OpenAI #Audio: Transcribe to WebVTT

OpenAI: Audio, Transcribe to WebVTT

Transcribes audio and video files in caption format WebVTT, using the OpenAI API “whisper-1” model by default (configurable) to convert audio data into text data. Set abbreviations and technical terms as PROMPT for more accurate transcription.

Configs for this Auto Step

AuthzConfU: U: Select HTTP_Authz Setting (Secret API Key as “Fixed Value”) *
StrConfM: M: Set MODEL Name (default “whisper-1”)^#{EL}
SelectConfA1: A1: Select FILE for Source Audio/Video *
StrConfA2: A2: Set Request Summary PROMPT^#{EL}
StrConfB1: B1: Set Sampling Temperature (default “0”)^#{EL}
StrConfB2: B2: Set Language (default null)^#{EL}
SelectConfC1: C1: Select STRING that stores WebVTT Text (update)
SelectConfC2: C2: Select FILE that stores WebVTT File (append)
StrConfC3: C3: Set File Name (default “{pid}.vtt”)^#{EL}

Script (click to open)

// GraalJS Script (engine type: 2)

//////// START "main()" /////////////////////////////////////////////////////////////////

main();
function main(){ 

////// == Config Retrieving / 工程コンフィグの参照 ==
const strAuthzSetting   = configs.get      ( "AuthzConfU" );   /// REQUIRED
  engine.log( " AutomatedTask Config: Authz Setting: " + strAuthzSetting );
const strModel          = configs.get( "StrConfM" ) !== "" ?   // NotRequired
                          configs.get( "StrConfM" ) : "whisper-1"; // (default)
  engine.log( " AutomatedTask Config: OpenAI Model: " + strModel );
const filesPocketAudio  = configs.getObject( "SelectConfA1" ); /// REQUIRED
  let filesAudio        = engine.findData( filesPocketAudio );
  if( filesAudio      === null ) {
    throw new Error( "\n AutomatedTask UnexpectedFileError:" +
                     " No File {A1} is attached \n" );
  }else{ // java.util.ArrayList of QfileView
    engine.log( " AutomatedTask FilesArray {A1}: " +
                 filesAudio.size() + " file(s)" );
  }
const strPrompt         = configs.get      ( "StrConfA2" );    // NotRequired
const strTemperature    = configs.get      ( "StrConfB1" );    // NotRequired
const strLanguage       = configs.get      ( "StrConfB2" );    // NotRequired
const strPocketVtt      = configs.getObject( "SelectConfC1" ); // NotRequired
const filesPocketVtt    = configs.getObject( "SelectConfC2" ); // NotRequired
  let filesVtt          = engine.findData( filesPocketVtt ) ??
                          new java.util.ArrayList(); // if `null`, ArrayList of QfileView
const strSaveAs         = configs.get( "StrConfC3" ) !== "" ?   // NotRequired
                          configs.get( "StrConfC3" ) :
                          processInstance.getProcessInstanceId() + ".vtt"; // (default)


////// == Data Retrieving / ワークフローデータの参照 ==
// (Nothing. Retrieved via Expression Language in Config Retrieving)


////// == Calculating / 演算 ==
//// OpenAI API > Documentation > API REFERENCE > CHAT
//// https://platform.openai.com/docs/api-reference/audio

/// prepare request1
let request1Uri = "https://api.openai.com/v1/audio/transcriptions";
let request1 = httpClient.begin(); // HttpRequestWrapper
    request1 = request1.authSetting( strAuthzSetting ); // with "Authorization: Bearer XX"
    request1 = request1.multipart( "file", filesAudio.get(0) );
    request1 = request1.multipart( "model", strModel );
    request1 = request1.multipart( "response_format", "vtt" ); // "verbose_json" to Json
    if ( strPrompt !== "" ) {
      request1 = request1.multipart( "prompt",      strPrompt );
    }
    if ( strTemperature !== "" ) {
      request1 = request1.multipart( "temperature", strTemperature );
    }
    if ( strLanguage !== "" ) {
      request1 = request1.multipart( "language",    strLanguage  );
    }

/// try request1
const response1     = request1.post( request1Uri ); // HttpResponseWrapper
engine.log( " AutomatedTask ApiRequest1 Start: " + request1Uri );
const response1Code = response1.getStatusCode() + ""; // JavaNum to string
const response1Body = response1.getResponseAsString();
engine.log( " AutomatedTask ApiResponse1 Status: " + response1Code );
if( response1Code !== "200"){
  throw new Error( "\n AutomatedTask UnexpectedResponseError: " +
                    response1Code + "\n" + response1Body + "\n" );
}

/// append file to FILES data
filesVtt.add(
  new com.questetra.bpms.core.event.scripttask.NewQfile(
    strSaveAs, "text/vtt", response1Body
  )
);


////// == Data Updating / ワークフローデータへの代入 ==

if( strPocketVtt !== null ){
  engine.setData( strPocketVtt,
                  response1Body
                );
}
if( filesPocketVtt !== null ){
  engine.setData( filesPocketVtt,
                  filesVtt
                );
}


} //////// END "main()" /////////////////////////////////////////////////////////////////


/*
Notes:
- If you place this "Automated Step" in the Workflow diagram,
    - the request will be automatically sent every time the process token arrives.
    - A request is automatically sent to the OpenAI API server. (REST API)
    - The response from the OpenAI API server is automatically parsed.
    - You can incorporate "AI assistance" into your business processes.
- Audio File: Assume that the file for storing FILE type is the audio source.
    - The first file is used as the audio source.
    - The second and subsequent files are not referenced.
    - Audio file to transcribe
        - "mp3", "mp4", "mpeg", "mpga", "m4a", "wav", or "webm"
        - eg, WebVTT file is automatically generated from video file such as mp4.
        - Consider the size limit on the API side.
            - Status 413: "Maximum content size limit (26214400)" (about 26MB) (as of 202303)
        - Note: Upload in Questetra BPM Suite is limited to 100MB (as of 202303)
- PROMPT: Text to improve the quality of the generated transcripts
    - The PROMPT should be in English.
    - Summary text for reference in English translation
        - Words or acronyms that the model often misrecognizes in the audio
- An API key is required to use the OpenAI API.
    - Get an API Key in advance.
    - Set "Secret API Key" to "HTTP Authz Setting" (Token Fixed Value)
- WebVTT:
    - Web Video Text Tracks is a World Wide Web Consortium (W3C) standard for displaying timed text in connection with the HTML5 `<track>` element. [Wikipedia](https://en.wikipedia.org/wiki/WebVTT)
    - Closed Captions: to make content more accessible e.g. to prevent discrimination.
    - Subtitles: e.g. for a French film screened in an English-speaking country.
- Model Name:
    - https://platform.openai.com/docs/models/model-endpoint-compatibility

APPENDIX:
- Sampling temperature
    - range: "[0,1]", default: "0"
- Language
    - ISO 639-1
    - eg. "`en`", "`ja`", "`fr`", "`de`", "`pt`", "`es`", "`ko`", "`nl`" , "`zh`",,,
    - https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
- Headers for developers belonging to multiple organizations are not yet supported (as of 202303).
    - `OpenAI-Organization`
- PROMPT test
    - (no Prompt)
        - "I'm going to step off the land now. That's one small step for man, one giant leap for mankind."
    - "Captain Neil Armstrong became the first man to set foot on the Moon"
        - "I'm going to step off the land now. That's one small step for a man, one giant leap for mankind."
    - "Commander Neil Armstrong climbed down the ladder of the Lunar Module (LM)"
        - "I'm going to step off the LM now. That's one small step for a man, one giant leap for mankind."


Notes-ja:
- この［自動工程］をワークフロー図に配置すれば、案件が到達する度にリクエストが自動送信されます。
    - OpenAI API サーバに対してリクエストが自動送出されます。（REST API通信）
    - OpenAI API サーバからのレスポンスが自動解析されます。
    - "AI による支援" を業務プロセスに組み込むことが出来ます。
- 音声ファイル: ファイル型データの格納ファイルを音源とします。
    - 1つ目に保存されているファイルを音源とします。
    - 2つ目以降のファイルは参照しません。
    - 音声ファイル（会議音声等）フォーマット
        - "mp3", "mp4", "mpeg", "mpga", "m4a", "wav", or "webm"
        - たとえば mp4 等の動画ファイルから WebVTT ファイルが自動生成されます。
        - API 側のサイズ制限に配慮が必要です。
            - Status 413: "Maximum content size limit (26214400)" (約26MB) （202303時点）
        - Questetra BPM Suite のアップロード操作が100MBに制限されている点にも注意が必要です（202303時点）
- 概要 PROMPT: 生成文の品質を向上させるためのテキストを登録します。
    - PROMPT設定は英語が推奨。
    - 英文への翻訳の参考となる概要説明テキスト
        - モデルに誤認されがちな略語や熟語など
- OpenAI API の利用には API key が必要です。
    - あらかじめ API Key を取得しておいてください。
    - "Secret API Key" のセット: ［HTTP 認証設定］＞［トークン直接指定］
- WebVTT:
    - "Web Video Text Tracks" は、"時間指定テキスト" を表示するための World Wide Web Consortium (W3C) 標準です。
    - HTML5 `<track>` 要素によって紐づけられます。 [Wikipedia](https://ja.wikipedia.org/wiki/WebVTT)
    - キャプション字幕: アクセシビリティを高めるため（たとえば難聴者の不便を無くすため）
    - サブタイトル字幕: 視聴者は音を聞きとれている前提での字幕。英語圏の国で仏語映画を上映する際の翻訳字幕など
- モデル名:
    - https://platform.openai.com/docs/models/model-endpoint-compatibility

APPENDIX-ja:
- サンプリング温度設定（temperature）
    - range:"[0,1]", default:"0"
- Language
    - ISO 639-1
    - eg. "`en`", "`ja`", "`fr`", "`de`", "`pt`", "`es`", "`ko`", "`nl`" , "`zh`",,,
    - https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
- 複数組織に所属する開発者向けのヘッダには未対応です（202303時点）
    - `OpenAI-Organization`
- PROMPT テスト
    - (no Prompt)
        - "おじいさんは山へしばっかりに おばあさんも山へしばっかりに行きました。"
    - "お爺さんとお婆さんが芝刈りに行く話です"
        - "お爺さんは山へ芝刈りに、お婆さんも山へ芝刈りに行きました。"
    - "お爺さんとお婆さんが柴刈りに行く話です"
        - "お爺さんは山へ柴刈りに、お婆さんも山へ柴刈りに行きました。"
- PROMPT テスト2
    - (no Prompt)
        - "I'm going to step off the land now. That's one small step for man, one giant leap for mankind."
    - "Captain Neil Armstrong became the first man to set foot on the Moon"
        - "I'm going to step off the land now. That's one small step for a man, one giant leap for mankind."
    - "Commander Neil Armstrong climbed down the ladder of the Lunar Module (LM)"
        - "I'm going to step off the LM now. That's one small step for a man, one giant leap for mankind."
*/

Download

openai-audio-transcribe-to-webvtt-2023.xml
- 2023-03-16 (C) Questetra, Inc. (MIT License)
openai-audio-transcribe-to-webvtt-202308.xml
- 2023-08-08 (C) Questetra, Inc. (MIT License)
- for “GraalJS standard (engine-type 3)” on v15.0 or above

warning Freely modifiable JavaScript (ECMAScript) code. No warranty of any kind.
(Installing Addon Auto-Steps are available only on the Professional edition.)

Audio/Video files for test
- ja-old-story-peach-boy.m4a
- YouTube: API時代の業務フロー図#5 分岐と分流の違い
  - learn-workflow-part5-show-converted.wav
  - learn-workflow-part5-show.-p1-p3-480p.mp4

Notes

If you place this automated step in the Workflow diagram,
- the request will be automatically sent every time the process token arrives.
- A request is automatically sent to the OpenAI API server. (REST API)
- The response from the OpenAI API server is automatically parsed.
- You can incorporate AI assistance into your business processes.
Audio File: Assume that the file for storing FILE type is the audio source.
- The first file is used as the audio source.
- The second and subsequent files are not referenced.
- Audio file to transcribe
  - mp3, mp4, mpeg, mpga, m4a, wav, or webm
  - E.g. a WebVTT file is automatically generated from a video file such as an mp4.
  - Consider the size limit on the API side.
    - Status 413: “Maximum content size limit (26214400)” (about 26MB) (as of 202303)
  - Note: Upload in Questetra BPM Suite is limited to 100MB (as of 202303)
PROMPT: Text to improve the quality of the generated transcripts
- The PROMPT should be in English.
- Summary text for reference in English translation
  - Words or acronyms that the model often misrecognizes in the audio
An API key is required to use the OpenAI API.
- Get an API Key in advance.
- Set “Secret API Key” to “HTTP Authz Setting” (Token Fixed Value)
WebVTT:
- Web Video Text Tracks is a World Wide Web Consortium (W3C) standard for displaying timed text in connection with the HTML5 <track> element. Wikipedia
- Closed Captions: to make content more accessible, e.g. to prevent discrimination.
- Subtitles: e.g. for a French film screened in an English-speaking country.
Model Name:
- https://platform.openai.com/docs/models/model-endpoint-compatibility

Capture

Transcribes audio and video files in caption format WebVTT, using the OpenAI API "whisper-1" model by default (configurable) to convert audio data into text data. Set abbreviations and technical terms as PROMPT for more accurate transcription.

Appendix

Sampling temperature
- range: “[0,1]”, default: “0”
Language
- ISO 639-1
- eg. “en“, “ja“, “fr“, “de“, “pt“, “es“, “ko“, “nl” , “zh“,,,
- https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
Headers for developers belonging to multiple organizations are not yet supported (as of 202303).
- OpenAI-Organization
PROMPT test
- (no Prompt)
  - “I’m going to step off the land now. That’s one small step for man, one giant leap for mankind.”
- “Captain Neil Armstrong became the first man to set foot on the Moon”
  - “I’m going to step off the land now. That’s one small step for a man, one giant leap for mankind.”
- “Commander Neil Armstrong climbed down the ladder of the Lunar Module (LM)”
  - “I’m going to step off the LM now. That’s one small step for a man, one giant leap for mankind.”