OpenAI: Audio, Transcribe to WebVTT

OpenAI: Audio, Transcribe to WebVTT

OpenAI: Audio, WebVTT形式で文字起こし

Transcribes audio and video files in caption format WebVTT, using the OpenAI API “whisper-1” model by default (configurable) to convert audio data into text data. Set abbreviations and technical terms as PROMPT for more accurate transcription.

Auto Step icon
Configs for this Auto Step
AuthzConfU
U: Select HTTP_Authz Setting (Secret API Key as “Fixed Value”) *
StrConfM
M: Set MODEL Name (default “whisper-1”)#{EL}
SelectConfA1
A1: Select FILE for Source Audio/Video *
StrConfA2
A2: Set Request Summary PROMPT#{EL}
StrConfB1
B1: Set Sampling Temperature (default “0”)#{EL}
StrConfB2
B2: Set Language (default null)#{EL}
SelectConfC1
C1: Select STRING that stores WebVTT Text (update)
SelectConfC2
C2: Select FILE that stores WebVTT File (append)
StrConfC3
C3: Set File Name (default “{pid}.vtt”)#{EL}
Script (click to open)
// GraalJS Script (engine type: 2)

//////// START "main()" /////////////////////////////////////////////////////////////////

main();
function main(){ 

////// == Config Retrieving / 工程コンフィグの参照 ==
const strAuthzSetting   = configs.get      ( "AuthzConfU" );   /// REQUIRED
  engine.log( " AutomatedTask Config: Authz Setting: " + strAuthzSetting );
const strModel          = configs.get( "StrConfM" ) !== "" ?   // NotRequired
                          configs.get( "StrConfM" ) : "whisper-1"; // (default)
  engine.log( " AutomatedTask Config: OpenAI Model: " + strModel );
const filesPocketAudio  = configs.getObject( "SelectConfA1" ); /// REQUIRED
  let filesAudio        = engine.findData( filesPocketAudio );
  if( filesAudio      === null ) {
    throw new Error( "\n AutomatedTask UnexpectedFileError:" +
                     " No File {A1} is attached \n" );
  }else{ // java.util.ArrayList of QfileView
    engine.log( " AutomatedTask FilesArray {A1}: " +
                 filesAudio.size() + " file(s)" );
  }
const strPrompt         = configs.get      ( "StrConfA2" );    // NotRequired
const strTemperature    = configs.get      ( "StrConfB1" );    // NotRequired
const strLanguage       = configs.get      ( "StrConfB2" );    // NotRequired
const strPocketVtt      = configs.getObject( "SelectConfC1" ); // NotRequired
const filesPocketVtt    = configs.getObject( "SelectConfC2" ); // NotRequired
  let filesVtt          = engine.findData( filesPocketVtt ) ??
                          new java.util.ArrayList(); // if `null`, ArrayList of QfileView
const strSaveAs         = configs.get( "StrConfC3" ) !== "" ?   // NotRequired
                          configs.get( "StrConfC3" ) :
                          processInstance.getProcessInstanceId() + ".vtt"; // (default)


////// == Data Retrieving / ワークフローデータの参照 ==
// (Nothing. Retrieved via Expression Language in Config Retrieving)


////// == Calculating / 演算 ==
//// OpenAI API > Documentation > API REFERENCE > CHAT
//// https://platform.openai.com/docs/api-reference/audio

/// prepare request1
let request1Uri = "https://api.openai.com/v1/audio/transcriptions";
let request1 = httpClient.begin(); // HttpRequestWrapper
    request1 = request1.authSetting( strAuthzSetting ); // with "Authorization: Bearer XX"
    request1 = request1.multipart( "file", filesAudio.get(0) );
    request1 = request1.multipart( "model", strModel );
    request1 = request1.multipart( "response_format", "vtt" ); // "verbose_json" to Json
    if ( strPrompt !== "" ) {
      request1 = request1.multipart( "prompt",      strPrompt );
    }
    if ( strTemperature !== "" ) {
      request1 = request1.multipart( "temperature", strTemperature );
    }
    if ( strLanguage !== "" ) {
      request1 = request1.multipart( "language",    strLanguage  );
    }

/// try request1
const response1     = request1.post( request1Uri ); // HttpResponseWrapper
engine.log( " AutomatedTask ApiRequest1 Start: " + request1Uri );
const response1Code = response1.getStatusCode() + ""; // JavaNum to string
const response1Body = response1.getResponseAsString();
engine.log( " AutomatedTask ApiResponse1 Status: " + response1Code );
if( response1Code !== "200"){
  throw new Error( "\n AutomatedTask UnexpectedResponseError: " +
                    response1Code + "\n" + response1Body + "\n" );
}

/// append file to FILES data
filesVtt.add(
  new com.questetra.bpms.core.event.scripttask.NewQfile(
    strSaveAs, "text/vtt", response1Body
  )
);


////// == Data Updating / ワークフローデータへの代入 ==

if( strPocketVtt !== null ){
  engine.setData( strPocketVtt,
                  response1Body
                );
}
if( filesPocketVtt !== null ){
  engine.setData( filesPocketVtt,
                  filesVtt
                );
}


} //////// END "main()" /////////////////////////////////////////////////////////////////


/*
Notes:
- If you place this "Automated Step" in the Workflow diagram,
    - the request will be automatically sent every time the process token arrives.
    - A request is automatically sent to the OpenAI API server. (REST API)
    - The response from the OpenAI API server is automatically parsed.
    - You can incorporate "AI assistance" into your business processes.
- Audio File: Assume that the file for storing FILE type is the audio source.
    - The first file is used as the audio source.
    - The second and subsequent files are not referenced.
    - Audio file to transcribe
        - "mp3", "mp4", "mpeg", "mpga", "m4a", "wav", or "webm"
        - eg, WebVTT file is automatically generated from video file such as mp4.
        - Consider the size limit on the API side.
            - Status 413: "Maximum content size limit (26214400)" (about 26MB) (as of 202303)
        - Note: Upload in Questetra BPM Suite is limited to 100MB (as of 202303)
- PROMPT: Text to improve the quality of the generated transcripts
    - The PROMPT should be in English.
    - Summary text for reference in English translation
        - Words or acronyms that the model often misrecognizes in the audio
- An API key is required to use the OpenAI API.
    - Get an API Key in advance.
    - Set "Secret API Key" to "HTTP Authz Setting" (Token Fixed Value)
- WebVTT:
    - Web Video Text Tracks is a World Wide Web Consortium (W3C) standard for displaying timed text in connection with the HTML5 `<track>` element. [Wikipedia](https://en.wikipedia.org/wiki/WebVTT)
    - Closed Captions: to make content more accessible e.g. to prevent discrimination.
    - Subtitles: e.g. for a French film screened in an English-speaking country.
- Model Name:
    - https://platform.openai.com/docs/models/model-endpoint-compatibility

APPENDIX:
- Sampling temperature
    - range: "[0,1]", default: "0"
- Language
    - ISO 639-1
    - eg. "`en`", "`ja`", "`fr`", "`de`", "`pt`", "`es`", "`ko`", "`nl`" , "`zh`",,,
    - https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
- Headers for developers belonging to multiple organizations are not yet supported (as of 202303).
    - `OpenAI-Organization`
- PROMPT test
    - (no Prompt)
        - "I'm going to step off the land now. That's one small step for man, one giant leap for mankind."
    - "Captain Neil Armstrong became the first man to set foot on the Moon"
        - "I'm going to step off the land now. That's one small step for a man, one giant leap for mankind."
    - "Commander Neil Armstrong climbed down the ladder of the Lunar Module (LM)"
        - "I'm going to step off the LM now. That's one small step for a man, one giant leap for mankind."


Notes-ja:
- この[自動工程]をワークフロー図に配置すれば、案件が到達する度にリクエストが自動送信されます。
    - OpenAI API サーバに対してリクエストが自動送出されます。(REST API通信)
    - OpenAI API サーバからのレスポンスが自動解析されます。
    - "AI による支援" を業務プロセスに組み込むことが出来ます。
- 音声ファイル: ファイル型データの格納ファイルを音源とします。
    - 1つ目に保存されているファイルを音源とします。
    - 2つ目以降のファイルは参照しません。
    - 音声ファイル(会議音声等)フォーマット
        - "mp3", "mp4", "mpeg", "mpga", "m4a", "wav", or "webm"
        - たとえば mp4 等の動画ファイルから WebVTT ファイルが自動生成されます。
        - API 側のサイズ制限に配慮が必要です。
            - Status 413: "Maximum content size limit (26214400)" (約26MB) (202303時点)
        - Questetra BPM Suite のアップロード操作が100MBに制限されている点にも注意が必要です(202303時点)
- 概要 PROMPT: 生成文の品質を向上させるためのテキストを登録します。
    - PROMPT設定は英語が推奨。
    - 英文への翻訳の参考となる概要説明テキスト
        - モデルに誤認されがちな略語や熟語など
- OpenAI API の利用には API key が必要です。
    - あらかじめ API Key を取得しておいてください。
    - "Secret API Key" のセット: [HTTP 認証設定]>[トークン直接指定]
- WebVTT:
    - "Web Video Text Tracks" は、"時間指定テキスト" を表示するための World Wide Web Consortium (W3C) 標準です。
    - HTML5 `<track>` 要素によって紐づけられます。 [Wikipedia](https://ja.wikipedia.org/wiki/WebVTT)
    - キャプション字幕: アクセシビリティを高めるため(たとえば難聴者の不便を無くすため)
    - サブタイトル字幕: 視聴者は音を聞きとれている前提での字幕。英語圏の国で仏語映画を上映する際の翻訳字幕など
- モデル名:
    - https://platform.openai.com/docs/models/model-endpoint-compatibility

APPENDIX-ja:
- サンプリング温度設定(temperature)
    - range:"[0,1]", default:"0"
- Language
    - ISO 639-1
    - eg. "`en`", "`ja`", "`fr`", "`de`", "`pt`", "`es`", "`ko`", "`nl`" , "`zh`",,,
    - https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
- 複数組織に所属する開発者向けのヘッダには未対応です(202303時点)
    - `OpenAI-Organization`
- PROMPT テスト
    - (no Prompt)
        - "おじいさんは山へしばっかりに おばあさんも山へしばっかりに行きました。"
    - "お爺さんとお婆さんが芝刈りに行く話です"
        - "お爺さんは山へ芝刈りに、お婆さんも山へ芝刈りに行きました。"
    - "お爺さんとお婆さんが柴刈りに行く話です"
        - "お爺さんは山へ柴刈りに、お婆さんも山へ柴刈りに行きました。"
- PROMPT テスト2
    - (no Prompt)
        - "I'm going to step off the land now. That's one small step for man, one giant leap for mankind."
    - "Captain Neil Armstrong became the first man to set foot on the Moon"
        - "I'm going to step off the land now. That's one small step for a man, one giant leap for mankind."
    - "Commander Neil Armstrong climbed down the ladder of the Lunar Module (LM)"
        - "I'm going to step off the LM now. That's one small step for a man, one giant leap for mankind."
*/

Download

warning Freely modifiable JavaScript (ECMAScript) code. No warranty of any kind.
(Installing Addon Auto-Steps are available only on the Professional edition.)

Notes

  • If you place this automated step in the Workflow diagram,
    • the request will be automatically sent every time the process token arrives.
    • A request is automatically sent to the OpenAI API server. (REST API)
    • The response from the OpenAI API server is automatically parsed.
    • You can incorporate AI assistance into your business processes.
  • Audio File: Assume that the file for storing FILE type is the audio source.
    • The first file is used as the audio source.
    • The second and subsequent files are not referenced.
    • Audio file to transcribe
      • mp3, mp4, mpeg, mpga, m4a, wav, or webm
      • E.g. a WebVTT file is automatically generated from a video file such as an mp4.
      • Consider the size limit on the API side.
        • Status 413: “Maximum content size limit (26214400)” (about 26MB) (as of 202303)
      • Note: Upload in Questetra BPM Suite is limited to 100MB (as of 202303)
  • PROMPT: Text to improve the quality of the generated transcripts
    • The PROMPT should be in English.
    • Summary text for reference in English translation
      • Words or acronyms that the model often misrecognizes in the audio
  • An API key is required to use the OpenAI API.
    • Get an API Key in advance.
    • Set “Secret API Key” to “HTTP Authz Setting” (Token Fixed Value)
  • WebVTT:
    • Web Video Text Tracks is a World Wide Web Consortium (W3C) standard for displaying timed text in connection with the HTML5 <track> element. Wikipedia
    • Closed Captions: to make content more accessible, e.g. to prevent discrimination.
    • Subtitles: e.g. for a French film screened in an English-speaking country.
  • Model Name:

Capture

Transcribes audio and video files in caption format WebVTT, using the OpenAI API "whisper-1" model by default (configurable) to convert audio data into text data. Set abbreviations and technical terms as PROMPT for more accurate transcription.

Appendix

  • Sampling temperature
    • range: “[0,1]”, default: “0”
  • Language
  • Headers for developers belonging to multiple organizations are not yet supported (as of 202303).
    • OpenAI-Organization
  • PROMPT test
    • (no Prompt)
      • “I’m going to step off the land now. That’s one small step for man, one giant leap for mankind.”
    • “Captain Neil Armstrong became the first man to set foot on the Moon”
      • “I’m going to step off the land now. That’s one small step for a man, one giant leap for mankind.”
    • “Commander Neil Armstrong climbed down the ladder of the Lunar Module (LM)”
      • “I’m going to step off the LM now. That’s one small step for a man, one giant leap for mankind.”

See Also

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: