OpenAI #Chat: Answer Prompt with Images

OpenAI #Chat: Answer Prompt with Images

OpenAI #Chat: Answer Prompt with Images

translate OpenAI #Chat: 画像付きプロンプトに回答

Creates a response for a prompt with images. MODEL (GPT-4 with Vision) takes in images and answers questions related to them, such as what the image represents, what is in the image, and more (eg. ideas for dinner based on what is in fridge).

Auto Step icon
Configs for this Auto Step
AuthzConfU1
U1: Select HTTP_Authz Setting (Secret API Key as “Fixed Value”) *
StrConfA1
A1: Set Text PROMPT *#{EL}
StrConfA2
A2: Set Image URLs on each line *#{EL}
SelectConfC1
C1: Select STRING that stores Generated Text (update)
StrConfM
M: Set MODEL Name (default “gpt-4-vision-preview”)#{EL}
StrConfU2
U2: Set OpenAI Organization ID (“org-xxxx”)#{EL}
StrConfB1
B1: Set DETAIL parameter (“high” or “low”:default)#{EL}
StrConfB2
B2: Set MaxTokens#{EL}
SelectConfC2
C2: Select NUMERIC that stores PROMPT Tokens (update)
SelectConfC3
C3: Select NUMERIC that stores Total Tokens (update)
Script (click to open)
// Script Example of Business Process Automation
// for 'engine type: 3' ("GraalJS standard mode")
// cf. 'engine type: 2' ("GraalJS Nashorn compatible mode") (renamed from "GraalJS" at 20230526)

//////// START "main()" /////////////////////////////////////////////////////////////////

main();
function main(){ 

////// == Config Retrieving / 工程コンフィグの参照 ==
const strAuthzSetting  = configs.get      ( "AuthzConfU1" );                       /// REQUIRED
  engine.log( " AutomatedTask Config: Authz Setting: " + strAuthzSetting );
const strOrgId         = configs.get      ( "StrConfU2" );                         // NotRequired
  engine.log( " AutomatedTask Config: OpenAI-Organization: " + strOrgId );
const strModel         = configs.get      ( "StrConfM" ) !== "" ?                  // NotRequired
                         configs.get      ( "StrConfM" ) : "gpt-4-vision-preview"; // (default)
  engine.log( " AutomatedTask Config: OpenAI Model: " + strModel );

const strTextPrompt    = configs.get      ( "StrConfA1" );                         /// REQUIRED
  if( strTextPrompt  === "" ){
    throw new Error( "\n AutomatedTask ConfigError:" +
                     " Config {A1: Prompt} must be non-empty \n" );
  }
const strImageUrls     = configs.get      ( "StrConfA2" );                         /// REQUIRED
  if( strImageUrls   === "" ){
    throw new Error( "\n AutomatedTask ConfigError:" +
                     " Config {A2: ImageUrls} must be non-empty \n" );
  }
const arrImageUrls     = strImageUrls.split("\n");

const strDetail        = configs.get      ( "StrConfB1" ) !== "" ?                 // NotRequired
                         configs.get      ( "StrConfB1" ) : "low";                 // (default)
const strMaxTokens     = configs.get      ( "StrConfB2" );                         // NotRequired
const numMaxTokens     = parseInt ( strMaxTokens, 10 );
  engine.log( " AutomatedTask Config: Max Tokens: " + numMaxTokens );

const strPocketGenerated = configs.getObject ( "SelectConfC1" );                   /// REQUIRED NotRequired
const numPocketPrompt    = configs.getObject ( "SelectConfC2" );                   // NotRequired
const numPocketTotal     = configs.getObject ( "SelectConfC3" );                   // NotRequired



////// == Data Retrieving / ワークフローデータの参照 ==
// (Nothing. Retrieved via Expression Language in Config Retrieving)


////// == Calculating / 演算 ==

//// OpenAI API > Documentation > API REFERENCE > CHAT
//// https://platform.openai.com/docs/api-reference/chat/create (not updated)
//// https://platform.openai.com/docs/guides/vision

/// prepare json
let strJson = {};
    strJson.model = strModel;
    if ( ! isNaN(numMaxTokens) ) {
      strJson.max_tokens       = numMaxTokens;
    }
//    strJson.response_format = {};
//    strJson.response_format.type = "json_object";

    strJson.messages = [];
    strJson.messages[0] = {};
    strJson.messages[0].role = "user";
    strJson.messages[0].content = [];
    strJson.messages[0].content[0] = {};
    strJson.messages[0].content[0].type = "text";
    strJson.messages[0].content[0].text = strTextPrompt;

    for ( let i = 0; i < arrImageUrls.length; i++ ) {
      const objTmp = {};
      objTmp.type = "image_url";
      objTmp.image_url = arrImageUrls[i];
      strJson.messages[0].content.push ( objTmp );
    }

/// prepare request1
let request1Uri = "https://api.openai.com/v1/chat/completions";
let request1 = httpClient.begin(); // HttpRequestWrapper
    request1 = request1.queryParam( "detail", strDetail );
    request1 = request1.authSetting( strAuthzSetting ); // with "Authorization: Bearer XX"
    request1 = request1.body( JSON.stringify( strJson ), "application/json" );
    if ( strOrgId !== "" ){
      request1 = request1.header( "OpenAI-Organization", strOrgId );
    }

/// try request1
const response1     = request1.post( request1Uri ); // HttpResponseWrapper
engine.log( " AutomatedTask ApiRequest1 Start: " + request1Uri );
const response1Code = response1.getStatusCode() + ""; // JavaNum to string
const response1Body = response1.getResponseAsString();
engine.log( " AutomatedTask ApiResponse1 Status: " + response1Code );
if( response1Code !== "200"){
  throw new Error( "\n AutomatedTask UnexpectedResponseError: " +
                    response1Code + "\n" + response1Body + "\n" );
}


/// parse response1
/* engine.log( response1Body ); // debug
{
  "id": "chatcmpl-8I9gORGEvsLaVE10Dir0pOMZI5I37",
  "object": "chat.completion",
  "created": 1699337816,
  "model": "gpt-4-1106-vision-preview",
  "usage": {
    "prompt_tokens": 1887, 
    "completion_tokens": 16,
    "total_tokens": 1903
  },
  "choices": [{
    "message": {
      "role": "assistant", 
      "content": "\u6700\u521d\u306e\u753b\u50cf\u306b\u306f\u3001\"Fine tuning workflow\" \u3068\u3044\u3046"
    },
    "finish_details": {
      "type": "max_tokens"
    },
    "index": 0
    }]
}
*/
const response1Obj = JSON.parse( response1Body );
engine.log( " AutomatedTask ApiResponse1 finish_details: " + response1Obj.choices[0].finish_details.type );


////// == Data Updating / ワークフローデータへの代入 ==

if( strPocketGenerated !== null ){
  engine.setData( strPocketGenerated, response1Obj.choices[0].message.content );
}
if( numPocketPrompt !== null ){
  engine.setData( numPocketPrompt, new java.math.BigDecimal( response1Obj.usage.prompt_tokens ) );
}
if( numPocketTotal !== null ){
  engine.setData( numPocketTotal, new java.math.BigDecimal( response1Obj.usage.total_tokens ) );
}

} //////// END "main()" /////////////////////////////////////////////////////////////////


/*
Notes:
- This [Automated Step] obtains the answer text via OpenAI API (Chat endpoint).
    - Specify the instruction (prompt) using Text and Image-Url.
    - Also possible to specify multiple images (Image-Urls).
    - Compatible with GPT-4V preview version (gpt-4-vision-preview).
        - Specifications are subject to change.
        - https://platform.openai.com/docs/guides/vision
- If place this [Automated Atep] in the workflow diagram, communication will occur every time a process arrives.
    - Request from the Questetra BPM Suite server to the OpenAI server.
    - Analyzes the response from the OpenAI server and stores the necessary information.
- [HTTP Authz Settings] is required for workflow apps that include this [Automated Step].
    - An API key is required to use OpenAI API. Please obtain an API key in advance.
        - https://platform.openai.com/api-keys
    - Set 'Secret API Key' as communication token. [HTTP Authz Settings] > [Token Fixed Value]

APPENDIX
- For low res mode, a 512px x 512px image is expected.
- For high res mode,
    - the short side of the image should be less than 768px and
    - the long side of the image should be less than 2,000px.
- Supported type of files
    - PNG (.png), JPEG (.jpeg .jpg), WEBP (.webp), and non-animated GIF (.gif)
- An error may occur depending on the timing.
    - 429 error ('Too Many Requests')
    - Timeout script error `java.util.concurrent.TimeoutException`


Notes-ja:
- この[自動工程]は、OpenAI API (Chat エンドポイント)を通じて、回答文を取得します。
    - 指示文(プロンプト)は Text および Image-Url にて指定します。
    - 複数画像(Image-Urls)の指定も可能です。
    - GPT-4V プレビュー版(gpt-4-vision-preview)に対応します。
        - 仕様が変更される可能性があります。
        - https://platform.openai.com/docs/guides/vision
- この[自動工程]をワークフロー図に配置すれば、案件到達の度に通信が発生します。
    - Questetra BPM Suite サーバから OpenAI サーバに対してリクエストします。
    - OpenAI サーバからのレスポンスを解析し、必要情報を格納します。
- この[自動工程]を含むワークフローアプリには、[HTTP 認証設定]が必要です。
    - OpenAI API の利用には API key が必要です。あらかじめ API Key を取得しておいてください。
        - https://platform.openai.com/api-keys
    - 'Secret API Key' を通信トークンとしてセットします。[HTTP 認証設定]>[トークン直接指定]

APPENDIX-ja
- 低解像度モード(low)の場合、512px x 512px 画像が推奨です。
- 高解像度モード(high)の場合、
    - 画像の短辺は 768px 未満、
    - 画像の長辺は 2,000px 未満である必要があります。
- サポートされる画像フォーマット
    - PNG (.png), JPEG (.jpeg .jpg), WEBP (.webp), and non-animated GIF (.gif)
- タイミングによって、エラーになる場合があります。
    - 429エラー('Too Many Requests')
    - Timeout スクリプトエラー `java.util.concurrent.TimeoutException`
*/

Download

warning Freely modifiable JavaScript (ECMAScript) code. No warranty of any kind.
(Installing Addon Auto-Steps are available only on the Professional edition.)

Notes

  • This [Automated Step] obtains the answer text via OpenAI API (Chat endpoint).
    • Specify the instruction (prompt) using Text and Image-Url.
    • Also possible to specify multiple images (Image-Urls).
    • Compatible with GPT-4V preview version (gpt-4-vision-preview).
  • If you place this [Automated Step] in the workflow diagram, communication will occur every time a process arrives.
    • Request from the Questetra BPM Suite server to the OpenAI server.
    • Analyzes the response from the OpenAI server and stores the necessary information.
  • [HTTP Authz Settings] is required for workflow apps that include this [Automated Step].
    • An API key is required to use OpenAI API. Please obtain an API key in advance.
    • Set ‘Secret API Key’ as the communication token. [HTTP Authz Settings] > [Token Fixed Value]

Capture

Creates a response for a prompt with images. MODEL (GPT-4 with Vision) takes in images and answers questions related to them, such as what the image represents, what is in the image, and more (eg. ideas for dinner based on what is in fridge).

Appendix

  • For low res mode, a 512px x 512px image is expected.
  • For high res mode,
    • the short side of the image should be less than 768px and
    • the long side of the image should be less than 2,000px.
  • Supported type of files
    • PNG (.png), JPEG (.jpeg .jpg), WEBP (.webp), and non-animated GIF (.gif)
  • An error may occur depending on the timing.
    • 429 error (‘Too Many Requests’)
    • Timeout script error java.util.concurrent.TimeoutException

See Also

OpenAI #Chat: Answer Text Prompt
OpenAI #Images: Generate

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Questetra Support

Subscribe now to keep reading and get access to the full archive.

Continue reading

Scroll to Top