Incorporate Optical Character Reading into Workflow (AI-OCR/BPM Integration)

Hi there!

The extraction of characters from an image (OCR: Optical Character Recognition) has been a hot topic recently.

I tried to incorporate OCR processing into a Workflow in Questetra BPM Suite since OCR can be easily implemented using Google’s API. (It seems worthwhile making into a Workflow if such work occurs frequently.)

The following are the details of what I tried.

TOC
Settings for invoking Google Cloud Vision API from Questetra
 Settings on the Google side
 Settings on the Questetra side
Closing

 

Settings for invoking Google Cloud Vision API from Questetra

In this case, I will calling the API (Google Cloud API) of Google Cloud Platform (GCP) from Questetra BPM Suite. Specifically, it uploads an image file to a Cloud Storage bucket and sends a request to Vision API using the image file. I referred to the “Using API Explorer API” quick start page of Cloud Vision for the process flow.

Settings on the Google side

In accordance with the Using API Explorer page, I created a project in the GCP Console and enabled the Vision API.

Then in accordance with the OAuth2 Authentication page here, I created Authentication information and obtained a client ID and client secret. I took a memo of the client ID and the client secret as these are needed for settings on the Questetra side.

Also, I created a Cloud Storage bucket in accordance with the Using API Explorer page. (in the example script mentioned later, the bucket is named “questetra-visionapi-test”.)

Settings on the Questetra side

I have created a simple workflow in which an image is uploaded to a File-type Data Item and returns the result of the OCR after Operating the Task.
You can download the sample App Archive HERE. (download the .zip file and extract to find a .qar file.)

Concerning OAuth settings, I configured them using the client ID/client secret that I noted earlier, referring to the following page. (In the example of the script the setting name is “GoogleVisionAPItest”. In addition, this setting is required separately even if you import the sample workflow.)
 Settings when Calling REST API of Another Service from Questetra (Questetra to be OAuth2 client)

Regarding the Script Task, the codes in the Script are as follows. (These are included in the sample App Archive as well.)

var bucket = 'questetra-visionapi-test'; // bucket name
var file = 'p' + processInstance.getProcessInstanceId(); // file name
var token = httpClient.getOAuth2Token('GoogleVisionAPItest');
var response, code;

// Google Cloud Storage
response = httpClient.begin()
  .bearer(token)
  .queryParam('name', file)
  .queryParam('uploadType', 'media')
  .body(q_file.get(0))
  .post('https://www.googleapis.com/upload/storage/v1/b/' + bucket + '/o');

code = response.getStatusCode();
if (code !== 200) {
  throw "response is not 200: "+ code + " " + response.getResponseAsString();
}

// Google Cloud Vision
var visionReq = {
  "requests" : [{
    "features" : [{ "type": "DOCUMENT_TEXT_DETECTION"}],
    "image" : {"source" : { "gcsImageUri" : 'gs://' + bucket + '/' + file}}
  }]
};

response = httpClient.begin()
  .bearer(token)
  .body(JSON.stringify(visionReq), "application/json")
  .post('https://vision.googleapis.com/v1/images:annotate');

code = response.getStatusCode();
if (code !== 200) {
  throw "response is not 200: "+ code + " " + response.getResponseAsString();
}

engine.setDataByVarName('q_response', response.getResponseAsString());

The overall processing flow in the script is, first sending the File-type Data Item q_file to Cloud Storage, then invoking Cloud Vision API using the file URL as an argument. The returned result is saved in q_response, the String-type Data Item. The Field names of the Data Items such as q_file and q_response associates with the settings in the Workflow side. In addition, you can specify which API features to use with the type parameter in features. Please see the following page for the details.
 Cloud Vision API: Feature
* The result differs when using DOCUMENT_TEXT_DETECTION or TEXT_DETECTION even using the same image. It seemed the former was better for images of a document that had been created in as Microsoft Word. In features, there are some others which I am interested in such as detecting landmarks in images.
* Although in this case, I used a Script Task, it can also be implemented by using two Message Intermediate Event (HTTP) instead.

As mentioned in the Using API Explorer page, “You can store up to 5GB of data in Cloud Storage for free and make up to 1000 feature requests to the Vision API for free per month.” Please keep in mind that you will be charged when exceeding the limit.

 

Closing

Thus, invoking Google Cloud API from Workflows can be easily achieved.

The Translation API could also be invoked in the same way as the Cloud Vision API. It would be good to incorporate this into Workflows for translation type businesses. (I will write about it in another post when I get the chance.)

Also, based on this flow, it is possible to send a picture to a certain email address and return a transcription back by creating an entry point for emails using a Message Start Event (Email), then extracting data from returned JSON, and replying with Throwing Message Intermediate Event (Email). (I actually created it as well but I cannot tell you the address due to the limitation on API usage.)

If you have any questions concerning this post, please feel free to ask via our Inquiry Form.

%d bloggers like this: