Incorporate Optical Character Reading into Workflow (AI-OCR/BPM Integration)

Hi, there!

The extraction of characters out of an image (OCR: Optical Character Recognition) has been a topic recently.

I tried to incorporate an OCR processing into a Workflow of Questetra BPM Suite since OCR can be realized easily using Google’s API.
* It seems worthwhile making into a Workflow if such work occurs frequently.

The followings are details of what I tried.

TOC
Settings for invoking Google Cloud Vision API from Questetra
 Settings on the Google side
 Settings on the Questetra side
Closing

 

Settings for invoking Google Cloud Vision API from Questetra

In the case here, I will realize by kicking the API (Google Cloud API) of Google Cloud Platform (GCP) from Questetra BPM Suite. Specifically, it uploads an image file to a Cloud Storage bucket and sends a request to Vision API using the image file. I referred to the Using API Explorer of Cloud Vision API for the flow of the processing.

Settings on the Google side

In accordance with the Using API Explorer of Cloud Vision API, I created a project in the GCP Console and enabled the Vision API.

Then in accordance with the OAuth2 Authentication page here, I created Authentication information and obtained a client ID and client secret. I took a memo of the client ID and the client secret as these are needed for settings on the Questetra side.

Also, I created a Cloud Storage bucket in accordance with the Using API Explorer. (in the example script mentioned later, the bucket is named “questetra-visionapi-test”.)

Settings on the Questetra side

I have created a simple workflow in which just an image is uploaded to a File-type Data Item and returns the result of OCR after Operating the Task.
You can download the sample App Archive HERE. (download the .zip file and extract to find a .qar file.)

Concerning OAuth settings, I configured using the client ID/client secret that I have taken a memo, referring to the following page. (in the example of the script the setting name is “GoogleVisionAPItest”. in addition, this setting is required separately even if you import the sample workflow.)
 Settings when Calling REST API of Another Service from Questetra (Questetra to be OAuth2 client)

Regarding the Script Task, the codes in the Script are as follows. (These are included in the sample App Archive as well.)

var bucket = 'questetra-visionapi-test'; // bucket name
var file = 'p' + processInstance.getProcessInstanceId(); // file name
var token = httpClient.getOAuth2Token('GoogleVisionAPItest');
var response, code;

// Google Cloud Storage
response = httpClient.begin()
  .bearer(token)
  .queryParam('name', file)
  .queryParam('uploadType', 'media')
  .body(q_file.get(0))
  .post('https://www.googleapis.com/upload/storage/v1/b/' + bucket + '/o');

code = response.getStatusCode();
if (code !== 200) {
  throw "response is not 200: "+ code + " " + response.getResponseAsString();
}

// Google Cloud Vision
var visionReq = {
  "requests" : [{
    "features" : [{ "type": "DOCUMENT_TEXT_DETECTION"}],
    "image" : {"source" : { "gcsImageUri" : 'gs://' + bucket + '/' + file}}
  }]
};

response = httpClient.begin()
  .bearer(token)
  .body(JSON.stringify(visionReq), "application/json")
  .post('https://vision.googleapis.com/v1/images:annotate');

code = response.getStatusCode();
if (code !== 200) {
  throw "response is not 200: "+ code + " " + response.getResponseAsString();
}

engine.setDataByVarName('q_response', response.getResponseAsString());

The overall flow of processing in the script is, first sending q_file, the File-type Data Item, to Cloud Storage, then invoking Cloud Vision API using the file URL as an argument. The returned result is saved in q_response, the String-type Data Item. It associates with the settings in the Workflow side by the Field names of the Data Items such as q-file and q_response. In addition, you can specify which feature of API to use with type parameter in features. Please see the following page for the details.
 Cloud Vision API: Feature
* The result differs when using DOCUMENT_TEXT_DETECTION or TEXT_DETECTION even using the same image. It seemed the former was better for images of a document that had created with such as Microsoft Word. In features, there are some others such as to detect landmarks in images, which I am interested in.
* Although in this case, I used a Script Task, it can also be realized by using two Message Intermediate Event (HTTP) instead.

As it is mentioned in Cloud Vision API Using API Explorer, “You can store up to 5GB of data in Cloud Storage for free and make up to 1000 feature requests to the Vision API for free per month.” Please keep in mind that you will be charged when exceeding the limit.

 

Closing

Thus, the incorporation of Google Cloud API invoking into Workflows can be achieved easily.

Not only for the Cloud Vision API, but also for the Translation API I could invoke in the same way. It would be good to incorporate into Workflows for translation type businesses. (I will write about it in another post when I get the chance.)

Also, based on this flow, it is possible to realize that sending a picture to a certain email address and returning a transcription back. That is creating an inlet for emails using a Message Start Event (Email), then extracting data from returned JSON, and replying with Throwing Message Intermediate Event (Email). (I actually created it as well but I cannot tell you the address since the limitation of API usage.)

If you have any questions concerning this post, please feel free to ask via our Inquiry Form.

%d bloggers like this: