#String: Extract by RegExp

Extracts all strings that match the regular expression. For example, if you set a URL regular expression, all URLs in the text will be extracted.

Configs for this Auto Step

StrConfA: A: Set Text to Search for *^#{EL}
StrConfB1: B1: Set Regular Expression (eg: “(\d{3}-\d{4})|(\d{7})”) *^#{EL}
BoolConfB2: B2: Case Sensitive or Case should be Ignored
StrConfB3: B3: Set Max Num of Extractions (default: “10”)^#{EL}
SelectConfC: C: Select DATA to store Extracted Strings (update) *
SelectConfD1: D1: Select NUMERIC for Number of Text Lines (update)
SelectConfD2: D2: Select NUMERIC for Number of Matches (update)

Script (click to open)

// Script Example of Business Process Automation
// for 'engine type: 3' ("GraalJS standard mode")
// cf. 'engine type: 2' ("GraalJS Nashorn compatible mode") (renamed from "GraalJS" at 20230526)


//////// START "main()" /////////////////////////////////////////////////////////////////
main();
function main(){ 

//// == Config Retrieving / 工程コンフィグの参照 ==
const strInput       = configs.get       ( "StrConfA" );      // REQUIRED
  if( strInput     === "" ){
    throw new Error( "\n AutomatedTask ConfigError:" +
                     " Config {A: InputText} is empty \n" );
  }
const numInputLines  = strInput.split("\n").length;

const strRegExp      = configs.get       ( "StrConfB1" );     // REQUIRED
const boolIgnoreCase = configs.getObject ( "BoolConfB2" );    // TOGGLE
  // https://questetra.zendesk.com/hc/en-us/articles/360024574471-R2300 "Boolean object"
const strMax         = configs.get       ( "StrConfB3" );     // not required
const numMax         = isNaN ( parseInt(strMax,10) ) ?
                       10 : // defalut
                       parseInt(strMax,10);
  engine.log( " #of Max: " + numMax );
const strPocketC     = configs.getObject ( "SelectConfC" );   // REQUIRED
const numPocketD1    = configs.getObject ( "SelectConfD1" );  // not required
const numPocketD2    = configs.getObject ( "SelectConfD2" );  // not required



//// == Data Retrieving / ワークフローデータの参照 ==
// (nothing)



//// == Calculating / 演算 ==
const regSearch  = boolIgnoreCase ?
                   new RegExp( strRegExp, 'ig' ) : new RegExp( strRegExp, 'g' );

let arrMatches   = [ ...strInput.matchAll ( regSearch ) ];  // Spread Syntax
let arrExtracted = [];


for ( let i = 0; i < arrMatches.length; i++ ) {
  arrExtracted.push ( arrMatches[i][0] );
  engine.log( " AutomatedTask Match Index: " + arrMatches[i].index );
  if ( i ===  numMax - 1 ){ break; }
}


//// == Data Updating / ワークフローデータへの代入 ==
/// ref) Retrieving / Updating from ScriptTasks
/// https://questetra.zendesk.com/hc/en-us/articles/360024574771-R2301
/// https://questetra.zendesk.com/hc/en-us/articles/360024574771-R2301

if ( strPocketC !== null ){ 
  engine.setData( strPocketC, arrExtracted?.join( '\n' ) ?? "" );
}

if ( numPocketD1 !== null ){ 
  engine.setData( numPocketD1, new java.math.BigDecimal( numInputLines ) );
}
if ( numPocketD2 !== null ){ 
  engine.setData( numPocketD2, new java.math.BigDecimal( arrMatches.length ) );
}

} //////// END "main()" /////////////////////////////////////////////////////////////////



/*
NOTES
- The Process reaches this [Automated Step], the "Extraction" is automatically executed.
    - "Match strings" within the input text are extracted.
    - You can set the maximum number of items to extract.
- The number of matches can also be recorded.
- The number of lines in the input text can also be recorded.
- Regular Expressions
    - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions

NOTES-ja
- この［自動工程］に案件が到達すると、「抽出処理」が自動実行されます。
    - Inputテキスト内にある「マッチ文字列」が抽出されます。
    - 抽出件数の最大値を設定できます。
- マッチ件数も記録可能です。
- 入力Inputテキストの行数も記録可能です。
- 正規表現とは
    - https://developer.mozilla.org/ja/docs/Web/JavaScript/Guide/Regular_expressions

▼Test Data for Debug:
Our front pages are https://questetra.com/en/ and https://questetra.com/ja/.
URLには https://questetra.com/トップ とか https://questetra.com/ja/?hoge=12.34 とか
 HTTPS://questetra.com/ja/#hoge とか http://questetra.com 
とか https://questetra.com#hoge とか https://questetra.
com/ とか https://questetra.com/ いろいろある

APPENDIX
- 正規表現 設定例 / RegExp Example
    - URL
        - `https?://[\w/:%#\$&\?\(\)~\.=\+\-]+`
    - 日本の郵便番号 / Japanese Postal Code
        - `(\d{3}-\d{4})|(\d{7})`
        - `([0-9]{3}-[0-9]{4})|([0-9]{7})`
    - ISO Date from 2024-12-15 to 2025-01-06
        - `(2024-12-1[5-9])|(2024-12-[2-3][0-9])|(2025-01-0[1-6])`
*/

Download

string-extract-by-regexp-2025.xml
- 2025-04-03 (C) Questetra, Inc. (MIT License)

warning Freely modifiable JavaScript (ECMAScript) code. No warranty of any kind.
(Installing Addon Auto-Steps are available only on the Professional edition.)

Notes

When a Process reaches this [Automated Step], the extraction is automatically executed.
- Matching strings within the input text are extracted.
- You can set the maximum number of items to be extracted.
The number of matches can also be recorded.
The number of lines in the input text can also be recorded.
Regular Expressions
- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions

Capture

Appendix

RegExp Example
- URL
  - https?://[\w/:%#\$&\?~\.=\+\-]+
- Japanese Postal Code
  - (\d{3}-\d{4})|(\d{7})
  - ([0-9]{3}-[0-9]{4})|([0-9]{7})
- ISO Date from 2024-12-15 to 2025-01-06
  - (2024-12-1[5-9])|(2024-12-[2-3][0-9])|(2025-01-0[1-6])