#文字列: 正規表現で抽出

正規表現にマッチする文字列を全て抽出します。たとえば「URL正規表現」をセットしておけば、テキスト文中にある全てのURLが抽出されます。

Configs for this Auto Step

StrConfA: A: 探索対象テキストをセットしてください *^#{EL}
StrConfB1: B1: 正規表現をセットしてください（例: “(\d{3}-\d{4})|(\d{7})”） *^#{EL}
BoolConfB2: B2: 大文字小文字を区別 ⇔ 大文字小文字を無視
StrConfB3: B3: 抽出件数の最大数をセットしてください（デフォルト: “10”）^#{EL}
SelectConfC: C: 抽出文字列が格納される複数行文字列型データを選択してください (更新) *
SelectConfD1: D1: テキスト行数を格納する数値型データを選択してください (更新)
SelectConfD2: D2: マッチ件数を格納する数値型データを選択してください (更新)

Script (click to open)

// Script Example of Business Process Automation
// for 'engine type: 3' ("GraalJS standard mode")
// cf. 'engine type: 2' ("GraalJS Nashorn compatible mode") (renamed from "GraalJS" at 20230526)


//////// START "main()" /////////////////////////////////////////////////////////////////
main();
function main(){ 

//// == Config Retrieving / 工程コンフィグの参照 ==
const strInput       = configs.get       ( "StrConfA" );      // REQUIRED
  if( strInput     === "" ){
    throw new Error( "\n AutomatedTask ConfigError:" +
                     " Config {A: InputText} is empty \n" );
  }
const numInputLines  = strInput.split("\n").length;

const strRegExp      = configs.get       ( "StrConfB1" );     // REQUIRED
const boolIgnoreCase = configs.getObject ( "BoolConfB2" );    // TOGGLE
  // https://questetra.zendesk.com/hc/ja/articles/360024574471-R2300 "Boolean object"
const strMax         = configs.get       ( "StrConfB3" );     // not required
const numMax         = isNaN ( parseInt(strMax,10) ) ?
                       10 : // defalut
                       parseInt(strMax,10);
  engine.log( " #of Max: " + numMax );
const strPocketC     = configs.getObject ( "SelectConfC" );   // REQUIRED
const numPocketD1    = configs.getObject ( "SelectConfD1" );  // not required
const numPocketD2    = configs.getObject ( "SelectConfD2" );  // not required



//// == Data Retrieving / ワークフローデータの参照 ==
// (nothing)



//// == Calculating / 演算 ==
const regSearch  = boolIgnoreCase ?
                   new RegExp( strRegExp, 'ig' ) : new RegExp( strRegExp, 'g' );

let arrMatches   = [ ...strInput.matchAll ( regSearch ) ];  // Spread Syntax
let arrExtracted = [];


for ( let i = 0; i < arrMatches.length; i++ ) {
  arrExtracted.push ( arrMatches[i][0] );
  engine.log( " AutomatedTask Match Index: " + arrMatches[i].index );
  if ( i ===  numMax - 1 ){ break; }
}


//// == Data Updating / ワークフローデータへの代入 ==
/// ref) Retrieving / Updating from ScriptTasks
/// https://questetra.zendesk.com/hc/ja/articles/360024574771-R2301
/// https://questetra.zendesk.com/hc/ja/articles/360024574771-R2301

if ( strPocketC !== null ){ 
  engine.setData( strPocketC, arrExtracted?.join( '\n' ) ?? "" );
}

if ( numPocketD1 !== null ){ 
  engine.setData( numPocketD1, new java.math.BigDecimal( numInputLines ) );
}
if ( numPocketD2 !== null ){ 
  engine.setData( numPocketD2, new java.math.BigDecimal( arrMatches.length ) );
}

} //////// END "main()" /////////////////////////////////////////////////////////////////



/*
NOTES
- The Process reaches this [Automated Step], the "Extraction" is automatically executed.
    - "Match strings" within the input text are extracted.
    - You can set the maximum number of items to extract.
- The number of matches can also be recorded.
- The number of lines in the input text can also be recorded.
- Regular Expressions
    - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions

NOTES-ja
- この［自動工程］に案件が到達すると、「抽出処理」が自動実行されます。
    - Inputテキスト内にある「マッチ文字列」が抽出されます。
    - 抽出件数の最大値を設定できます。
- マッチ件数も記録可能です。
- 入力Inputテキストの行数も記録可能です。
- 正規表現とは
    - https://developer.mozilla.org/ja/docs/Web/JavaScript/Guide/Regular_expressions

▼Test Data for Debug:
Our front pages are https://questetra.com/en/ and https://questetra.com/ja/.
URLには https://questetra.com/トップ とか https://questetra.com/ja/?hoge=12.34 とか
 HTTPS://questetra.com/ja/#hoge とか http://questetra.com 
とか https://questetra.com#hoge とか https://questetra.
com/ とか https://questetra.com/ いろいろある

APPENDIX
- 正規表現 設定例 / RegExp Example
    - URL
        - `https?://[\w/:%#\$&\?\(\)~\.=\+\-]+`
    - 日本の郵便番号 / Japanese Postal Code
        - `(\d{3}-\d{4})|(\d{7})`
        - `([0-9]{3}-[0-9]{4})|([0-9]{7})`
    - ISO Date from 2024-12-15 to 2025-01-06
        - `(2024-12-1[5-9])|(2024-12-[2-3][0-9])|(2025-01-0[1-6])`
*/

Download

string-extract-by-regexp-2025.xml
- 2025-04-03 (C) Questetra, Inc. (MIT License)

warning 自由改変可能な JavaScript (ECMAScript) コードです。いかなる保証もありません。
(アドオン自動工程のインストールは Professional editionでのみ可能です)

Notes

この［自動工程］に案件が到達すると、「抽出処理」が自動実行されます。
- Inputテキスト内にある「マッチ文字列」が抽出されます。
- 抽出件数の最大値を設定できます。
マッチ件数も記録可能です。
入力Inputテキストの行数も記録可能です。
正規表現とは
- https://developer.mozilla.org/ja/docs/Web/JavaScript/Guide/Regular_expressions

Capture

Appendix

正規表現設定例 / RegExp Example
- URL
  - https?://[\w/:%#\$&\?~\.=\+\-]+
- 日本の郵便番号 / Japanese Postal Code
  - (\d{3}-\d{4})|(\d{7})
  - ([0-9]{3}-[0-9]{4})|([0-9]{7})
- ISO Date from 2024-12-15 to 2025-01-06
  - (2024-12-1[5-9])|(2024-12-[2-3][0-9])|(2025-01-0[1-6])