Multiline String, Filter by RegExp

Narrows down to just the matched lines from the multiline text. A regular expression (RegExp) can be specified as the match condition. Each line is determined to match a regular expression. Also stores the filtered-out lines (removed lines).

Configs for this Auto Step

StrConfA1: A1: Set Multiline Text *^#{EL}
StrConfB1: B1: Set Matching Condition (REGEXP) for each line *^#{EL}
BoolConfB2: B2: Case Sensitive or Case should be Ignored
SelectConfC1: C1: Select STRING that stores Narrowed Text (update)
SelectConfC2: C2: Select STRING that stores Removed Lines (update)

Script (click to open)

// GraalJS Script (engine type: 2)

//////// START "main()" ////////////////////////////////////////////////////////////////
main();
function main(){ 

//// == Config Retrieving / 工程コンフィグの参照 ==
const strInput         = configs.get      ( "StrConfA1" );    /// REQUIRED
  if( strInput       === "" ){
    throw new Error( "\n AutomatedTask ConfigError:" +
                     " Config {A1: String} is empty \n" );
  }
const strRegexp        = configs.get      ( "StrConfB1" );    /// REQUIRED
  if( strRegexp      === "" ){
    throw new Error( "\n AutomatedTask ConfigError:" +
                     " Config {B1: String} is empty \n" );
  }
const boolIgnoreCase   = configs.getObject( "BoolConfB2" );   /// TOGGLE
  // https://questetra.zendesk.com/hc/en-us/articles/360024574471-R2300 "Boolean object"
const strPocketOutput  = configs.getObject( "SelectConfC1" ); // NotRequired
const strPocketRemoved = configs.getObject( "SelectConfC2" ); // NotRequired


//// == Data Retrieving / ワークフローデータの参照 ==
// (Nothing. Retrieved via Expression Language in Config Retrieving)


//// == Calculating / 演算 ==
const arrInput         = strInput.split( '\n' );
const regConstructor   = boolIgnoreCase ?
                         new RegExp( strRegexp, 'i' ) : new RegExp( strRegexp );
// engine.log( " AutomatedTask RegExp: " + regConstructor.toString() );

let   arrOutput    = [];
let   arrRemoved   = [];

for ( let i = 0; i < arrInput.length; i++ ) {
  if ( regConstructor.test( arrInput[i] ) ){
    arrOutput.push ( arrInput[i] );
  } else {
    arrRemoved.push ( arrInput[i] );
  }
}

let strOutput  = arrOutput?.join( '\n' ) ?? "";
let strRemoved = arrRemoved?.join( '\n' ) ?? "";
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Optional_chaining


//// == Data Updating / ワークフローデータへの代入 ==
if ( strPocketOutput !== null ) {
  engine.setData ( strPocketOutput,  strOutput );
}
if ( strPocketRemoved !== null ) {
  engine.setData ( strPocketRemoved, strRemoved );
}

} //////// END "main()" /////////////////////////////////////////////////////////////////



/*
Notes:
- When a process reaches this automated step, "multi-line text" is automatically processed.
    - Each line of multiline text is tested with RegExp, and only lines matching the condition are output.
    - Note that it is possible to store not only filtered-down lines, but also filtered-out lines.
- If the amount of calculation is too large, the process will be terminated in the middle.
    - A large number of Lines or complex RegExp that requires more than 30 secs will result in a termination.
- No line feed code is inserted at the end of the output string.
    - "Narrowed Text" (filtered down)
    - "Removed Lines" (filtered out)
- Various human steps can be automated (unmanned).
     - Filter by date
     - Remove invalid email addresses
     - Remove invalid zip codes
     - Extract URLs starting with "http://"
     - Extract only jp domain URLs

Appendix-en:
- Basic regular expressions: Groups
    - `(apple|orange)`
        - Either "apple" or "orange"
- Basic regular expressions: A character class
    - `[xyz]`
        - Any one of the enclosed characters, "x" "y" or "z"
    - `[a-c]`
        - Any one in the range "a" to "c"
    - `[^xyz]`
        - Any one that is neither "x" nor "y" nor "z"
    - `\d`
        - Any digit / `[0-9]`
    - `\w`
        - Any alphanumeric character / `[A-Za-z0-9_]`
    - `\t`
        - A horizontal tab
    - `.`
        - Any single character except line terminators
- Basic regular expressions: Assertions
    - `^`
        - Beginning
    - `$`
        - End
    - `\b`
        - A word boundary
    - `\B`
        - A non-word boundary
- Basic regular expressions: Quantifiers
    - `x*`
        - The preceding item "x" 0 or more times
    - `x+`
        - The preceding item "x" 1 or more times
    - `x?`
        - The preceding item "x" 0 or 1 times
    - `x{n}`
        - "n" occurrences of the preceding item "x"
    - `x{n,m}`
        - at least "n" and at most "m" occurrences
- RegExp Examples
    - Japanese postal code in the middle of the line
        - `(\d{3}-\d{4})|(\d{7})`
        - `([0-9]{3}-[0-9]{4})|([0-9]{7})`
    - Comma + Gmail address
        - `,[a-zA-Z0-9\.\+]+@(gmail\.com|googlemail\.com)`
        - `,[a-zA-Z0-9.+]+@(gmail\.com|googlemail\.com)`
    - Secure HTTP in the middle of the line
        - `https:\/\/`
        - `https://`
    - ".com/" or ".com" at the end of the line
        - `\.com/?$`
    - Date at the begging of the line（2022-12-15 to 2023-01-06）
        - `(^2022-12-1[5-9])|(^2022-12-[2-3])|(^2023-01-0[1-6])`

Notes-ja:
- 案件がこの自動工程に到達した際、「複数行テキスト」が自動的に加工さます。
    - 複数行テキストの各行がそれぞれ指定した正規表現でテストされ、条件にマッチした行だけが出力されます。
    - なお、フィルタ抽出された行だけでなく、フィルタ除去された行を格納することも可能です。
- 計算量が多すぎる場合、途中終了します。
    - 行数が多い・正規表現が極めて複雑など、30秒を超えるような処理は途中で終了する可能性があります。
- 出力文字列の末尾に改行コードは挿入されません。
    - "絞り込まれたテキスト"
    - "除去された行"
- 様々なヒューマン工程を自動化（無人化）できます。
    - 日付を絞り込む
    - 不正なメールアドレスを除去する
    - 不正な郵便番号を除去する
    - "http://" 始まりのURLを抽出する
    - jpドメインのURLだけを抽出する

Appendix-ja:
- 基本的な正規表現（グループ／Groups）
    - `(apple|orange)`
        - "apple" か "orange" のいずれか　／　Either "apple" or "orange"
- 基本的な正規表現（文字クラス／A character class）
    - `[xyz]`
        - "x" か "y" か "z" の 一文字　／　Any one of the enclosed characters, "x" "y" or "z"
    - `[a-c]`
        - "a" から "c" までのいずれか一文字　／　Any one in the range "a" to "c"
    - `[^xyz]`
        - "x" でも "y" でも "z" でもない一文字　／　Any one that is neither "x" nor "y" nor "z"
    - `\d`
        - 数字一文字　／　Any digit　／　`[0-9]`
    - `\w`
        - 半角英数字一文字　／　Any alphanumeric character　／　`[A-Za-z0-9_]`
    - `\t`
        - 水平タブ　／　A horizontal tab
    - `.`
        - 改行文字を除くあらゆる一文字　／　Any single character except line terminators
- 基本的な正規表現（アサーション言明／Assertions）
    - `^`
        - 先頭　／　Beginning
    - `$`
        - 末尾　／　End
    - `\b`
        - 区切り　／　A word boundary
    - `\B`
        - 区切り以外　／　A non-word boundary
- 基本的な正規表現（数量詞／Quantifiers）
    - `x*`
        - 直前アイテム "x" の0回以上の繰返　／　The preceding item "x" 0 or more times
    - `x+`
        - 直前アイテム "x" の1回以上の繰返　／　The preceding item "x" 1 or more times
    - `x?`
        - 直前アイテム "x" の0回か1回の出現　／　The preceding item "x" 0 or 1 times
    - `x{n}`
        - 直前アイテム "x" のn回の出現　／　"n" occurrences of the preceding item "x"
    - `x{n,m}`
        - 直前アイテム "x" がnからm回出現　／　at least "n" and at most "m" occurrences
- 正規表現 設定例 Example
    - 途中の日本の郵便番号
        - `(\d{3}-\d{4})|(\d{7})`
        - `([0-9]{3}-[0-9]{4})|([0-9]{7})`
    - カンマ＋Gmailアドレス
        - `,[a-zA-Z0-9\.\+]+@(gmail\.com|googlemail\.com)`
        - `,[a-zA-Z0-9.+]+@(gmail\.com|googlemail\.com)`
    - 途中のセキュアHTTP
        - `https:\/\/`
        - `https://`
    - 行末の ".com/" もしくは ".com"
        - `\.com/?$`
    - 行頭の日付（2022-12-15 to 2023-01-06）
        - `(^2022-12-1[5-9])|(^2022-12-[2-3])|(^2023-01-0[1-6])`

Test Data for Debug:
date,email,zipcode,url
2022-12-14,example@gmail.com,604-0835,https://questetra.com/
2022-12-14,suzuki@example.com,604-0835,https://QUESTETRA.COM/
2022-12-15,sato@example.com,604-0835,https://questetra.com/
2022-12-18,example@gmail.com,604-0835,https://support.questetra.com/
2023-01-06,example@googlemail.com,604-0835,HTTP://QUESTETRA.COM/
2023-01-06,suzuki@example.com,6040-835,https://questetra.com/
2023-01-06,SATO@example.com,6040835,https://questetra.com/
2023-01-07,EXAMPLE@GMAIL.COM,604-0835,http://support.questetra.com/
*/

Download

multiline-string-filter-by-regexp-2023.xml
- 2023-01-24 (C) Questetra, Inc. (MIT License)

warning Freely modifiable JavaScript (ECMAScript) code. No warranty of any kind.
(Installing Addon Auto-Steps are available only on the Professional edition.)

Notes

When a process reaches this automated step, the multi-line text is automatically processed.
- Each line of multiline text is tested with RegExp, and only lines matching the condition are output.
- Note that it is possible to store not only filtered-down lines, but also filtered-out lines.
If the amount of calculation is too large, the process will be terminated in the middle.
- A large number of Lines or complex RegExp that requires more than 30 secs will result in a termination.
No line feed code is inserted at the end of the output string.
- “Narrowed Text” (filtered down)
- “Removed Lines” (filtered out)
Various human steps can be automated (unmanned).
- Filter by date
- Remove invalid email addresses
- Remove invalid zip codes
- Extract URLs starting with “http://“
- Extract only jp domain URLs

Capture

Narrows down to just the matched lines from the multiline text. A regular expression (RegExp) can be specified as the match condition. Each line is determined to match a regular expression. Also stores the filtered-out lines (removed lines).

Appendix

Basic regular expressions: Groups
- (apple|orange)
  - Either “apple” or “orange”
Basic regular expressions: A character class
- [xyz]
  - Any one of the enclosed characters, “x” “y” or “z”
- [a-c]
  - Any one in the range “a” to “c”
- [^xyz]
  - Any one that is neither “x” nor “y” nor “z”
- \d
  - Any digit / [0-9]
- \w
  - Any alphanumeric character / [A-Za-z0-9_]
- \t
  - A horizontal tab
- .
  - Any single character except line terminators
Basic regular expressions: Assertions
- ^
  - Beginning
- $
  - End
- \b
  - A word boundary
- \B
  - A non-word boundary
Basic regular expressions: Quantifiers
- x*
  - The preceding item “x” 0 or more times
- x+
  - The preceding item “x” 1 or more times
- x?
  - The preceding item “x” 0 or 1 times
- x{n}
  - “n” occurrences of the preceding item “x”
- x{n,m}
  - at least “n” and at most “m” occurrences
RegExp Examples
- Japanese postal code in the middle of the line
  - (\d{3}-\d{4})|(\d{7})
  - ([0-9]{3}-[0-9]{4})|([0-9]{7})
- Comma + Gmail address
  - ,[a-zA-Z0-9\.\+]+@(gmail\.com|googlemail\.com)
  - ,[a-zA-Z0-9.+]+@(gmail\.com|googlemail\.com)
- Secure HTTP in the middle of the line
  - https:\/\/
  - https://
- “.com/” or “.com” at the end of the line
  - \.com/?$
- Date at the begging of the line（2022-12-15 to 2023-01-06）
  - (^2022-12-1[5-9])|(^2022-12-[2-3])|(^2023-01-0[1-6])