Multiline String, Filter by RegExp

Multiline String, Filter by RegExp

Multiline String, Filter by RegExp

複数行文字列, 正規表現フィルタ

Narrows down to just the matched lines from the multiline text. A regular expression (RegExp) can be specified as the match condition. Each line is determined to match a regular expression. Also stores the filtered-out lines (removed lines).

Auto Step icon
Configs for this Auto Step
StrConfA1
A1: Set Multiline Text *#{EL}
StrConfB1
B1: Set Matching Condition (REGEXP) for each line *#{EL}
BoolConfB2
B2: Case Sensitive or Case should be Ignored
SelectConfC1
C1: Select STRING that stores Narrowed Text (update)
SelectConfC2
C2: Select STRING that stores Removed Lines (update)
Script (click to open)
// GraalJS Script (engine type: 2)

//////// START "main()" ////////////////////////////////////////////////////////////////
main();
function main(){ 

//// == Config Retrieving / 工程コンフィグの参照 ==
const strInput         = configs.get      ( "StrConfA1" );    /// REQUIRED
  if( strInput       === "" ){
    throw new Error( "\n AutomatedTask ConfigError:" +
                     " Config {A1: String} is empty \n" );
  }
const strRegexp        = configs.get      ( "StrConfB1" );    /// REQUIRED
  if( strRegexp      === "" ){
    throw new Error( "\n AutomatedTask ConfigError:" +
                     " Config {B1: String} is empty \n" );
  }
const boolIgnoreCase   = configs.getObject( "BoolConfB2" );   /// TOGGLE
  // https://questetra.zendesk.com/hc/ja/articles/360024574471-R2300 "Boolean object"
const strPocketOutput  = configs.getObject( "SelectConfC1" ); // NotRequired
const strPocketRemoved = configs.getObject( "SelectConfC2" ); // NotRequired


//// == Data Retrieving / ワークフローデータの参照 ==
// (Nothing. Retrieved via Expression Language in Config Retrieving)


//// == Calculating / 演算 ==
const arrInput         = strInput.split( '\n' );
const regConstructor   = boolIgnoreCase ?
                         new RegExp( strRegexp, 'i' ) : new RegExp( strRegexp );
// engine.log( " AutomatedTask RegExp: " + regConstructor.toString() );

let   arrOutput    = [];
let   arrRemoved   = [];

for ( let i = 0; i < arrInput.length; i++ ) {
  if ( regConstructor.test( arrInput[i] ) ){
    arrOutput.push ( arrInput[i] );
  } else {
    arrRemoved.push ( arrInput[i] );
  }
}

let strOutput  = arrOutput?.join( '\n' ) ?? "";
let strRemoved = arrRemoved?.join( '\n' ) ?? "";
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Optional_chaining


//// == Data Updating / ワークフローデータへの代入 ==
if ( strPocketOutput !== null ) {
  engine.setData ( strPocketOutput,  strOutput );
}
if ( strPocketRemoved !== null ) {
  engine.setData ( strPocketRemoved, strRemoved );
}

} //////// END "main()" /////////////////////////////////////////////////////////////////



/*
Notes:
- When a process reaches this automated step, "multi-line text" is automatically processed.
    - Each line of multiline text is tested with RegExp, and only lines matching the condition are output.
    - Note that it is possible to store not only filtered-down lines, but also filtered-out lines.
- If the amount of calculation is too large, the process will be terminated in the middle.
    - A large number of Lines or complex RegExp that requires more than 30 secs will result in a termination.
- No line feed code is inserted at the end of the output string.
    - "Narrowed Text" (filtered down)
    - "Removed Lines" (filtered out)
- Various human steps can be automated (unmanned).
     - Filter by date
     - Remove invalid email addresses
     - Remove invalid zip codes
     - Extract URLs starting with "http://"
     - Extract only jp domain URLs

Appendix-en:
- Basic regular expressions: Groups
    - `(apple|orange)`
        - Either "apple" or "orange"
- Basic regular expressions: A character class
    - `[xyz]`
        - Any one of the enclosed characters, "x" "y" or "z"
    - `[a-c]`
        - Any one in the range "a" to "c"
    - `[^xyz]`
        - Any one that is neither "x" nor "y" nor "z"
    - `\d`
        - Any digit / `[0-9]`
    - `\w`
        - Any alphanumeric character / `[A-Za-z0-9_]`
    - `\t`
        - A horizontal tab
    - `.`
        - Any single character except line terminators
- Basic regular expressions: Assertions
    - `^`
        - Beginning
    - `$`
        - End
    - `\b`
        - A word boundary
    - `\B`
        - A non-word boundary
- Basic regular expressions: Quantifiers
    - `x*`
        - The preceding item "x" 0 or more times
    - `x+`
        - The preceding item "x" 1 or more times
    - `x?`
        - The preceding item "x" 0 or 1 times
    - `x{n}`
        - "n" occurrences of the preceding item "x"
    - `x{n,m}`
        - at least "n" and at most "m" occurrences
- RegExp Examples
    - Japanese postal code in the middle of the line
        - `(\d{3}-\d{4})|(\d{7})`
        - `([0-9]{3}-[0-9]{4})|([0-9]{7})`
    - Comma + Gmail address
        - `,[a-zA-Z0-9\.\+]+@(gmail\.com|googlemail\.com)`
        - `,[a-zA-Z0-9.+]+@(gmail\.com|googlemail\.com)`
    - Secure HTTP in the middle of the line
        - `https:\/\/`
        - `https://`
    - ".com/" or ".com" at the end of the line
        - `\.com/?$`
    - Date at the begging of the line(2022-12-15 to 2023-01-06)
        - `(^2022-12-1[5-9])|(^2022-12-[2-3])|(^2023-01-0[1-6])`

Notes-ja:
- 案件がこの自動工程に到達した際、「複数行テキスト」が自動的に加工さます。
    - 複数行テキストの各行がそれぞれ指定した正規表現でテストされ、条件にマッチした行だけが出力されます。
    - なお、フィルタ抽出された行だけでなく、フィルタ除去された行を格納することも可能です。
- 計算量が多すぎる場合、途中終了します。
    - 行数が多い・正規表現が極めて複雑など、30秒を超えるような処理は途中で終了する可能性があります。
- 出力文字列の末尾に改行コードは挿入されません。
    - "絞り込まれたテキスト"
    - "除去された行"
- 様々なヒューマン工程を自動化(無人化)できます。
    - 日付を絞り込む
    - 不正なメールアドレスを除去する
    - 不正な郵便番号を除去する
    - "http://" 始まりのURLを抽出する
    - jpドメインのURLだけを抽出する

Appendix-ja:
- 基本的な正規表現(グループ/Groups)
    - `(apple|orange)`
        - "apple" か "orange" のいずれか / Either "apple" or "orange"
- 基本的な正規表現(文字クラス/A character class)
    - `[xyz]`
        - "x" か "y" か "z" の 一文字 / Any one of the enclosed characters, "x" "y" or "z"
    - `[a-c]`
        - "a" から "c" までのいずれか一文字 / Any one in the range "a" to "c"
    - `[^xyz]`
        - "x" でも "y" でも "z" でもない一文字 / Any one that is neither "x" nor "y" nor "z"
    - `\d`
        - 数字一文字 / Any digit / `[0-9]`
    - `\w`
        - 半角英数字一文字 / Any alphanumeric character / `[A-Za-z0-9_]`
    - `\t`
        - 水平タブ / A horizontal tab
    - `.`
        - 改行文字を除くあらゆる一文字 / Any single character except line terminators
- 基本的な正規表現(アサーション言明/Assertions)
    - `^`
        - 先頭 / Beginning
    - `$`
        - 末尾 / End
    - `\b`
        - 区切り / A word boundary
    - `\B`
        - 区切り以外 / A non-word boundary
- 基本的な正規表現(数量詞/Quantifiers)
    - `x*`
        - 直前アイテム "x" の0回以上の繰返 / The preceding item "x" 0 or more times
    - `x+`
        - 直前アイテム "x" の1回以上の繰返 / The preceding item "x" 1 or more times
    - `x?`
        - 直前アイテム "x" の0回か1回の出現 / The preceding item "x" 0 or 1 times
    - `x{n}`
        - 直前アイテム "x" のn回の出現 / "n" occurrences of the preceding item "x"
    - `x{n,m}`
        - 直前アイテム "x" がnからm回出現 / at least "n" and at most "m" occurrences
- 正規表現 設定例 Example
    - 途中の日本の郵便番号
        - `(\d{3}-\d{4})|(\d{7})`
        - `([0-9]{3}-[0-9]{4})|([0-9]{7})`
    - カンマ+Gmailアドレス
        - `,[a-zA-Z0-9\.\+]+@(gmail\.com|googlemail\.com)`
        - `,[a-zA-Z0-9.+]+@(gmail\.com|googlemail\.com)`
    - 途中のセキュアHTTP
        - `https:\/\/`
        - `https://`
    - 行末の ".com/" もしくは ".com"
        - `\.com/?$`
    - 行頭の日付(2022-12-15 to 2023-01-06)
        - `(^2022-12-1[5-9])|(^2022-12-[2-3])|(^2023-01-0[1-6])`

Test Data for Debug:
date,email,zipcode,url
2022-12-14,example@gmail.com,604-0835,https://questetra.com/
2022-12-14,suzuki@example.com,604-0835,https://QUESTETRA.COM/
2022-12-15,sato@example.com,604-0835,https://questetra.com/
2022-12-18,example@gmail.com,604-0835,https://support.questetra.com/
2023-01-06,example@googlemail.com,604-0835,HTTP://QUESTETRA.COM/
2023-01-06,suzuki@example.com,6040-835,https://questetra.com/
2023-01-06,SATO@example.com,6040835,https://questetra.com/
2023-01-07,EXAMPLE@GMAIL.COM,604-0835,http://support.questetra.com/
*/

Download

warning Freely modifiable JavaScript (ECMAScript) code. No warranty of any kind.
(Installing Addon Auto-Steps are available only on the Professional edition.)

Notes

  • When a process reaches this automated step, the multi-line text is automatically processed.
    • Each line of multiline text is tested with RegExp, and only lines matching the condition are output.
    • Note that it is possible to store not only filtered-down lines, but also filtered-out lines.
  • If the amount of calculation is too large, the process will be terminated in the middle.
    • A large number of Lines or complex RegExp that requires more than 30 secs will result in a termination.
  • No line feed code is inserted at the end of the output string.
    • “Narrowed Text” (filtered down)
    • “Removed Lines” (filtered out)
  • Various human steps can be automated (unmanned).
    • Filter by date
    • Remove invalid email addresses
    • Remove invalid zip codes
    • Extract URLs starting with “http://
    • Extract only jp domain URLs

Capture

Narrows down to just the matched lines from the multiline text. A regular expression (RegExp) can be specified as the match condition. Each line is determined to match a regular expression. Also stores the filtered-out lines (removed lines).

Appendix

  • Basic regular expressions: Groups
    • (apple|orange)
      • Either “apple” or “orange”
  • Basic regular expressions: A character class
    • [xyz]
      • Any one of the enclosed characters, “x” “y” or “z”
    • [a-c]
      • Any one in the range “a” to “c”
    • [^xyz]
      • Any one that is neither “x” nor “y” nor “z”
    • \d
      • Any digit / [0-9]
    • \w
      • Any alphanumeric character / [A-Za-z0-9_]
    • \t
      • A horizontal tab
    • .
      • Any single character except line terminators
  • Basic regular expressions: Assertions
    • ^
      • Beginning
    • $
      • End
    • \b
      • A word boundary
    • \B
      • A non-word boundary
  • Basic regular expressions: Quantifiers
    • x*
      • The preceding item “x” 0 or more times
    • x+
      • The preceding item “x” 1 or more times
    • x?
      • The preceding item “x” 0 or 1 times
    • x{n}
      • “n” occurrences of the preceding item “x”
    • x{n,m}
      • at least “n” and at most “m” occurrences
  • RegExp Examples
    • Japanese postal code in the middle of the line
      • (\d{3}-\d{4})|(\d{7})
      • ([0-9]{3}-[0-9]{4})|([0-9]{7})
    • Comma + Gmail address
      • ,[a-zA-Z0-9\.\+]+@(gmail\.com|googlemail\.com)
      • ,[a-zA-Z0-9.+]+@(gmail\.com|googlemail\.com)
    • Secure HTTP in the middle of the line
      • https:\/\/
      • https://
    • “.com/” or “.com” at the end of the line
      • \.com/?$
    • Date at the begging of the line(2022-12-15 to 2023-01-06)
      • (^2022-12-1[5-9])|(^2022-12-[2-3])|(^2023-01-0[1-6])

See Also

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top

Discover more from Questetra Support

Subscribe now to keep reading and get access to the full archive.

Continue reading