TSV String, Create Summary Table of Count

TSV String, Create Summary Table of Count

Built-in automatic processing step [Update CSV Data] can also be used to perform the same processing. (Recommended)

The settings for generating a summary table of data counts from TSV strings in [Update CSV Data] are as follows.

  • C1: Data items to be entered from TSV file
  • C1a: “TSV” / “Header included”
  • C1b: Table name to be called in C3 (e.g., Sales Data)
  • C2: (Leave blank)
  • C3: Specify the following query after replacing “Product Name” and “Sales Data” with the actual column names
    SELECT
    "Product Name", -- Column name of the aggregate key in the source TSV file
    COUNT(*) AS "Count",
    ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM "Sales Data"), 2) AS "Percentage"
    -- Sales Data is the table name representing the source TSV specified in C1b
    FROM
    "Sales Data""
    GROUP BY
    "Product Name"
  • C4: “Save the entire table to a single data item in CSV / TSV format”
  • C4a: Data item to store the result TSV
  • C4b: “TSV”
  • C4c: “Only when necessary”

The TSV data to be entered should be prepared with the column names for each column entered in the first row.

TSV String, Create Summary Table of Count
Aggregates the count of data by aggregation-keys. Count and Percentage are tabulated as a summary table TSV. E.g. Survey Results TSV is automatically generated from Survey Records TSV. Frequency distribution.
Configs
  • A1: Set TSV *#{EL}
  • B2: Set Column ID of Aggregation-Key Field (eg “4” ) *#{EL}
  • C1: To Sort by Count, Set DESC or ASC (eg “ASC” )#{EL}
  • C2: To Sort by Agg-Key, Set DESC or ASC (eg “ASC” )#{EL}
  • D1: Select STRING DATA that stores Summary TSV (update) *
  • D2: Select NUMERIC DATA that stores Count Total (update)
Script (click to open)
// GraalJS Script (engine type: 3)

//////// START "main()" /////////////////////////////////////////////////////////////////
main();
function main(){ 

//// == Config Retrieving / 工程コンフィグの参照 ==
const strTsv         = configs.get( "StrConfA1" );           /// REQUIRED /////////////
  if( strTsv       === "" ){
    throw new Error( "\n AutomatedTask ConfigError:" +
                     " Config {A1: Tsv} is empty \n" );
  }
  const arr2dTsv     = parseAsRectangular( strTsv );
const strYaggField   = configs.get( "StrConfB2" );           /// REQUIRED /////////////
  if( strYaggField === "" ){
    throw new Error( "\n AutomatedTask ConfigError:" +
                     " Config {B2: Y-AggField} is empty \n" );
  }
  const numYaggField = parseInt( strYaggField, 10 );
  if( isNaN( numYaggField ) || numYaggField < 0 ){
    throw new Error( "\n AutomatedTask ConfigError:" +
                     " Config {B2: YaggField} must be a positive integer \n" );
  }
  if( numYaggField  >= arr2dTsv[0].length ){
    throw new Error( "\n AutomatedTask ConfigError:" +
                     " Config {B2: YaggField} is larger than TsvWidth \n" );
  }
const strNumSort     = configs.get( "StrConfC1" );           // NotRequired ///////////
const strYaggSort    = configs.get( "StrConfC2" );           // NotRequired ///////////

const strPocketOutput     = configs.getObject( "SelectConfD1" ); // NotRequired ///////
const numPocketGrandTotal = configs.getObject( "SelectConfD2" ); // NotRequired ///////


//// == Data Retrieving / ワークフローデータの参照 ==
// (Nothing. Retrieved via Expression Language in Config Retrieving)


//// == Calculating / 演算 ==

// omit repeated Keys in Y-Aggregation Field values (make Keys uniq)
let arrYaggKeys = [];
for( let i = 0; i < arr2dTsv.length; i++ ){
  if( arrYaggKeys.indexOf( arr2dTsv[i][ numYaggField ] ) === -1){
      arrYaggKeys.push(    arr2dTsv[i][ numYaggField ] );
  }
  // Array.indexOf(): strict equality `===` 
  // https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array/indexOf
}

// sort by Keys
if( strYaggSort === "ASC" ){ // ASC: ascending alphabetical from A to Z
  arrYaggKeys.sort( function( strA, strB ){
    if( strA > strB ){ return  1; }
    if( strA < strB ){ return -1; }
    return 0;
  });
}
if( strYaggSort === "DESC" ){ // DESC: descending from Z to A
  arrYaggKeys.sort( function( strA, strB ){
    if( strA < strB ){ return  1; }
    if( strA > strB ){ return -1; }
    return 0;
  });
}

// initialize arr2dOutput[y][x]
let arr2dOutput = [];
for( let i = 0; i < arrYaggKeys.length; i++ ){
  arr2dOutput[i] = [];
  arr2dOutput[i][0] = arrYaggKeys[i];
  arr2dOutput[i][1] = 0; // count
  arr2dOutput[i][2] = 0; // share
} // arr2dOutput.length === arrYaggKeys.length

// sum up
for( let i = 0; i < arr2dTsv.length; i++ ){
  let strKey = arr2dTsv[i][numYaggField];
  arr2dOutput[ arrYaggKeys.indexOf( strKey ) ][1] += 1;
}

for( let i = 0; i < arr2dOutput.length; i++ ){
  arr2dOutput[i][2] = (arr2dOutput[i][1] / arr2dTsv.length).toFixed(3);
}

// sort by Total
if( strNumSort === "ASC" ){ // ASC: ascending alphabetical from 0 to 10
  arr2dOutput.sort( function( arrA, arrB ){
    if( arrA[1] > arrB[1] ){ return  1; }
    if( arrA[1] < arrB[1] ){ return -1; }
    return 0;
  });
}
if( strNumSort === "DESC" ){ // DESC: descending from 10 to 0
  arr2dOutput.sort( function( arrA, arrB ){
    if( arrA[1] < arrB[1] ){ return  1; }
    if( arrA[1] > arrB[1] ){ return -1; }
    return 0;
  });
}

// output Summary Tabulation
let strOutput = "";
for( let i = 0; i < arr2dOutput.length; i++ ){
  strOutput += arr2dOutput[i].join("\t") + "\n";
}
strOutput = strOutput.slice( 0, -1 ); // delete last "\n"


//// == Data Updating / ワークフローデータへの代入 ==
engine.setData( strPocketOutput, strOutput );
if( numPocketGrandTotal !== null ){
  engine.setData( numPocketGrandTotal, new java.math.BigDecimal( arr2dTsv.length ) );
}


} //////// END "main()" /////////////////////////////////////////////////////////////////


// Parses TSV string as two-dimensional rectangular data matrix and creates a 2D array.
function parseAsRectangular( strTsv ){
  const arrTsv = strTsv.split("\n");

  /// Get numMinWidth and numMaxWidth (blank lines are excluded)
  let numMinWidth   = Infinity; // cf. String-Type Max: 1 million
  let numMaxWidth   = 0;
  let numBlanklines = 0;
  for( let i = 0; i < arrTsv.length; i++ ){
    if( arrTsv[i] === "" ){ // Skip blank lines
      numBlanklines += 1;
      continue;
    }
    let arrCells = arrTsv[i].split("\t");
    if( numMinWidth > arrCells.length ){ numMinWidth = arrCells.length; }
    if( numMaxWidth < arrCells.length ){ numMaxWidth = arrCells.length; }
  }
  engine.log( " AutomatedTask TsvDataCheck:" + 
              " MinWidth:" + numMinWidth +
              " MaxWidth:" + numMaxWidth +
              " Lines:" + arrTsv.length +
              " (BlankLines:" + numBlanklines + ")" );

  /// Get numMinWidth and numMaxWidth (blank lines are excluded)
  let arr2dTsv      = [];
  for( let i = 0; i < arrTsv.length; i++ ){
    if( arrTsv[i] === "" ){ // Skip blank lines
      continue;
    }
    let arrTmp = [];
    let arrCells = arrTsv[i].split("\t");
    for( let j = 0; j < numMaxWidth; j++ ){
      if( j < arrCells.length ){
        arrTmp[j] = arrCells[j];
      }else{
        arrTmp[j] = "";
      }
    }
    arr2dTsv.push( arrTmp );
  }

  return arr2dTsv;
}

/*
TSV Example:
2021-08	Orange	200
2021-08	Apple	1,200
2021-09	Apple	1,100
2021-09	Orange	200
2021-09	Tomato	500
2021-09	Orange	200
2021-09	Tomato	500
2021-10	Tomato	500
2021-10	Tomato	500
2021-10	Tomato	450
2021-10	Tomato	450

Notes:
- When the process reaches this automated task, TSV is automatically read.
    - TSV: Sales Records, Survey Records, etc.
- If there is a blank line in the input TSV text, it will be skipped.
    - The line feed code for the last line is not added either.
- The output TSV text (total table TSV, etc.) is rectangular data.
    - The number of rows in TSV is the number of types of "aggregate Key".
    - The number of columns of TSV is 3 of "key", "count" and "percentage"

APPENDIX:
- TSV (Tab Separated Values) text assumes rectangular data.
    - Data that is not rectangular is automatically formatted with empty characters.
    - Blank lines (including the end) are ignored.
- Only "DESC" or "ASC" is valid for the sort config.
    - If no sort setting, it will be in the order of appearance.
    - Character sorting is in code order.
    - If both character sort and numeric sort are configured, numeric sort is performed after character sort.
- To create a histogram, place "TSV String, Switch Rows and Columns" downstream.
    - https://support.questetra.com/addons/tsv-string-switch-rows-and-columns-2021/


Notes-ja:
- 案件が自動処理工程に到達した際、文字列型データに保存されているTSVが自動的に読み込まれます。
    - TSV: 売上レコード、アンケート回答ログ、など
- 入力TSVテキストに空行がある場合、スキップされます。
    - 出力TSVの最終行に改行コードは付与されません。
- 出力されるTSVテキスト(合計値テーブルTSV等)は、矩形データです。
    - TSVの行数は "集約Key" の種類数になります。
    - TSVの列数は、"集約Key列" "個数列" "割合値列" の3です。

APPENDIX-ja:
- TSV(Tab Separated Values)テキストは、矩形データを前提とします。
    - 矩形でないデータは、空文字によって自動整形されます。
    - 空行(末尾改行を含む)は無視されます。
- ソート設定は "DESC" もしくは "ASC" のみが有効です。
    - ソート設定がない場合、出現順になります。
    - 文字ソートは文字コード順になります。
    - 文字ソートと数値ソートがどちらも設定された場合、文字ソートの後に数値ソートが実行されます。
- ヒストグラム(histogram)を作成するには、下流に『TSV 文字列, 列と行を入替』を配置するなどします。
    - https://support.questetra.com/ja/addons/tsv-string-switch-rows-and-columns-2021/
*/

Download

warning Freely modifiable JavaScript (ECMAScript) code. No warranty of any kind.
(Installing Addon Auto-Steps are available only on the Professional edition.)

Notes

  • When the process reaches this automated task, the TSV is read automatically.
    • TSV: Sales Records, Survey Records, etc.
  • If there is a blank line in the input TSV text, it will be skipped.
    • The line feed code for the last line is not added either.
      - The output TSV text (total table TSV, etc.) is rectangular data.
    • The number of rows in TSV is the number of types of “aggregate Key”.
    • The number of TSV columns is 3: key, count and percentage

Capture

Aggregates the count of data by aggregation-keys. Count and Percentage are tabulated as a summary table TSV. Eg, Survey Results TSV is automatically generated from Survey Records TSV. Frequency distribution.
Aggregates the count of data by aggregation-keys. Count and Percentage are tabulated as a summary table TSV. Eg, Survey Results TSV is automatically generated from Survey Records TSV. Frequency distribution.

Appendix

  • TSV (Tab Separated Values) text assumes rectangular data.
    • Data that is not rectangular is automatically formatted with empty characters.
    • Blank lines (including the end) are ignored.
  • Only “DESC” or “ASC” is valid for the sort config.
    • If no sort setting is specified, it will be sorted in the order of appearance.
    • Character sorting is in code order.
    • If both character sort and numeric sort are configured, numeric sort is performed after character sort.
  • To create a histogram, place “TSV String, Switch Rows and Columns” downstream.

See also

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top

Discover more from Questetra Support

Subscribe now to keep reading and get access to the full archive.

Continue reading