Text Files, Convert Character Encoding
Text Files, Convert Character Encoding
Converts text files charset (Coded Character Set). For example, converts UTF-8 encoding to Shift_JIS or UTF-16. If multiple files are attached, all will be converted according to the same rules.
Conversion may fail if the file’s Content-Type is recorded in a format with parameters. When a parameter is appended after a semicolon. When a generated file is replaced in the automated step “Generate Text File”, etc.

warning PAGE UPDATED

https://support.questetra.com/addons/text-files-convert-character-encoding-2023/
Configs
  • A1: Select FILE DATA for Original Text Files *
  • A2: Set Original Charset (eg “UTF-8” )#{EL}
  • B1: Set New Charset (eg “UTF-16” )#{EL}
  • B2: Select FILE DATA that stores New Text Files (append) *
Script (click to open)
// GraalJS Script (engine type: 2)

//////// START "main()" /////////////////////////////////////////////////////////////////
main();
function main(){ 

//// == Config Retrieving / 工程コンフィグの参照 ==
const filesPocketInput    = configs.getObject( "SelectConfA1" );  /// REQUIRED ///////////////
  let filesInput          = engine.findData( filesPocketInput );  // java.util.ArrayList
  if( filesInput        === null ) {
    throw new Error( "\n AutomatedTask UnexpectedFileError:" +
                     " No File {A1} is attached \n" );
  }else{
    engine.log( " AutomatedTask FilesArray {A1}: " +
                filesInput.size() + " files" );
  }
let   strInputCharset     = configs.get( "StrConfA2" );           // NotRequired /////////////
  if( strInputCharset   === "" ){
      strInputCharset     = "UTF-8";
  }
let   strOutputCharset    = configs.get( "StrConfB1" );           // NotRequired /////////////
  if( strOutputCharset  === "" ){
      strOutputCharset    = "UTF-8";
  }
const filesPocketOutput   = configs.getObject( "SelectConfB2" );  /// REQUIRED ///////////////
  let filesOutput         = engine.findData( filesPocketOutput ); // java.util.ArrayList
  if( filesOutput       === null ) {
    engine.log( " AutomatedTask FilesArray {B2}: (empty)" );
    filesOutput           = new java.util.ArrayList();
  }else{
    engine.log( " AutomatedTask FilesArray {B2}: " +
                filesOutput.size() + " files" );
  }


//// == Data Retrieving / ワークフローデータの参照 ==
// (Nothing. Retrieved via Expression Language in Config Retrieving)


//// == Calculating / 演算 ==
const numFilesInput = filesInput.size() - 0;

for( let i = 0; i < numFilesInput; i++ ){
  const strInputFileName = filesInput.get(i).getName() + "";
  const strInputFileSize = filesInput.get(i).getLength() + " bytes";
  const strInputFileMime = filesInput.get(i).getContentType();

  let strInputText = "";
  let numLineCounter = 0;
  fileRepository.readFile( filesInput.get(i), strInputCharset, function(line) {
  // com.questetra.bpms.core.event.scripttask.FileRepositoryWrapper
  // https://questetra.zendesk.com/hc/ja/articles/360024574471-R2300#FileRepositoryWrapper
      strInputText += line + '\n';
      numLineCounter ++;
  });
  engine.log( " AutomatedTask FileLoaded: " + strInputFileName + " (" + strInputFileMime + ")" );
  engine.log( " AutomatedTask: " + strInputFileSize + " / " + numLineCounter + " lines" );

  filesOutput.add(
    new com.questetra.bpms.core.event.scripttask.NewQfile(
      strInputFileName,
      strInputFileMime + "; charset=" + strOutputCharset,
      strInputText
    )
  );
}


//// == Data Updating / ワークフローデータへの代入 ==
engine.setData( filesPocketOutput, filesOutput );

} //////// END "main()" /////////////////////////////////////////////////////////////////


/*
Notes:
- Used when incorporating "Step in which Text file Encoding is automatically changed" in the workflow.
    - Charset of Text file is automatically changed when the process reaches this automated task.
- The file name of the output file will be the same as the input file.
    - The line feed code is `LF`.
- Converts according to the specified Encodings.
    - No auto-detect feature.
    - If not specified, the default Encoding is `UTF-8`.

APPENDIX:
- `UTF-8`
    - Compactly encodes more than 1 million Unicode characters in the world with 1 to 4 bytes.
    - It became the most common character code in 2008 and is used in 97% of web pages as of 2021.
    - UTF-8 is superset of US-ASCII (single-byte characters). (upward compatible)
        - That is, ASCII files are also UTF-8 files. (US-ASCII is a subset of UTF-8)
        - Similarly, ASCII files are also Shift_JIS files.
- `UTF-16`
    - Encodes over 1 million Unicode characters in the world with 2-4 bytes.
    - If there are many Asian characters such as Japanese and Chinese, encode them compactly.
- Another encoding
    - `charset=UTF-16` (Unicode [characters around the world])
    - `charset=UTF-16BE` (Unicode [characters around the world])
    - `charset=UTF-16LE` (Unicode [characters around the world])
    - `charset=UTF-32` (Unicode [characters around the world])
    - `charset=x-UTF-32LE-BOM` (Unicode [characters around the world])
    - `charset=ISO-8859-1` (Western language characters)
    - `charset=Shift_JIS` (Japanese characters)
    - `charset=Big5` (Traditional Chinese characters)
    - `charset=GB2312` (Simplified Chinese EUC characters)
    - `charset=GBK` (Simplified Chinese GB characters)
    - `charset=KOI8-R` (Russian)
    - In addition, "UTF8B (UTF-8 with BOM)" cannot be output. (File for Windows / pray for its eradication)
    - https://docs.oracle.com/javase/9/intl/supported-encodings.htm


Notes-ja:
- ワークフロー内に「TextファイルEncodingが自動的に変更される工程」を組み込む際に利用します。
    - 案件が自動処理工程に到達した際、TextファイルのCharsetが自動的に変更されます。
- 出力ファイルのファイル名は、入力ファイルと同じファイル名になります。
    - 改行コードは `LF` です。
- 指定された Encoding に従って変換します。
    - 自動判別機能はありません。
    - 未指定の場合、デフォルトの Encoding は `UTF-8` です。

APPENDIX-ja:
- `UTF-8`
    - 世界100万種以上のUnicode文字を、1~4バイトでコンパクトにエンコードします。
    - 2008年に最も一般的な文字コードとなり、2021年時点で97%のウェブページで利用されています。
    - UTF-8 は US-ASCII(1バイト文字)の上位互換です。(US-ASCII は UTF-8 のサブセットです)
        - すなわち ASCII ファイルは UTF-8 ファイルでもあります。
        - 同様に ASCII ファイルは Shift_JIS ファイルでもあります。
- `UTF-16`
    - 世界100万種以上のUnicode文字を、2~4バイトでエンコードします。
    - 日本語や中国語などのアジア文字が多い場合は、コンパクトにエンコードします。
- その他のエンコーディング
    - `charset=UTF-16` (Unicode[世界中の文字])
    - `charset=UTF-16BE` (Unicode[世界中の文字])
    - `charset=UTF-16LE` (Unicode[世界中の文字])
    - `charset=UTF-32` (Unicode[世界中の文字])
    - `charset=x-UTF-32LE-BOM` (Unicode[世界中の文字])
    - `charset=ISO-8859-1` (ヨーロッパ言語の文字)
    - `charset=Shift_JIS` (日本語の文字)
    - `charset=Big5` (繁体中国語の文字)
    - `charset=GB2312` (簡体中国語EUC文字)
    - `charset=GBK` (簡体中国語GBの文字)
    - `charset=KOI8-R` (ロシア語)
    - なお "UTF8B (BOM付 UTF-8)" は出力できません。(Windows用ファイル/その撲滅を祈念)
    - https://docs.oracle.com/javase/9/intl/supported-encodings.htm
    - https://docs.oracle.com/javase/jp/9/intl/supported-encodings.htm
*/

Download

2021-08-20 (C) Questetra, Inc. (MIT License)
https://support.questetra.com/addons/text-files-convert-character-encoding-2021/
The Add-on import feature is available with Professional edition.
Freely modifiable JavaScript (ECMAScript) code. No warranty of any kind.

Notes

  • Used when incorporating a Step in which Text file Encoding is automatically changed in the workflow.
    • Charset of Text file is automatically changed when the process reaches this automated task.
  • The file name of the output file will be the same as the input file.
    • The line feed code is LF.
  • Converts according to the specified Encodings.
    • No auto-detect feature.
    • If not specified, the default Encoding is UTF-8.

Capture

Converts text files charset (Coded Character Set). For example, converts UTF-8 encoding to Shift_JIS or UTF-16. If multiple files are attached, all will be converted according to the same rules.
Converts text files charset (Coded Character Set). For example, converts UTF-8 encoding to Shift_JIS or UTF-16. If multiple files are attached, all will be converted according to the same rules.

Appendix

  • UTF-8
    • Compactly encodes more than 1 million Unicode characters in the world with 1 to 4 bytes.
    • It became the most common character code in 2008 and is used in 97% of web pages as of 2021.
    • UTF-8 is superset of US-ASCII (single-byte characters). (upward compatible)
      • That is, ASCII files are also UTF-8 files. (US-ASCII is a subset of UTF-8)
      • Similarly, ASCII files are also Shift_JIS files.
  • UTF-16
    • Encodes over 1 million Unicode characters in the world with 2-4 bytes.
    • If there are many Asian characters such as Japanese and Chinese, encode them compactly.
  • Another encoding
    • charset=UTF-16 (Unicode [characters around the world])
    • charset=UTF-16BE (Unicode [characters around the world])
    • charset=UTF-16LE (Unicode [characters around the world])
    • charset=UTF-32 (Unicode [characters around the world])
    • charset=x-UTF-32LE-BOM (Unicode [characters around the world])
    • charset=ISO-8859-1 (Western language characters)
    • charset=Shift_JIS (Japanese characters)
    • charset=Big5 (Traditional Chinese characters)
    • charset=GB2312 (Simplified Chinese EUC characters)
    • charset=GBK (Simplified Chinese GB characters)
    • charset=KOI8-R (Russian)
    • In addition, “UTF8B (UTF-8 with BOM)” cannot be output. (File for Windows / pray for its eradication)
    • https://docs.oracle.com/javase/9/intl/supported-encodings.htm

See also

2 thoughts on “Text Files, Convert Character Encoding”

  1. Pingback: TSV String, Convert – Questetra Support

  2. Pingback: Converter, CSV-String to TSV-String – Questetra Support

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: