CSV Data Processing: From Manual to Automated

In Questetra BPM Suite, the automatic processing step [Update CSV Data] allows for the aggregation and processing of CSV/TSV format text data within a workflow. Similar to spreadsheet calculations in Excel, this step can extract specific columns or rows based on certain conditions, and even combine two tables.

To carry out this process, we use a specialized language called SQL, which is often perceived as difficult and exclusive to engineers. However, with knowledge of some basic rules, non-engineers can also utilize it. Additionally, recent advancements in generative AI, like ChatGPT, allow SQL to be created by simply describing the desired action.

This article is aimed at those who would like to try using the [Update CSV Data] Task, especially those who have little experience with SQL, and provides easy-to-understand explanations of the basic settings, query examples, and solutions to common errors.


1. What is [Update CSV Data]?

1-1. Overview

[Update CSV Data] is a built-in automated processing step with the following features. It can be used on Advanced and Professional editions of the Workflow platform.

  • Treat CSV/TSV data as a database table
    CSV/TSV data stored in a String-type (multiple lines) data item is converted to a database table internally and SQL is issued.
    • Treated as a table in H2 Database
    • The user prepares SQL for the H2 Database
  • Specify up to three data items for input and output
    • C1, C2: CSV/TSV data items to be processed (if multiple items are specified, they can be combined using JOIN, etc.)
    • C3: Data item to save SQL execution results (can overwrite the same item as C1)
    • C4: Write the SQL query to be executed (maximum 2000 characters)
  • The first line of CSV/TSV data is treated as column names
    For example, if the first row is Product ID, Product Name, Price, the table will have columns "Product ID", "Product Name", "Price"
    • An error will occur if the column name is empty or exceeds 64 characters
    • Duplicate column names will cause an error
    • If the number of columns is insufficient/too many, they will be automatically filled in

1-2. Why this item is useful

  • No need to manually process CSV files
    Eliminate the need to sort or aggregate data using Excel or other programs and process the data automatically
  • Multiple CSV/TSV can be matched with JOIN
    If you specify multiple data items (C1, C2), you can use the JOIN clause to join tables together, and you can also process and aggregate the data
  • Easy for non-engineers to work on
    Although SQL stands for Structured Query Language, grasping the fundamentals of SELECT, WHERE, and JOIN will enable you to utilize it effectively. Additionally, with the assistance of generative AI to provide SQL suggestions, memorizing grammar is no longer necessary.
[Update CSV Data] Settings Screen

2. Basic Setup

2-1. Data item to be processed (C1, C2)

  • Data items to be loaded into C1
    • Specify CSV/TSV data saved as String-type (multiple lines)
    • You can also specify a table name (e.g. "Products")
    • Organize the column names to avoid errors in the column row (first row)
  • When C2 is additionally specified
    • Specify when handling the second CSV/TSV data
    • You cannot use the same data item or table name as C1 (duplicate error)

2-2. Data item to store the query results (C3)

  • If you select the same item as C1, overwrite and save
    The processing results are overwritten to the same item, so this is useful when the original data is not needed
  • By specifying a different item, both the original data and the results can be saved
  • Select output format (CSV or TSV)

2-3. Query (SQL) input field (C4)

  • Specifying table and column names
    Must be enclosed in double quotes ("), e.g. select * from "Product"
  • Maximum 2000 characters
    When writing long queries, consider splitting and simplifying them
  • Beware of forbidden words
    The names of functions and commands related to the file system cannot be used

3. Simple Query Example

Here are some very basic SQL queries. Even if you are not an engineer, I think you can try them out right away.

3-0. Sample Data

Merchandise

Product ID	Product Name	Price
a-01	Apple	200
a-02	Orange	150
a-03	Banana	100
a-04	Strawberry	500
b-01	Tomato	150
b-02	Cabbage	300
b-03	Radish	250
b-04	Pepper	200

Sales

Date	Product ID	Quantity	Sales
2025-02-20	a-01	3	600
2025-02-20	a-02	5	750
2025-02-20	a-03	2	200
2025-02-20	a-04	4	2000
2025-02-20	b-01	2	300
2025-02-20	b-02	1	300
2025-02-20	b-03	3	750
2025-02-20	b-04	6	1200
2025-02-21	a-01	5	1000
2025-02-21	a-02	4	600
2025-02-21	a-03	4	400
2025-02-21	a-04	3	1500
2025-02-21	b-01	2	300
2025-02-21	b-02	3	900
2025-02-21	b-03	4	1000
2025-02-21	b-04	2	400

3-1. Retrieving all data (SELECT statement)

Query(SQL)

select * from "Merchandise"
  • Extract all columns from the “Merchandise” table
  • You don’t need to specify the column name, so try it out and check the results
  • You can also retrieve specific columns by specifying the column name
    • Example: select "Product ID", "Product Name" from "Merchandise"

Query Results

"Product ID"	"Product Name"	"Price"
"a-01"	"Apple"	"200"
"a-02"	"Orange"	"150"
"a-03"	"Banana"	"100"
"a-04"	"Strawberry"	"500"
"b-01"	"Tomato"	"150"
"b-02"	"Cabbage"	"300"
"b-03"	"Radish"	"250"
"b-04"	"Pepper"	"200"

3-2. Retrieving with conditions (WHERE clause)

Query(SQL)

select * from "Merchandise" where "Product ID" like 'a-%'
  • Extract only data whose Product ID starts with “a-“
  • %: any string, _: any single character

Query Results

"Product ID"	"Product Name"	"Price"
"a-01"	"Apple"	"200"
"a-02"	"Orange"	"150"
"a-03"	"Banana"	"100"
"a-04"	"Strawberry"	"500"

3-3. Joining multiple tables (JOIN clause)

If multiple CSV/TSVs are specified (e.g. C1 = product, C2 = sales), matching is possible using a JOIN clause.

Query (SQL)

select p.*, s.* except(s."Product ID")
from "Merchandise" as p
join "Sales" as s
  on p."Product ID" = s."Product ID"
  • Match products and sales by Product ID and obtain both information together
  • s. “Product ID” is excluded because it may be duplicated and unnecessary

Query Results

"Product ID"	"Product Name"	"Price"	"Date"	"Quantity"	"Sales"
"a-01"	"Apple"	"200"	"2025-02-20"	"3"	"600"
"a-02"	"Orange"	"150"	"2025-02-20"	"5"	"750"
"a-03"	"Banana"	"100"	"2025-02-20"	"2"	"200"
"a-04"	"Strawberry"	"500"	"2025-02-20"	"4"	"2000"
"b-01"	"Tomato"	"150"	"2025-02-20"	"2"	"300"
"b-02"	"Cabbage"	"300"	"2025-02-20"	"1"	"300"
"b-03"	"Radish"	"250"	"2025-02-20"	"3"	"750"
"b-04"	"Pepper"	"200"	"2025-02-20"	"6"	"1200"
"a-01"	"Apple"	"200"	"2025-02-21"	"5"	"1000"
"a-02"	"Orange"	"150"	"2025-02-21"	"4"	"600"
"a-03"	"Banana"	"100"	"2025-02-21"	"4"	"400"
"a-04"	"Strawberry"	"500"	"2025-02-21"	"3"	"1500"
"b-01"	"Tomato"	"150"	"2025-02-21"	"2"	"300"
"b-02"	"Cabbage"	"300"	"2025-02-21"	"3"	"900"
"b-03"	"Radish"	"250"	"2025-02-21"	"4"	"1000"
"b-04"	"Pepper"	"200"	"2025-02-21"	"2"	"400"

4. Creating Queries Using Generative AI and Troubleshooting

4-1. Creating queries using ChatGPT etc.

If you instruct the AI to merge product master and sales data by Product ID and filter for products priced at 1,000 yen or higher, it will generate a basic SQL query for you.

  • It can be easily used by anyone, even those without an engineering background, by simply communicating the desired process
  • Make sure to adjust table and column names in the proposed SQL to align with your Workflow App before pasting it

Here is an example of SQL generated by ChatGPT (o1). The prompt has been slightly improved from the above text with some tinkering. In this case, the Product ID is a string, not a number, so type conversion (CAST) was not necessary, but by modifying only this part, it worked properly and the expected results were obtained.

Example prompts for ChatGPT and generated SQL queries

4-2. Typical errors and how to deal with them

When a Step fails, check the details in the [Error Notification Email] or [Process Log] and take appropriate action. To view the [Process Log], you may need to enable Administrator Mode on the Process Details screen and have [Process Manager] in the target app.

  • Column name error
    • An error occurs if blank, more than 64 characters, or duplicates
    • Please check the spelling of the column (first line) as well
  • Query syntax error
    • Forgetting to use double quotes, putting “WHERE” and “FROM” in the wrong order, etc.
    • Please double-check the grammar when using SQL generated by AI
  • Use of prohibited words
    • Keywords that are considered inappropriate from a security standpoint, such as file system function names, will be rejected
    • Prohibited words cannot be used as column names either
  • Query is too long (exceeds the maximum of 2000 characters)
    • Consider streamlining and splitting redundant subqueries and complex joins before executing them
Example of error output in [Process Log]

5. Summary

  • The [Update CSV Data] process is designed to handle tabular data and can be utilized by both technical and non-technical users
    • It is practical enough to understand the basics of SQL (SELECT, WHERE, JOIN, etc.) and copy the examples
  • Leveraging generative AI
    • The system can generate SQL for you, even if you’re not familiar with its detailed notation. Simply explain what you want to do in plain language
  • Next steps
    • You can perform more advanced analysis by using aggregate functions (SUM, COUNT, GROUP BY, etc.) and conditional branching (CASE WHEN …). Expand your SQL knowledge little by little as needed.
  • Pre and post-processing, integration into workflow
    • Expand the scope of use by obtaining and setting CSV/TSV data in the required format, instead of manually inputting it
      • Converts Table-type data items or text files attached to File-type data items (built-in automatic converters)
      • Set data obtained from external services (built-in automation steps, original API requests, etc.)
    • As with pre-processing, using the query result data contributes to business improvement
      • Format the data into a report and send it via email (regular reporting)
      • Generate data based on the collected data and have AI analyze it before sending it to the next process
      • Convert data into the format required by the external service and link the data to the external service

Consider using the [Update CSV Data] Task to automate the processing of CSV/TSV data that you previously handled manually, and to achieve a more efficient workflow with fewer errors.

Scroll to Top

Discover more from Questetra Support

Subscribe now to keep reading and get access to the full archive.

Continue reading