DataTongs for MySQL | SchemaSmith

Overview

Now that you have your source and structure under source control and you've got your state based deployments in place it's time to manage your seed data in the same way. DataTongs will help you extract all or parts of your data and manage deployment via generated scripts using the data in a JSON format. It connects to a MySQL database and extracts data for the tables specified in the config file and will generate REPLACE INTO, INSERT ... ON DUPLICATE KEY UPDATE, or INSERT IGNORE statements depending on the configured merge type. You can define a filter clause in case you allow mixing user and system data in a single table and the extraction will honor that filter.

Simply point the application at a database with the correct seed data to capture, fill out the list of tables and the key columns for each, set the merge options you wish to generate and we'll handle generating the scripts for you. You can generate new scripts any time you need to update the seed data or manually update the script(s) if you find that simpler. Your application seed data will become part of your source control and state based updates.

Where DataTongs Fits

SchemaTongs extracts your schema into metadata.
SchemaHammer provides powerful search and edit capabilities for your metadata.
SchemaQuench applies your schema and data metadata to your MySQL server.
DataTongs extracts your seed data into metadata

Quick Start Guide

1. Configure your appSettings.json

Point DataTongs at your source database and list the tables to extract. This example uses Schema Package Integration mode to write content files and configure table JSON automatically:

{
  "Source": {
    "Server": "localhost",
    "Database": "sakila",
    "User": "root",
    "Password": ""
  },
  "ScriptPath": "",
  "ContentPath": "/opt/schemapackage/templates/main/TableData",
  "Tables": [
    {
      "Name": "language",
      "KeyColumns": "",
      "SelectColumns": "",
      "Filter": "",
      "MergeType": ""
    },
    {
      "Name": "category",
      "KeyColumns": "",
      "SelectColumns": "",
      "Filter": "",
      "MergeType": ""
    },
    {
      "Name": "country",
      "KeyColumns": "",
      "SelectColumns": "country_id,country",
      "Filter": "country IN ('United States','Canada','United Kingdom')",
      "MergeType": ""
    }
  ],
  "ShouldCast": {
    "OutputContentFiles": true,
    "OutputScripts": false,
    "ConfigureDataDelivery": true,
    "MergeType": "Upsert",
    "DisableTriggers": false
  }
}

2. Run DataTongs

Execute DataTongs from the command line and tongs will begin the casting process.

> DataTongs

3. Add them to your repository

Once you determine your deployment scheme you can add them to your repository and configure either a script folder or the data delivery section of each table to deploy your seed data.

See the DataTongs Walkthrough to help you get started with the tool.

Operating Modes

DataTongs operates in two modes depending on how you want to deploy the extracted data. The modes are mutually exclusive — ConfigureDataDelivery requires OutputContentFiles to be enabled and OutputScripts to be disabled.

Schema Package Integration

When ConfigureDataDelivery is enabled, DataTongs writes .tabledata content files and automatically updates the corresponding table JSON files in your schema package with data delivery properties. SchemaQuench reads these properties at deployment time and generates the appropriate statements on the fly.

Configuration:

ShouldCast:OutputContentFiles = true
ShouldCast:OutputScripts = false
ShouldCast:ConfigureDataDelivery = true
ContentPath must reside within a template directory (one that contains Template.json)

Properties written to table JSON files:

Property	Source
`ContentFile`	Relative path from template root to the `.tabledata` file.
`MergeType`	Per-table `MergeType` override, or the global `ShouldCast:MergeType` default.
`MatchColumns`	From the table's `KeyColumns` setting, if provided.
`MergeFilter`	From the table's `Filter` setting, if provided.
`MergeDisableTriggers`	From the global `ShouldCast:DisableTriggers` setting. Reserved for future use.

DataTongs only writes the table JSON file when a property value has actually changed, keeping your source control diffs clean.

Standalone Script Generation

When OutputScripts is enabled, DataTongs generates self-contained .sql files. These scripts can be executed directly or placed in a script folder for SchemaQuench to run.

Configuration:

ShouldCast:OutputScripts = true
ShouldCast:ConfigureDataDelivery = false (or omitted)

Each script is named Populate tablename.sql and written to ScriptPath. The JSON data is embedded inline in the script. The script type depends on your MergeType setting:

MergeType	MySQL Statement	Behavior
`Replace`	`REPLACE INTO`	Deletes and re-inserts rows matching primary/unique key.
`Upsert`	`INSERT ... ON DUPLICATE KEY UPDATE`	Updates non-key columns on duplicate; inserts new rows.
`Insert`	`INSERT IGNORE`	Inserts new rows only; silently skips duplicates.

You can also combine OutputScripts with OutputContentFiles to generate both scripts and content files — useful when you want scripts for manual review and content files for automated delivery.

Test Data

You can also use DataTongs to extract test data. Generate standalone scripts and execute them outside of SchemaQuench to populate test databases as part of your test pipeline. This simplifies integration test setup and lets you easily reset to a consistent starting state between test runs.

Extraction Details

Column Auto-Detection

When SelectColumns is left blank for a table, DataTongs queries INFORMATION_SCHEMA.COLUMNS to build the column list automatically. Generated columns (both VIRTUAL and STORED) are excluded. Spatial columns use ST_AsText() for extraction. Binary/blob columns use TO_BASE64() encoding. Date/time columns are formatted with DATE_FORMAT() for consistent ISO output.

When KeyColumns is left blank, DataTongs auto-detects match columns using the table's primary key from INFORMATION_SCHEMA.TABLE_CONSTRAINTS. If no primary key exists, the first column is used with a warning.

Excluded Columns

The following column types are automatically excluded:

Column Type	Reason
Generated columns	`VIRTUAL` and `STORED` generated columns are excluded from extraction. Values are calculated by the database engine.
`AUTO_INCREMENT` columns	Excluded from merge script generation. Auto-increment values are managed by MySQL and should not be explicitly inserted during data delivery.

Special Type Handling

Certain MySQL data types require special handling during extraction and script generation:

MySQL Type	Extraction	JSON_TABLE Mapping
`GEOMETRY`, `POINT`, `LINESTRING`, `POLYGON`, `MULTI*`	Converted via `ST_AsText()` to Well-Known Text. Reconstructed via `ST_GeomFromText()`.	`VARCHAR(4000)`
`BINARY`, `VARBINARY`, `BLOB` types	Encoded via `TO_BASE64()` with newline stripping. Decoded via `FROM_BASE64()`.	`TEXT`
`DATE`	`DATE_FORMAT(col, '%Y-%m-%d')`	`DATE`
`DATETIME`, `TIMESTAMP`	`DATE_FORMAT(col, '%Y-%m-%dT%H:%i:%s')`	`DATETIME`
`TIME`	`TIME_FORMAT(col, '%H:%i:%s')`	`TIME`
`BIT`	`CAST(col AS UNSIGNED)`	`UNSIGNED`
`JSON`	Extracted as-is. MySQL's `JSON_OBJECT` handles nesting.	`JSON`

Extraction SQL

DataTongs extracts table data using JSON_ARRAYAGG and JSON_OBJECT:

SELECT JSON_ARRAYAGG(
    JSON_OBJECT(
        'col1', `col1`,
        'col2', `col2`,
        ...
    )
) AS json_data
FROM `database`.`tableName`
WHERE {filter}
ORDER BY {keyColumns};

Results are ordered by key columns for deterministic output and cleaner source control diffs. Column expressions use the appropriate type handling functions from the special type table above (e.g., ST_AsText() for geometry, DATE_FORMAT() for dates).

The JSON output is lightly formatted with newlines between row objects for readability, then written to the .tabledata file or embedded in the script. Generated scripts parse the JSON data using JSON_TABLE() to feed the REPLACE INTO, INSERT ... ON DUPLICATE KEY UPDATE, or INSERT IGNORE statement.

ShouldCast Flags Reference

Flag	Default	Description
`OutputContentFiles`	`true`	Write `.tabledata` JSON files to `ContentPath`. Each file is named `tablename.tabledata`. Required for `ConfigureDataDelivery`.
`OutputScripts`	`true`	Write standalone `.sql` scripts to `ScriptPath`. Each file is named `Populate tablename.sql`. Incompatible with `ConfigureDataDelivery`.
`MergeType`	`Upsert`	Default merge type for generated scripts and table JSON when `ConfigureDataDelivery` is enabled. Valid values: `Replace`, `Upsert`, `Insert`. Can be overridden per table.
`DisableTriggers`	`false`	Sets `MergeDisableTriggers` on table JSON when `ConfigureDataDelivery` is enabled. Reserved for future use — does not currently disable triggers or foreign key checks.
`ConfigureDataDelivery`	`false`	Automatically update table JSON files with delivery properties after extraction. Requires `OutputContentFiles = true` and `OutputScripts = false`. See Operating Modes.

Additional Resources

DataTongs Walkthrough - A step-by-step guide for casting data from a sample product. How-to Data Delivery - Making data deployment easier. Reference Configuration Settings - Explore the configuration options available in appSettings.json. Reference SchemaTongs - Learn how to cast your databases. Related SchemaQuench - Learn how to quench your databases. Related Cast data with Ask Forge - Run DataTongs through the AI-powered CLI. AI Assistant

DataTongs - MySQL (Enterprise)