SchemaSmith Documentation

DataTongs - MySQL (Enterprise)

Extract data and generate deploy scripts built for repeatable Enterprise CI/CD pipelines.

DataTongs reference data extraction tool

Overview

Now that you have your source and structure under source control and you've got your state based deployments in place it's time to manage your seed data in the same way. DataTongs will help you extract all or parts of your data and manage deployment via generated scripts using the data in a JSON format. It connects to a MySQL database and extracts data for the tables specified in the config file and will generate REPLACE INTO, INSERT ... ON DUPLICATE KEY UPDATE, or INSERT IGNORE statements depending on the configured merge type. You can define a filter clause in case you allow mixing user and system data in a single table and the extraction will honor that filter.

Simply point the application at a database with the correct seed data to capture, fill out the list of tables and the key columns for each, set the merge options you wish to generate and we'll handle generating the scripts for you. You can generate new scripts any time you need to update the seed data or manually update the script(s) if you find that simpler. Your application seed data will become part of your source control and state based updates.

Where DataTongs Fits

  • SchemaTongs extracts your schema into metadata.
  • SchemaHammer provides powerful search and edit capabilities for your metadata.
  • SchemaQuench applies your schema and data metadata to your MySQL server.
  • DataTongs extracts your seed data into metadata

Quick Start Guide

1. Configure your appSettings.json

Point DataTongs at your source database and list the tables to extract. This example uses Schema Package Integration mode to write content files and configure table JSON automatically:

{
  "Source": {
    "Server": "localhost",
    "Database": "sakila",
    "User": "root",
    "Password": ""
  },
  "ScriptPath": "",
  "ContentPath": "/opt/schemapackage/templates/main/TableData",
  "Tables": [
    {
      "Name": "language",
      "KeyColumns": "",
      "SelectColumns": "",
      "Filter": "",
      "MergeType": ""
    },
    {
      "Name": "category",
      "KeyColumns": "",
      "SelectColumns": "",
      "Filter": "",
      "MergeType": ""
    },
    {
      "Name": "country",
      "KeyColumns": "",
      "SelectColumns": "country_id,country",
      "Filter": "country IN ('United States','Canada','United Kingdom')",
      "MergeType": ""
    }
  ],
  "ShouldCast": {
    "OutputContentFiles": true,
    "OutputScripts": false,
    "ConfigureDataDelivery": true,
    "MergeType": "Upsert",
    "DisableTriggers": false
  }
}
2. Run DataTongs

Execute DataTongs from the command line and tongs will begin the casting process.

> DataTongs
3. Add them to your repository

Once you determine your deployment scheme you can add them to your repository and configure either a script folder or the data delivery section of each table to deploy your seed data.

See the DataTongs Walkthrough to help you get started with the tool.

Operating Modes

DataTongs operates in two modes depending on how you want to deploy the extracted data. The modes are mutually exclusive — ConfigureDataDelivery requires OutputContentFiles to be enabled and OutputScripts to be disabled.

Schema Package Integration

When ConfigureDataDelivery is enabled, DataTongs writes .tabledata content files and automatically updates the corresponding table JSON files in your schema package with data delivery properties. SchemaQuench reads these properties at deployment time and generates the appropriate statements on the fly.

Configuration:

  • ShouldCast:OutputContentFiles = true
  • ShouldCast:OutputScripts = false
  • ShouldCast:ConfigureDataDelivery = true
  • ContentPath must reside within a template directory (one that contains Template.json)

Properties written to table JSON files:

Property Source
ContentFile Relative path from template root to the .tabledata file.
MergeType Per-table MergeType override, or the global ShouldCast:MergeType default.
MatchColumns From the table's KeyColumns setting, if provided.
MergeFilter From the table's Filter setting, if provided.
MergeDisableTriggers From the global ShouldCast:DisableTriggers setting. Reserved for future use.

DataTongs only writes the table JSON file when a property value has actually changed, keeping your source control diffs clean.

Standalone Script Generation

When OutputScripts is enabled, DataTongs generates self-contained .sql files. These scripts can be executed directly or placed in a script folder for SchemaQuench to run.

Configuration:

  • ShouldCast:OutputScripts = true
  • ShouldCast:ConfigureDataDelivery = false (or omitted)

Each script is named Populate tablename.sql and written to ScriptPath. The JSON data is embedded inline in the script. The script type depends on your MergeType setting:

MergeType MySQL Statement Behavior
Replace REPLACE INTO Deletes and re-inserts rows matching primary/unique key.
Upsert INSERT ... ON DUPLICATE KEY UPDATE Updates non-key columns on duplicate; inserts new rows.
Insert INSERT IGNORE Inserts new rows only; silently skips duplicates.

You can also combine OutputScripts with OutputContentFiles to generate both scripts and content files — useful when you want scripts for manual review and content files for automated delivery.

Test Data

You can also use DataTongs to extract test data. Generate standalone scripts and execute them outside of SchemaQuench to populate test databases as part of your test pipeline. This simplifies integration test setup and lets you easily reset to a consistent starting state between test runs.

Extraction Details

Column Auto-Detection

When SelectColumns is left blank for a table, DataTongs queries INFORMATION_SCHEMA.COLUMNS to build the column list automatically. Generated columns (both VIRTUAL and STORED) are excluded. Spatial columns use ST_AsText() for extraction. Binary/blob columns use TO_BASE64() encoding. Date/time columns are formatted with DATE_FORMAT() for consistent ISO output.

When KeyColumns is left blank, DataTongs auto-detects match columns using the table's primary key from INFORMATION_SCHEMA.TABLE_CONSTRAINTS. If no primary key exists, the first column is used with a warning.

Excluded Columns

The following column types are automatically excluded:

Column Type Reason
Generated columns VIRTUAL and STORED generated columns are excluded from extraction. Values are calculated by the database engine.
AUTO_INCREMENT columns Excluded from merge script generation. Auto-increment values are managed by MySQL and should not be explicitly inserted during data delivery.

Special Type Handling

Certain MySQL data types require special handling during extraction and script generation:

MySQL Type Extraction JSON_TABLE Mapping
GEOMETRY, POINT, LINESTRING, POLYGON, MULTI* Converted via ST_AsText() to Well-Known Text. Reconstructed via ST_GeomFromText(). VARCHAR(4000)
BINARY, VARBINARY, BLOB types Encoded via TO_BASE64() with newline stripping. Decoded via FROM_BASE64(). TEXT
DATE DATE_FORMAT(col, '%Y-%m-%d') DATE
DATETIME, TIMESTAMP DATE_FORMAT(col, '%Y-%m-%dT%H:%i:%s') DATETIME
TIME TIME_FORMAT(col, '%H:%i:%s') TIME
BIT CAST(col AS UNSIGNED) UNSIGNED
JSON Extracted as-is. MySQL's JSON_OBJECT handles nesting. JSON

Extraction SQL

DataTongs extracts table data using JSON_ARRAYAGG and JSON_OBJECT:

SELECT JSON_ARRAYAGG(
    JSON_OBJECT(
        'col1', `col1`,
        'col2', `col2`,
        ...
    )
) AS json_data
FROM `database`.`tableName`
WHERE {filter}
ORDER BY {keyColumns};

Results are ordered by key columns for deterministic output and cleaner source control diffs. Column expressions use the appropriate type handling functions from the special type table above (e.g., ST_AsText() for geometry, DATE_FORMAT() for dates).

The JSON output is lightly formatted with newlines between row objects for readability, then written to the .tabledata file or embedded in the script. Generated scripts parse the JSON data using JSON_TABLE() to feed the REPLACE INTO, INSERT ... ON DUPLICATE KEY UPDATE, or INSERT IGNORE statement.

ShouldCast Flags Reference

Flag Default Description
OutputContentFiles true Write .tabledata JSON files to ContentPath. Each file is named tablename.tabledata. Required for ConfigureDataDelivery.
OutputScripts true Write standalone .sql scripts to ScriptPath. Each file is named Populate tablename.sql. Incompatible with ConfigureDataDelivery.
MergeType Upsert Default merge type for generated scripts and table JSON when ConfigureDataDelivery is enabled. Valid values: Replace, Upsert, Insert. Can be overridden per table.
DisableTriggers false Sets MergeDisableTriggers on table JSON when ConfigureDataDelivery is enabled. Reserved for future use — does not currently disable triggers or foreign key checks.
ConfigureDataDelivery false Automatically update table JSON files with delivery properties after extraction. Requires OutputContentFiles = true and OutputScripts = false. See Operating Modes.

Additional Resources

See Also