DataTongs for MySQL

Overview

DataTongs grips reference data from a live database and produces self-contained synchronization scripts that bring it into any target. Point it at a source, list the tables you care about, and it produces one SQL file per table — ready to drop into a schema package's Table Data folder or run directly against any compatible instance.

DataTongs supports SQL Server, PostgreSQL, and MySQL. MySQL doesn't have a MERGE statement, so DataTongs generates the appropriate idiom based on the configured behaviour: INSERT ... ON DUPLICATE KEY UPDATE for insert + update, INSERT IGNORE for seed-only, or INSERT ... ON DUPLICATE KEY UPDATE followed by DELETE WHERE NOT EXISTS for full sync. The script data travels inside the file as values literals — no staging tables, no LOAD DATA dependencies, no external files at run time. The configuration shape and the workflow are identical across all three platforms.

Installation

GitHub Releases — Download the latest release from the SchemaSmith releases page. Unzip and place the executable on your PATH. DataTongs ships in the same distribution as SchemaTongs and SchemaQuench.

Running DataTongs

From the directory containing your DataTongs.settings.json:

DataTongs

With a custom config file:

DataTongs --ConfigFile:/etc/schemasmith/extract-reference-data.json

With a full MySQL connection string override:

DataTongs --ConnectionString:"Server=myserver;Port=3306;Database=AppMain;Uid=appuser;Pwd=secret;"

DataTongs connects to the source database, reads every table listed in the Tables array, serializes each row to JSON in deterministic key order, and writes a sync script plus a sibling .tabledata file for each table. Progress is logged to DataTongs - Progress.log and any errors to DataTongs - Errors.log in the working directory.

Configuration reference

DataTongs reads configuration from multiple sources, merged in precedence order (highest priority last):

Settings file — DataTongs.settings.json in the working directory (or the file specified by --ConfigFile)
User secrets (debug builds only)
Environment variables with the SmithySettings_ prefix
Command-line switches (highest precedence)

For the full explanation of configuration loading, environment variable mapping, and CLI switches, see Configuration.

Complete DataTongs.settings.json

{
  "Source": {
    "Server": "",
    "Port": "3306",
    "User": "",
    "Password": "",
    "Database": "",
    "Platform": "MySQL",
    "ConnectionProperties": {}
  },
  "ContentPath": "",
  "ScriptPath": "",
  "Tables": [
    { "Name": "Country",  "KeyColumns": "CountryCode" },
    { "Name": "Currency", "KeyColumns": "CurrencyCode", "Filter": "IsActive = 1" }
  ],
  "ShouldCast": {
    "OutputScripts": true,
    "OutputContentFiles": false,
    "DisableTriggers": false,
    "MergeUpdate": true,
    "MergeDelete": false,
    "MergeType": "Insert/Update",
    "ConfigureDataDelivery": false
  }
}

Source connection

Key	Type	Default	Description
`Source:Platform`	string	(required)	Set to `MySQL` to select the MySQL script generator.
`Source:Server`	string	(required)	Database server hostname or IP.
`Source:Port`	string	`3306`	TCP port.
`Source:User`	string	(empty)	Login username.
`Source:Password`	string	(empty)	Login password.
`Source:Database`	string	(required)	Source database (schema) to extract data from.
`Source:ConnectionProperties`	object	`{}`	Arbitrary key-value pairs appended to the connection string. Platform-specific keys — see Configuration.

The --ConnectionString switch bypasses all Source settings and passes the provided value directly to the MySQL driver. Source:Platform is still required so the right script generator claims the connection.

Output paths

Key	Type	Default	Description
`ScriptPath`	string	`"."`	Directory where the generated `Populate <schema>.<table>.sql` merge scripts are written. Created automatically if it doesn't exist.
`ContentPath`	string	`"."`	Directory where the sibling `.tabledata` files (raw JSON row data) are written. When `--ConfigureDataDelivery` is on, this path is also used to locate the enclosing template — DataTongs walks upward from `ContentPath` looking for a `Template.json`.

The typical placement inside a schema package is ScriptPath pointing at Templates/<Name>/Table Data and ContentPath pointing at Templates/<Name>/data. That layout makes the script files visible to SchemaQuench's TableData quench slot and the .tabledata files reachable from each table's DataDelivery block.

Tables array

Field	Type	Default	Description
`Name`	string	(required)	Table name. If no schema prefix is given, the connection database is assumed. MySQL's default schema is the `Database` on the connection; cross-schema extraction uses `schema.table`.
`KeyColumns`	string	(auto-detected)	Comma-separated column names for the row-matching key. When blank, auto-detected from the table's primary key or best unique index. Prefix a column with `*` for NULL-safe comparison on nullable keys.
`SelectColumns`	string	(all columns)	Comma-separated column names to extract. When blank, every non-excluded column is extracted. Use to narrow extraction to a specific subset.
`Filter`	string	(empty)	SQL `WHERE` clause (without the `WHERE` keyword) to filter which rows are extracted.
`MergeType`	string	(derived)	Per-table override for the `DataDelivery:MergeType` default. Values: `None`, `Insert`, `Insert/Update`, `Insert/Update/Delete`. When blank, derived from global `ShouldCast:MergeUpdate` + `ShouldCast:MergeDelete`.

See Table configuration below for key-column auto-detection, nullable-key handling, and filter semantics. The Tables array itself can't be configured via environment variables — table definitions must live in the JSON configuration file.

Table configuration

Table name

Specify tables by name. If you omit the schema, the connection's Database is assumed:

{ "Name": "Country" }
{ "Name": "Products" }
{ "Name": "config.feature_flags" }

DataTongs validates that each table exists in the source database before attempting extraction. Tables that don't exist are skipped with an error message.

Key columns

Key columns define the row-matching predicate — the unique-key match that drives MySQL's ON DUPLICATE KEY UPDATE behaviour.

Auto-detection. When KeyColumns is blank, DataTongs queries the table's indexes and selects the primary key if one exists. If there's no primary key, it falls back to the first available unique index. Nullable columns in the detected key are automatically prefixed with * for NULL-safe comparison. This handles the vast majority of tables without any manual configuration.

Manual override. When you specify KeyColumns, DataTongs uses your list instead of auto-detection. Separate multiple columns with commas:

{ "Name": "OrderLine", "KeyColumns": "OrderID, LineNumber" }

Nullable key columns. If a key column allows NULLs, prefix it with *. This generates NULL-safe matching for the change-detection predicate:

{ "Name": "Mapping", "KeyColumns": "SourceCode, *TargetCode" }

When auto-detection discovers a nullable column in a unique index, it automatically applies the * behaviour. Note: on MySQL, INSERT ... ON DUPLICATE KEY UPDATE itself matches on the table's declared unique keys, so a column prefixed with * must still participate in a unique index for the merge to work as intended.

No key available. If a table has no primary key, no unique index, and no KeyColumns configured, DataTongs skips the table with an error message.

Filter

The Filter field accepts a SQL WHERE clause (without the WHERE keyword). It controls which rows are extracted from the source database.

{ "Name": "FeatureFlags", "KeyColumns": "FlagName", "Filter": "IsActive = 1" }

This extracts only active flags. On MySQL, full delete-sync semantics are not available via INSERT ... ON DUPLICATE KEY UPDATE, so Filter acts purely as an extraction constraint on this platform (see ShouldCast flags — MergeDelete).

ShouldCast flags

Control how DataTongs generates merge scripts and whether it wires the results into a schema package.

Flag	Default	Effect
`OutputScripts`	`true`	Generate merge scripts. When `false`, only `.tabledata` files are written.
`OutputContentFiles`	`false`	Write raw data to sibling `.tabledata` content files. Automatically enabled when `ConfigureDataDelivery` is on.
`DisableTriggers`	`false`	On MySQL, `DisableTriggers` is a best-effort signal in the generated `DataDelivery` block. Inline trigger disabling within a DML statement is not supported by MySQL the way it is on SQL Server; consult the SchemaQuench docs for how this is honoured at delivery time.
`MergeUpdate`	`true`	Includes the update clause. When `true`, rows are emitted as `INSERT ... ON DUPLICATE KEY UPDATE`. When `false`, seed-only behaviour emits `INSERT IGNORE` and existing rows are left untouched regardless of whether their data differs.
`MergeDelete`	`true`	Appends a `DELETE WHERE NOT EXISTS` step that removes target rows not present in the source data. When both `MergeUpdate` and `MergeDelete` are on, full-sync emits `INSERT ... ON DUPLICATE KEY UPDATE` followed by the delete step. When `Filter` is configured, the delete step respects the filter so rows outside the filter are never removed.
`MergeType`	`Insert/Update`	Default `DataDelivery:MergeType` for tables that don't set it explicitly. Values: `None`, `Insert`, `Insert/Update`, `Insert/Update/Delete`. Used when writing `DataDelivery` blocks via `--ConfigureDataDelivery`.
`ConfigureDataDelivery`	`false`	After extraction, writes a `DataDelivery` block into each matching table's JSON file inside the enclosing schema package. See --ConfigureDataDelivery.

Delivery scenarios

These settings compose into three practical patterns:

Scenario	`MergeUpdate`	`MergeDelete`	MySQL output
Full sync	`true`	`true`	`INSERT ... ON DUPLICATE KEY UPDATE` followed by `DELETE WHERE NOT EXISTS`. Target matches source exactly.
Add and update, no deletes	`true`	`false`	`INSERT ... ON DUPLICATE KEY UPDATE`. Insert missing, update changed, leave extra rows alone.
Seed only	`false`	`false`	`INSERT IGNORE`. Insert missing rows only. Existing rows untouched, nothing deleted.

Generated script anatomy

MySQL doesn't have a MERGE statement. DataTongs generates the appropriate idiom based on MergeUpdate and MergeDelete: INSERT ... ON DUPLICATE KEY UPDATE for insert + update, INSERT IGNORE for seed-only, and INSERT ... ON DUPLICATE KEY UPDATE followed by DELETE WHERE NOT EXISTS for full sync. The script is self-contained — no staging tables, no external files, no bulk-load dependencies. Run it against any compatible database and the data arrives.

INSERT ... ON DUPLICATE KEY UPDATE

INSERT INTO `northwind`.`shippers` (`company_name`, `phone`, `shipper_id`) VALUES
  ('Speedy Express',   '(503) 555-9831', 1),
  ('United Package',   '(503) 555-3199', 2),
  ('Federal Shipping', '(503) 555-9931', 3)
ON DUPLICATE KEY UPDATE
  `company_name` = VALUES(`company_name`),
  `phone`        = VALUES(`phone`);

The ON DUPLICATE KEY UPDATE clause runs whenever an incoming row would collide with an existing unique-key value, carrying the new column values from the VALUES list via the VALUES() function. Primary-key and unique columns appear in the INSERT list but not in the UPDATE clause, so their values are set on insert and never modified on collision.

INSERT IGNORE (seed only)

INSERT IGNORE INTO `northwind`.`shippers` (`company_name`, `phone`, `shipper_id`) VALUES
  ('Speedy Express',   '(503) 555-9831', 1),
  ('United Package',   '(503) 555-3199', 2),
  ('Federal Shipping', '(503) 555-9931', 3);

When MergeUpdate is false, DataTongs emits INSERT IGNORE. Rows that would collide with an existing unique key are silently skipped; only missing rows are inserted.

Full-sync delete step

When both MergeUpdate and MergeDelete are on, DataTongs appends a delete step after the insert/update to remove any target rows not present in the source data:

DELETE FROM `northwind`.`shippers`
 WHERE `shipper_id` NOT IN (1, 2, 3);

When a Filter is configured, the delete step is scoped to rows matching the filter, so rows outside the filter are never removed.

Auto-increment columns

Auto-increment columns are handled at the INSERT level — explicit values from the source are carried into the VALUES list so existing auto-increment sequences are preserved. Generated (virtual or stored) columns are auto-excluded from extraction because they cannot be assigned directly.

Special type handling

DataTongs detects column types and applies the correct extraction and restoration strategy for each. On MySQL:

Auto-increment columns — handled at the INSERT level so explicit values are carried and existing sequences are preserved.
Generated / virtual columns — auto-excluded from extraction because they cannot be assigned directly.
Spatial types — basic support via WKT. Complex geometry may need manual adjustment for restoration.
TEXT / BLOB families — supported through string and binary literals; large values are inlined into the VALUES list.
JSON columns — JSON values preserved through the extraction round-trip.

When a column is excluded from extraction, DataTongs logs a warning naming the table, the column, and the underlying type. If you need data from these columns, handle them with custom scripts outside of DataTongs.

--ConfigureDataDelivery

DataTongs can do more than write scripts — it can wire the extracted data straight into your schema package's declarative delivery pipeline. Pass --ConfigureDataDelivery and DataTongs writes a DataDelivery block into each matching table's JSON file after extracting the data.

DataTongs --ConfigureDataDelivery
DataTongs --ConfigureDataDelivery --TemplatePath:./Templates/Main

The switch can also be enabled via configuration:

{
  "ShouldCast": { "ConfigureDataDelivery": true }
}

What it does

For every table that produces output, DataTongs opens Tables/<table>.json inside the template and writes (or overwrites) a DataDelivery block with:

Property	Value
`ContentFile`	Path to the written `.tabledata` file, relative to the template root.
`MergeType`	Either the per-table `MergeType` from config, or a default derived from `ShouldCast:MergeUpdate` + `ShouldCast:MergeDelete` (see below).
`MatchColumns`	The key columns used during extraction — either the per-table `KeyColumns`, or the primary key auto-detected from the source database (same logic that drives the generated script).
`MergeFilter`	The per-table `Filter`, if one was specified.
`MergeDisableTriggers`	Mirrors `ShouldCast:DisableTriggers`.

Tables whose JSON file doesn't exist in the template are skipped with a warning. No file is created from scratch — this tool configures existing tables, it doesn't scaffold new ones (use SchemaTongs for that).

Default MergeType derivation

When a table's config doesn't set MergeType explicitly, the default is derived from the global ShouldCast flags:

`MergeUpdate`	`MergeDelete`	Default `MergeType`
`true`	`true`	`Insert/Update/Delete`
`true`	`false`	`Insert/Update`
`false`	any	`Insert`

A per-table MergeType on the Tables[] entry always wins over the derived default.

Template root discovery

--ConfigureDataDelivery needs to know where your template lives. It uses this precedence:

--TemplatePath:<path> CLI switch.
TemplatePath config key.
Auto-discovery: walk upward from ContentPath looking for the nearest directory containing a Template.json.

If nothing is found, DataTongs logs a warning and skips the configuration step — extraction still completes normally. See Schema Packages — DataDelivery for the full property reference and the schema validation that catches misconfigurations at authoring time.

Source query and change detection

How rows are extracted

DataTongs extracts data using a deterministic, JSON-shaped query. On MySQL, that query uses JSON_OBJECT aggregation. The principles that apply:

Non-blocking reads. MySQL's default transaction isolation and InnoDB's row-versioning keep DataTongs from blocking writers during extraction. DataTongs is extracting a snapshot of reference data, not transactional data requiring strict consistency.
Deterministic order. Results are ordered by the key columns to produce diff-friendly output. Run DataTongs twice against unchanged data and you get identical files.
Empty tables. When the query returns no data, DataTongs skips script generation for that table entirely.

Change detection on MySQL

On SQL Server and PostgreSQL, generated MERGE scripts carry an inline change-detection predicate. MySQL's INSERT ... ON DUPLICATE KEY UPDATE handles the collision branch at the engine level — the database itself decides whether each incoming row's key collides and fires the update when it does. DataTongs emits the full column list in the UPDATE clause so every non-key column is refreshed on collision.

Running the generated script twice against unchanged data is safe: every row collides, every update sets the same values, and the net effect is zero row changes. Diffs are clean. Reviews are honest.

Last reviewed May 2026 by the SchemaSmith Team.