SchemaSmith Documentation

Data Delivery - MySQL (Enterprise)

Package and deliver MySQL data updates as controlled artifacts.

Data Delivery

Overview

MySQL 8.0+ — Data delivery for MySQL uses platform-native syntax: REPLACE INTO for full-row upserts, INSERT ... ON DUPLICATE KEY UPDATE for selective column updates, and INSERT IGNORE for insert-only seeding. These replace SQL Server's MERGE statement with MySQL-native equivalents.

Also available for SQL Server and PostgreSQL

In the Community edition, data delivery can be achieved via scripts. SchemaQuench supports those scripts and the TableData quench slot and behavior. The Enterprise Edition expands on this capability by offering the ability to export your data to JSON files and configure tables for automated data delivery. This feature uses the JSON files to generate merge scripts dynamically using MySQL syntax such as REPLACE INTO, INSERT ... ON DUPLICATE KEY UPDATE, and INSERT IGNORE, which are applied immediately prior to the TableData scripts.

If you continue using the TableData scripts, we will loop to resolve dependency order. However, this can be expensive, especially with many tables or complex dependency chains. You can optionally assign names to your scripts, so they will sort to control their execution order and reduce or eliminate the need for looping, though this approach will increase the maintenance burden.

Configuration

With data delivery, we traverse the foreign keys to optimize the delivery order. We also use looping to resolve dependencies that cannot be accurately determined up front, such as circular references.

To facilitate this method, you can use DataTongs to capture the necessary JSON files. The DVDRental product in demo repository is configured to use data delivery for both seed data and the full set of test data.

At the table level, you can simplify setup and reduce script management by configuring the following properties in your table's JSON file:


    {
        "ContentFile": "Table Contents/actor.tabledata",
        "MergeType": "Upsert",
        "MergeDisableTriggers": false,
        "MergeUpdateDescendents": false,
        "MatchColumns": "",
        "MergeFilter": ""
    }
                                

A complete table JSON with these properties configured can be found at the bottom of this file.

Property Required Default Description
ContentFile Yes Relative path from the template root to the .tabledata JSON file created by DataTongs.
MergeType Yes Controls which merge strategy is used: Replace, Upsert, or Insert.
MergeDisableTriggers No false When true, wraps the merge with trigger disable/enable statements.
MatchColumns No Auto-detected Comma-separated column names used to match source and target rows. Overrides auto-detection from PK or unique index.
MergeFilter No SQL predicate applied to the DELETE clause only, scoping which rows are eligible for deletion.

Important

  • The ContentFile path is relative to the template for which the table is defined.
  • The MergeFilter scopes the DELETE operation when included.
  • The supported MergeType values are Replace, Upsert, and Insert (or None to skip delivery).
  • MySQL merge operations use REPLACE INTO (Replace), INSERT ... ON DUPLICATE KEY UPDATE (Upsert), or INSERT IGNORE (Insert) depending on the configured merge type.

Merge Strategies

MySQL data delivery supports three merge strategies, each mapping to a native MySQL statement. All strategies use JSON_TABLE to parse content file data inline, avoiding temporary tables.

MergeType Values

MergeType SQL Pattern Behavior
Replace REPLACE INTO Deletes any existing row matching the PK or unique index, then inserts the new row. Triggers ON DELETE CASCADE.
Upsert INSERT ... ON DUPLICATE KEY UPDATE Inserts new rows and updates non-key columns on conflict. No rows are deleted, so cascade rules are not triggered.
Insert INSERT IGNORE Adds new rows only. Duplicate key violations are silently skipped. Existing rows are never modified.

JSON_TABLE Pattern

Each merge strategy builds its SQL around the JSON_TABLE function, which extracts rows from the JSON content file directly within a SELECT statement:


    SELECT jt.*
    FROM JSON_TABLE(@json_data, '$[*]' COLUMNS (
        actor_id    INT           PATH '$.actor_id',
        first_name  VARCHAR(45)   PATH '$.first_name',
        last_name   VARCHAR(45)   PATH '$.last_name'
    )) AS jt;
                                

The @json_data session variable holds the content file contents. Column definitions are generated from the table metadata, with types mapped to their MySQL equivalents.

Replace — REPLACE INTO

The Replace strategy generates a REPLACE INTO ... SELECT FROM JSON_TABLE statement. MySQL's REPLACE INTO deletes any existing row that matches the primary key or unique index, then inserts the new row.


    REPLACE INTO actor (actor_id, first_name, last_name)
    SELECT jt.actor_id, jt.first_name, jt.last_name
    FROM JSON_TABLE(@json_data, '$[*]' COLUMNS (
        actor_id    INT           PATH '$.actor_id',
        first_name  VARCHAR(45)   PATH '$.first_name',
        last_name   VARCHAR(45)   PATH '$.last_name'
    )) AS jt;
                                

Warning

Because REPLACE INTO performs a delete-then-insert internally, it triggers ON DELETE CASCADE on child tables. If a parent table uses Replace and has child foreign keys with ON DELETE CASCADE, child rows will be deleted even if the parent row data has not changed. Use Upsert instead when child tables have cascade rules.

Upsert — INSERT ... ON DUPLICATE KEY UPDATE

The Upsert strategy generates an INSERT INTO ... SELECT FROM JSON_TABLE ... ON DUPLICATE KEY UPDATE statement. Key columns (match columns) are excluded from the UPDATE clause, so only non-key values are modified on conflict. This is the non-destructive approach — no rows are deleted, so cascade rules are not triggered.


    INSERT INTO actor (actor_id, first_name, last_name)
    SELECT jt.actor_id, jt.first_name, jt.last_name
    FROM JSON_TABLE(@json_data, '$[*]' COLUMNS (
        actor_id    INT           PATH '$.actor_id',
        first_name  VARCHAR(45)   PATH '$.first_name',
        last_name   VARCHAR(45)   PATH '$.last_name'
    )) AS jt
    ON DUPLICATE KEY UPDATE
        first_name = VALUES(first_name),
        last_name  = VALUES(last_name);
                                

Insert — INSERT IGNORE

The Insert strategy generates an INSERT IGNORE INTO ... SELECT FROM JSON_TABLE statement. Rows that would cause a duplicate key violation are silently skipped. Existing rows are never modified.


    INSERT IGNORE INTO actor (actor_id, first_name, last_name)
    SELECT jt.actor_id, jt.first_name, jt.last_name
    FROM JSON_TABLE(@json_data, '$[*]' COLUMNS (
        actor_id    INT           PATH '$.actor_id',
        first_name  VARCHAR(45)   PATH '$.first_name',
        last_name   VARCHAR(45)   PATH '$.last_name'
    )) AS jt;
                                

Additional Merge Options

MergeFilter
The MergeFilter property restricts which rows are affected during a Replace operation. When set, only rows matching the filter expression are candidates for deletion before re-insertion. This is useful when a table contains both managed seed data and user-created rows that should not be removed.

    {
        "MergeType": "Replace",
        "MergeFilter": "category = 'system'"
    }
                                        
MergeDisableTriggers
When set to true, triggers on the target table are dropped before delivery and recreated afterward. The default value is false.
CASCADE Pre-Flight Validation
Before delivery begins, SchemaSmith queries INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS and INFORMATION_SCHEMA.KEY_COLUMN_USAGE for every table configured with the Replace merge type. If any child table has a foreign key with ON DELETE CASCADE pointing at a Replace table, delivery fails with an error before any SQL is executed. Resolution options: change the parent table's MergeType to Upsert, or remove the ON DELETE CASCADE rule from the child foreign key.

Delivery Algorithm

Data delivery uses a two-pass, foreign-key-aware algorithm to deliver data in the correct dependency order. This avoids FK constraint violations without requiring you to manually order your tables.

Foreign Key Classification

Before delivery begins, each foreign key relationship is classified based on the nullability of its columns. Only foreign keys pointing to other tables in the delivery set are considered.

FK Type Condition Effect
Required Any FK column is NOT NULL The referenced table must be delivered first. This creates a hard dependency edge.
Deferred All FK columns are nullable The FK columns can be temporarily set to NULL, allowing the table to be inserted in Pass 1 without the referenced data.
Ignored Self-referencing FK, or referenced table is not in the delivery set Not factored into delivery ordering.

Pass 1: Topological Delivery

Tables are delivered in topological order based on their required dependency edges. The algorithm iterates until all deliverable tables are processed:

  • No deferred columns: Tables whose required dependencies have all been delivered receive the full merge (using the configured MergeType).
  • Has deferred columns: Tables with nullable FK columns pointing to not-yet-delivered tables receive an INSERT IGNORE merge. The deferred FK columns are set to NULL in the source query. These tables are queued for Pass 2.
  • Failed deliveries: If a table fails during delivery (e.g., due to a constraint violation from an unresolved dependency), it is retried in the next iteration.

Circular Dependencies

If tables remain after topological iteration completes (all have unmet required dependencies), they form a hard circular dependency. These tables are delivered as-is with full merge scripts. Any FK constraint violations will surface as errors and can be resolved with migration scripts.

Pass 2: Deferred Column Update

After all tables have been inserted, tables that were deferred in Pass 1 receive a second full merge using their configured MergeType. This fills in the actual FK values now that all referenced rows exist in the database.

WhatIf Mode

When WhatIf mode is enabled, the delivery algorithm performs the same FK classification and topological ordering, but no SQL is executed against the database. Instead, it logs what each table delivery would do, including which tables would be deferred, which would be delivered with retry, and which would receive a Pass 2 update.

Checkpoint and Resume

SchemaQuench tracks "Data Delivery" as a single checkpoint step per template. If delivery fails partway through (e.g., on the fifth of ten tables), the entire data delivery step re-executes on the next run. The merge-based algorithm is designed to be safely re-runnable, so tables that were already delivered are re-merged without data loss or duplication.

Content File Format

Content files use the .tabledata extension and contain a JSON array of row objects. Each object represents one row, with property names matching the column names of the source table.

[
{"actor_id": 1, "first_name": "PENELOPE", "last_name": "GUINESS", "last_update": "2006-02-15 04:34:33"},
{"actor_id": 2, "first_name": "NICK", "last_name": "WAHLBERG", "last_update": "2006-02-15 04:34:33"}
]

How Content Files Are Created

Content files are created by DataTongs using MySQL's JSON_ARRAYAGG(JSON_OBJECT(...)) function to extract table data as JSON. DataTongs can also automatically configure the data delivery properties in your table JSON files when the ConfigureDataDelivery setting is enabled.

Column metadata is not stored in the content file. At delivery time, SchemaQuench queries the target database's INFORMATION_SCHEMA to resolve column types, nullability, and auto-increment status. This means the content file is decoupled from the target schema -- columns can be added or removed from the table definition without regenerating the content file, as long as the match columns remain present.

Excluded Columns

The following column types are automatically excluded from content files and merge scripts:

Column Type Reason
AUTO_INCREMENT columns MySQL assigns new identity values on insert.
Generated columns (virtual and stored) Computed by MySQL from expressions and cannot be inserted or updated directly.

Special Type Handling

Certain MySQL data types require special encoding in content files to preserve fidelity during extraction and delivery:

Data Type Encoding Extract Function Delivery Function
BINARY, VARBINARY, BLOB types Base64 TO_BASE64() FROM_BASE64()
GEOMETRY, POINT, LINESTRING, etc. WKT strings ST_AsText() ST_GeomFromText()
DATE YYYY-MM-DD Native JSON serialization Native JSON parsing
DATETIME, TIMESTAMP ISO 8601 Native JSON serialization Native JSON parsing
TIME HH:MM:SS Native JSON serialization Native JSON parsing
JSON Passed through as-is Native JSON extraction Native JSON parsing
ENUM, SET String values Native JSON serialization Native JSON parsing

Match Column Detection

When MatchColumns is not specified, SchemaQuench auto-detects the best match columns from the target table using this priority:

  1. Primary Key -- preferred if one exists.
  2. Unique index -- selected by fewest columns, with preference for all non-nullable columns.

You can override auto-detection by specifying column names in the MatchColumns property as a comma-separated list.

Platform Notes

MySQL-specific considerations that affect data delivery behavior and configuration.

MySQL 8.0+ Required

Data delivery relies on JSON_TABLE, which was introduced in MySQL 8.0. Earlier versions of MySQL do not support this function and cannot use data delivery.

max_allowed_packet Advisory

Content file data is loaded into a MySQL session variable and passed to JSON_TABLE. Large content files may exceed the server's max_allowed_packet setting, causing delivery to fail with a "packet too large" error.

Binary and geometry columns are especially affected because Base64 and WKT encoding increases the JSON payload size significantly compared to the raw data. If delivery fails on tables with these column types, increase max_allowed_packet in the MySQL server configuration.

AUTO_INCREMENT Columns

AUTO_INCREMENT columns are excluded from content files. MySQL assigns new identity values automatically on insert. If a table's primary key is an AUTO_INCREMENT column, match columns will fall back to a unique index instead.

JSON_TABLE Type Mapping

When generating the JSON_TABLE COLUMNS clause, MySQL column types are mapped as follows:

MySQL Column Type JSON_TABLE Column Type
INT, TINYINT, SMALLINT, MEDIUMINT, BIGINT INT
DECIMAL, NUMERIC DECIMAL(65,30)
FLOAT FLOAT
DOUBLE DOUBLE
BIT(n) BIT(n)
DATE DATE
DATETIME(n), TIMESTAMP(n) DATETIME(n)
CHAR, VARCHAR, TEXT, TINYTEXT, MEDIUMTEXT, LONGTEXT TEXT
BINARY, VARBINARY, BLOB, TINYBLOB, MEDIUMBLOB, LONGBLOB TEXT (Base64 encoded)
GEOMETRY, POINT, LINESTRING, POLYGON, etc. TEXT (WKT encoded)
ENUM, SET Full column type definition
JSON JSON

Additional Resources