SchemaSmith Documentation

Getting Started

Recommended best practices and considerations.

Development Cycle

SchemaTongs

The first step is to extract your existing structure into files ready for source control.

See the SchemaTongs Walkthrough to help you get started with the tool.

  • Choose the database that should be the gold copy and extract it to create an initial product.
  • Extract existing databases one by one and use a text comparison tool, such as BeyondCompare, to get a clear picture of the drift between databases. Understanding what kind of changes will occur when this new template is used to true up each one is critical. You may find changes that you want to add to or remove from your template during this comparison process.
  • Assess the impact of the changes. You may need to modify code for views or stored procedures in some way to account for environmental differences. Synonyms can be helpful if you have inconsistent naming conventions across servers.
  • Consider the time to add or modify indexes and other potentially long-running impacts. Consider data cleanup for missing unique indexes or foreign key constraints to ensure the data is in a safe state to apply them. You may need to roll out this new process in layers or digestible pieces if the time to apply the changes will exceed allowable maintenance windows.
  • Add any necessary migration scripts to validate the data cleanup and perform a final cleaning at the time of update to deal with any bad data that might have been introduced between a manual cleanup and the eventual application of the missing constraints or indexes.
  • The end goal is to repair and eliminate drift, but you do not craft a sword with a single hammer blow. If you attempt to rush a complicated forge, you might snap the metal and ruin the whole piece. Starting small and adding new objects over time is a good practice. The entire database does not have to be fixed in one pass.

Best Practices

We strongly recommend that each object file, whether it is table JSON or a SQL file for other objects, be named with the schema and object name. This will make them simpler and easier to locate in your source control when you need to make changes. Our examples will use the form < schema >.< object name >.ext, but you could use spacing or underscores if you prefer or follow a different convention altogether. It is your schema... your source code... you should tailor your repository to meet your needs.

For scripted (.sql) items, we strongly recommend using CREATE OR ALTER in your script so that they can be applied safely every time. This keeps all the objects trued up with every release and helps to eliminate the drift that can creep in overtime. The management of dependencies for scripted objects is left up to the script to manage. Things such as dropping a computed column that references a function that is being updated. This allows you to decide the correct way to manage it. If the function is rarely changed, then perhaps the script should become "create when not exists" rather than using the CREATE OR ALTER approach. If the object tends to change often, then you can script the removal of such dependencies. Dropping a computed column or constraint is generally all you would need because the objects will be applied prior to table updates and the missing items would then be restored automatically by SchemaQuench.

For triggers, we strongly recommend using a naming convention of < schema >.< table name >.< trigger name >.sql to keep them grouped by table. An alternative approach could be to create folders for each table and put the triggers in the correct folder and simply use < trigger name >.sql. Whatever makes it easier to find and update the object later.

Important

SchemaTongs will follow our recommended naming conventions and the basic folder layout. You are free to rename and group the files it generates for your repository, but keep in mind that the tool will not understand those changes. Rerunning the tool will recreate the files with the original folder structure and naming again, and you will need to modify the names and locations to update the repository.

Folders and subfolders

We have defined the folder structure for each template, divided into the basic object types we support in SchemaTongs and adding the ability to define before and after migration scripts to do any special handling like:

  • data fixup prior to adding foreign or unique keys
  • data fixup after a new column is applied
  • forcing the creation of dependent objects before the rest of the process runs

Anything that might need to happen outside the main flow of the update either at the beginning or the end of the database update. There is also a folder for data delivery scripts to help you in the delivery of seed data.

All of these folders support the addition of subfolders to allow grouping objects into domains or any other logical organization that simplifies your development efforts. Even the Tables folder can have subfolders to organize your tables into logical groupings when the model gets complex enough that you want to break it down.

Source Control

With the product defined, it is time to get it under source control.

Using the SchemaSmithy allows you to break your database schema into easily located and identified pieces that can be placed in source control as text files just like any other source code. You can then begin to monitor and observe the changes and evolution of your schema just as you do with your application code base. You can also choose to place your SchemaSmithy project directly in a folder in your main source control project so that database changes become atomic with the code. Some teams use this approach and some prefer to have the database changes remain separate with a different rollout cadence. That flexibility is completely in your hands.

From this point your schema changes will be straightforward to see and track via source control like your other source code. See Examining The Product Repository Structure for information on how the files will be laid out on disk. Socialize this new process and have people begin making changes here rather than your previous location.

It is strongly recommended to start with gatekeepers, code owners in git, who must approve pull requests to ensure the new repository is being used correctly. In the beginning, you may need both the old and new system as you migrate to this approach and slowly clean up the drift. The same group should be approving all changes going into the old and new repositories to ensure changes are checked into the correct location.

Some organizations have database changes restricted to a small group of experienced and trained database developers and/or DBAs. While it is good to have oversight, it can create a bottleneck on getting changes done for multiple project streams running concurrently. SchemaSmithy can allow opening up the ability for anyone to check in changes and make pull requests while keeping oversight and approval still in the hands of that smaller group. Project startups and designs can still involve those individuals to ensure a cohesive database design. The process of incremental changes can then move down to developers with the core group only tasked with making sure proper guidelines are followed and nothing is veering outside agreed design parameters. Needing an extra three characters on a VARCHAR column does not have to pull someone into a meeting or create extra tickets with dependencies that block other coding work waiting for an additional resource to make the change.

Most column changes are as simple as making the type or property adjustment (length, type, nullability, etc.), and SchemaQuench handles the rest. We drop dependent constraints and indexes and recreate them as needed so the developer and reviewer can focus on the specific change, not the cascade of actions that might be required to perform it. CI should catch any edge cases that we might not handle, and you can always create migration scripts for any special conditions you might need to handle specific to your environment.

We recommend running CI for every PR. Catch issues early and fix them before they cause downstream impacts. The same principles that apply to your code apply to your structure. Test early, test often, fix problems and move forward with confidence. It is recommended that you apply to existing databases with representative types of data. Some issues only arise when data is present, such as altering data types when the data does not conform to the new type or is too long to fit in a truncated length. Just like any other code, you need human review and the double check of testing to keep your quality high and your customer impacts low.

It is also advisable to use something like Docker or other local setup to test applying your product on completely empty databases. This helps make sure that you include all the objects you need in the repository and that you can start a fresh environment when you need one. Too often, dependencies creep in overtime. For example, a "before" migration script gets added that needs an object that would not exist yet in a clean run. Those scripts should be protected against failing in the case of an initial deployment. It is valuable to test that condition regularly to maintain that ability. Being able to quickly spin up a local, empty set of databases for things like integration testing can be a powerful addition to the developer's toolkit. Running tests in the same form they will run in CI on a local environment can save hours of waiting for feedback that could have been addressed before creating the first PR.

We provide a schema file that you can edit and use to validate your JSON files in places such as CI or GitHub Actions to prevent downstream issues. You can specify which things are optional and which are required, such as requiring every table to have at least one index if that is your policy. You can also remove support for things you do not want used in your product, such as XML Indexes or Full Text Indexes. It is one more layer of automation that can help maintain consistency beyond simple human review.

Unleashing the power of templates

Templates are like the class definition for your database. They define the tables and other objects that should exist in each "instance" or database for that template. Whether your server has one database that matches the template or 100... SchemaQuench identifies them based on the script you provide and updates each one to match the template definition. One run of SchemaQuench updates all the templates and all the associated databases for each template in the template order specified in the product.json file.

There are times when complex dependencies exist between templates... A chicken/egg relationship that may have developed over time. The update process requires defining the order that templates will be deployed, but what if a database needs certain objects to exist ahead of time, due to circular dependencies? Your choice is to maintain some form of "startup" script to create the empty database and then add those initial dependencies... OR you could define a pre template that only applies those special dependent objects.

You can define a template for things like master or msdb to apply server level objects and scripts such as jobs or server configurations. The order they appear in the product.json will determine the order they get applied to your server. You might even have two templates that refer to the master database where one applies before and one at the end. It is best if any given object only exists in one template for a given database, but there is no restriction that will prevent you from having different versions of the same object that apply in some sequence. Keep in mind that it should be a LAST resort and include plenty of comments to help people understand that more than one version exists in different places and why.

Using Script Tokens

Script Tokens allow configuring certain values by environment to simplify your deployments. Update a few token values and change things like specific database names. They also allow you to specify things like minimum versions to be used in your version validation scripts and the new version to stamp in the version stamp scripts. These tokens are defined at the product level and can be used in the product and template scripts located in those JSON files.

SchemaQuench

Now it is time to begin applying changes.

See the SchemaQuench Walkthrough to help you get started with the tool.

  • Use the product validation script to ensure you will only apply to the correct server(s).
  • Use the database identification script for each template to find all databases of the given type.
  • Both the product and template definitions include the ability to stamp and validate a version. Use these scripts to validate that you are not unintentionally applying an older version.
  • Deployment is easily automated:
    • Deliver the command line application and the contents of your product folders
    • Set up the config to point to the product location (SchemaPackagePath)
    • Execute the application to apply the update.
  • There is no requirement to run directly from the server being updated, that is up to your own preferences and requirements for access.
  • This deployment should become part of your CI/CD process.
  • Continue to add missing pieces over time until your "true up" is completed.
  • Eliminate any legacy deployment process to keep everything under a single source of truth in source control for simple and efficient change tracking.

Additional Considerations

  • Capture your seed data using DataTongs and add the scripts in the TableData folder to keep it up to date with every release.
  • Make use of migration scripts before and after the database update to handle any bad data cleanup or edge cases that might come up.
  • We do not drop database objects such as procedures, views, and triggers. Consider removing them in two releases. In the first, replace the current object with a drop if exists. In a future release the script can be deleted from source control.

If you have a specific need that is not addressed, please reach out to our master craftsman and YouTube spokesman Forge Barrett. He will attempt to provide guidance or put the item up for consideration on our roadmap.

Additional Resources