Статьи

Автоматизируйте синхронизацию данных с помощью SQL Data Compare и TeamCity за 15 минут

Какое-то время я откладывал задачу по настройке процесса синхронизации для определенной части корпоративных данных. Эти данные заполняются в единую таблицу в производственной среде по ночам, но их также нужно время от времени синхронизировать с тестовой средой и средой разработки. Не вдаваясь в подробности о природе данных, они состоят из около 700 000 записей, которые изменяются либо посредством обновлений, либо путем вставок. Обычно я не люблю переносить производственные данные в другую среду ( есть просто лучшие способы ), но природа этих данных требует синхронизации сред, чтобы разработчики могли выполнять свою работу.

Я некоторое время сидел на этом требовании, поскольку я не наслаждался тем, что могло бы стать трудоемкой и трудоемкой задачей. Это была таблица с более чем 40 столбцами, включая множество внешних ключей (один из них самореференциальный) и вычисляемое поле, а также чтобы не создавать проблем со ссылочной целостностью, мне нужно, чтобы это происходило быстро, чтобы не хаос с производственной средой. На мой взгляд, это будет означать ручное написание какой-либо формы ETL либо непосредственно в TSQL или через SSIS, либо даже идти по пути репликации SQL. Возможно, это была даже задача SQL MERGE, но эти специфические среды все еще зависали в SQL 05, так что этот маршрут отсутствовал.

One quick caveat: I’m not a DBA, I’m a developer who works with databases. There may have been other angles to come at this from, but the solution I arrived at is fast, simple and easy to monitor. The fact that it ended up being a 15 minute job on my weekend and I didn’t mind giving up a little of my valuable Sunday on this particular task was a very nice result!

SQL Data Compare and the command line

Red Gate’s SQL Data Compare is rather adept at syncing databases across environments via a friendly UI. Enter the source and target servers and databases, run the compare, look at the differences, run the sync. Easy as that.

Thing is though, you can also instrument all this via the command line. It looks something like this:

SQLDataCompare.exe /s1:SourceServer /db1:SourceDb /s2:TargetServer
/db2:TargetDb /Include:table:\[MyTable\] /Include:Identical /v /sync

This is mostly pretty self-explanatory, it’s the four switches at the end that are the interesting ones: Only compare the table titled “MyTable” (the square braces are escaped by the preceding backslash), include identical tables in the comparison result so the process doesn’t return an error when everything matches, give me verbose output and actually sync the differences, don’t just compare them.

Because there’s a nice command line interface we can easily orchestrate this from TeamCity. The environment in question was already running the SQL Compare suite as it’s used for Automated database releases with TeamCity and Red Gate. Reviewing the project I created, it’s just a normal TeamCity project with no VCS settings (we’re not getting anything from source control), a command line build step, a couple of build parameters and a trigger. Actually, the command is a little trickier because I’ve extracted it into a template and used environment variables for the server names:

Конфигурация сборки, показывающая синтаксис командной строки

The reason I’ve done this is so that I can reuse the build between both production-to-test and test-to-development data syncs. This makes maintenance really easy if I want to change command line parameters later on. Here’s what the build parameters look like for the production-to-test sync:

Переменные среды, используемые в команде

The test-to-development sync is identical (I simply copied the production-to-test build), other than obviously setting the two parameters above according to the respective server names. I’ll trigger off the production-to-test sync well and truly out of business hours then on successful completion, trigger a test-to-development build. This way the window where the production server is being touched in minimised.

One last thing: make sure the account TeamCity is connecting to SSQL Server with has had the principle of least privilege actively applied. Put your security hat on and ask “What does this account absolutely, positively need to be able to do?” and then don’t let it do anything beyond this. For example, it only needs to read from production but it needs to be able to select, insert and update in the test and development environments (I’m assuming no redundant records are inserted directly into these environments and that deletions won’t be necessary).

Once a build runs successfully, the log looks something like this:

[10:39:14]: Skip checking for changes - there are no VCS roots defined
[10:39:14]: Clearing temporary directory:
            C:\TeamCity\buildAgent\temp\buildTmp
[10:39:14]: Checkout directory: C:\TeamCity\buildAgent\work\db35110eeed7e0e5
[10:39:14]: Repository sources transferred
[10:39:14]: Updating sources: server side checkout...
[10:39:14]: Publishing internal artifacts (1s)
[10:39:16]: [Publishing internal artifacts] Sending build.start.properties
            file
[10:39:14]: Starting: "C:\Program Files\Red Gate\SQL Data Compare 9
            \SQLDataCompare.exe" /s1:MyProductionServer /db1:MyTable
            /s2:MyTestServer /db2:MyTable /Include:table:\[MyTable\]
            /Include:Identical /v /sync
[10:39:14]: in directory: C:\TeamCity\buildAgent\work\db35110eeed7e0e5
[10:39:15]: Red Gate SQL Data Compare Command Line Utility V9.1.0.365
[10:39:15]: ===============================================================
[10:39:15]: Copyright c Red Gate Software Ltd 1999-2011
[10:39:15]: Serial Number: XXX-XXX-XXXXXX-XXXX
[10:39:15]: Comparing database MyProductionServer.MyTable with database 
            MyTestServer.MyTable...
[10:39:15]: Registering databases
[10:39:17]: Mapping
[10:39:17]: Comparing databases
[10:40:53]: Generating SQL scripts
[10:40:54]: Synchronizing databases
[10:41:15]: Finished
[10:41:15]: Summary Information
[10:41:15]: ===============================================================
[10:41:15]: DB1 = MyProductionServer.MyTable
[10:41:15]: DB2 = MyTestServer.MyTable
[10:41:15]: Object type     Name                     Records      DB1 DB2
[10:41:15]: ---------------------------------------------------------------
[10:41:15]: Table           [dbo].[MyTable]           683119      ==  ==  
[10:41:15]: Table           [dbo].[MyTable]             7379      <>  <>  
[10:41:15]: Table           [dbo].[MyTable]             3217      >>      
[10:41:15]: ---------------------------------------------------------------
[10:41:15]: OK
[10:41:15]: Process exited with code 0
[10:41:15]: Publishing internal artifacts
[10:41:15]: [Publishing internal artifacts] Sending build.finish.properties 
            file
[10:41:15]: Build finished

Let’s break this down a little: there are a total of 693,715 rows in the source, the vast majority of which are identical to the target other than the 7,379 which have some differences and the 3,217 which don’t exist in the target at all. All of this runs in a grand total of 2 minutes and 1 second but the actual sync script, that is the writing to the target environment, takes only 21 seconds. Nice.

Gotchas

There were two small sightly derailing gotchas in all this: Firstly, Red Gate had a little oversight with the earlier releases of version 9 of SQL Data Compare and managed to break the command line runner. There’s a patch available at that link which brings it up to version 9.1.0.365 which works just fine.

The other one was that TeamCity was initially failing the build with a message like this:

[11:09:21]: Comparing databases
[11:09:26]: Unhandled Exception: System.InvalidOperationException: Showing a modal dialog box or form when the application is not running in UserInteractive mode is not a valid operation. Specify the ServiceNotification or DefaultDesktopOnly style to display a notification from a service application.

It turns out there was a dialog box sitting there on the TeamCity machine waiting for some action (from memory, it was one of those “Can we anonymously collect feedback” boxes). It would be nice to be able to instruct SQL Data Compare to run unattended but I couldn’t see anything in the command help that indicated this. Anyway, log on to the server, dismiss the dialog and that’s the end of that.

Summary

This is a pretty straight forward implementation but it’s doing something altogether more complex under the covers. Perhaps it could have been orchestrated from SQL Server using other means but I like the intuitive way SQL Data Compare handles it and I really like being able to see it in my TeamCity build list including the output describing what was affected. And of course I can get all the usual TeamCity notifications if something goes wrong.

But best of all, it’s literally a 15 minute job to setup and it does exactly what I need. That’s a very nice result indeed.

Source:  http://www.troyhunt.com/2011/08/automated-data-syncing-with-sql-data.html