Tuesday, August 18, 2015

Continuous Integration with Stata

Continuous Integration is a workflow that frequently merges code changes and automatically runs tests. This can be especially helpful when working in teams. Several free services exist, such as Travis-CI which provides Linux build infrastructure and integrates with GitHub. Basically, you instruct Travis how to setup a Linux environment for your project and then direct it what tests to run. Travis will then automate this, checking your repository at every push.

First off, you'll obviously need a Stata license that allows for this kind of work. The single-user license allows for 3 "seats", or you may have a network license with extra capacity. In Travis, you should go to the Settings for your repo and "Limit concurrent builds" to 1 (or whatever is permitted by you license).

Travis works each time off of clean Linux machine images so your setup will have to allow Travis to install Stata automatically. Travis allows you to store private passwords on the test infrastructure, so you could either host the Stata files on a password protected machine or you could encrypt the files with a password (e.g. with gpg --symmetric). Either way, Travis will download files repeatedly so you may want to make them as small as possible. I've found the best way is to take an existing install and then strip out anything unnecessary, including documentation (all *.pdf, *.sthlp, *.ihlp, *.key, and *.dta which are almost all example data), graphical tools (*.mnu, *.png, *.dlg, *.idlg), and alternate executables. Compress this folder with xz. Then upload the files to a web server that allows for command-line download (if you are using a general hosting site, you could check out plowshare) making sure to take care of security. 

Now you can setup your .travis.yml to setup the machine.

  1. Extract the password needed for you Stata files (you can do this by installing the Travis client on your local machine, and then use it to encrypt as an environment variable).
  2. Download and set-up the Stata files.
  3. Add the Stata folder to the PATH (it needs to be able to find the license file).
Finally, add commands to test your code:
  • Obviously, you already have lots of code to test your programs, right? :)
  • In order for Travis to know if there was a problem, the executable that runs the tests should return a non-zero error code upon failure. The Stata executable will not do this (!), but you can use a simple wrapper like statab.sh. If you download this, remember you will have to make it executable.
  • The tests can be run either from a master do file, or be called independently from a script or makefile.
The more I use Stata the more it valuable to me it becomes!