Wednesday, April 22, 2015

Code Management in Dropbox

Dropbox is not the ideal version control system, but some people do not want to make the leap to a full VCS (like git or svn which full software developers use). Given that one should not make the perfect the enemy of the good, what simple tips can Stata-base economists follow that will make their lives easier? Here are some tips that will be helpful on projects that last a while (longer than short-term memory or the involvement of an assistant) or involve multiple people. As always, one should have the over-arching goal that the project be reproducible (at almost all points, you can follow an explicit recipe and produce all required outputs). While my tips will help with this, you may require some project-specific solutions, but it is well worth the effort.

  1. Backup: Structure your folders so that a minimal number can be used to reproduce everything in the project. A common approach is to have a code folder (which changes often) and a data/raw folder (which should almost never change). (This means all "manual" actions like cleaning should be embedded in code!) With this approach one can store a one-time backup of the raw data and then store period snapshots of the code folder. At key times (e.g. after every new draft or presentation) zip up the code folder and archive it untouched. Since all code is periodically backed up, you can get ride of no longer used code.
  2. Avoid editing conflicts: If multiple people are editing the same files here is a tip to reduce friction. If you start editing a file, make an innocuous edit and save it. Then, before editing a file, check the "last modified" time. If someone else edited the file recently, chat to negotiate access (your team should decide what "recently" means).
  3. Branching: If you want to a make a series of changes that will leave a key file in an unusable state then you should make a "branch". You can do this (a) in Dropbox by duplicating the files (including unstable outputs) with modified names (have a convention here), or (b) copying the project out of Dropbox for editing. Try to make changes small enough that they can be merged back into the main file quickly. The longer files diverge the harder this will be.
  4. Syncing: For intermediate files only needed internally by one piece of code, you can reduce syncing by creating those files in your `c(tmpdir)' (with -tempfile- or just named explicitly) rather than in your project folder.

Here are other simple general tips that can be applied to Dropbox:

    1. No one's full path should be written into the code. Code should be run given Stata is in the appropriate working directory (often the root of the project folder or in the code subfolder). A nice benefit is that if the project folder is copied out of Dropbox it should run easily. If you have machine-specific configuration details, those should be defined on the machine in environment variables and brought into Stata using -local env_var : environment ENV_VAR-
    2. Have a local code/ado folder where you store all the needed modules. In the header of your code you should include a script that sets code/ado as PERSONAL and then restricts your adopath to just PERSONAL;BASE. You may want to -net set ado PERSONAL- and -mata: mata mlib index- also. 
    3. Learn how to use a visual text diff/merge utility like the one that comes with TortoiseSVN. This will make it easy to compare files between editors and across time.
    4. If possible, do not use spaces in file names (use "_")
    What other tips do you all have?