Monday, September 12, 2016

Guidelines for making a Stata package

Here are some guidelines for making a well behaved Stata package (ado):
  1. Provide all relevant output programmatically, not just textually. Another package may want to work with yours.
  2. Provide a way to check the version of your package.
  3. Use version.
  4. The package file should have a line like 'd Distribution-Date: 01jan2000' so that the package can be updated from adoupdate.
  5. Make clear in the help file what side effects your program may have (globals, characteristics, mata objects, scalars, incrementing of the RNG state by using randomization functions, files left around). Cleanup after yourself if you can, especially after errors (use preserve, tempfiles/tempvars/tempnames, and capture blocks with cleanup sections).
  6. List your package dependencies. You can do this in the package file, but you should also mention this in the help file (and say if you autoinstall dependencies). 
  7. If you capture the output from a long running process, make sure to allow for exit if the break key was pressed (check _rc==1).
  8. Use _assert. It makes error checking code much nicer to read.
  9. Provide tests (such as static source code checks and unit checks) with good coverage. Relatedly, your help material should include useful examples.
  10. Put a brief description/authors in *! initial lines above the command in the ado file (used by which). Don't make this a full changelog, put that in a changelog file.
  11. If you use compiled plugins or mata mlibs, provide the source code. This can help users determine solutions to errors.
  12. If the package is estimating statistical routines then include references for the procedure and make very explicit details of your algorithm.
  13. Make sure your program works with when either the current directory or the tempdir contain spaces.
Extra-special niceties:
  1. Have an option so that relevant output (such as displayed text or saved files) is not dependent on machine-specific characteristics (e.g. directory, # processors, speed/time). This allows for log files to be compared across runs to check for real differences using text comparison methods.
  2. Provide a way to access previous versions of package and a place to post errors. You can easily do this by host your project on a development website like GitHub.
  3. Provide a way to cite the programs.
  4. Provide a centralized place for listing issues/bug reports/enhacement requests/etc (e.g. at GitHub).
  5. If you provide a web-based development site (e.g. at GitHub) provide a version of your help in HTML (see the log2html Stata package).