Monday, September 12, 2016

Guidelines for making a Stata package

Here are some guidelines for making a well behaved Stata package (ado):
  1. Provide all relevant output programmatically, not just textually. Another package may want to work with yours.
  2. Provide a way to check the version of your package.
  3. Use version.
  4. The package file should have a line like 'd Distribution-Date: 01jan2000' so that the package can be updated from adoupdate.
  5. Make clear in the help file what side effects your program may have (globals, characteristics, mata objects, scalars, incrementing of the RNG state by using randomization functions, files left around). Cleanup after yourself if you can, especially after errors (use preserve, tempfiles/tempvars/tempnames, and capture blocks with cleanup sections).
  6. List your package dependencies. You can do this in the package file, but you should also mention this in the help file (and say if you autoinstall dependencies). 
  7. If you capture the output from a long running process, make sure to allow for exit if the break key was pressed (check _rc==1).
  8. Use _assert. It makes error checking code much nicer to read.
  9. Provide tests (such as static source code checks and unit checks) with good coverage. Relatedly, your help material should include useful examples.
  10. Put a brief description/authors in *! initial lines above the command in the ado file (used by which). Don't make this a full changelog, put that in a changelog file.
  11. If you use compiled plugins or mata mlibs, provide the source code. This can help users determine solutions to errors.
  12. If the package is estimating statistical routines then include references for the procedure and make very explicit details of your algorithm.
  13. Make sure your program works with when either the current directory or the tempdir contain spaces.
Extra-special niceties:
  1. Have an option so that relevant output (such as displayed text or saved files) is not dependent on machine-specific characteristics (e.g. directory, # processors, speed/time). This allows for log files to be compared across runs to check for real differences using text comparison methods.
  2. Provide a way to access previous versions of package and a place to post errors. You can easily do this by host your project on a development website like GitHub.
  3. Provide a way to cite the programs.
  4. Provide a centralized place for listing issues/bug reports/enhacement requests/etc (e.g. at GitHub).
  5. If you provide a web-based development site (e.g. at GitHub) provide a version of your help in HTML (see the log2html Stata package).

Monday, May 02, 2016

LyX template for UMD dissertation format

The following is a template for formatting dissertation in LyX for PhD dissertations for the University of Maryland, College Park. The umdthesis.cls file is a slightly modified version of thesis.cls from here (and renamed so it doesn't conflict with existing thesis files). Instructions for using the LyX file are in a comment at the top of that file.


Saturday, April 23, 2016

"Auto-edit failed" errors in LyX

A nice feature of LyX is that you can edit \included (or \inputed) files from the InsertFileChild Document dialog window. I was recently having a problem, however, where I could edit file types that had notepad as the default handler but not those that had Notepad++ as the default handler. I would get dialog messages saying "Error: Cannot edit file \n Auto-edit file file_path.tex failed". If this happened to you, here's the reason and fix.

Unless the child document is a .lyx file, LyX will check to see which handler is associated with text files in Tools→ Preferences→ File Handling→ File Formats, "Plain Text". If this is set to "Custom:auto", then LyX will call Win32's ShellExecute() with the action/verb as edit. The problem my setup is that if you merely set Notepad++ as the default handler of a certain file type by right clicking on the file type and selecting Open withChoose Another App→ Use this App to open all XXX files then the application isn't registered as the handler with the edit verb. Possible solutions
  1. Set Notepad++ to be editor for all included files. Do this by setting the edit command for Plain Text files as "C:\Program Files (x86)\Notepad++\notepad++.exe". This is what I did
  2. LyX could be patched so that if the first call to ShellExecute()  in  osWin32.cpp:autoOpenFile() with edit failed with error code SE_ERR_NOASSOC (=31) to try again with a verb of NULL.
  3. The user can register an edit verb with the default file handler in the registry. See here and here.
General reference for this type of problem.

Thursday, February 18, 2016

More efficient news reading

Nassim Taleb famously doesn't read the news. One of his reasons is that much of the news is inconsequential. This is both because news outlets need to publish even on slows days (the BBC has on only one day stated "[t]here is no news") and because it is difficult to tell at the time what will be important, especially from just headlines. Existing metrics to help determine if something of note actually happened include article view/share counts (e.g. RSS feeds of the "top 10" articles) and links (Google News), but these are insufficient. A better metric would be if a financial market (such as a prediction market) was available. Imagine a news service that tracked markets you cared about and if there was a large and non-temporary price change it would you send you the top stories from around that time period. One could imagine that the user could use a set of news search terms and the link to a market to setup and alert themselves. This could also be a platform for deciding about new prediction markets to set up. The target end-user would be general news readers, not traders as there'd obviously have a bit of a lag.

Sunday, January 31, 2016

Reproducible PDFs from LaTeX

I like using version control, such as git, to store project files. When working with others it is often convenient to commit generated files, such as PDFs from LyX files. PDFs, however, may be different for inconsequential reasons, making them more inconvenient with version control systems.

I have patched the pdftex executables from MiKTeX so that if you are on Windows (x64), the PDFs can be identical. It removes the creation date, modification date, and trailer ID so that the PDF file is constant across runs. It also removes the filepath for included graphics so that if the file is made somewhere else (e.g. an a collaborator's machine) then they will still be identical (assuming they are built using the same LaTeX setup; would be nice to improve this). No changes are needed for you lyx/tex files.

To install:

  1. Backup your current pdftex executables (in C:\Program Files (x86)\MiKTeX 2.9\miktex\bin): MiKTeX209-pdftex.dllmiktex-pdftex.exe, and pdftex.exe.
  2.  Download this zip file and unpack the new versions of those three files into the above directory.
  3. Be careful of updating the MiKTeX executables (though packages are fine).
I patched the version of pdftex that came with MiKTeX 2.9 (2015-08-24). It was primarily cobbled together from existing changes to the pdftex source code (svn access). See the diffs when changing revisions 222->223, 723->724, and 727->728. Hopefully, these changes will eventually be mainlined into the LaTeX distributions.

Sunday, September 13, 2015

Compiling Stata plugins

Here are some notes on how to compile plugins for Stata. The official plugin page is a bit out of date.

Linux: Add option -fPIC. If you want to build a 32-bit plugin on a 64-bit system then you will need the 32-bit version of libc (e.g. on Debian install g++-multilib and libc6-dev-i386).

Windows: You will need the mingw-w64 compiler (unless you just want 32-bit in which case you can use the original MinGW).  You can get this in several ways, but since I already had Cygwin I just installed the mingw64-x86_64-* packages (for 32 bit you can use mingw64-i686-*). You will then use the x86_64-w64-mingw32-g++ command rather than g++ -mno-cygwin. Add the -static flag in the link step otherwise Stata might give a "could not load plugin" due to unfound DLLs (e.g. I used C++ classes and I found using Procmon that after reading my plugin file Stata couldn't find libgcc_s_seh-1.dll).

Mac: Install the Xcode command-line tools. I added the  -shared option and removed the -bundle option (-bundle can't be used with dylibs). For modern Xcode (since version 5.0) then you have to use clang/clang++.

Wednesday, September 02, 2015

assign_treatment: A Tool for Stratified Randomization

This blog post by David McKenzie and Miriam Bruhn talks about assigning treatments at random when stratification variables cause cells (unique combinations of stratification variable values) to have uneven number of methods. The provided Stata do-file is hard-coded for a six treatments and is a bit difficult to adapt for another number. I've made a module that does the assignment for varying numbers of treatments as well of providing different methods of assigning the remainder units ("misfits") from each cell.

The three main methods provided all assign the misfits at random and achieve cell-level balance (counts differ at most by 1):

  • full - This is the same method as McKenzie and Bruhn. The misfits from each cell are separately randomized to any combination of treatments. This can cause the overall number of units per treatment to be unbalanced even though they will be balanced at the cell-level level.
  • reduction - This method achieves overall balance as well as better balance for specified stratification variables. It does this by limiting misfits to be assigned to a "wrapped" interval of treatments (e.g. (2,3,4) or (6,1,2) if there are six treatments) and then having those intervals dovetail together.
  • full-obalance - This method allows misfits to be assigned to any combination of treatments and achieves overall balance. It does so by assigning units one at a time to fill repeating slots of (1,...,T,1,...T,...,1..). At each stage it keeps track of possible units that could fill a spot (without causing two from the same cell to have the same treatment). It randomly picks one and then attempts to fill the next spot (while giving a slight weight to trying to fit first misfits from cells with many misfits). If filling a spot is impossible the algorithm backs up to the last point where there was a choice and tries a new option.

To install the module, run
. net install assign_treatment, from( replace

Tests are provided in here.