Sunday, September 13, 2015

Compiling Stata plugins

Here are some notes on how to compile plugins for Stata. The official plugin page is a bit out of date.

Linux: Add option -fPIC. If you want to build a 32-bit plugin on a 64-bit system then you will need the 32-bit version of libc (e.g. on Debian install g++-multilib and libc6-dev-i386).

Windows: You will need the mingw-w64 compiler (unless you just want 32-bit in which case you can use the original MinGW).  You can get this in several ways, but since I already had Cygwin I just installed the mingw64-x86_64-* packages (for 32 bit you can use mingw64-i686-*). You will then use the x86_64-w64-mingw32-g++ command rather than g++ -mno-cygwin. Add the -static flag in the link step otherwise Stata might give a "could not load plugin" due to unfound DLLs (e.g. I used C++ classes and I found using Procmon that after reading my plugin file Stata couldn't find libgcc_s_seh-1.dll).

Mac: Install the Xcode command-line tools. I added the  -shared option and removed the -bundle option (-bundle can't be used with dylibs). For modern Xcode (since version 5.0) then you have to use clang/clang++.

Wednesday, September 02, 2015

assign_treatment: A Tool for Stratified Randomization

This blog post by David McKenzie and Miriam Bruhn talks about assigning treatments at random when stratification variables cause cells (unique combinations of stratification variable values) to have uneven number of methods. The provided Stata do-file is hard-coded for a six treatments and is a bit difficult to adapt for another number. I've made a module that does the assignment for varying numbers of treatments as well of providing different methods of assigning the remainder units ("misfits") from each cell.

The three main methods provided all assign the misfits at random and achieve cell-level balance (counts differ at most by 1):

  • full - This is the same method as McKenzie and Bruhn. The misfits from each cell are separately randomized to any combination of treatments. This can cause the overall number of units per treatment to be unbalanced even though they will be balanced at the cell-level level.
  • reduction - This method achieves overall balance as well as better balance for specified stratification variables. It does this by limiting misfits to be assigned to a "wrapped" interval of treatments (e.g. (2,3,4) or (6,1,2) if there are six treatments) and then having those intervals dovetail together.
  • full-obalance - This method allows misfits to be assigned to any combination of treatments and achieves overall balance. It does so by assigning units one at a time to fill repeating slots of (1,...,T,1,...T,...,1..). At each stage it keeps track of possible units that could fill a spot (without causing two from the same cell to have the same treatment). It randomly picks one and then attempts to fill the next spot (while giving a slight weight to trying to fit first misfits from cells with many misfits). If filling a spot is impossible the algorithm backs up to the last point where there was a choice and tries a new option.

To install the module, run
. net install assign_treatment, from(https://raw.github.com/bquistorff/Stata-modules/master/a/) replace

Tests are provided in here.