Monday, August 30, 2021

Cross-fitting in Stata

Cross-fitting is a method to produce unbiased (honest) predictions/residuals from a model. It is commonly used these days when dealing with Machine Learning models as they typically have flexibility/bias built-in. Cross-fitting can be seen as the initial part of cross-validation: the data are split into K folds, and predictions for fold k are done using a model trained on all data but fold k. When K is the same as the sample-size this generates jackknife predictions (whose residuals are approximately unbiased).

You can install it from my Stata-modules GitHub repo. It encapsulates both the fit and prediction step and generates either predictions or residuals (if you specify the outcome).


//net install crossfit, from(https://raw.github.com/bquistorff/Stata-modules/master/c/) replace
sysuse auto
crossfit price_hat_oos, k(5): reg price mpg
crossfit price_resid_oos, k(5) outcome(price): reg price mpg
ereturn list //additionally get MSE, MAE, R2

The crossfold package almost does everything needed except it's non-trivial to get the predictions/residuals as a new variable (especially when there's a global 'if' clause). Maybe one day, we should merge!

Random Forest imlementation in Stata

I recently needed a Random Forest implementation on a slightly older version of Stata and found the choices quite lacking. 

  1. crtrees (deck) is a Stata-native implementation, but I found confusing errors when running this. I tried to fix them, but the code looks like it has been sent through a code obfuscator!
  2. Stata's native interface with Python wasn't available to me since I was using Stata 15.
  3. rforest is a binding to a JAVA implementation. I both couldn't install Java and as most machine learning these days happens in R or Python, Java is very odd language choice today. 

I then stumbled upon rcall, which allows calling R programs. R was on my platform and so I made a simple Stata binding to the fast ranger package on R. You can install it from my Stata-modules GitHub repo. An R program is spun up and all work done with a single command so it encapsulates both the fit and prediction step (either standard or out-of-bag predictions).


//net install ranger, from(https://raw.github.com/bquistorff/Stata-modules/master/r/) replace
sysuse auto
ranger price mpg turn, predict(price_hat_oos)



Thursday, July 29, 2021

Getting Stata automation working without administrative priveleges

"You have no power here!"

If you're using Stata on Windows, Stata automation is really powerful. It can allow you to efficiently use Jupyter notebooks using stata_kernel (or IPyStata's automation interface). The "installation", however, typically requires admin right. If you have a non-admin user, here's how to do it.

Installation consists of registering the Stata type library in the Windows Registry. The default process adds entries in the HKEY_CLASSES_ROOT root, which is not user-editable. With a bit of tweaking, however, these entries can work if installed in HKEY_CURRENT_USER, which is user-editable. Here is my modified reg entry used for Stata 15. It worked for both SE and MP flavors. Possibly it works for other versions. You can copy it to the target system,  edit the Stata path if different, rename it from .txt to .reg, and imported using regedit.exe.[1]

Here are the steps to I took in case one needs to replicate the procedure (e.g., if the above does not work for you). I did the registration on a system where I did have admin rights and logged what was added to the registry using NirSofts RegFromApp:

RegFromApp.exe /AutoSave "modified.reg" "original.reg" /RunProcess "C:\Program Files (x86)\Stata15\StataMP-64.exe" /ProcessParams "/Register"

Then I following these instructions to modify modified.reg.[2]  Hope that helps

[1] If you don't have access to Regedit then you can try C:\Windows\System32\reg.exe import modified.reg.

[2] Note, that when I registered, unregistered, and re-registered again, the second time it was missing some keys: in HKCU\SOFTWARE\Classes\: AppID\stata.EXE, AppID\<GUID from previous>, stata.StataOLEApp.1, stata.StataOLEApp.1\CLSID, stata.StataOLEApp, stata.StataOLEApp\CLSID, stata.StataOLEApp\CurVer.