Monday, August 30, 2021

Random Forest imlementation in Stata

I recently needed a Random Forest implementation on a slightly older version of Stata and found the choices quite lacking. 

  1. crtrees (deck) is a Stata-native implementation, but I found confusing errors when running this. I tried to fix them, but the code looks like it has been sent through a code obfuscator!
  2. Stata's native interface with Python wasn't available to me since I was using Stata 15.
  3. rforest is a binding to a JAVA implementation. I both couldn't install Java and as most machine learning these days happens in R or Python, Java is very odd language choice today. 

I then stumbled upon rcall, which allows calling R programs. R was on my platform and so I made a simple Stata binding to the fast ranger package on R. You can install it from my Stata-modules GitHub repo. An R program is spun up and all work done with a single command so it encapsulates both the fit and prediction step (either standard or out-of-bag predictions).


//net install ranger, from(https://raw.github.com/bquistorff/Stata-modules/master/r/) replace
sysuse auto
ranger price mpg turn, predict(price_hat_oos)



No comments: