I recently needed a Random Forest implementation on a slightly older version of Stata and found the choices quite lacking.
- crtrees (deck) is a Stata-native implementation, but I found confusing errors when running this. I tried to fix them, but the code looks like it has been sent through a code obfuscator!
- Stata's native interface with Python wasn't available to me since I was using Stata 15.
- rforest is a binding to a JAVA implementation. I both couldn't install Java and as most machine learning these days happens in R or Python, Java is very odd language choice today.
I then stumbled upon rcall, which allows calling R programs. R was on my platform and so I made a simple Stata binding to the fast ranger package on R. You can install it from my Stata-modules GitHub repo. An R program is spun up and all work done with a single command so it encapsulates both the fit and prediction step (either standard or out-of-bag predictions).
//net install ranger, from(https://raw.github.com/bquistorff/Stata-modules/master/r/) replace
sysuse auto
ranger price mpg turn, predict(price_hat_oos)
No comments:
Post a Comment