Friday, September 30, 2016

R+H2O for marketing campaign modeling

My last post about telco churn prediction with R+H2O attracted unexpectedly high response. It seems that R+H2O combo has currently a very good momentum :). Therefore Wit Jakuczun decided to publish a case study that he uses in his R boot camps that is based on the same technology stack.

You can find the shortened version of the case study material along with the source codes and data sets on GitHub.

I think it is nice not only because it is another example how to use H2O in R, but it is also a basic introduction to how to combine segmentation and prediction modeling for marketing campaign targeting.

Monday, September 12, 2016

Telco churn prediction with R+H2O

Recently together with my friend Wit Jakuczun we have discussed about a blog post on Revolution showing application of SQL Server R services to build and run telco churn model. It is a very nice analysis and we thought that it would be interesting to compare the results to H2O, which is a great tool for automated building of prediction models.

Today Wit has published on Github his codes performing the analysis. You can check them out here. The obtained model is pretty good, with AUC equal to 0.947 which is better than any of the models presented in Revolution blog. But, in my opinion the key value of the Wit's work is to show how simple it is to build models with H2O using R. From his Github repository you can download a ready to run RStudio project and all the instructions needed to build the model in just a few lines of code.

Tuesday, January 6, 2015

Sequence generation in R, Python and Julia

Recently I was comparing implementation of sequence generation functions in R, Python (numpy) and Julia. Interestingly even such basic functions have slight differences in implementation. In my opinion Julia provides the best solution and Python the worst.

Wednesday, July 16, 2014

Comparing localsolver with Rglpk on k-medoids example

Recently I have coauthored a new localsolver package that can be used to solve large scale optimization problems from R. It is a wrapper around commercial solver that is free for academia. If you are interested why it is worthwhile to give it a look - read on.

Wednesday, June 25, 2014

R Scrabble: Part 2

Ivan Nazarov and Bartek Chroł gave very interesting comments to my last post on counting number of subwords in NGSL words. In particular they proposed large speedups of my code. So I thought to try checking a larger data set. So today I will work with TWL2006 - the official word authority for tournament Scrabble in the USA.
The question is whether the exponential relationship between the number of letters in the word and the number of its subwords that is observed in NGSL data set still holds for TWL2006.

Saturday, June 14, 2014

RGolf: NGSL Scrabble

It is last part of RGolf before summer. As R excels in visualization capabilities today the task will be to generate a plot.

Friday, May 30, 2014

RGolf: rolling window

I have learned a lot from my last RGolf post. Therefore today I have another problem from practice.