As an Excel power user (someone called me a guru recently!) I know Excel can be used to do pretty much anything – I’ve even seen Excel being used to play the Game of Life. If this is the case why do we need R?
In this post I’ll tell you why and then show you.
We can write an R script once to do any of the following :
- Acquire data
If the R is written in the correct way it’s reproducible by default. This is very beneficial if data is dynamic or if you need to pass the script to a colleague. You don’t want them getting a different result.
Excel is flexible as mentioned above. Excel is also limited by the resources Microsoft decide to invest in the product. Even if Microsoft had the resources I think they would only include functionality that is useful for a broad spectrum of people.
How about R? First off, it’s open source which means if we need a real niche feature we can write it ourselves and many people do. Furthermore, the architecture of R is based around a package system which makes it highly adaptable. To be fair, Excel has extensibility (VBA, COM, .NET) but to write an extension one has to learn a different language. Packages in R are written with R. There’s a little extra to learn but learn to use R and you’re very close to being a package author.
At the time of writing there are over 6000 packages available for use. Given R’s long lineage I am willing to bet that the problem you have will have been solved by someone else.
Scalability and Availability
You can run R in lots of different places:
- In databases
- On really big machines
It’s important to know that R runs in-memory. This makes it very fast but also means if you run out of memory your stuck. This isn’t a major problem as memory is cheap. If you really want power you can run on the big Amazon/Microsoft/Google cloud services.
With R we can send the output to lots of different places. We’re not constrained by a worksheet.
- Web page
Okay, enough theory. Here’s an example of customer analysis. The problem we’re seeing is that customers are disappearing. We need to find more insight before we can look to mitigate the problem.
As you can see in the Excel screenshot below, I’ve pulled in data from a database and grouped customers based on demographics. We see 12 months of customer count.
The second table is a copy of the first but with customer count aligned to the left (for charting/formula clarity). I did this by hand. You could it with a formula to be fair. The next table calculates Retention which takes the current month count and divides by month 1. With this third table it’s simple to create a chart.
We’ve solved our problem. What is going on with group 4?
It’s not all good though. What happens if we see more groups in the database? or more months? This worksheet doesn’t grow. I know we can structure the workbook to make it expandable. We can use dynamic ranges to populate the chart. This is okay for me. I’m a power user after all. What about the other users? Wouldn’t it be good if we could forget about structure and layout and just focus on the problem at hand as I’ve done here.
This is what R gives us. I’ve not posted the R on this page as WordPress is a bit picky. I wrote the code and published to a website using R Markdown. R Markdown gives us a facility to write a report or slides and embed R directly in the doc. You can find a copy of the R Markdown here and the published report here.
I hope you can see how R let’s us forget the structure. If you run the R on your machine you’ll get the same result. Of course, this comes at the expense of the worksheet which gives R a steep learning curve. Let me tell you though that once over the curve you have access to a new world of analysis capabilities.