Data Analytics modelling, why tune by hand?

When we’re carrying out analysis once we’ve got clean transformed data we have to create a model.

There are many types of models that can be used depending on the type of analysis or prediction being made.  For instance, predicting a class, predicting values,  finding unusual points.

Within each collection of models I’d really like to be able to spin through the models and selectively apply each to my dataset.   I want to see the Accuracy, p-Value, Sensitivity, Specificity etc.. ranked.

With the model algorithms already pre-baked why can’t we just consume them in a fairly efficient way?

Of course we can do this by hand with Python or R but it would be much better if the software handled this type of plumbing/set-up.

Here’s an example of doing it with R from Suraj V Vidyadaran.  He cycles through 17 classification algorithms applying them and outputting a confusion matrix for each one.  This is a great resource for learning R but it also shows how there are patterns in the modelling that be abstracted away, in my opinion.

 

 

Posted in Analytics, R | Leave a comment

My new Power BI course is live

I created a course on the Udemy platform to teach non-developers how to create visualisations and share them on Power BI.com

If you want a course that includes slides/screencasts/assignments please check out my course.  No need to be a techie 🙂

I cover getting data, cleaning it, transforming it and creating visualisations online.

Learn Power BI

Posted in Uncategorized | Leave a comment

Power BI Friday

I’ve started to send out a Power BI Friday newsletter via email that contains a curation of interesting news for the week.

If you’re interested in signing up you can do so here.

It can be hard to keep up with the rapid changes to Power BI hence I hope this becomes a useful resource.

Posted in Uncategorized | Leave a comment

Collecting requirements from users for a BI project

When holding a workshop to collect requirements for a BI project you can take two broad approaches.

  1. When customers know what they want it’s easier to talk dimensions and facts i.e. £ sales by customer, product, brand, time.  Or £ Operating Expenses by Account, Department, Cost Center, Legal Entity.   If the users have the right subject knowledge they probably realise the £ Operating Expenses is not available by Product.
  2. Sometimes users don’t quite know what they want in each case you need to delve into processes.  For instance, take a Billing process.   It can be helpful to turn this into a process diagram as it will help tease out requirements.  It can also be used later to help document functional requirements when used in combination with an ERD.

The takeaway here : Don’t assume users know what they want.   Adapt your action to the situation at hand.

Posted in Business Analysis | Leave a comment

Business Analysis for BI projects

BI projects have complex elements such as dimensional modelling, modelling facts, hierarchies, calculations etc..

This makes it essential that Business Analysis is applied on all projects.  It’s during the BA phase that requirements are picked up.  As a BI professional it’s important we use our judgement to tease out requirements that users may not be aware of, for instance, slowly changing dimensions, late arriving facts, late arriving dimensions.

We can also start to detect data quality, although, data profiling is the key tool to determine data quality.  Users will often give their opinion on data quality.

We can also get an understanding of the largest fact tables and any fact grained dimensions.

For some BI projects there are limited facts, we see this with HR & Legal.  It’s very important to determine this early on.  This doesn’t prevent users from requesting facts, for instance, a hiring manager may wish to see the number of new hires or the number of applicants.  If this fact isn’t available it’s going to have to be created or inferred based on some other field such as Hire Date.

It’s also important we understand the process that underlies the OLTP database.   With BA tools we can document this process.   I had one situation where a billing process included dynamic revenue types.   The user wanted to analyse & report on revenue.   This required master data to be added to the ETL pipeline.  Not a small feat.    It’s things like this you want to know about early on.

There’s a lot more too that I won’t go into…currency translation, cost allocations, complex measures.

It’s vitally important these things are well understood before development starts.

Do you agree?

Posted in Business Analysis | Leave a comment