Sales Analytics in Qlik: From the Basics to Statistical Modeling

The basics

The most common Qlik application involves sales data analysis. Period.

Well, I don’t have enough information to back that up, and since data analysis is my life, I can’t make unsupported claims without some major nervous facial twitching (or so my wife says). However, I would bet based on personal experience that it is one of the most common applications, and moreover, I’d go as far to say that it is often the first analytical application that businesses develop after buying Qlik.

One the most obvious reasons that this could be true is that sales data is huge, low-hanging fruit. Sales is what drives most businesses and the data trail it leaves is usually the most readily available data to analyze. When a company purchases Qlik, it is often after continuous investments in an ERP (Enterprise Resource Planning) system, a CRM (Customer Relationship Management) systems, a customer portal and/or numerous Excel reports – all of which make sales data ripe for harvest.

Many of you are probably familiar with the following sales metrics.

MetricDefinition
Gross Sales RevenueSales before discounts measured in monetary units
Net Sales RevenueSales after discounts measured in monetary units
Sales VolumeSales measured in non-monetary units such as an individual item, boxes, pallets, kilograms, tons, etc.
Unit SalesAlso referred to as Average Price, it can be defined as Net Sales Revenue divided by Sales Volume
HitsNumber of sales transactions or invoices
Gross ProfitNet Sales Revenue minus the Cost of Goods Sold (COGS)
Gross Profit MarginGross Profit divided by Net Sales Revenue

If you are familiar with these metrics, you’ll also be well acquainted with the series of dimensions that often slice and dice them.  Catalogs that describe customers, sales representatives, products, dates, branches, stores, promotion codes, and channels answer the questions of who, what, when, where, why, and how that surround the sales numbers.  You’ll also recognize the importance of using some type of reference data that comes in the form of a budget, a forecast, or at the very least, historical data.

Almost immediately after businesses start to take advantage of this basic, yet powerful, sales data analysis, they start to adjust their business questions. We can answer some new questions by adding a new metric, a new dimension, or a new visualization, but some new questions involve more advanced analysis techniques.  One of the most popular questions that I encounter and that requires a more sophisticated approach is the evaluation of customer retention and the detection of customers that the business is endanger of losing.  For here on, we’ll refer to this type of analysis as customer churn.

Customer Churn

In its simplest form, customer churn measures how many customers a business has gained or lost during a certain period.  We can calculate it in the following way.

This formula is simple for those businesses that sell a subscription-based service that customers explicitly cancel. However, many other businesses never get any warning that their customers have decided to not do business with them anymore.

There is however a nice work around for those businesses that don’t sell a subscription-based service, but whose customers make recurring purchases. For example, a grocery store with some sort of membership program could deduce that a customer who usually purchases groceries on a weekly basis is lost when he hasn’t made any purchase after 1 month. This customer’s purchase frequency, like every other customer’s, will naturally vary over time. For example, he may have gone 2 times in 1 week if he was having people over for a spontaneous dinner or skip 1 week when he was on vacation. The grocery store can take into account this natural variation and use a statistical model that predicts at what point the customer’s absence is likely to be permanent.

After having created a statistical model based on purchase frequency to determine lost customers, the grocery store could go a step further and identify at-risk customers.  This dynamic grouping contains customers whose absence is longer than usual, but whose return is still possible. Any attempts to recover these customers might be a better investment than a random marketing campaign. For example, in the case of the customer we mentioned in the previous paragraph, we could have been alerted to the customer’s unusual absence after 3 weeks and taken the appropriate steps to encourage him to come back before it was too late.

In Qlik, we can use the t-test and normal distribution functions to create statistical models to predict purchase frequency.  I explain both methods in Mastering QlikView Data Visualization. The exponential distribution may also be a useful statistical model that predicts purchase frequency.  There is no native function in Qlik to calculate the exponential distribution, but we can easily create one in Qlik by “mathing” the hell out of it.

Exponential Distribution Overview

Next week I will explain the exact formulas and functions we use to create an exponential distribution.  For now, let’s review when we can use an exponential distribution and what we can expect it to tell us about customer behavior.

If customers make purchases at a constant rate and if each purchase is independently likely to occur at that rate then an exponential distribution may be the right fit to predict a customer’s probable purchase frequency. For example, if we expect customers to make at least one purchase every day and if each day’s purchase only depends on the absence of some random event like a hurricane, an earthquake, a sickness, a labor strike, an accident, etc. then we may be able to reliably detect unusual customer inactivity with an exponential distribution.  

The following QlikView chart shows a statistical model based on an exponential distribution that predicts how consistently I do at least 15 minutes of exercise daily.

In the previous chart, the x-axis shows the number of days that have passed or will pass without exercising.  The y-axis shows the probability that a certain number of days have passed or will pass since the last time I exercised.  I refer to what the axes measure in both past and future tense because this chart measure both past and predicted behavior.

The blue bars represent the past behavior and is the actual data I’ve collected about how frequently I exercise.  The actual data is based on my behavior during one month.  During that month and according to the chart, I’ve exercised daily 80% of the time and I’ve missed 1 day 15% of the time. I’ve also missed 4 or 5 days 5% of the time.  In this way, I can visually confirm that my consistency to exercise resembles an exponential curve and that I can use the exponential distribution to predict my future behavior.

The two lines represent the statistical model that predicts my future behavior. The exponential distribution (blue line) is the probability that I will exercise after any given number of days since the last time I exercised.  This line will approach 0 as it continues to the right.  The accumulated exponential distribution (green line) is the probability that I will have had exercised after any given number of days or fewer since the last time I exercised.  This line will approach 1 at is continues to the right.

With the previous explanations in mind, we can make the following interpretation. In the month that I collected my behavioral data I never exercised exactly 3 days after the last time I exercised. However, according to the statistical model, there is a 15% probability that in the future I will exercise after missing exactly 3 days and almost a 90% probability I will exercise after missing 3 or fewer days.

You can apply this same model to customer purchasing frequency and customer churn.  For example, if a customer hasn’t purchased anything in the last 5 days and the probability that they would make a purchase in 5 or fewer days is between 70% and 95% then we classify him as an at-risk customer. If the probability he should make a purchase after 8 or fewer days is 95% and he continues to be absent then we classify him as a lost customer after 8 days.  I’ve used days in this example, but you can also use minutes, hours, weeks, or months.

We’ll review the exact formulas and functions to calculate and visualize the exponential distribution next week.

One more thing…

Congratulations to all those selected to be 2017 Qlik Luminaries. A special congrats to my team member Julian Villafuerte (a.k.a Qlik Freak) who has done some exceptional work over the past year.

Let the Chain of Inpirations Continue

As Qlik Sense comes of age, I anxiously wait for the day when we will talk about the good ol’ QlikView days when we used to make map charts using a background image and a scatterplot chart.  In the meantime, I’ve been finishing up one last, great QlikView adventure with Mastering QlikView Data Visualization and seeing how far ol’ QlikView can still go.  Although, we have not seen a major update to its native visualizations in 5 years, I’ve been pleasantly surprised that there are still a few new tricks to be learned and boundaries to be pushed.

Stephen Redmond was the first to make a cookbook on tips and tricks, and he inspired us to look for ways to squeeze the most out of QlikView. More recently, I was inspired by a QlikFix blog post Barry Harmsen wrote on macros to create a my own macro that generates design layout grids and I called it the QlikView Grid System Tool. In turn, Barry was inspired to create a improved GridMaker. (Yes, he even improved upon the name.)

That, and to add insult to injury, he has a uncanny knack for making the most hilarious memes.  I’ll leave the meme business to my fellow consultant Qlik Freak Julian Villafuerte, but as far as QlikView is concerned, I thought I’d return the favor and continue the chain of inspiration.

While writing my book, I was recently inspired by an excellent QlikFix blog post written by Frédérique Verhagen about creating bar chart target lines in QlikView.  This post was the piece of the puzzle I was missing to finally create a single object, native bullet graph in QlikView.  If you convert Frédérique’s bar chart into a combo chart with stacked bars, an error bar and a stock chart expression then you have a native bullet graph that is as robust as any other normal QlikView chart.

BulletGraph

By the way, you can learn the trick on how to add a stock chart expression in Stephen Redmond’s cookbook.  I’ve left the rest of the details in my book because otherwise Packt will blame me for plagiarizing.  No kidding, they’ve already done it once.  I bet some of you can figure it out by yourselves, so I’ll also keep the rest of the tricks for the book.

And of course, the bullet graph isn’t perfect.  It would be impeccable if only we could change the line width of the stock expression.  If you like the idea then vote to add this feature in this Qlik Community idea. Though, in all honesty, what I’m really hoping to do is to inspire one of you to take it that little bit further.

Karl

Data Visualization for Accounting and Unicorns

Recently, I’ve been reviewing the financial analysis part of my new book and I’m reminded how hard it is to create data visualization for an area obsessed with calculating every amount to the exact cent.  When we develop our visual analysis it is so easy to get trapped in this labyrinth of detailed tables and numbers.  We hit our heads against the wall and try to imagine one omniscient visualization that captures every detail and we only succeed in creating something more impenetrable than the table we were trying to replace.  So, let’s step back and think about what we need to do to create a successful visual analysis for an accounting department. Continue reading “Data Visualization for Accounting and Unicorns”

Data Discovery in QlikView – Part 2 – Easier Data Integration and Modeling

Qlik Sense and QlikView are two data discovery tools that make it easy to go from raw data to data visualization.  This in contrast to Tableau that is limited to combining data tables with joins.  If your data source is, for example, an OLTP system that contains a long list of table with complex relationships, you will probably have to invest time to transform and model the data in another tool before you can use Tableau for data discovery.

The couple times I tried to implement Tableau for a company, I had to work all night in SQL Server Express to create a data model that made Tableau easy to use.  It was this experience that made me realize that data discovery does not depend on great data visualization alone.  It also depends on being able to easily extract, transform, and model data.

Continue reading “Data Discovery in QlikView – Part 2 – Easier Data Integration and Modeling”

QlikView Grid System Tool

When we work with data visualization in Qlik Sense, we work within a grid system.  Even though Qlik Sense gets some slack because it does not allow you the freedom to escape the grid, using a grid system a good practice to follow.  We should even apply the same practice when we design QlikView applications. Continue reading “QlikView Grid System Tool”

Data Discovery in QlikView – Part 1 – Agile Implementation

Lately, consultancies with little QlikView experience have asked me to review the feasibility of using QlikView for a variety of projects.  It was obvious after only a quick glance of the projects’ goals that they did not take into consideration the strengths and necessities of QlikView.  I had come to believe people understood the concept of data discovery and that we were past the idea that QlikView was a just a quick reporting tool, but I was mistaken.

Many still believe QlikView only stands for fast implementation time, ease of use and a visual display.   They try to adapt what they understand of BI to what they’ve heard about QlikView. Of course, you can’t blame them because we humans naturally interpret new information based on past experiences.

So, I’ve decided to write a series of blog posts that explain the strengths of QlikView so that we can understand how to use it effectively in our organizations.  I will then conclude the series by detailing the reasons why QlikView projects sometimes go awry.  We’ll add an extra part in each post about how Qlik Sense may or may not change how we use QlikView.

First, let’s explain the concept of data discovery and how we should go about implementing data discovery projects. Continue reading “Data Discovery in QlikView – Part 1 – Agile Implementation”

Learning QlikView Data Visualization – SVG Map Extension Download Update

Since Learning QlikView Data Visualization was published, the url to download the SVG map extension created by Brian Munz and highlighted in chapter 7 has changed to https://github.com/brianwmunz/svgReader-QV11.

If you have any questions about the content of the book, don’t hesitate to ask.

See you around,

Karl