K-Means Clustering with Qlik

K-Means Clustering with Qlik

By Stiliyan Neychev

April 27, 2022

With the new Qlik Sense September 2020 update just around the corner, we thought we might talk a little bit about what we believe is the most exciting new feature – Advanced Analytics calculations on the front end. The first function available will be k-means clustering, and we had some time to test it out in the Technical Preview. So, today we’ll be bringing it under the spotlight for a more detailed look into its use cases within various areas of work and later showing you how you can easily incorporate it in your Qlik Sense dashboards. But first…

What is K-Means Clustering?

Just to make sure we’re all on the same page, k-means clustering is a type of unsupervised learning used in machine learning when you have unlabeled data. That’s because the goal of this algorithm is to locate groups within your data. The number of groups that will be formed is represented by the variable K. In short, the algorithm works to iteratively assign data points into clusters according to the similarities between them. So, now that we have that out of the way let’s see what business use cases we can apply these advanced analytics to.

Use Cases for K-Means Clustering in Businesses

Currently, k-means clustering can be most often found in marketing departments where it helps segment customers into different groups based on purchase history, interests, activity monitoring, etc. Also, it’s often used to classify clusters of documents based on commonly used terms, tags, content, and so on. However, there are plenty of other exciting use cases in which you can apply the algorithm, like:

  • Healthcare – k-means clustering can be used to group areas based on factors like frequency of emergencies, prone to illness, the average age of residents, etc. Then by finding the cluster center of those groups, a healthcare center can be opened, which would be at a minimum distance from all the needed areas. Furthermore, this type of unsupervised learning can be used to identify similarities in patients based on their characteristics in order to explore treatment costs and methods.
  • Financial Services  – as previously mentioned, this algorithm is often used to segment customers into different groups. So, it’s only logical that it’d be a recommended approach for banking institutions, who need to group their customers based on information like purchasing habits or personal income. This, along with data mining, can provide them with accurate data, which they can use to improve their services and offer more personalized packages.
  • Cybersecurity – machine learning is already widely used in fraud detection thanks to the available historical data of fraudulent claims. K-means clustering can utilize this information to assess new claims and determine if they’re genuine based on their proximity to fraudulent clusters. Additionally, this algorithm can be used to cluster IT alert messages. With a little bit of training, it can set priorities based on affected units, provide average issue resolve times based on historical data, and even help predict failures or malfunctions.
  • Logistics – with the help of publicly available traffic and ride information data, we can utilize k-means clustering to find numerous optimal travel stops, routes, etc. This can benefit not only taxi services, but also delivery ones since we can gain insight into current traffic patterns, most common pickup locations, and more. Furthermore, the process can be further optimized by combining the algorithm with a real-time information provider like a drone.

These were just a few examples of how you can take advantage of k-means clustering. Regardless of your business, you’ll be able to gain more insight and improve your decision-making process. But with all this said, let us show you how to implement this in Qlik Sense this September.

K-Means Clustering in Qlik Sense

In the latest Qlik Sense update, k-means clustering will come in the form of a few functions – KMeans2D, KMeansND, KMeansCentroid2D, and KMeansCentroidND. We’ve done a little bit of testing through the Technical Preview, and it’s straightforward to use them. Let us show you.

Starting off with the KMeans2D function, because it’s the easier one to grasp. First of all, the reason it’s 2D is that it only covers two dimensions, which is ideal for standard charts since you can easily visualize the outcome. So, for every dimension, you need to provide one data set, which Qlik Sense will then group into a specified number of clusters.

On the screenshot above, we can see how this function groups up the data. We’ve added a bit of varying color for clearer visibility, so you can see how the algorithm picks and chooses the data points based on their proximity to each other. Note that the light orange “Deli” group is actually the third one and not the fifth, as it may seem at first glance.

In the other function – KMeansND – you can specify the number of dimensions, but keep in mind that anything beyond three dimensions will be tough to conceptualize for a human brain. The parameters are filled in a similar way to the previous function.

As you can see in the screenshot, this results in Qlik grouping your expressions in order. However, since the chart is still only 2D, one of the expressions won’t be represented. As you can see, we chose to visualize the graph with Sales being represented by the Y-axis and the products we have in Inventory on the X-axis. If we could make the chart 3D, then the Lead Time would be represented along the Z-axis.

So, you might be asking yourself, “If this is a function, then can’t I use it in all of Qlik’s charts?” and to that, we’ll answer a big ol’ “Yes!”. The best way to visualize clusters is with a Scatter Plot chart, but this doesn’t mean that you can’t use a Table to showcase all the groups manually. We’ll leave the experimenting to you, though, and we’re eager to see how you utilize this new feature.

Conclusion

We hope you’re as excited as we are for this new Qlik Sense update, and if you need more details on the new features that will be added, you can just check out the Qlik Community. So, what would you use this powerful algorithm for, now that you have some use-cases in mind?