Modeling Patient Outcomes with Classification Algorithms

Ever since the pandemic began, the healthcare sector has been pressured to quickly provide accurate data and treatment protocols. Honestly, 14th century doctors would probably envy our modern systems and speedy approaches to crises. However, having access to all the data isn’t what makes the difference, rather it’s how you use it or, in this case, how fast it can be analyzed. Now we’re at a point where patient data is kept private, limiting the number of organizations that can apply their expertise. We believe that patient data should be shared one way or another. So stick around to see all the benefits this could bring. For future context, though, let’s first check out …

What are Classification Algorithms?

Once again, we’re diving into machine learning, so we can enhance our dashboards and utilize our tools to their full potential. This time we were in need of a supervised learning algorithm, which is generally what the classification approach is. You can very much look at this as a form of pattern recognition. In short, the computer program inspects a provided set of already classified data and learns from it. Then it uses that knowledge to classify other data sets into preset categories. The more you train your algorithm, the more accurate it will be with its groupings. A simple example of this would be your emails being sorted into spam and non-spam.

Classification algorithms can be seen in several different forms, but we won’t go into too much detail there. However, If you wish to read up a bit more about them, you can check out this short article. In general, this type of supervised machine learning is used quite frequently in a multitude of fields. One example is banks using it to analyze their clients and determine whether they’ll default on a loan or not. Marketers and advertisers rely on machine learning to classify customers, predict user churn rates, and much more. Another interesting example is healthcare organizations utilizing algorithms to increase diagnostic accuracy, identify at-risk patients or patient readmission rates, etc. Speaking of which, let’s now move to our application of classification algorithms within our COVID dashboard.

Classification Algorithms in our COVID Dashboard

Predicting a Patient’s Condition

Our previous article talked about our COVID dashboard project, how we used publicly available data to make insightful visualizations, and how we took advantage of some advanced analytics to get more forecasting out of it. Well, we wanted to expand our app a bit more and include a new sheet that tries to answer one of this year’s top questions – what other illnesses accompanied by COVID could lead to undesirable conditions?

To do that, first, we had to acquire some patient data, which proved more difficult than anticipated. However, in the end, we managed to obtain data for about 35,000 patients from a public research portal we found. From there, we were able to train our algorithm to determine how a new patient with similar conditions would fare. This was possible by observing many different components like at what point the patient registered at the medical center, what symptoms they were reporting, what other illnesses they suffered from, etc.

Initially, the model we trained was just Logistic Regression, a form of the classification algorithm, but we soon pivoted to several more, like Random Forest, Stochastic Gradient Descent Classifier, and a few others. This switch was inspired by the desire to observe each new case from several different “viewpoints.” In supervised learning, you definitely need to observe the algorithms’ results in order to determine how much more training they require. We’ve kept an eye on them and recorded their outputs in the form of “False Positive” and “False Negative” values. Anyway, all you really need to know about them is that the lower the numbers are, the more trustworthy the algorithm will be.

In order to be able to predict a patient’s condition reliably, we had to take into account that not every accompanying factor has the same impact. To deal with this, we had our algorithms consider parameter importance by assigning values to everything from age and gender to accompanying illnesses like pneumonia or diabetes. Unlike the reliability score, the higher the percentage shown, the worse it is for patients who have COVID. How about we explain what we mean with an example. If we have a 90-year-old man infected with COVID but no other illnesses, then our model would predict that he’ll live. However, if we toggle that he’s also developed pneumonia, some of the algorithms would show that he’d be in trouble.

Taking Advantage of the Data

Let’s make one thing clear now. This algorithm isn’t omnipotent, so it doesn’t have the power to say for certain how a patient’s condition will develop; it’s just a mathematical construct, after all. In our case, the algorithm was trained with the help of those 35,000 data entries, and thus it points out only the trends it managed to observe from them. However, there will always be exceptions and things we didn’t take into account.

If you have a quick chat with any doctor, you’ll notice that they consider more than 5-10 parameters when choosing a verdict for a patient. They’ve got loads of experience to draw from, which is something algorithms, in general, are still struggling to emulate. So, why do we make them then? Most often, it’s to better understand how specific treatments or medications are performing. However, another great reason would be to reinforce the professionals’ verdicts with actual data. With these dashboards, doctors can experience millions of cases instead of the thousands they would usually go through in their lifespan.

The presence of publicly available patient data can be a game-changer when it comes to determining how medics should deal with an illness. Currently, the lack of a globally accepted and agreed-upon framework for sharing patient data between hospitals could be considered a real issue. As far as we can tell, WHO (World Health Organization) has a way to collect such data, but it’s unavailable to most, if not all, analysts. From what we’ve gathered, research – clinical or otherwise – on this topic is closed off. And that’s not necessarily good since this COVID pandemic has shown us that many people and companies are quite eager to create and share useful dashboards that can help inform everyone.

Back in the day, doctors gathered at medical conventions to share their observations and exchange experience. With the help of the internet, this can be easily mitigated, and information could be shared with the snap of a finger if we so desire. If the problem lies in the data’s sensitivity, then maybe an anonymous version could be considered, or a site that checks your identity, credentials, and purpose before giving you access. Of course, that’s one course of action; another helpful one would be to have hospitals sharing patient data with each other be common practice.

By contributing to the public knowledge, hospitals can help speed up the creation of health protocols for new diseases and perfecting the protocols for the known ones. Through teamwork, we could hasten operations and take the next step forward sooner. We’re all for this advancement, and we’re ready to assist any medical or pharma company in making dashboards with insightful visualizations, advanced analytics, and intuitive design, such as our COVID dashboard. We’ve got the experience and the tools; all we need is the right data for the job.

Conclusion

The available public registers allow us to form a business response towards this epidemic and any future ones.  Whereas the existence of models such as the one we covered here can help keep medical professionals up-to-date at all times. It’s crucial for doctors to be aware of the latest data since that can minimize error in judgment or bring clarity worldwide in the face of a new disease. Medical centers should consider making patient data sharing a common occurrence since that can help propel the medical field forward. What’s your stance on public access to patient data?

Dimitar Dekov
Dimitar Dekov
Stiliyan Neychev
Stiliyan Neychev