Professional Miami Limo Transportation Services

Miami is a city of luxury, glamour, and elegance, and when it comes to transportation, nothing screams class and sophistication like a limo. Whether you’re heading to the airport, a business meeting…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Predicting Heart Disease Diagnoses with Machine Learning

As an interesting exercise I decided to do some machine learning analysis on an old-ish dataset on heart disease.

The goal of this exercise was to train a machine learning model to accurately predict whether a sample patient has been diagnosed with heart disease, by training it on this dataset.

I’m a software developer and not a doctor or any kind of medical professional, so it’s important I stress this is only an exercise on machine learning. But I am interested to see how accurate a model we can come up with given this dataset.

According to the web page, the 13 attributes of each sample are:

The weird thing you might notice about the set is that the patient’s sex is recorded as a boolean record, with no definition of which corresponds to 0 or 1. Is female 1? If I have missed this detail in the web page, please leave a comment below!

In any case, we still have a good range of quantifiable physical attributes to work with, and 270 is a solid enough number of records to train a machine learning model on. So going into this exercise, I’m confident we can come up with some accurate classifications.

I used a Support Vector Machine for classification, with a kernel using the RBF model (exp(-gamma |u-v|²). This is a supervised learning model commonly used in classification and regression analyses.

Naive Bayes would likely have been a lot faster and less resource-intensive, but it’s not appropriate in this case, because it assumes that all features in a sample are independent of one another (hence ‘Naive’). That is clearly not going to be the case in any kind of medical dataset.

My test consisted of carrying out 10,000 iterations in my code, each time creating a new stratified random split on the dataset, with a 90/10 split of train/test samples, and training a fresh model and recording the accuracy of the predictions on each set of test samples.

I normalised the 13 attributes in the dataset using the L2-Norm, to vectorise and scale individual samples to have unit norm, before feeding them into the code.

This resulted in a model that predicted heart disease diagnoses with 83.9% accuracy, with a 0.5% variance of success in the 10,000 iterations.

To summarise:

I find exercises like this to be a fun and challenging way of identifying machine learning methods that can be leveraged outside the world of software development and data science.

Given a larger dataset and more time to mine it with different machine learning models, I’d like to see higher accuracy than this.

Yes, 84% is a relatively high degree of accuracy, but in matters of literal life or death, we need to aim higher.

It’s not about falsely warning a healthy patient that they are likely to have heart disease, it’s more about the converse. Such models should only be used as an indicator.

Still, it is pretty amazing that we can predict a heart disease diagnosis with just a few lines of code and 270 sample records, with fairly good accuracy.

Please download the dataset linked above and try a similar exercise yourself, I’d like to see someone do better than me!

Add a comment

Related posts:

O poeta

Quem te disse que poeta tem alma? Quem quer que tenha sido, mentiu Poeta não sente Não sente medo, alegria, tristeza, euforia, felicidade, raiva Poeta não chora, poeta não ri, poeta não fica bravo…

Risk it today

Fear is a mirage that tricks you into thinking that what you see ahead is dangerous and life threatening. I urge you to step forward and see way past the fear. You would be surprised to know that

The Unparalleled Cool Of The Limits Of Control.

Is The Limits Of Control the world’s coolest filmmaker’s coolest film? Languid in pace and beautifully shot and scored, Jim Jarmusch’s film was initially seen as a bit of a misfire in the directors…