A.I. and Racehorses: Can you teach a computer to understand what a 'good cardio' looks like
I have been working on a few data science projects over the past few months, mainly to do with improving the current algorithms that we use to predict outcomes for Performance Genetics and BreezeupIQ. One of the more interesting aspects that is close to completion, hopefully within the next week, is an approach that I believe is a world-first, at least in terms of the Equine field.
When I ultrasound the heart of a racehorse as part of my sales selection technique, be that a yearling, two year old or older horse, in addition to measurements that I take of the heart, I capture a 10 second video loop of the heart as it beats. I have been gathering this video data for over 7 years now so I have a lot of samples of both fast and slow racehorses, but the use of the video was limited to my own perception of what a 'good cardio' looked like, and it wasn't something that I used as data input in my database for my machine learning algorithm to learn from.
Using the 10 second video loop, I have now been able to train an Artificial Neural Network, specifically a Convolutional Neural Network, to recognize the difference between the different types of cardiovascular parameters that exist in the thoroughbred and also the difference between the cardiovascular parameters found in fast and slow racehorses. I am pretty sure that nobody else in the field that measures cardiovascular parameters is using this approach and I am also pretty sure that nobody in the veterinary field is using Artificial Neural Networks on any video captured in this way.
The above paper, which can be read here that appeared towards the middle of last year on arXiv was the inspiration for me attempting to use a computer vision algorithm to classify the cardiovascular video. There was for me at least a 'proof of concept' in the medical field on using images (not video) of cardiovascular measurements in prediction of disease state, so extrapolating this to video seemed at least possible.
After reading the paper, I did a little bit of digging around on GitHub and came across this repo on video classification using Keras and Tensorflow (Google's open-source machine learning framework).
This linked to a blog post on Medium by the repo author that outlined the different approaches to using neural networks and videos for prediction of outcomes.
The obvious advantage of this approach, that is training a computer to understand equine cardiovascular parameters is that it is objective rather than subjective. If you just provide enough data, and label it properly, the algorithm can be created to predict future outcomes without any subjective human input. Rather than me saying "this looks like a nice cardio", if trained properly, a computer vision pipeline would more accurately determine exactly what the cardio measured was.
To date, I have measured cardio's in the 'traditional way' and been as objective as possible with the data gathered. That is, using an ultrasound I take a 10 second m-mode video clip of the left ventricle of the cardiovascular chamber and then measure out the cardio to gather the following measurements of the chamber.
IVSd – Interventricular septal end diastole
LVIDd - Left ventricular internal diameter end diastole
LVFWd - Left ventricular free wall diameter during diastole
IVSs – Interventricular septal end systole
LVIDs - Left ventricular internal diameter end systole
LVFWs - Left ventricular free wall diameter during systole
Basically these are the measurements of the walls of the chamber and the internal diameter of the chamber when it is at maximal fill and maximal contraction. From these measurements I can also generate calculated functional indices such as Left Ventricular Mass (LVMass), Fractional Shortening (FS), Ejection Fraction (EF), Stroke Volume (SV) and Cardiac Output (CO).
Additionally, I take body measurements of the horse using a tape and use these measurements to generate a height, weight, body length and body surface area of the horse. These latter measurements can be then used to scale the cardio by sex - so something like LVMassbyWeight (Left Ventricular Mass divided by Estimated Body Weight) becomes a variable for consideration. For the past seven years all of this data is entered into my SQL database and I then use a machine learning algorithm, specifically XGBoost, to determine what variables are important and create an algorithm that is trained to predict the probability of a horse being an elite runner.
So, having gathered a lot of data and used the best possible machine learning algorithm (XGBoost is the 'go to' machine learning algorithm for most data scientists) to predict the outcomes, what is the "best possible accuracy" of any algorithm using this technique?
If we use a balanced dataset - that is there are as many fast horses as there are slow - the accuracy of this algorithm is quite good, but open for considerable improvement. If you understand that a coinflip probability is 0.5, or 50%, then if I developed a model that had an Accuracy of 0.5, it would be no better than looking at a yearling and flipping a coin at a sale to say 'yes' or 'no' to, so of no real use. If a model had an Accuracy of 1.00, it would perfectly predict all the elite horses as elite, and non-elite as non-elite, an impossible task given that horses only have to beat who turns up on a given day and depending on what you determine as 'elite' can have considerable impact on the models ability to accurately describe an elite racehorse.
The current Cardio+Biometrics model that I have has an Accuracy of 0.702. This is a solid number as in this case, we have a lot of outside effects and variables that we are not measuring (like DNA for one) that could influence the output (the prediction of racing class) which is not present in the cardiovascular parameters and biomechanical features that are measured. The Accuracy also represents a significant 'edge' in terms of being able to disqualify horses that are 'all dressed up with nowhere to go".
Building the Video Classification Model
Using video is no easy task. Effectively what you have to do is break each video down into individual frames, so if my 10 second video is taken at 30 frames a second, I am are looking at 300 frames (or images), and then you have to train one model, a Convolutional Neural Network, to learn from the 300 images to look at the difference between fast and slow horses and then another model, a Recurrent Neural Network, to look at the difference in the images that occurs between each image (compare the first image to the second, the first image to the third, the first image to the last, etc).
This two stream approach best captures the spatio-temporal nature of video, or put another way, it not only captures the shape of the cardio, but how the cardio is moving as it beats. In developing the project I settled on a bespoke CNN+RNN approach as right now the likes of Clarifai, Microsoft's Custom Vision and the most recent addition Google's AutoML, aren't quite up to speed to allow me to use a custom labelled data set as I have with fast and slow racehorses.
The initial analysis of the video was performed using an RNN/Long Short Term Memory based neural network model. This initial model provided basic results with an Accuracy of 0.620, less than my current Cardio+Biometrics model at 0.702, but it gave me a baseline for improvement. It turned out that the best approach to improving the model was primarily based on the improving the quality of the videos themselves as they are quite noisy and sometimes not very informative because of the specifics of echocardiogram videos of horses in general (horses move, have variability of fat between the transducer and heart, etc). To improve the video itself, the following transformations were applied to each of the images:
Resized the video to 100x100 pixels
Divided each element by 255.0 and saved the frames as an array
Once all this is done, I then tried various neural network models including VGG, InceptionV3, and the state of the art NASNet, which is a derivative of Google's AutoML, as well as a custom CNN model. Here is the accuracy of the various Neural Network models that were tried.
TDCNN+LSTM+optical flow - 0.558
VGG16+LSTM - 0.669
NASnet+median - 0.681
Inception_v3+LSTM - 0.687
NASnet+gabor - 0.693
NASnet+LSTM - 0.718
NASnet+preproc - 0.724
TDCNN+LSTM - 0.730
TDCNN+LSTM+preproc - 0.742
The best model that was created was a custom CNN and RNN (LSTM) model that had the pre-processing transformations that I listed above. Importantly, with just the video alone, and nothing else, I was able to generate an accuracy of 0.742 was actually better than my current Cardio+Biometrics model at 0.702.
While the neural network of the video alone is better than the Cardio+Biometrics model that I have developed previously, it doesn't mean that it is best to just replace the model, rather I need to use the data that the traditional Cardio+Biometrics test generates in order to help the video model relate to the sex and size of the horse. When we drill down into the data that is collected in the Cardio+Biometrics model and work out what is actually relevant to performance, here are the most important features:
Sex - the sex of the horse.
BackLength - the measurement from the top of the withers to the point of the hip.
BodyLength - the measurement from the point of the shoulder to the Ischium at back of the horse (near the tail).
Girth Circumference - the measurement of the girth over the point of the withers and under the girth.
Leg Length - measurement from the elbow to the proximal sesamoid.
LV Pelvic Index - the measurement from the Hip to the Ischium, divided by the measurement from the ground to the withers
Shoulder to Hip - the measurement from the point of the shoulder to the Hip.
ZscoreWeight - The Standard Score of the Estimated body weight of the horse. I use the formula developed by Staniar, et al, which I have found to be the most accurate measurement of weight that can be applied in the field. The weight is then transformed into a standard/Z-Score based on horses of the same age and sex so there is a normalization of the data.
LVIDd - Left ventricular internal diameter end diastole
LVIDs - Left ventricular internal diameter end systole
SV - A calculation of stroke volume of the cardiovascular parameters.
CO - A calculation of cardiac output of the cardiovascular parameters.
FS - A calculation of fractional shortening of the cardiovascular parameters.
A Combined Model
The final step, at least to this point of my research, was to combine most important features that the CNN+RNN neural network had created from the video with most important features from the standard Cardio+Biometrics dataset and pass these features into a new combined prediction model.
Without getting too technical (if I haven't already!), as the result of the CNN+RNN process I am able to extract the most important variables from the CNN+RNN that are related to racehorse performance. It turns out that there 4 features from the CNN+RNN part of the model, that significantly explain the most important features of the cardio video. These four variables are then combined with the Cardio+Biometrics data for the same horses in order to develop the combined model.
You can see from the above that the four CNN+RNN features that are extracted from the video are by far the most important features overall This is followed by the LVPelvicIndex, Stroke Volume, Cardiac Output, Leg Length and Body length. Interestingly, variables that had some importance in the Cardio+Biometrics dataset, namely LVIDs and somewhat surprisingly the Sex of the horse has no importance in this combined model. An important take away for me here is that I should only be measuring what matters and there are a lot of variables that I used to measure, which have no real importance in predicting the outcome so I should no longer be measuring them.
The resulting list of features above was used as input parameters in a Random Forest model. Training the model on these parameters and using a 10-fold cross validation of the data, the outcome was:
Combined CNN+RNN+Cardio+Biometrics Random Forest Model Accuracy: 0.825.
CNN+RNN Video Model Accuracy: 0.742
Current Cardio+Biometrics Random Forest Model Accuracy: 0.702
So there you have it. By combining the best features from the CNN+RNN video model, with the relevant data from my current Cardio+Biometrics model, I was able to develop a model that has VERY GOOD accuracy, much better than my current Cardio+Biometrics model that I use.
The Next Steps
So, while the model that I have now developed is very good, undoubtedly better and more scientifically rigorous than any human could do, there are still improvements that I can make to further increase overall Accuracy of the model developed. These are:
In the same way that I selected the features that most mattered in the Cardio+Biometrics dataset and only used those features in the combined model. We have a dataset of DNA markers, about 220 SNPs that have already been known to be correlated to racehorse success. These SNPs are located in genes that are involved in muscle fiber type determination, oxygenation, etc. I can perform the same 'feature selection' process on these SNPs that we did with the Cardio+Biometrics data above and then use just those SNPs in a new Final Combined model that would use the 4 features extracted from the video, the 14 features from the Cardio+Biometrics data model and whatever SNPs are deemed as important. It is possible, actually probable, that the SNPs will eliminate a lot of the cardio or biometric measurements as there are SNPs in there that are proxies for weight (those in Myostatin and PPARGC1a) and height (LCORL SNPs) so it may be that I get down to only having to take a hair sample, the video and a handful of measurements on the horse and it captures all that it needs to make a determination of future performance.
Add some type of K-Means Clustering algorithm of the video sequence like they have in this document here - Video Classification Algorithm Based on Improved K-Means - so as to group the cardio video by their similarity. While there are certainly "bullseye" cardios that are great in just about any type of horse, there are certainly cardio shapes that are more effective in certain types of horses, doing different things. If I apply a label to the type of horse that it is, so as an example, a good backmarker turf horse or a good on pace dirt horse, a supervised K-Means clustering algorithm should be able to separate them out into different cardio types.
In my current machine learning algorithm and the one that I used for 'feature selection' and the final combined model, I only used one machine learning algorithm - XGBoost. The use of Ensemble Stacking, where you use a whole lot of weak and strong learning algorithms has been proven to be more effective than using a single algorithm so I plan to integrate an Automated Machine Learning process to do this. There are a few programs out there that can do this but the best that I have found to date is H2o.ai. They have developed an Open-source machine learning platform which has an AutoML feature within it which includes automatic training and tuning of many models and the creation of a Stacked Ensemble which, in most cases, should be the top performing model available. This will certainly improve the Accuracy of my predictions and make it easy for me to have an algorithm retrained on new data. I currently update my dataset every 3 months, so when horses turn four and have at least 3 starts, they then become samples for the algorithm to learn from. Setting this up as an automated feature will enable me to quickly iterate a more accurate algorithm and one that I have to do less hand tuning of.
This new CNN+RNN model is a massive improvement in what I am currently doing and I would say with reasonable confidence that nobody else in the equine field is measuring cardios this way. It's going to take me a little while to get this new model into production (I am waiting for Google to set up a GPU instance so I can retrain my model when I add new data) but I hope that I can start using it by the middle of this year when the first Northern Hemisphere yearling sales come around.
I am also looking forward to adding the SNP data in to see where I end up with it all in terms of Accuracy. I think it will get towards 0.90 in terms of Accuracy but I doubt it will get much past that. As comprehensive as I think I have been in using different techniques and gathering as much relevant data as I can, there are other factors that I am not measuring that I am sure help determine the difference between elite and non-elite racehorses. There is of course also a commercial downside to improving the Accuracy as I have and this will be exacerbated if by adding the SNP data the accuracy gets towards 0.9 (that is 90% Accuracy). The closer you get to perfectly classifying elite and non-elite racehorses, the harder elite horses become to find at a sale and the more testing you have to do to find them.