In addition to creating a deep learning neural network to understand what a 'good cardio' looks like I stumbled across a neat bit of work out of the Mathis Mouse Motor Lab at Harvard University.
They have trained a deep learning neural network to automate the tracking of features on a mouse for varying tasks. Their paper, Markerless tracking of user-defined features with deep learning, is published on arxiv.org and the code for their work is on Github
In laymans terms, what their work does is this.....
Take a video of an animal and break the video down into frames. So if you have a 10 second video taken at 30 frames per second, you have 300 frames
Markup each frame with a point on each biomechanical/anatomical landmark (joint) that is of interest.
Train a neural network on 40% of the frames and test the outcome on the other 60% of the frames using a deep learning network. The neural network generalizes to the data point very quickly. The network they trained on the mouse below had an average error of less than 5 pixels, that is, where the computer thought the point should be placed was less than 5 pixels away from where the human had placed it (so its super accurate).
I figured that if they could get it to work on a mouse, we could try a racehorse. Last year I took a lot of video of yearlings at the September Yearling sale and older horses at the November Breeding Stock sale (and stallions that had retired to stud) - I have over 1000 of them - so I selected a few and sent them along to Mackenzie and Alexander Mathis who agreed to give it a try.
Once I had hand annotated the video, they trained their deep learning neural network in the same 40/60 (train/test) split that they had above. Below is the video of the horse with the markups that the neural network made on the left, and on the right are the markers without any horse to show what it looks like.
Pretty neat huh!
The error rate of the neural network was about 3 pixels so with some more training and more data it should be able to get down to less than 2 pixels so it would be super accurate. There is a little more work to be done in terms of training the model to understand horses of different colors (grey/white might be tough) as well as different circumstances (walking into shadow, walking on grass, etc) but once this is done then it would only be a matter of feeding the video in and the neural network would place all the points on the video automatically....
Once this is done, what would I be looking to do with it?
I'd like to know if there is a way to cluster the videos of each of the horses into neighborhoods, some type of t-SNE clustering or the like. Basically work out if you have a horse that has particular bone lengths and a 'way of going' at the walk, can they be put in a group of similar horses?
If that is possible, is there a difference between fast and slow horses in that neighborhood? or is the neighborhood itself 'elite' or 'non-elite'.
Regardless of the above, as the markerless biomechanics will help a neural network look at the 'right' places to see what a horse is, will I be able to create a new neural network that will be better than a human at selecting an elite racehorse just off the data of the walk.
Plenty of work to do. The overall goal is that I can build something that will help in making predictions of outcomes more reliable. Humans are great at learning off True Positives - we remember the horses we saw as yearlings that turned out to be good horses. Where we are poor is that we are slow to learn off False Positives - horses that we thought were good types as yearlings when we looked at them but they turn out to be slow and False Negatives, horses that we thought would be slow as yearlings but turn out to be fast. It turns out that by improving our understanding of and learning from False Positives and False Negatives we end up learning how to identify more True Positives (and more True Negatives) and classifying outcomes properly.