Thoroughbred Selection by The Numbers
On the run out of the Northern Hemisphere yearling sales and into the Southern Hemisphere one, we've been doing a lot more data modeling using different tools to weigh statistical data (like foal rank; age of mare; generational interval, etc), biomechanical data (leg length, girth, body length, etc), cardiac data (Left Ventricular measurements) and genetic data (SNPs of exercise relevant genes) to build a single unified model and improve our thoroughbred sales selection process for our clients.
Horses like Verrazano and New Years Day are great advertisements for what we are doing but any company worth their salt is continually improving their prediction models and we are no different in that respect.
Some of the tools that we have been using include Nutonian's Eureqa as well as some of the other regression and neural network tools freely available in R. There are some interesting models being developed and one in particular looks very promising as it weighs the genetic variants particularly cleverly. What it is showing however is that relying on one system doesn't nearly give you the answer you are looking for. Cardiac data explains only ~25% of the difference between elite and non-elite runners. If you relied on cardiac data alone to select your yearlings you'd get some good results by just taking the horses with the best cardio's but in plenty of circumstances you'd miss out on some really good horses that have good genetic profiles that don't really need a great cardio (think sprinters).
Alternatively, relaying on genetics alone has its flaws. In taking a blood or hair sample and examining the variation in SNPs involved in exercise, you are only getting the variation that occurred during conception. Once the foal is born (and indeed prior to it being born) the SNPs are set and no environmental interaction is taken into consideration. If the horse got pluresy or rattles or lacked nutrition at a vital stage which compromises its athletic potential, this isn't going to be captured in the DNA tests that we or any other company does. In looking at the cardio, you actually get a peek into the environmental interaction as well as assessing athletic potential. Putting these together and weighing them appropriately results in a much more predictive model.
In terms of yearling sales, most of the advantage is in fact found in statistical data. You can't test every horse at a sale and the catalogue page is a minefield of misinformation and bias that creates market asymmetry. There are some factors that give you a small advantage at the sales but if you start to add them together they become a big one. The data can tell you what your odds ratio is with each individual yearling offered and it is a great place to start your selection process. This type of advantage reminds me of Kevin Kelley, the football coach at Pulaski Academy who after watching a 15 minute video based on the paper of UC Berkeley Professor Dr. David Romer on the reasons why you shouldn't ever punt a ball in American football changed the way his team played because of what he describes as a "15% advantage that I will take every time to win". They never punt and always on-side kick and are winning state championships. If you have a few minutes, watch the video below and think about how this relates to how we currently buy yearlings, and the advantage that can be had because of tradition.