Earlier this week I was asked to write down some thoughts about genetic testing in Thoroughbreds for the upcoming International Thoroughbred Breeders Federation congress in March. In the process of doing this (the below doesn’t represent what I have sent to the ITBF), I started to think about what we have done to date, the successes and failures that we’ve had and some of the lessons learned in the various experiences I’ve had with clients and industry bodies.
Lesson One – DNA is not necessarily destiny.
If somebody says that they can take a hair or blood sample from a horse and tell you with 100% certainty that you have a fast racehorse or not they are telling you a lie. That is not to say that genetic testing isn’t important and doesn’t have its place in predictive analytics, rather, it is as a standalone test somewhat oversold at the moment for what it can actually deliver.
The reason for this is that while some SNPs (changes in genetic code) are highly correlated to elite performance, that is one variation (say a A:A) of a SNP appears in a high proportion of the elite population while another version (G:G) appears in a high proportion of the non-elite population, that doesn’t necessarily mean that the SNP explains the total variance in the population as a whole. A lot of SNPs have good to high correlation with performance but actually explain a lower percentage of the variance in the population. Without wanting to get into a larger discussion about the difference between Correlation Coefficients and Goodness of Fit, suffice to say, this means that even the best prediction model relying on genetic variants alone can only explain about 40% of the variance in the thoroughbred population. This leaves another 60% of the variance to other factors that genetic markers are not going to pick up on when you do a test.
Part of the reason for this being just 40% is the method that we are using in the field and the limitation that this has. We are testing the genetic markers that the yearling or two year old was born with the day it came out of its mother. If the foal got pleurisy, rattles or some other disease that will compromise its performance (like being riddled with OCD's), the genetic test isn’t going to pick it up. More importantly, we (and that is all genetic testing companies) don’t pick up the environmental interaction with genes. The more we know about this, the more important it seems to be in terms of determining performance.
Thus, if a yearling with “average” genetics, is really well raised and the genes interact with the environment positively, they could well be a superior runner, equally, if a yearling with “good” genetics is raised poorly and doesn’t get much opportunity in life, this compromises its ability to become an elite racehorse. This is why companies that use genetic testing alone (and that includes us...although that is about to change...) are going to ‘miss’ some horses and in an era where people seem to believe that DNA equals destiny, missing a good horse is hard to explain to the lay person.
The other issue we have is that there are "elite horses" and then there are truly "elite horses". Breeders and owners view the graded/group system as a good measure of genetic merit but unfortunately at times that faith is misplaced. This is especially apparent, at least in America, at the beginning of the year and when horses are running against their own age group and sex. We invariably find that Three-year-old fillies that are running against their own age group on the turf at this time of the year can get some awfully cheap graded stakes wins to their name and look a lot better than they actually are. Right now the same can be said for the mares division in North America. With the best of the mares division having run at the Breeders' Cup and not looking to return to the track until at least late February, mares graded stakes races are currently being run with five horse fields that include a couple of mares that are just there to see if they can get some graded stakes placing before they head off to the breeding shed. These are hardly elite animals yet the graded stakes system rewards them as such and the market perceives them to be even though the difference between these types of horses and the elite horse is vast. As a standalone, genetic testing for performance can give you an advantage in selection (it is a very good negative predictor), but there are some important limitations to it.
Lesson Two – Cardiovascular evaluation explains even less
While Genetic testing only explains around 40% of the variance, measurement of cardiovascular capacity via m-mode (also known as echocardiography) explains even less, somewhere around 25%. Again, that is not to say that m-Mode measurement of cardiac parameters doesn’t have its uses in selection, it does but the reality is that it is:
It is less relevant in sprinters (1000-1400m) where the cardiac capacity is less of a determinant of performance when compared to the raw sprinting ability of the muscle. That said, when you get a great cardio in a sprinter (like say Majesticperfection, Fast Bullet, Rain Affair, etc) they are pretty hard to beat on the front end of a race.
It is less relevant in fillies than it is in colts. For obvious reasons being that the metabolic cost of taking a filly, generally smaller and lighter, than a colt from point A to point B is less so the cardiac capacity doesn't need to be as much.
It is best applied to horses that want to run a mile (1600m) or more and it is vitally important for colts.
Echocardiography is different to Electrocardiography, the latter being where they are looking at the t-wave of the cardio and generating heart scores. This type of evaluation has next to zero correlation to performance and is a complete waste of time. For more information on how echocardiography works click here.
Lesson Three – Specialization is required.
Variables (not the variance within those variables) can kill in predictive analytics and thoroughbred performance is literally thousands of variables. Trying to build a model or multiple models that properly predicts the racing potential of each individual yearling at a sale is impossible, even with a combined genetic/cardio prediction model that we have (more about that soon). There is too much of a difference in terms of genotype and phenotype between a early maturing sprinting colt and a later maturing distance mare. Even more specific, there is a lot of genetic and cardiac difference between a high class mare that runs 9f on the turf in France when compared to a high class mare that runs the same distance on the dirt in America. They are quite different even though 'class' is a constant.
Specialization is required. It is much better to be trying to predict one type of yearling to perform at the highest level than it is to be predicting all types. If you determine that you are wanting to predict elite colts that run over a mile as three year olds (let's face it, that is where the money is), then you can build a model that just looks for that horse at a yearling and, by separating out the signal from the noise, get very good at finding them.