Age, and Frequency, Shall Weary Them
We have always said (to anyone that would listen!) that there is no one silver bullet to predicting racehorse performance. In distance horses, cardiovascular and splenic capacity are vitally important, but scientific studies on that have shown that it only explains about 22% of the variation in performance. 78% is explained by other factors and in sprinters the cardiovascular parameters are less of a determinant than other factors (genetics becomes more important there).
Genetic profiling is the same. You can tell a lot about how far and how fast a horse could be by looking at variations within genes like MSTN, PPARGC1a, CKM, PDK4, TFAM, ACE and ACTN3 but anyone who says that they can just take some DNA and tell you unequivocally that you have a fast or slow horse is simplifying the complex.
In terms of selection at yearling sales, neither cardiovascular testing or genetic testing is truly applicable at scale. That is, you can't test every horse that goes through a yearling sale as the volume of yearlings on offer makes this impossible to physically do. Equally, buyers invariably select for different things at sales. Some want staying fillies, some want sprinting colts and to develop a solid model for each of these horse types in terms of genetics or cardiovascular parameters requires a large data set.
This is where data modeling using statistics steps in. Using a lot of data and statistical techniques such as symbolic regression (we highly recommend anyone interested to take a look at Eureqa) one can model for different types of horses. Say for example that you are wanting to purchase a colt that can win the Kentucky Derby. There are two factors at play in this model, distance and class. Distance is a highly heritable trait and can be modeled quite easily using the median winning distance to describe optimal racing distance. Because it is highly heritable having the median winning distance of the sire (and his offspring) and the dam (and her offspring) tells you a lot about the potential of the horse that you are modeling. In the case of the dam being unraced (and the rarer case of the sire), the next generation of parents is enough to tell us what we need to know. Interestingly from a modeling viewpoint, in terms of distance the median winning distance of the broodmare sire has slightly more weight to it than the median winning distance of the grandam. However, when we look at a catalog page there is no information about the Broodmare sire at all. This type of information asymmetry can be taken advantage of.
Class is a little harder to model as it requires the figure that you are looking to model for to be heritable and explain as much of the variance in the population as possible. This is where using Beyer speed figures and other such figures as the basis of a breeding model fails. The figures don't explain the variance in the population well and as good as they are as a handicapping tool, they are a poor judge of genetic merit in terms of breeding. There are other more heritable measures such as the log of earnings per start, ranking and performance ratings which are more heritable and explain the population better than speed figures. The same structure applies to answering the question in terms of distance with a large number of data points, most of which you can't see on a catalog page, that are important to discriminate for racing class.
One of the data points that we have modeled that is important, but you won't see on a catalog page is "foal rank of sire".
To explain, If you took the career of Sadler's Wells and you noted the first foal that was born in 1986 (his first crop) in the January of that year and noted it as foal number 1, and the next foal born was #2 (In The Wings incidentally) and you kept on with that process you would get to Montjeu (foal number 786) Galileo (989) and High Chaparral (1108). Now where it gets interesting is that once you get past 1350 foals in Sadler's Wells' career, which occurred when he was 18, his Gr1 winners to foals ratio drops from 4.66% to 1.1%. Basically in the first 60% of his career he was a great stallion and in the last 40% he was just a good stallion. As good a stallion as he was, he didn't sire a Gr1 winner in his last 318 foals.
We looked at this phenomenon in 30 great stallions and the result is that on average 85% of their Gr1 and Gr2 winners are sired in the first 55% of their entire foal count. Depending on the number of mares that they covered this tipping point usually occurs somewhere between the ages of 16 to 18 years of age. Somewhat perversely this is usually when the stallion is at the height of his commercial powers. If a stallion normally retires at the age of four, covers his first mares at age five, and has his first runners at age eight, he is usually 10 before his quality is established. It is however at least another two seasons, with sustained results on the racetrack, that they get to their maximum service fee which invariably occurs at age 12. But within 4-6 years of this point in time, presuming they are serving large books of mares, they aren't the same stallion that established their merit.
Another interesting stallion to look at this phenomenon is the fantastic New Zealand stallion Zabeel. He is still breeding mares this season at age 26. In his first 10 crops of 1005 foals he was producing stakes winners at a high 10% SW/Foals born. In his next five crops, sired at his most expensive fee and presumably with yearlings going to the best trainers, he drops to a 6.7% Sw/Fls stallion. In his current 6yo,5yo and 4yo crop combined his SW/Fls ratio has dropped to just 3.9%. When Zabeel does have a good Gr1 winner now it is usually a repeat mating (Gondokoro and the previous Zabeelionaire) or one out of a really good race mare on a good cross (Zydeco out of Gr1 winner All Time High). Age and frequency apparently does weary them.
The reason for this drop off, even in the best of stallions is unclear. There could be any number of reasons or theories for this but one that might be the most plausible are the fact that sperm is a replication process and influenced by methylation patterns - thus the process could easily have progressive mutations and be heavily influenced by the overall health of the stallion. An observation that may also have some validity is that in wild horse herds, most stallions acquire a small harem of mares at age 5 or 6, with this harem increasing in size through about age 12 and decline thereafter. Maybe nature is telling us something.
There are a number (literally over 100) data points that we have modeled for racing class. Depending on the model that you are looking at (developing a model for sprinters gives different data points than distance runners) there are about 12 that tell you as much as you can know using statistics alone on what the likely performance of a horse will be. From there, using genetics and cardiovascular scores as an overlay on the horses that are predicted to be elite on the model gives greater certainty to overall prediction.