The Sales Orphan
Craig Bernick, the President of Glen Hill Farm, a racing & breeding business based in Ocala, Florida penned a timely tweet on the eve of the September Yearling Sales.
Unfortunately there will be many orphans generated by yearling buyers over the next few weeks. The game is set up to create them.
Firstly, we have what is known in data science as a highly imbalanced dataset. That is, the positive case (fast racehorses) is not equivalent in the population as the negative case (a slow horse). In fact, depending on the year and how quickly or slowly the American Graded Stakes committee add or delete stakes races, an elite runner can be found by random chance in between 3% to 5% of the September Sale population.
What this means is that if you took a random selection of 20 to 30 yearlings out of the sale, on average you should get one good horse by chance. Makes you think about yearling buyers who consistently fail to buy an elite horse, they're not unlucky, they actively select against elite runners!
Overall however, selecting an elite runner is a relatively difficult endeavor. It is not like we are dealing with spam or fraud detection, where the positive case is usually less than 1% of the overall population, but at 3-5% it is still very difficult.
Secondly, and probably more importantly, the feedback loop to learn in selecting horses is awfully bad. If you think about how you would optimally learn, it is when there is instant feedback based on the selection you have made. The reason for this is that it allows us to refine the false positives (FP = horses that we think are fast but are slow) and false negatives (FN = horses that we think are slow but turn out to be fast) which in turn increases our true positives (TP = horses that we think are fast that are fast) and True Negatives (TN = horses we think are slow that are slow).
Generally speaking those that buy within the industry are judged by Precision, or strike rate. That is the outcome of the percentage that are predicted "yes". So, the average buyer/agent buys 10 horses and 1 turns out to be good then the Precision is 10%, that is 1 TP from 10 total selections (1 TP + 9 FP).
What we don't measure agents and buyers by, but what is equally as important, is Recall, which is the percentage of Elite horses selected by the agent/buyer, when compared to all horses that they saw that have turned out to be elite. It's just as important to learn off the horses that we thought were slow, that turned out to be fast.
The challenge that all agents and buyers face is that it is a long time between when they see a horse as a yearling and when they get feedback on the horse. Sometimes its a couple of years at least. By this time it isnt possible for the agent/buyer to recall exactly what they saw when they looked at the yearling that turned out to be fast when they thought it would be slow, and where they made a mistake. We tend to rationalize that away but it is a product of a poor feedback loop.
It is for these two reasons, a long time between when a yearling is seen and an outcome is known, and the low percentage of positive cases to overall cases (3-5%), that an agent that has just 10% of the horses he or she buys become elite, is considered quite successful.
Which leads to the obvious answer. A superior selection and feedback loop that records what it sees and then works on reducing the false positives and false negatives when outcomes are know will, over time, result in much better precision....and fewer orphans.