Data and More Data
Popular Science, an online magazine dedicated to new technology and science has a series of articles online regarding the use of data and in particular data mining to make informed decisions. One of the more interesting articles was about the Irish bookmaker Paddy Power, who uses both real data (race form, pedigree, etc) to make informed decisions about how to frame a market for a race, and also crowd sourcing information and betting patterns to make alterations to the market. All of it is driven by data.
You can click on the article here. What we found most interesting was there comment made between the difference between weak and strong data, and how strongly you can make recommendations or comments based on this data. Prediction of racehorse performance is no different. The more data you have to make a decision the better. What we are also seeing is that at a genomic level there are types of horses that are significantly more predictable than others. It is hard to predict an early maturing two year old that you see racing at Hollywood Park, or Belmont in the summer and winning stakes races. Some of them are predictable, usually the ones like say Posse, who go on and become better racehorses as they get older, but the bulk of them are really hard to predict. Equally, the longer distance horses, especially those that get good late in their life and want to race on turf are problematic. Most of this variability of course comes down to the fact that you only have to beat the horses that turn up on a given day.
Defining the phenotype or outcome that you are trying to measure is vitally important in this game. We make no bones about the fact that we are primarily interesting in buying the late two year old that matures at three to win consistently at graded stakes level. There is a reason for this. Most owners, trainers and indeed horses are trying to do the same, so there is more horses to measure and more data. More data of course means more accurate prediction.