Unfinished Portrait: Official Data on Equine Injuries

The Jockey Club’s press release summarizing the finding of its Equine Injury Database (EID) paints an incomplete picture of the impact of racing surfaces on injuries to racehorses. The headline conclusion is impressive: the synthetic surfaces in North America had a fatal breakdown rate of 1.22 per 1000 starts versus 2.11 for dirt surfaces.

The conclusion above is an undisputed fact given the data collected. What is not indisputable is the following statement: “synthetic surfaces prevent 0.89 racehorse deaths (per 1000) versus dirt tracks”. Another common phrasing might be “if all dirt surfaces in North America were converted to synthetic, we could reduce on-track breakdowns by 40%”.

The reason I cannot say the second sentences are indisputable is because other factors  come into play. These factors that  need to be isolated include track policies impacting horse safety, track personnel responsible for implementing policies, and the racing class of horses running amongst many others.

What has been most frustrating about the discussion around this, and Keeneland’s switch to dirt, is that the full Equine Injury database has not been made public so that a true, transparent, investigation into horse safety can be made. The Jockey Club has hired an equine breakdown specialist, a well-respected epidemiologist named Tim Parkin, to parse the full data. I have no doubt that Dr Parkin can do the job in question; however, I believe a stronger result would come out of a public and/or peer-reviewed process.

The ultimate goal would be to provide a magnitude, within a range of confidence, for the impact of track surface on breakdowns and also a magnitude for the other factors that are also important, policies and personnel and racing class at the front of the line. I firmly believe that synthetic surfaces play a (statistically) significant role in equine safety. I believe, with equal fervor, that the impact is not 0.89/deaths/K – I believe it is lower than that.

Why? Because I have run some numbers. As part of the 5-year summary of the database, TJC provided the same summarized breakout for tracks that were willing to make their breakdown rates public, 28 in all. (This includes all NYRA and California tracks, Keeneland and Gulfstream – all these tracks should be praised for sharing their results). Of course, TJC released these stats only in summarized PDFs by track; while very data-unfriendly, this is more granularity than we have seen.

Not one to let a file format get in my way, I imported the granular data from all 28 public tracks PDFs to create a public-reported EID. Fortunately for our number-crunching exercise, all Synthetic tracks save Arlington Park were represented in the public data. There is a lot of data to crunch through, but two results from the data have jumped out at me and I wanted to share. Moreover, I want people to have access to this data and either confirm or refute my results and also find things on their own.  I am making the data available here:

TJC – Public EID on Google Docs

Public Equine Injury Database Summary Results

Preface: It is important to note that this database has a lot of variance. There are 204 separate triads [?] of Track-Surface-Year in the data – the public dataset has a weighted average of 1.72 DPK for all surfaces, but equally weighting each track comes out to 1.96. More importantly, the Standard Deviation for this sample of 204 datapoints is 1.27, which is 65% of the mean. My rule of thumb is that a std dev of 25% of mean is “normal”, so the EID data would be higher variance. Higher variance generally weakens the strength of causation for any one variable. This alone gives me pause when drawing conclusions from a dataset.

1. Tracks with Synthetic Surfaces also have safer turf courses

Turf DPK
All Turf 1.54
Tracks w Synth Main 1.39
All Other 1.58

Since synthetic mains were so well-represented in the data, we could actually breakout the results of their turf courses separately. While it’s a small relationship (12% lower), tracks with synth mains had safer turf courses than all other turf courses reported. I have not tested for statistical significance, but 132000 turf starts are in the summary.

This indicates to me that perhaps there are policies and personnel in places at these tracks that contribute to overall racehorse safety, and the magnitude could be as much as 0.20 DPK.

2. The relationship between distance and DPK persists on a synthetic sample

Turfway Park and Presque Isle Downs are two tracks in the public dataset. What makes them uniquely valuable to the analysis is that they have no turf course, and therefore their distance-to-DPK relationship is isolated to synthetics. (Again, TJC data could totally isolate this for each track and surface, but we’re using what we have)

PID + TP 5Y DPK 5Y DPK
<6f 1.22 1.22
6.0-7.5f 0.95 0.96
8f+ 0.98
Total 1.02 1.02

Over 66,000 races, races run at less than 6f were 25% more likely to have a fatality than 6f+ at these two tracks.

If distance – or, more importantly, if some other variable (class) for which distance is a proxy – were not a factor, then we would expect the racing surface to reduce or eliminate the relationship of distance. (<6f is 20% higher for all races). This factor, be it distance or class, may have a DPK magnitude of 0.24 when applied to the higher AW DPK stat

3. When looking at a certain class level on dirt in the public database, shorter distances are not less safe than longer (i.e. the distance relationship disappears)

California Racing Fairs all report data and all run at a similar class level – basically lower level claimers topping out with an infrequent allowance or overnight stake. Plus the majority have only dirt tracks. Look at the data:

CRF 5Y DPK 5Y DPK
<6f 2.31 2.31
6.0-7.5f 2.42 2.17 avg
8f+ 1.72
Total 2.21 2.21

I grant that this says little about the safety of dirt vs synthetic. A statistician might conclude from this data, nonetheless, that the observed relationship between distance and DPK is not as strong when controlling for a certain class level. Therefore, class and not distance is strongly viable as a dependent variable. Mainly, it reinforces the need to look at the data more closely

Conclusions

I will not make claims that what I have provided above definitively shows what I claim. That’s why I’m making this database publicly (again, here) available so others can test these claims and look at the data more robustly. There are definitely other interesting observations in the public dataset that I think both strengthen AND weaken the synthetic surface claims. The Jockey Club needs to do something to get the full dataset in front of more eyeballs. But I’m pretty confident that three conclusions, which are really no-brainers, are true:

  1. The best racetracks do more than install synthetic surfaces to ensure equine safety. I believe installing synthetic surfaces, for a time, were a credible additional commitment to safety. People and policies matter a lot, perhaps more than surface.
  2. Racing class is an important predictor of likelihood to breakdown – the link needs to be investigated and quantified
  3. Synthetic races do indeed reduce fatal injuries vs other surfaces, but not by 40%, and not without a pre-established commitment to safety from the track and its personnel

My statistical instinct tells me the real preventative value of synthetics is in the 15-25% range, which is still really great, about 65 horses/year, more if we factor in training. I would love to know for certain – this is a call to make sure that happens as soon as possible.

Advertisements