Finding a Birdie in a Haystack

Happy New Year everyone! I took some time off around the holidays, but I am excited to get back to writing about how a data-driven approach can help you meet your golfing resolutions this year!

This post continues my introduction to golf statistics and how to use them. Previously I discussed how a player looking to get better must first determine what getting better looks like. Assuming you have decided what improvement means to you, how do you determine which stats to measure in order to get you there?

With so many statistics used on TV and in magazines, it is crucial to determine which are most useful, and which are entertaining noise. This article will hopefully help differentiate the two.

The 7 Characterisitics of Good Golf Stats

Some golf statistics have been used for decades – others were only recently invented. With limited time and resources, most players want to track only the statistics that will best help their game. I believe that the best and most useful statistics are:

1. Reliable

A statistic is only useful if it correlates well with performance (whatever performance means). For example, a friend of yours might suggest Patrick Reed has a good chance of winning the Masters this year, since he won it last year (perhaps suggesting he does well on the course). However, back-to-back wins have only happened 3 times in the 81 Masters tournaments. No matter how much logical sense it makes, if a statistic is not reliable it should not be tracked.

2. Predictive

Some statistics are only useful in retrospective discussions of a round, tournament, or season. While these analyses can be useful in understanding why the results occurred as the did, statistics driving improvement are most useful when they predict future performance.

Here is an example from Mark Broadie’s book, Every Shot Counts. Broadie defines a stat called “Putting’s Contribution to Victory (PCV),” which is the percentage of a winner’s advantage that is due to putting.¹ This stat can help us determine whether a player won because he putted well or because the rest of his game was exceptional (or a combination). For instance, Vijay Singh won the 2006 Barclays Classic with a PCV of 79% (i.e. most of his success was due to his putting). On the other hand, Singh won the 2008 WGC-Bridgestone with a PCV of -37% (i.e. he won despite putting below average for the week).

The PCV stat helps us contextualize the results of a tournament, and should probably be used more among TV analysts. However, it does not do a very good job of informing us about what it takes to win or who is likely to win next. The two examples above show that Vijay Singh has at times relied almost entirely on above-average putting to win, and at times compensated for below average putting to win. We cannot say that Vijay Singh’s recipe for success involves getting hot with his putter or getting hot with his full swing. It might be interesting to note that Vijay’s average PCV is 20%, but if he came to us before a tournament and asked whether he should work more on putting or on the rest of his game, PCV by itself would not give us an answer.²

PCV is an example of an extremely informative stat that is not (by itself) particularly predictive. We can learn a lot about why a player won, but we cannot really say what a player should do to win more often.

3. Informative

A statistic is only useful if it provides insight as to where performance gains or losses arise. Even if a statistic is reliable and predictive, it is hard to use it in an improvement plan if we do not know the reasons behind the results.

The greens in regulation (GIR) statistic is a great example of both informative and uninformative qualities in a stat. GIR measures the fraction of times a par can be achieved in 2 putts. It is also highly correlated with scoring average and is generally predictive of future low scores. Because the GIR statistic does not involve any short game shots, the stat is particularly useful in comparing a player’s ball striking and short game abilities (low scores with a low GIR tells us the player is getting up and down for par a lot, high scores with a high GIR means a player is probably struggling with his putting, etc.). In this way, GIR is very informative.

However, once we identify overall ball striking abilities through GIR, we get no more detail about full swing strengths and weaknesses through GIR alone. A low GIR could be because a player struggles with her irons or with her tee shots. We don’t know if the player needs to work on hitting longer tee shots, finding more fairways, improving iron contact, or making less risky decisions. In this way, the GIR statistic is very uninformative. Most statistics provide limited insight on their own – often they must be used in combination to be truly informative.

4. Applicable

The best statistics apply to all golfers regardless of ability, gender, age, or playing style. However, some statistics are only applicable for certain players or situations, and an important part of data-driven analysis involves recognizing which statistics are most useful for the particular player.

For example, if you have a high handicap, a great statistic to track is the number of “big number” holes you have per round (maybe 7 or higher). Most high handicap players I know can string together a long streak of bogeys and pars, only to lose ground with an 8 or 9 (or worse). Setting a goal of eliminating those disaster holes and tracking them is a good way to see rapid improvement.

On the other hand, tracking big numbers makes no sense for tour pros because their disaster holes are extremely rare. Similarly, a tour pro may be interested in tracking birdies per round, while birdies are too rare for an average player to find insight in tracking.

5. Accesible

Some statistics are easy to measure, record, and understand. Others require a vigilant player to make continual, meticulous notes. While a player is most informed by careful analysis of every shot from every perspective, much of that work may be a waste of time. Even worse, a surfeit of information can be overwhelming, and finding useful insight amongst the noise may be impossible.

Thus a statistic can be accessible in three ways: easy to measure, easy to calculate, and easy to understand. Technology has made many statistics more accessible, with systems like ArcCos and Game Golf automatically tracking each shot (and of course Tour players have the benefit of Shotlink to gather all of their stats). There are numerous apps available to take care of calculations, and machine learning tools have the potential to simplify the analysis. Still, the best statistics are easy to record and analyze.

6. Relative or Absolute (where appropriate)

Virtually every statistic falls into one of two categories: Relative statistics are computed by comparing a player’s performance to others (strokes gained, total driving, any sort of ranking, etc.), while absolute statistics only consider details of the player’s performance in isolation (greens in regulation, putts per hole, bogeys per round, etc.).

Neither category is inherently better than the other, but it is important to recognize when either type is most applicable. Relative statistics are most useful when the performance goal involves comparisons to other players and when those comparisons are plentiful. Absolute statistics are better when there is limited data about a player’s opponents or when comparisons with the average are not useful.

A tour player looking to make more cuts will probably benefit from relative statistics, since his goal is to improve relative to his peers. However, if this tour player is in his 50s, he may not find it useful to learn his driving distance is below average, since he faces physical factors limiting his improvement ability. An average player looking to win his weekly matches may need to look to relative stats, since he likely does not have statistical data about his buddies.

7. Grounded

This quality is very similar to Informative, but I believe it deserves a description of its own. By grounded, I mean a statistic is grounded in golf fundamentals. This is best explained by analogy: In my training as a physicist, we were taught not just to identify trends and determine their origin, but to connect our results to physics fundamentals. Since the laws of physics have been developed over many years utilizing many scientific approaches, our results are usually more robust if we can ground them in these fundamentals.

In a similar way, golf has its own fundamentals that have been refined through generations of players, coaches, physiologists and sports scientists. For instance, we typically break our shot types into categories – driving, iron play, putting, etc. Within these categories, we can be even more specific (draw/fade, 7 iron performance, lag putts, etc.), and we expect these categories to be somewhat distinct from each other. Thus, just as scientific data is best when it relates back to fundamentals, golf stats are best when we can see the roles of each of these individual categories.

A statistic like scrambling is grounded in the fact that short game is different from full swing shots, but it does not distinguish between chipping and putting – two skills that we know are distinct. Stats like proximity to the hole or strokes gained allow us to distinguish between chipping and putting skills.

Other fundamentals could include the physics of a golf swing, course management, psychology, etc. Note that being grounded is not the same as adopting conventional wisdom – rather it is recognizing that golf performance depends on a variety of factors, and we already have some knowledge of what those factors are and how they relate to each other. By grounding a statistic in those factors, we can better recognize where to focus our practice.

Applying it

Each of the qualities listed above is important in evaluating a statistic’s usefulness. No statistic will excel in every criteria, and often the best insight comes from a variety of stats. The average player has limited time for both analysis and practice, and so tracking only the best stats can help simplify and optimize the improvement strategy.

Over the next few weeks I will be reviewing some of the most common golf statistics, evaluating each using these criteria. Stay tuned to learn more about which data are worth your time.

In the meantime, let me know what you think of my characterization? Did I miss anything? Do you think some of these qualities are more important than others? Let me know in the comments and enjoy planning your new season goals!

PCV is computed by dividing a winner’s strokes gained putting by the winner’s total strokes gained on the field. ↩
With a little more data we can give a better answer. Broadie also defines the Putting Contribution to Scoring (PCS), which is the strokes gained putting divided by total strokes gained (same as PCV but for all rounds). Vijay’s averaged 1.58 strokes gained per round from 2004-2012, but his PCS is -11%, meaning his putting is typically below the field average. Since his PCV is 20% and his PCS is -11%, this says that his putting is significantly better when he wins than in a typical tournament. With this context we can say that yes, Vijay should probably work more on his putting. ↩