Tuesday, July 12, 2011

Baseball Writers Mark Simon And Tom Tango Tell Us Why We Need A Single-Number Tournament Selection Formula

I read a lot of baseball blogs and articles. One of my favorite baseball statistical research blogs (The Book Blog) linked to post of an interview of ESPN writer (and former/occasional d3hoops contributor) Mark Simon. Here’s what Simon said about one of the “newer” baseball statistics (I’ll tie it into tournament selection later):

What one baseball stat do you think is about to “break out” or is the most effective measuring tool?:

The one that has gained the most traction at ESPN this season is Wins Above Replacement and I think it’s because it’s a simple concept. You want to evaluate how good a player is– you can’t add his batting average and fielding percentage. That’s silly. But you can look at other factors and how they impact winning.

Fangraphs has its WAR and Baseball-Reference has one as well. But in truth, everyone has their own WAR.

My dad and I were talking about this the other day. He was talking about why he thinks no one in baseball is better now, and what he was doing was processing all the factors he values…he puts a higher value on speed (and triples) than you or I might…and he thinks there is a “fan popularity” impact for every player.

In his mind, he’s smushing all those factors together, just as the Fangraphs version and BB-Ref versions do. His version is personal. He and I don’t have to agree. But it makes for the most fun kind of baseball discussion.
And here’s further commentary by The Book Blog author Tom Tango (Tangotiger):

We all come up with our “single number”, even though we kick and scream that we shouldn’t come up with a single number. If one guy argues that Felix is better than Lincecum, and the other argues the opposite, then guess what: they’ve each “smushed” a bunch of parameters, considerations and gut feelings to get to their final opinion.

I remember an old boss of mine deriding the idea of a spreadsheet that would take a bunch of factors into consideration to come up with everyone’s rating at the office, and, in turn, everyone’s salary. He said that he has to do everything on a case-by-case basis.

But, lost to him is that, in the end, everyone DOES get a final number: a salary. So, you can have a consistent process, that considers everything objective and subjective. Or, you can consider those same objective and subjective things, and smush them together in your mind on a case-by-case basis. You are STILL considering the exact same things.

The difference is that by going case-by-case you may be applying different weights to different parameters for different people as the mood strikes you. If you have a process, that doesn’t happen.

I kind of hate to rip of large chunks of each post like this, but I think the entire context here is important to making my point.

In order to make tournament selections and seedings, the Division III Selection Committee is whittling each team down to a single “number” (it’s not an actual number, but they do have to eventually come up with rankings and decisions). They are coming to the conclusion that team A should be in while team B should be out. This team will host a round, but that team won’t.

As far as the committee members have ever admitted, they don’t actually compute a single number. This has always been disturbing to me. Because if you’re not even trying to treat teams objectively, then there’s no way that each team is getting a fair shake.

Is it possible that a “computer ranking” spits out a perfect tournament field? Probably not. Has the NCAA committee ever done so either? No.

Of course you couldn’t just assign weights and numbers willy-nilly (as it seems they did with the home-road multiplier this past year). You would need to approach the formula in a systematic way.

Take each of the primary criteria. Assign each one a weight. Add them together. Determine how close you want the final number to be before you open up the discussion to secondary criteria. Assign a weight to each one. Add them together.

Take the system and test it out against the previous few years’ final regular season results. Did it produce a reasonable tournament field? Fix any obvious problems. Adjust the weights accordingly. Retest. Assign new criteria if needed. Reweigh. Retest. Repeat.

If at any point you notice a “glaring flaw” in the system you ask yourself: is the system wrong, or does this team just not deserve to be let in?

My dummy computer system predicted 16 “correct” teams out of 18 for the past tournament. My last team in (Keystone) didn’t make it, but one of my first four out (Mary Hardin-Baylor) did. Carleton and Illinois Wesleyan were outliers with my system. This was a formula that was put together without much of the data that the NCAA has at their disposal. It took me probably a total of two hours to make and tweak. It would be easy to improve upon.

I’m not saying to overweight win percentage or underweight strength of schedule or anything of the sort. That’s all for the committee to figure out. Just make sure that the weights apply to everyone consistently.