2006-07-28T17:51:23.630+01:00For the rare few who stumble across this site nearly all my work is published at http://www.beyondtheboxscore.com .... please take a look.
2006-04-02T17:16:34.833+01:00Yikes. It is that time of year again. The clinking of turnstiles, the sweet scent of freshly cut turf, the gentle sizzling of hotdogs, and the spinning of cotton candy: all symbols of spring blending into summer and the dawn of a new baseball season. Spring training is over, trade talk is put to one side, and every fan of every team is united in hope, and sometimes expectation, of a precious World Series victory (unless you support the Royals). Also the start of the new season brings with it a plethora of predictions: high brow newspapers, respected journals, random blogs and, come to think of it, almost every punter who has ever shown a hint of interest in our National Pastime. Given that I haven’t jumped on this particular bandwagon yet, I guess it is time for the prediction / guess / speculate (delete as appropriate) baton to pass to me.I confess that I have left this article a little late, what with opening day a matter of only hours away! So rather than regurgitate the usual dross that is written about who’s going to win this or that pennant, I’ll do something a little different. I have taken $300 from my own pocket and bet on the losingest (is that really a word?) team in each division. Over the course of the season we’ll see how well I do and how much, if any, money I make. Sounds fun, right. Well, I thought so too until I handed 300 big ones over to some slightly overweight, cigar chomping bookmaker. So, who did I plump for? Here is the list, with odds:AL East: Orioles 3/1AL Central: Royals 1/6AL West: Mariners EvensNL East: Nationals 2/1NL Central: Reds EvensNL West: Rockies 1/2Let’s go through each in turn, starting with the AL East. The Os to lose! What is all that about? What about the Devil Rays? I admit, this is a close call, but despite having possibly the finest shortstop in all of baseball in Tejada there is nothing else. The only thing that can save this team is if Leo Mazzone works his magic and make stars of a hitherto ropey pitching staff that has lost arguably its greatest asset in BJ Ryan. In my book the Rays have turned the corner, and with superstar talent coming through (Young, Upton, Crawford, Kazmir to name four) they could surprise everyone. In any case the Orioles’ odds looked too good to ignore.In the AL Central the Royals virtually pick themselves – again! There is not much else to say except that any team which has a blog dedicated to a quest to lose less than 100 games is not in good shape (http://breaking-100.blogspot.com/).Finally the AL West; this is a little more tricky but with only four teams I’ll use a process of elimination. The Angels and A’s are genuine contenders, not just for the division but the World Series, so we can ignore them. The Rangers have upgraded their rotation (Millwood) and shipped out the awfully overrated Soriano, so should finish with an even record at least. That leaves the Mariners, who despite having the most exciting young arm in baseball and adding the impressive Johjima, will still struggle. Beltre may regress towards the mean but will still struggle in Safeco’s cavernous outfield, and Seattle’s main acquisition, Jarrod Washburn, has a FIP of 5 – a 2 full points above his ERA last year – indicating that 2005 was a fluke. Someone has got to lose and it should be the Mariners.Phew, the AL summary is over – lets move onto the NL, starting with the Central. Pirates or reds, Pirates or Reds, Pirates or Reds – hmm, I’ll take the Reds thank you very much. Actually it isn’t as difficult as I made out – the Reds ranked bottom in every major statistical category last season, and projections for this season aren’t much different. I’ll bank my evens odds thank you very much.Moving swiftly on the NL East: this is a straight toss-up between the Marlins and the Nationals. Although the Marlins look marginally worse on[...]
2006-03-30T05:59:38.510+01:00The Book: Playing the Percentages in Baseball, is arguably the most important sabermetric publication since the Hidden Game of Baseball over two decades ago. Hyperbole, you might say. Maybe, but consider that the authors – Tom Tango (Tangotiger), Mitchel Lichtman and Andy Dolphin – are probably three of the finest sabermetricians on the planet, and you can begin to understand why I am so excited! Moreover, this book has been two agonizing years in the making.Baseball is a simple game: win games by outscoring your opponents. And you don’t need to watch too much baseball to know that managers will do pretty much anything to eek out that vital victory. That’s because the manager’s job is to make decisions and trade-offs that maximize the win (or run) potential at every possible juncture.But are they actually making the right decisions? This is where The Book comes in. Using a variety of statistical techniques, and a truck load of data, the authors set out to debate some of the many myths which managers swear black and blue by. These debates are played out are across a variety of chapters:Batting and pitching streaks Batter / pitcher match-upsClutch hitting Batting orderPlatooningStarting pitchers Relief pitchersSacrifice battingIntentional walking Base running Game theory (responding to your opponent’s actions)Ok, I know what you are thinking. Many of these topics have been discussed before, so what is different about Tango, Lichtman and Dolphin’s approach? Well, amazingly our pen-toting trio manage to break new ground on pretty much every subject. Part of the joy of reading The Book is the feeling of discovering and learning alongside the authors, so I don’t want to give too much away, but here are a few tasty morsels:Sacrifice bunting can make sense in certain situationsDisruptive running has an enormous negative influence on battingPinch hitting for non-pitchers rarely pays offAnd the learning continues across all the debates. The conclusions are summarized in a box entitled “The Book says”, which contain the pithy takeaways that you’d do well to remember and reflect on when you are watching your next game.Some reviews I have read commented that The Book is quite technical in nature. I disagree. Sure, you have to have an aptitude for learning, but the writing is so lucid and exact that a layman with a bit of time on his hands is perfectly capable of picking up the main points.Another huge plus is the inclusion of a “Toolshed” chapter, as well as a detailed appendix on some of the statistical techniques used. In fact, reading through these sections was such an edifying experience that I have found that I consistently returned to many of the ideas to reaffirm my own thinking on the methods used. Topics such as regression to the mean and markov chains are explained succinctly yet with clarity that a book like this so often lacks.The only slightly negative comment I can make is that some of the studies with small sample sizes seem slightly out of place with the overall ethos of the book, and here the authors struggle to establish firm conclusions while still persisting to dive deep into the data. Still, even these analyses are a joy to pore over and reinforce the central concept about drawing accurate conclusions from data.In summary, if you are reading this review then buy The Book. To hear the opinions of three of the most respected sabermetricians in baseball is a joy and a privilege. It sets a standard of work for others to aspire to, and I can only hope that volume 2 isn’t another two years in the making.Buy the book at http://www.insidethebook.com[...]
2006-03-27T05:42:01.453+01:00I thought I'd post some comments on my line drive article last month which appeared on BTF. Based on this feedback I'm going to take another look at line drive using a regression from the mean technique, which is similar to that outlined in Tango et al's book.
2006-03-28T21:37:35.103+01:00Come on, admit it. We have all fantasized about being in charge of our favorite baseball team. Every year when we pick a fantasy team we ponder what life might be like as the General Manager of a ball club, resolving dilemmas like: deciding between A-Rod and Pujols, trading-off pitching and hitting, desperately trying to fill that last roster slot with the best sub-par hitter, wheeling and dealing the middle order, and picking up a rookie on waivers who (hopefully!) goes on to spank 30 homers, but probably doesn't. Even better, what about being an owner? Well, my team, the Atlanta Braves, is up for sale and I'd love to buy them. So how much will it cost?Ultimately a baseball club is a business, just like any other company, and therefore we can value it using similar techniques. The method I'm going to introduce is called Discount Cash Flow (DCF) analysis. Don't be put off by the name, although investment bankers are paid hundreds of thousands of dollars to perfect DCF we will use it in its simplest form.DCF values a business or investment by calculating the present value of all future cash flows. What does this mean? Suppose that I sell you a product, eg, a financial option, which can be sold this time next year for $100. How much would you pay for it today? $100? Well, no - our trusty friend from the 1970s called inflation means that $100 today is worth less this time next year. If inflation is 5%, $100 next year is worth $95 today. Therefore the maximum that you'd pay in this instance is $95 - less if you wanted to make a profit.A business is similar except rather than generating a single payment it spits out cash year after year. If we know how much money our business generates in all future years, and also what the inflation rate is for each year, we can work out the maximum that we would pay for it. Ok, so much for the theory, lets see how it works in practice. We need to work out what the present value of all those future cash flows are. A good starting point is to work out what the current year cash flow is. To do this we need to know revenues and costs. Revenue minus cost equals profit and, for this purpose, cash. Technical note: this excludes capital expenditure (adjusting for depreciation / amortization) and changes in working capital. This is not necessary for a high level exercise like this but is academically accurate if we were doing a full valuation.For a public company, one listed on the stock exchange for instance, annual reports give us all the information we need. However, ball clubs are intensely private companies which are loathe to reveal even a smatter of their financials. One approach is to ballpark (excuse the pun) estimate cash flow. We simply list all the revenues and costs and try to work out the size of each. Let's give this a go for revenue. Revenues include: tickets, concessions, car parking, advertising, TV / radio rights, mechandising to name a few - I'm sure you can probably list many more. For the time being lets consider the first three. The Braves' attendance in 2005 was 2.6m. Estimating the average ticket price at $20 gives $52m sales from tickets. Adding in concessions - say a beer and a hotdog per person (total $10) - gives an extra $26m. Car parking? The Braves have 10,000 spaces. So lets say that 80% are filled up on average for each game. At $10 per car that is another $8m per game. Just from gamedays we have $88m. We could continue to go through this exercise and work out all other revenue sources - though we'd struggle a little with television revenue given that AOL are the current owners. Luckily Forbes produces an annual estimate of revenue for us. The last available data is 2004 where revenue was estimated at $162m. Given our initial gameday estimate of $88m, $162m doesn't seem too crazy. Actually you could argue that it is a little on the low-side given the intricacies of the TV contract (basically the Braves sell its TV rights at below fair value to inflate AOL's [...]
2006-03-26T06:59:22.886+01:00So Barry Bonds is suing the authors of the book (Game of Shadows) which, for those of you who have been comatosed for the last couple of weeks, alleges that he took steroids for breakfast, lunch and dinner.
2006-03-26T06:30:02.550+01:00Firstly, my apologies for the paucity of posts over the last 2 weeks! I have been on vacation in Southern Africa, where it is practically impossible to get a decent internet connection. Anyway, the good news is that I am back in civilization and will be posting on a number of topics over the coming week or two.
2006-03-07T12:38:00.636+00:00I was struck by something a poster said on the scout message board: " (1) how many extra HR did the pitcher allow because as a strikeout pitcher because he pitches up in the zone and therefore allows more FB, and (2) how many extra walks does this pitcher allow?".
2006-02-26T12:27:41.380+00:00For those of you reading the Strategy and Sabermetrics (http://mb3.scout.com/fbaseballfrm8.showMessage?topicID=1166.topic)discussion board, on the strikeout proficiency thread I mentioned that I was going to post some year-to-year correlations. The correlations are between 1994-2004 for the following equations:
2006-02-24T08:34:43.523+00:00If you go back a couple of posts you'll see the analysis (based on the THT Annual 2006) which confirms the hypothesis that giving up a line drive is largely luck. Also in that post I said that I thought that there might be some elite pitchers who had a special ability to prevent line drives. My flawless logic was that if I was on the mound I'm pretty sure that I'd give up a heck of a lot more line drives than Roger Clemens! Well, today I had a bit of time to investigate a little further and here are the results.OK, so what I did was to work out the line drive percentage for all pitchers who gave up more than 40 BIP in both 2004 & 2005. I then allocated a score between 1 to 6 based on where they ranked in line drive percentage. I did this for both 2004 and 2005. If a pitcher had a low line drive percentage he got a 1, if not his score would be closer to 6. Each group is the same size so you can envisage a 6 by 6 matrix representing the distribution where pitchers who gave up few line drives in 2004 AND 2005 would be in the top left and those particularly bad would be in the bottom right. If its not clear hopefully the diagram below might help: At this point you are probably wondering why I am bothering to run this categorisation. Why don't I just run a regression? Remember, what I am trying to detect here is the presence of an elite group of pitchers, hence why I am segmenting. Technically you could say I should be comparing this group with REST of the population and not just the pitchers in the bottom right corner. If we find a difference between the extremes then lets come back to this.So my hypothesis is that you may get an elite group of pitchers who don't give up many line drives and they reside in the top left corner of the diagram. Pitchers in this corner include: Mariano Rivera, Johan Santana, Tim Wakefield, Billy Wagner, AJ Burnett and Jose Contreras to name a few. Not a bad list. But in the other corner there were also some A-list names: Jason Isringhausen, Mark Prior and (ouch) Brad Lidge - hmm my hypothesis looks doomed!!.Anyway, to test this what I did was to run an independent sample t-test of this data using FIP (Field Independent Pitching - developed by Tangotiger) as the test variable. FIP is a good measure of how effective a pitcher is with defence controlled. The two groups were, group 1: where the pitchers had a rank of either 1 or 2 in both 2004 & 2005 and group 2: where pitchers had a rank of ether 5 or 6 in both 2004 and 2005. Everyone else was excluded. Running the analysis it turned out that the test wasn't significant. In other words there was NO difference between the two test groups in their FIP scores therefore disproving my hypothesis.Not a surprise I suppose given the low year-to-year correlations in line drive percentage and the observations above. But I was still curious whether people like Santana and Rivera who distinguished themselves in have a low line drive % in both 2004 and 2005 did occupy an elite group of pitching. I segment the existing groups into 4:Group 1: elite pitchers, ranked 1 in both 2004 and 2005Group 2: semi-elite, ranked 1 and 2 or 2 and 2 Group 3: poor pitchers, ranked 5 and 6 or 5 and 5Group 4: worst, ranked 6 in both yearsI then ran an ANOVA to compare the different samples. And no surprise the test failed - the overall mean of the data was a better fit to the data than the ANOVA model. Why am I boring you with all this? Well the one interesting thing I found was that there was a significant difference between the worst (group 4) and the rest. FIP for group 4 was almost a whole point higher. Now this is probably because the sample size was small (only 10 in group 3 vs 30 in other groups). But if this is confirmed with a larger dataset it opens the possibility that there are some pitchers who simply shouldn't be in the m[...]
2006-02-23T16:21:28.360+00:00Thanks for all you feedback about my review of batted ball data from THT. One question that came up a couple of times is do "flyball pitchers" induce more pop-ups than ground-ball pitchers. I want to tackle this question in a couple of ways. Firstly, lets look at a graph of pop-ups vs. flyballs for all pitchers who have more than 40 batted balls in play in 2005: (image)
There is a clear relationship but the Rsq it is only 0.18 (significant at 0.01 level). This means that only 18% of the variance of flyballs is explained by this model (ie, pop-ups). (In case you were wondering including 2004 data gives a similar correlation).
Actually the correlation could be a little stronger than it first appears. Because both variables are a percentage of balls in play, if say, flyballs increase then there is less "room" for pop-ups to increase. This is why groundballs correlate invesely to fly-balls - if you don't have one you have the other (ignoring line drives).
Nothing suprising so far. Another way to look at this is to ask the contrarian question: do groundball pitchers give up fewer pop-ups than flyball pitchers. Given that flyballs and groundballs make up ~70% of batted balls we can simply categorise pitchers according to whether they give up a lot of groundballs or not. Then we can look for a difference in pop-ups in these two populations. Clear? Lets have a look at how it works in practice.
To categorise pitchers into those that give up groundballs and those that don't I'm simply going to cut the sample of pitchers in half. Those above the mean will go in the "groundball" group; those who aren't go in "other". Then we can run an independent t-test on the two populations to see if there is a significant difference in pop-ups. And, again no surprise, there is a difference and calculating an "Rsq" for this gives 0.63, which as expected is much more pronounced. We could further control for line drives but since we know that (for pitchers) they are largely a random event I have ignored them.
So, what does all this mean? We know from linear weights and the run expectancy matrix that a pop-out is worth almost the same as a strikeout. This poses a wider question. We know from my last post that inducing groundballs is very effective for the fielding team because most of them (75%) are turned into outs and those that are not are predominantly singles. But our analysis here tells us that popups, which lets not forget are as valuable as strikeouts, are the domain of flyballers. I haven't run the analysis but it would be interesting to see if groundball or flyball pitchers have a higher propensity to strikeout. Then we could use batted bull run value data to build the profile(s) of what an elite pitcher looks like.
2006-02-21T21:46:44.323+00:00I know that I'm a little late to the party - The Hardball Times (THT) Annual was released back in December - but I have just finished reading it. If you haven't got a copy can I suggest that you (temporarily) stop reading, go to http://www.hardballtimes.com and order yourself one. I am not going to review the annual in detail except to say that all of the articles are of the highest quality and are extremely well written. What I want to do is spend some time discussing my thoughts on what I consider to be the most interesting part of the book, namely analysis of batted ball types. The boys at THT ordered up a special cut of batted ball data for the last three years from Baseball Info Solutions and carried out all sorts of clever whiz-bang analysis on it. If you are a regular reader of THT, or indeed other blogs like Sabernomics, then this won't be new. But it is only with the advent of THT annual 2006 that I have focused on the potential of batted ball data.Of particular interest were the year-to-year correlations. Using this technique we can determine the extent to which a pitcher (or batter) has control over various events. For example, if we correlate groundballs per BIP (Ball in Play) for the entire population of pitchers in 2004 vs 2005 we get the chart below (thanks to Yahoo Stats Group for the play-by-play data for all charts in here): No surprise. Pitchers who gave up a lot of groundballs in 2004 did so in 2005. The Rsq is 0.5, which says that 50% of performance in 2005 is explained by performance in 2004 - which is reasonably high. That is why we refer to pitchers as groundball pitchers - Tim Hudson comes to mind. Now, where it gets more interesting is if we look at the same chart for line drives. Here it is: As you can see the Rsq is very small: 0.01. In other words whether or not a pitcher gives up a line drive is largely luck. I bet you didn't know that (unless year read THT). Doing the same for batters shows a slightly larger Rsq (~0.1) between one year and the next for line drives. Hitters do show a small degree of skill in hitting line drives. THT Annual 2006 does this (and a lot more) for a range of different batter / pitcher events and I encourage you to have a look.All very interesting but so what, you may ask? To really understand what is going on lets look at another article from, you've guessed it, THT Annual 2006 (no, I am not a contributor). This particular piece works out run value per batted ball above or below an average baseline. Here are some selected examples:Line drive: 0.356 Outfield flyball: 0.035 Groundball: -0.101Strikeout: -0.287What this is saying is that if a batter slams a line drive then it contributes runs for the offense - the highest value event. This is because line drives only result in an out 25% of the time. And, remember, we said earlier that line drives are largely luck! Pretty amazing. Now take groundballs. Hitting a groundball is bad news. That is because it results in an out 75% of the time, and if it doesn't the chances are that you will only get to first base.Two things jump out at me that I want to look at further. Firstly, I want to dig into line drives a little deeper. If we can find some pitchers who consistently prevent line drives more than others then they should be more valuable. I guarantee you that if I was on the mound a lot more than 20% of my pitches will go for line drives. Secondly I want to use this to develop a measuring system for pitchers / batters. Now I know this has been done (check out J'S Bradbury PrOPS metric - http://www.sabernomics.com - all his work is excellent if you have time to peruse), but I am curious if we can find a new part of the player population that has been significantly under-valued. Also I'll continue to explore the batted ball da[...]
2006-02-21T18:49:59.720+00:00Right. It has been a couple of days since I last posted and I promised to give you an update on what to expect from me on this blog over the next few months. Ok, so here goes (assuming that anyone has actually read any of this yet!).
2006-02-21T18:47:20.040+00:00Welcome to my blog on baseball strategy - well, on everything about baseball really, I guess I thought the title "Baseball Strategy" sounded neat, but perhaps not!