The Model Charting England’s Evolution Into World Cup Contenders

Key takeaways

– England’s exceptional batting line-up compares favourably with several legendary World Cup winning teams.

– England’s bowling unit is comparatively weak compared to their rivals at the World Cup.

– Based on domestic 50-over cricket, James Vince is the outstanding candidate to replace Alex Hales as the reserve top-order batsman.

– Jofra Archer’s performance level in the T20 Blast, IPL and Big Bash is comparable to the elite fast bowlers playing in those leagues, although the skillset required differs to those in the 50-over format.

It’s beyond cliché to describe this summer as a big one for English cricket. The early domestic season serves as the starter, with elevated scrutiny given the potential opportunities for top-order batsmen in the Test team and the dessert course of an Ashes series on the horizon in late summer. The main sustenance though will be the One-Day International (ODI) World Cup that takes place across England and Wales from the 30^th May-14^th July.

England enter the tournament not only as the number one ranked team by the official ICC rankings, but have got there through an aggressive and exciting brand of cricket. In 2016 they set a new record for the highest innings total in an ODI of 444 before smashing their own record in 2018 when scoring 481 against Australia. Prior to this World Cup cycle, England had passed 350 twice; they’ve done it 13 times since May 2015. This is a different England team to what has gone before.

The following article presents team and player performance ratings from models being developed by the OptaPro data science team. The aim is to give an overview of England’s strengths, weaknesses and selection options as they begin their final preparations for the World Cup and confirming their squad.

Quantifying team strength

International cricket is a challenging environment from a modelling and analysis point-of-view. The significantly unbalanced schedule, wide-range of conditions across the five continents regularly hosting fixtures and the array of venues has a substantial impact on performance and form. The regular tinkering with regulations, as well as the evolution and capability to score ever more quickly makes historical perspectives difficult. Furthermore, with amateurs feasibly lining up against stars earning millions every year, there can be a chasm between the relative ability of teams.

Performances and results can fluctuate hugely, for example on 27^th February this year, the England team scored 418 runs in 50 overs for the loss of six wickets, ultimately beating the West Indies by 29 runs. Just three days later on 2^nd March, England were bowled out for 113 in 28.1 overs, ultimately losing by seven wickets and with almost 38 overs remaining in the West Indies innings. Quantifying performance levels given such factors and variability represents a difficult task.

However, cricket is a sport rich with accumulated knowledge and wisdom; conditions in Australia and Bangladesh are generally different, as are those in England and the West Indies. Certain venues have a reputation for being hospitable for batsman, while others provide encouragement to the bowling contingent. While teams don’t play each other across a systematic schedule, the regular match-ups do provide information on their relative strengths and weaknesses especially in multi-match series. While the weaker ‘Associate’ members don’t play a huge number of matches each year, we can make broad assumptions about their ability relative to the established nations to inform our method.

Such information is ideal for the formulation of a Bayesian modelling approach that can take this prior wisdom and utilise it to determine the impact of the factors that may affect performance. We use PyMC3 to define a hierarchical Markov Chain Monte Carlo (MCMC)model to determine the strengths of international cricket teams over individual World Cup cycles; these 3-4 year cycles are a compromise between increased sample size at the expense of real smaller scale changes in performance. Similar approaches have been used previously in the academic literature to rank Test match batsman.

In terms of inputs for the model, we used completed ODI matches that were not subject to weather-enforced alterations to the playing conditions in a match since the 1979 World Cup. Matches prior to this were less uniform in terms of their playing conditions with the number of overs and balls per over differing in many instances. During the 1970s, the clearest predictor of the match winner was whichever team was batting second as the ability to set a target required presented a significant barrier, placing undue influence on the toss of a coin. The ratings and analysis presented below consider only matches between World Cups (i.e. they do not include World Cup matches themselves, as our aim is to quantify the strength of teams going into a tournament).

We determine separate batting and bowling strengths for each team as these are broadly speaking independent disciplines. As defined here, fielding ability would be encapsulated in a team’s bowling strength.

Further details on the modelling approach are included at the end of the article.

England health check

In terms of results from the model, we determine team strengths and express them as innings run totals against an ‘average’ team at a neutral venue. In order to compare across eras, the ratings are given as if a team was playing in the current World Cup cycle.

Over the 40 years and 10 World Cup cycles considered in the analysis, the top five batting teams are composed of the 2003 and 2007 Australia teams, the 2011 India team and the 1987 West Indies side. On the bowling side, the top five is dominated by the West Indies from 1983-1996 with their formidable and intimidating fast bowling attacks, with the 1996 South Africa unit sneaking in at four.

The figure below illustrates England’s ratings over time, with higher run totals signifying a better batting team and lower innings run totals associated with better bowling teams. Keen-eyed readers will have noted that only four teams were listed in the top-five batting teams; the omitted team was the current England line-up who sit second in our historical ratings. The sustained excellence of India’s batting is reflected in them taking up the sixth and seventh spots.

According to our historical ratings, England’s batting has generally been around average to above-average. The uncertainty in these ratings is also illustrated and we can be quite confident of the exceptional nature of this England team both compared to their peers and previous England teams. It has been clear for some time that this England team is both very different to their forefathers and at the vanguard of modern cricket, which our ratings illustrate and quantify.

On the flip-side, England’s bowling raises a cause for concern with our best estimate of their performance level over the past four years being that of an average team. Of the teams who have qualified for the 2019 World Cup, only West Indies and Sri Lanka are seen as clearly weaker with England’s bowling sitting alongside Bangladesh. Our best estimate is that South Africa and India have the strongest bowling units going into the competition.

Much of the narrative heading into the tournament has been around England’s bowling attack, which our ratings quantify explicitly relative to their peers. On the basis of these ratings, there is certainly a case for considering alternative options.

Quantifying batting ability

In order to gauge performance levels of individuals, we adapted the team strength model to study batsman and bowlers with a focus on matches since the 2015 World Cup.

For batsman, the model considers runs scored in an innings and the strike-rate in that innings. The ideal traits of an ODI player are the ability to score heavily and quickly and such a setup will isolate better batsman that demonstrate such skills while accounting for factors like home advantage, opposition and venue. We do not explicitly prescribe the relative importance of runs scored and strike-rate in the model in determining batting ability; examining the relationship between the batsman ratings and runs scored and strike-rate reveals that the former is more strongly related to the overall rating, although strike-rate is clearly an important facet.

The outputs of the model are illustrated below with individual players who’ve feature in ODIs from 2015-2019 represented by the markers according to their most frequent batting position over the period. Broadly speaking there is a decline in the batsman rating moving down the order – the model doesn’t consider batting position as a feature so is blind to this expectation.

An interesting aspect of the ratings is that due to the consideration of both runs scored and strike-rate, lower order batsman can have a relatively high rating compared to top-order batsman that is typically related to them scoring quicker than average. This shouldn’t be interpreted as such players being necessarily more capable than their top-order peers, just that they are performing well in the context they typically play in.

The figure highlights England’s players that have featured over the period, as well as Virat Kohli who is the player with the highest rating. What is clear from the figure is that England’s top order is exceptional, while their lower order is very capable as well. The quality and depth of England’s batting drives their performance rating outlined by the team strength model presented above.

Identifying potential replacements for Hales

Alex Hales’ exclusion means that there is a surprise opening in the squad to cover the top order batsman. Hales’ record has been excellent, regularly providing the aggressive approach that England prize. The settled nature of England’s batting line-up has meant that there have been relatively few opportunities for those now being considered to replace him. James Vince and Ben Duckett have had just 5 and 3 innings respectively to impress in ODIs and are thus difficult to judge in that context.

Based on a preliminary version of the model for domestic 50-over cricket, Vince is the outstanding candidate over the past 2-4 years having scored heavily and quickly. Hales and Duckett’s Nottinghamshire team-mate, Ben Slater and Warwickshire’s Sam Hain would be the other clear candidates based solely on domestic performances. It should however be noted that the relatively low number of domestic matches makes such ratings more uncertain.

No matter how England proceed, they have lost a proven and capable international player in Hales and should his replacement be thrust into the middle during the World Cup they will not have the same international track record to draw upon.

Quantifying bowling ability

For bowlers, the model considers economy-rate and wickets taken, while accounting for opponent and venue. Broadly speaking the ideal ODI bowler would be a regular wicket-taker and have a low economy-rate, which the model attempts to isolate. Similarly to the batsman ratings, we do not prescribe the relative importance of these factors in determining bowling ability but it appears that wicket-taking and economy-rate are relatively even in terms of the ratings.

The outputs of the model are illustrated below and are grouped by the bowling action of the player. For bowlers, lower values signify better players and the axis is thus reversed to reflect this. Rashid Khan is the clear stand-out bowler with his incredible wicket-taking ability combined with a very low economy-rate. Jasprit Bumrah is rated as the premier pace bowler.

In addition to Khan and Bumrah, England’s bowlers are highlighted with Adil Rashid, Chris Woakes and Liam Plunkett the stand-outs, with capable support from Moeen Ali relative to other off-spinners. The rest of the bowling options under consideration are seen as relatively average or poor, although the ratings do not currently account for the stage of an innings when the bowler is typically employed. Such ratings however do suggest that England’s consideration of alternative options may be a wise move.

The merits of Jofra Archer

Much of the focus in the build-up to the tournament has been on the prodigiously talented fast bowler Jofra Archer, who is now being available for selection upon fulfilling residency requirements. Archer has been included in the expanded squad for their final preparations for the World Cup which begin with a single ODI against Ireland tomorrow, followed by one T20 match and five ODIs against Pakistan. Archer wasn’t named in England’s initial 15-man squad for the tournament but they have the opportunity to alter this up until 23^rd May.

Archer’s reputation has largely been forged in various 20-over competitions including the IPL and Big Bash. Based on a preliminary model of bowlers in the T20 Blast, BBL and IPL over the past two years, Archer ranks sixth among fast bowlers indicating that his performance level has been excellent. The number of matches he has played means that his rating is more robust than many of his peers. Bumrah is again the highest ranked fast bowler and Archer compares favourably with him, as well as Kagiso Rabada who features highly in the ODI ratings as well. One caveat in terms of translating 20-over skills to the longer one-day format is that the model more strongly weights economy-rate as the signifier of bowling ability due to wickets being relatively harder to come by in the shorter format.

Archer will get an extended audition in the pre-tournament matches and if his 20-over exploits translate well to ODIs, then England’s bowling attack will get a welcome and timely boost.

Conclusions

The team and player performance ratings outlined here provide both room for optimism around England’s prospects at the World Cup as well as areas for concern. The batting unit is exceptional and its depth is extraordinary in comparison to the other contenders. In a historical context, the team’s batting is comparable to several legendary World Cup winning teams and has the capacity to dominate their opponents or chase down what would generally be seen as imposing totals.

The main concern for England is certainly their bowling attack. The setup and length of the tournament likely means that the odd misfiring batting performance won’t be terminal in terms of qualifying from the group stage. However, if the batsman were to fall below their usual standards in a crucial match and post a score in the 200-250 range, there would be serious questions over the team’s ability to defend such a total. Considering their bowling options in the lead-up to the tournament certainly appears the prudent course of action.

England’s overall improvement over the past four years has been remarkable with their bold and exciting cricket combined with home advantage putting them in a better position to win a World Cup than any previous England team in our analysis. This is a different England team to what has gone before.

*Further model details

The likelihood functions that are solved via the MCMC approach use the number of runs scored in an innings and the number of wickets that fall and consider home advantage, venue plus the identity of the batting and bowling team. First and second innings are separated so that the target run-rate can be included as a variable in the model for the second innings; this provides additional context for a given match with higher first innings totals on average leading to higher second innings totals. The number of overs in the innings is also included as a variable to account for run-rates, which is particularly important in the second innings when a team’s innings ends due to them passing their target score.

Innings run totals are modelled according to a negative binomial distribution, which is a special case of the Poisson distribution but with greater variation or ‘dispersion’ to reflect the significant variability prevalent in cricket. Wickets lost is modelled as a Poisson distribution.

Effectively, the model setup will isolate stronger batting teams as those that post higher run totals and lose fewer wickets while considering their opponent and the other factors included in the model. On the flip-side, stronger bowling teams are those that concede fewer runs while taking more wickets.

In order to account for the evolution of the game over time, we define changes relative to global mean run totals and wickets taken that can vary across each World Cup cycle. The goal here is to capture changes in regulations e.g. power-play rules, as well as underlying tactics and approaches e.g. the advent of ‘pinch-hitters’, ODI specialists or Twenty-20 matches. We observe relatively stable innings run totals up-to the 1996 World Cup, then a rise over the next two cycles before another rise in the late-2000s and early part of this decade, culminating in greater run totals since the 2015 World Cup.

Separate to such changes, we attempt to isolate changes in conditions at venues over time across each decade; such an approach is a compromise between sample size and real short-term changes at a venue. Using a longer time horizon also decouples the venue adjustment somewhat from the World Cup cycle adjustment, which aids separating these potentially highly correlated factors. For most venues, changes are relatively small and/or uncertain in magnitude, although there are clear trends at grounds such as Headingley, which was a particularly challenging environment for batsman in the 1980s ranking in the toughest 2.5% of venues. Since the turn of the century however, it has become a much more hospitable with above average run totals being posted after accounting for all others factors in the model.