Skip to Main Content

Introducing Role Discovery: Generating Data-Driven Roles In Elite Professional Football


By defining roles within a club’s game model, recruitment analysts can directly compare players who fit the parameters of each role more objectively. In this article, Stats Perform outlines how AI-derived models, powered by event data, can automatically detect the on-field roles of players competing in leading competitions around the world.


By: Andy Cooper

Taking A Different Approach To Technical Scouting

During the past decade, a scouting department’s process for identifying, monitoring and assessing recruitment prospects has continuously evolved as new technology, powered by video and data, has become available.

However, has this resulted in any noticeable change in how a club actually defines their assessment criteria for players? Within departments, positional KPIs will have been defined based on the game model, which are measured against a player’s performance outputs. The KPIs can also be applied to position-specific templates for scouts to subjectively assess players, adding a further layer of context.

But whilst these processes are undoubtedly more focussed and rigorous compared to those in place 10 years ago, there is one major question which has not been properly addressed: how reliable is it to assess players based on the KPIs of a specific position?

Today, players and their positions continue to be labelled in the same way as they were 25 years ago: full-back, centre back, centre forward and so on. However, when comparing recruitment prospects playing for different clubs in the same ‘position’, many will possess fundamentally different qualities, which make it difficult to assess them objectively against the same KPIs.

This is where Stats Perform’s new Role Discovery concept comes in. Role Discovery is a new method of assessing players, pioneered by our AI team, which addresses the limitations of assessing players based on position. Instead, it focusses on profiling players based on their role in a team.

Through applying unsupervised learning processes, Role Discovery allows recruitment analysts to evaluate players within a wider team style: grouping prospects together based on the key tendencies and characteristics required to fulfil each role within a game model.

In this piece, we are going to run through some of the key components behind Role Discovery, which will enable clubs to consider a new approach to technical scouting going into next year.

Stage 1: Identifying Each Player’s Spatial Tendencies, In The Context Of Playing Style

In total, Role Discovery detects 18 different types of role all over the pitch, working from 29 different player clusters. The starting point for role detection is identifying the most common start and end location of a player’s passes, to generate spatial heat maps.

The graphic below shows the three most common areas on the pitch where two centre forwards, Robert Lewandowski and Roberto Firmino, played their passes from during the 2018/19 season.

Although both players are labelled as centre forwards, it is clear from these heat maps that both players fulfiled different roles for their respective clubs. Lewandowski’s most common pass location was the central areas around the edge and within the box. In contrast, Firmino played a lot of balls from the half-space inside the right-hand channel.

However, this information alone provides limited insight – and it lacks any context of match situations. Therefore by applying the Stats Perform Playing Styles framework to a player’s touches, we can begin to understand how each player influences a specific phase of play.

This highlights further stylistic differences between Lewandowski and Firmino. The charts below show each player’s cluster of ball touches when their team is engaged in a specific style, compared to other players in the top five leagues.

Compared to the average, Firmino’s cluster is most heavily involved in the Counter Attack phase and also features strongly in the Sustained Threat phase. Lewandowski’s cluster is below the average in Sustained Threat, indicating that his role isn’t as involved in his team’s possession sequences in the attacking third when compared to other roles.

Stage 2: Quantifying The Quality Of A Player’s Contributions In Possession

Having established the areas of the pitch where a player most frequently gets on the ball, and their contribution to specific Playing Styles, Role Discovery then applies Stats Perform’s Possession Value (PV) Framework to determine how a player’s involvement during possession sequences increases their team’s probability of scoring during the next ten seconds of play.

Possession Value assigns credit to players based on positive and negative contributions, covering key on-the-ball events including passing, dribbling and crossing, as well as defensive actions such as tackles and interceptions.

The table below lists the players who recorded the highest PV+ output per 90 for Liverpool in the 2018/19 Premier League season, highlighting the progressive contributions of Mo Salah, Andrew Robertson and Firmino in increasing goal probability.

Another new metric applied to analyse a player’s use of the ball is Expected Pass Completion (xP). This model takes into account various factors, including the distance and angle of a pass and other contextual elements of a possession move, to establish the probability of a pass being completed.

By applying this metric, we can obtain a better understanding of a player’s tendencies in possession – to identify which players attempt and complete a high proportion of penetrative passes which, from a probability perspective, are more difficult to complete.

To incorporate this data into the Role Discovery model, we split the pitch into different zones, to highlight the location and xP completion rate for a player in each zone, to establish where they are making key contributions for their team.

By combining PV and xP, we can assess the potential reward for a player making a pass, set against the potential risk, and compare how they perform against other players.

Returning to our Lewandowski and Firmino comparison, the graphic below shows each player’s performance when playing passes into a central attacking midfield zone during 2018/19.

Firmino’s xP completion rate of above .75 indicates that he plays safe, high probability passes into this zone. At the same time, he is also increasing the probability of his team scoring, by being in the top 10% percentile for progressive PV.

Lewandowski, being a different type of striker, is nearer the average for xP completion rate, and is in the bottom 10% percentile for PV, reinforcing the fact he is involved in fewer possession chains and is more likely to be taking a lots of shots at goal, making him the final player in a chain. However this doesn’t tell the full story, which we will elaborate on later in the article.

Stage 3: Establishing How A Player Is Involved In Possession Sequences

To further identify how a player interacts with his teammates, Role Discovery incorporates another new model, Movement Chains, which identifies a team’s most common passing patterns that generate the highest PV on the pitch.

Movement Chains are labelled based on passing motifs that comprise four passes, which can help identify how a team progresses the ball and the players involved.

As with xP, to analyse Movement Chains we split the pitch into zones, so that we can identify the most frequent patterns, using clustering, of how a team moves the ball out of one zone and into another. As illustrated below, the starting zone could be the central area of the defensive third, with the end zone being the edge of the opposition box.

Movement Chains can also be applied in the context of Playing Styles, to identify how teams move the ball when engaged in a specific style. The example below shows Liverpool’s most dangerous chains during 2018/19 that ended in the opposition penalty area when they were engaged in Build-Up, highlighting how they play through the inside right and inside left channels to penetrate the box.

Stage 4: Applying Each Model To Create Roles

To identify an individual player’s involvement in their team’s possession chains, we apply a Chain2Vec model, inspired by Word2Vec, to identify the chain clusters that a player frequently appears in, their context and how the player interacts within a single passing motif. This advanced modelling through Deep Learning techniques provides a compact representation of player and team involvement.

Alongside this, the player touch maps are applied to provide each player’s spatial context information, as well as the Playing Style, xP and PV models. In addition, a breakdown of the type of open-play shots that a player takes and their location is included.

All of these outputs are then applied to the final Role Discovery model, using a clustering algorithm, Gaussian Mixture Model, to learn groupings and separate out players into distinctive groups.  From these clusters, we are then able to use data-driven descriptions to condense over 400 individual player descriptors into 18 separate roles across the pitch.

These roles are labelled as follows (click to enlarge):

Returning to Lewandowski and Firmino, having established that each player fits into a different role, we can now compare them to players in their respective roles rather than making direct comparisons.

The graphic below displays the same passing comparison we used previously, but this time we have isolated the players in their respective roles, which are highlighted in orange.

In the case of Firmino, when compared to all players in the top five leagues he ranks in the top tenth percentile for xP completion rate, but when compared only to other ‘Attacking Creative Threat’ players he is much closer to average.

With Lewandowski, when only compared to other ‘Advanced Forwards’, we can see that his passes are much safer compared to the average and whilst his progressive PV contribution is still below average, the numbers are not as extreme as when compared to all players.

Conclusion: Making Player Comparisons More Objective

By defining roles, recruitment analysts can apply Role Discovery to directly compare players who fit the parameters of each role more objectively. This can streamline the scouting process to ensure that players, who play in the same position on the field, are not assessed like-for-like but instead are analysed in the context of the role they fulfil.

The elements which make up the model encapsulates four key recruitment considerations:

  • Where does a player play on the pitch? – Spatial footprint
  • When do they make their key contributions? – Playing Styles engagement
  • What is the risk-reward profile and quality of their key contributions, compared to others in the same role? – xP and progressive PV outputs
  • What types of passing moves is a player involved in? – Movement chain involvement

In addition, Role Discovery can also be used as an aid to identify players who possess the key attributes for a different role or position to where they currently play for their club. This would help analysts identify the likes of a young Gareth Bale or Philipp Lahm, examples of players who have evolved from being full-backs at the start of their careers to playing in roles higher up the pitch.

Case studies, providing examples of how Role Discovery can be applied to identify replacements for a specific role will be published on the Stats Perform website later this year, together with some examples of analysis derived from models powering the concept, including xP and Movement Chains.

If you would like to know more about Role Discovery or have any questions relating to what we have developed so far, please do get in touch with us here.