Optimal Playing Position Prediction in Football Matches: A Machine Learning Approach
Abstract
Deciding optimal playing position can sometimes a challenging task for anyone working in sport management industry, particularly football. This study will present a solution by implementing Machine Learning approach to find and help football managers determine and predict where to place individual existing football players/potential players into different positions such as Attacking Midfielder (AM), Defending Midfielder (DMC), AllAround Midfielder (M), Defender (D), Forward Winger (FW), and Goalkeeper (GK) in a specific team formation based on their attributes. To aid in this identification process, it may be beneficial to understand how a player’s playstyle can affect where a player will be positioned in a team formation. The attributes used in facilitating the identification of the player position will be based on Passing Capabilities (AveragePasses), Offensive Capabilities (Possession, etc), Defensive Capabilities (Blocks, Through Balls, Tackles, etc), and Summary (Playtime, Goals, Assists, Passing Percentage, etc). The data that will be analysed upon will be scrapped manually from a popular football site that present football players statistics in a structured and ordered manner using a scrapping tool called Octoparse 8.0. Afterwards, the data that has been processed will be used to create a machine learning predictor modelled using various classification algorithms, which are KNN, Naive Bayes, Support Vector Machine, Decision Tree, and Random Forest ,coded using the Python programming language with the help of various machine learning and data science libraries, further enriched with copious graphs and charts which provides insight regarding the task at hand. The result of this study outputted in the form of the model predictor’s evaluation metric proves the Decision Tree algorithm have both the highest accuracy and f1-score of 76% and 75% respectively, while Naïve Bayes sits the lowest at both 69% accuracy and f1-score. The evaluation has prioritized validating and filtering algorithms that have overfitting in copious amounts which are evident in both the KNN and Support Vector Machine algorithms. As a result, the model formed in this study can be used as a tool for prediction in facilitating and aiding football managers, team coaches, and individual football players in recognizing the performance of a player relative to their position, which in turn would help teams in acquiring a specific type of player to fill a systematic frailty in their existing team roster.

