Using Deep Learning to improve FIFA 18 graphics

Author: Chintan Trivedi





Comparison of Cristiano Ronaldo’s face, with the left one from FIFA 18 and the right one generated by a Deep Neural Network.

Game Studios spend millions of dollars and thousands of development hours designing game graphics in trying to make them look as close to reality as possible. While the graphics have looked amazingly realistic in the last few years, it is still easy to distinguish them from the real world. However, with the massive advancements made in the field of image processing using Deep Neural Networks, is it time we can leverage that to improve the graphics while simultaneously also reducing the efforts required to create them?

Let us try to answer that using the game FIFA 18…

Football (i.e. soccer) being my favorite sport, FIFA becomes the natural game of choice for all of my deep learning experiments. To find out whether the recent developments in deep learning can help me answer my question, I tried to focus on improving the player faces in FIFA using the (in?)famous deepfakesalgorithm. It is a Deep Neural Network that can be trained to learn and generate extremely realistic human faces. My focus in this project lies on recreating the player faces from within the game and improving them to make them look exactly like the actual players.

Note: Here is a great explanation of how the deepfakes algorithm works. Tl;dr version: it can swap the face of anyone in a video with anybody else’s face using Autoencoders and Convolutional Neural Networks.

Gathering training data

Unlike the game developers, I could collect all required data from Google search without having to trouble Ronaldo with any motion-capture fancy dress.

Let us start by looking at one of the best designed faces in FIFA 18, that of Cristiano Ronaldo, and see if we can improve it. To gather the data required for the deepfakes algorithm, I simply recorded the player’s face from the instant replay option in the game. Now, we want to replace this face with the actual face of Ronaldo. For this, I downloaded a bunch of images from Google such that the images clearly show his face from different angles. That’s all that is needed to get us started with the training process of our model.

Model architecture & Training

The deepfakes algorithm involves training of deep neural networks called autoencoders. These networks are used for unsupervised learning and have an encoder that can encode an input to a compact representation called the “encoding”, and a decoder that can use this encoding to reconstruct the original input. This architecture forces the network to learn the underlying distribution of the input rather than simply parroting back the input. For images as our input, we use a convolutional net as our encoder and a deconvolutional net as our decoder. This architecture is trained to minimize the reconstruction error for unsupervised learning.

For our case, we train two autoencoder networks simultaneously. One network learns to recreate face of Ronaldo from FIFA 18 graphics. The other network learns to recreate the face from actual pictures of Ronaldo. In deepfakes, both networks share the same encoder but are trained with different decoders. Thus, we now have two networks that have learnt how Ronaldo looks like in the game and in real life.

2.jpg 5

2.jpg 5

  1. First autoencoder network learning from FIFA graphics

3 2

Second autoencoder network from learning actual pictures

When training using a pre-trained model on other faces, the total loss goes down from around 0.06 to 0.02 within 4 hours on a GTX 1070. In my case, I continued training on top of the original CageNet model that has been trained to generate Nicolas Cage’s face.

Using the trained models to swap faces

Now comes the fun part. The algorithm is able to swap faces by adopting a clever trick. The second autoencoder network is actually fed with the input of the first one. This way, the shared encoder is able to get the encoding from FIFA face, but the decoder reconstructs the real face using this encoding. Voila, this setup just converted the face from FIFA to the actual face of Ronaldo.

4 2

The second network converting FIFA face to real face of Ronaldo


The GIF below shows a quick preview of results from running this algorithm on faces of other players. I think the improvement is astonishing, but maybe I am biased, so you be the judge.


5 1


6What if you could play “The Journey” mode of the game as yourself instead of playing as Alex Hunter? All you got to do is upload a minute long video of yourself and download the trained model in a few hours. There you go, you may now play the entire Journey mode as yourself. Now that’d be some next level of immersive gaming!

Where it excels and where it needs more work

The biggest advantage I feel we get with this approach is the amazing life-like faces and graphics that are hard to distinguish from the real world. All of this can be achieved with only a few hours of training, compared to years taken by game designers with the current approach. This means game publishers can come out with new titles much faster rather than spending decades in development. This also means that the studios can save millions of dollars that could now be put into hiring decent story-writers.

The glaring limitation so far is that these faces have been generated post facto, like CGI in movies, while games requires them to be generated in real time. However, one big difference is that this approach does not require any human intervention for generating results once a model has been trained, and the only thing holding it back is the computation time required in generating the output image. I believe it is not going to be very long before we have light weight, not-too-deep generative models that can run very fast without compromising output quality, just like we now have YOLO and SSD MobileNets for real-time object detection, something that wasn’t possible with previous models like RCNNs.


If someone like me, who has no experience in graphics designing, can come up with improved faces within just a few hours, I truly believe that if game developers were to invest heavily in this direction it could change the face of gaming industry (yes, intended) in the not-too-distant future. Now if only anyone from EA sports was reading this…


Building a Deep Neural Network to play FIFA 18

Author: Chintan Trivedi



A.I. bots in gaming are usually built by hand-coding a bunch of rules that impart game-intelligence. For the most part, this approach does a fairly good job of making the bot imitate human-like behavior. However, for most games it is still easy to tell apart a bot from an actual human playing. If we want to make these bots behave more human-like, would it help to not build them using hand-coded rules? What if we simply let the bot figure out the game by making it learn from looking at how humans play?

Exploring this would require a game where it is possible to collect such data of humans playing the game ahead of developing the game itself. FIFA is one such game that let me explore this. Being able to play the game and record my in-game actions and decisions allowed me to train an end-to-end Deep Learning based bot without having to hard-code a single rule of the game.

The code for this project along with the trained model can be found here:

Mechanism for playing the game

The underlying mechanism to build such a bot needs to work without having access to any of the game’s internal code. Good thing then that the premise of this bot says we do not want to look at any such in-game information. A simple screenshot of the game window is all that is needed to feed into the bot’s game engine. It processes this visual information and outputs the action it wants to take which gets communicated to the game using a key-press simulation. Rinse and repeat.

4 1

4 1

Now that we have a framework in place to feed input to the bot and to let its output control the game, we come to the interesting part: learning game intelligence. This is done in two steps by (1) using convolution neural network for understanding the screenshot image and (2) using long short term memory networks to decide appropriate action based on the understanding of the image.

STEP 1: Training Convolution Neural Network (CNN)

CNNs are well known for their ability to detect objects in an image with high accuracy. Add to that fast GPUs and intelligent network architectures and we have a CNN model that can run in real time.

5 2

5 2

For making our bot understand the image it is given as input, I use an extremely light weight and fast CNN called MobileNet. The feature map extracted from this network represents a high level understanding of the image, like where the players and other objects of interest are located on the screen. This feature map is then used with a Single-Shot Multi-Box to detect the players on the pitch along with the ball and the goal.



STEP 2: Training Long Short Term Memory Networks (LSTM)




Now that we have an understanding of the image, we could go ahead and decide what move we want to make. However, we don’t want to look at just one frame and take action. We’d rather look at a short sequence of these images. This is where LSTMs come into picture as they are well known for being able to model temporal sequences in data. Consecutive frames are used as time steps in our sequence, and a feature map is extracted for each frame using the CNN model. These are then fed into two LSTM networks simultaneously.

The first LSTM performs the task of learning what movement the player needs to make. Thus, it’s a multi-class classification model. The second LSTM gets the same input and has to decide what action to take out of cross, through, pass and shoot: another multi-class classification model. The outputs from these two classification problems are then converted to key presses to control the actions in the game.

These networks have been trained on data collected by manually playing the game and recording the input image and the target key press. One of the few instances where gathering labelled data does not feel like a chore!

Evaluating the bot’s performance

I don’t know what accuracy measure to use in order to judge the bot’s performance, other than to let it just go out there and play the game. Based on only 400 minutes of training, the bot has learned to make runs towards the opponent’s goal, make forward passes and take shots when it detects the goal. In the beginner mode of FIFA 18, it has already scored 4 goals in about 6 games, 1 more than Paul Pogba has in the 17/18 season as of time of writing.

Video clips of the bot playing against the inbuilt bot can be found on my YouTube channel, with the video embedded below.


My initial impressions on this approach of building game bots are certainly positive. With limited training, the bot has already picked up on basic rules of the game: making movements towards the goal and putting the ball in the back of the net. I believe it can get very close to human level performance with many more hours of training data, something that would be easy for the game developer to collect. Moreover, extending the model training to learn from real world footage of matches played would enable the game developers to make the bot’s behavior much more natural and realistic. Now if only anyone from EA sports was reading this…