Mapping the Color Dynamics of La La Land with the Python Image Library (PIL) and Scikit Learn

The colors of La La Land contribute largely to the visual spectacle of the movie. Not only that, the colors seem to have a deeper meaning to them as well. Some articles on the web try to discuss what the colors mean. Here, what I want to do is to actually plot the color dynamics of La La Land in a single image – a color spectrum over time. Given this color spectrum, an analysis of the movie’s colors can be done a little bit better.

For this project, I used a 1080p video file of La La Land. I sampled the movie for a frame every two second. In total, I used 3,020 movie frames for the analysis.

In terms of tools, I used the Python Image Library (PIL) to process the movie frames and the Scikit Learn library to implement K means clustering.

Frame to Color Strip: Python Image Library and Scikit Learn

Given a single frame of the La La Land movie file, what we want to do is obtain a distribution of the constituting colors and turn it into a color strip. If we do this to all frames in the movie, we can stitch all of those together and obtain a color spectrum.

The following chart shows the procedure to go from a movie frame to a color strip.

Flowchart to Plot the Color Dynamics of La La Land with the Python Image Library (PIL) and Scikit Learn

Loading and Clustering

The first thing we have to do is to load the movie frame into Python. We can do this using the Python Image Library (PIL). After converting the frame into a PIL Image object, we obtain an array of the RGB (Red, Green, Blue) constituency of all the pixels in the image.

The next thing we do is to apply a machine learning clustering algorithm called K means clustering in order to reduce the RGB color space of the image into a small set of colors. In this case, we reduce the image to 50 colors. Check out my other article, K-Means Clustering Explained, to learn more about the algorithm. I’m going to use the Scikit Learn implementation of the K means clustering algorithm.

You may be thinking, why do all this?

The colors in the color strip of each movie frame have to be sorted because we want to have continuity and consistency across color strips – so that colors from one color strip to the next flow nicely. The problem is, there is no easy way to sort colors. Color is a 3-dimensional quantity, whereas sorting is a 1-dimensional problem. If we sort the raw RGB colors alone, we get a very noisy and unnatural sorting. However, if we compress the color space to a small set of colors, say 50 colors, we are able to reduce noise and have some consistency.

RGB to HLS and Color Sorting

After reducing the color space to a set of 50 colors, we change color representation from RGB to HLS (Hue, Luminosity, Saturation). Why? Because we want to sort the colors on hue, which seems to be the best option for what want to do. There are a lot of possible ways to sort colors, but simple RGB sorting doesn’t really work well. You can read more about sorting colors in Alan Zucconi’s wonderful article.

HLS to RGB and Saving

We convert back to RGB and build a histogram of the colors in the frame. Lastly, we make an Image object of the color strip and save it as a png file.

Color Dynamics: Strip to Spectrum

Now that we know how to process each movie frame, we can do the process for each frame of the movie. Doing so, we get a huge set of color strips.

We have to stitch the color strips together. We can do this using the Python Image Library’s paste method.

The finished product is a spectrum of the movie’s color dynamics.

In Conclusion

We were able to use the Python Image Library and Scikit Learn modules in order to transform the La La Land movie file into a spectrum that shows the color dynamics over time. Check out the finished product in The Color Dynamics of La La Land.

Leave a Reply