Image Aesthetics Assessment

2020

Machine learning is awesome. It provides a powerful set of techniques to make data-driven decisions for tasks that require humans input. One of such tasks is image aesthetics. For most people it is easy to tell which images are more beautiful than others. However, it’s hard to tell what specific rules the images follow. It’s a subjective concept, perfect for supervised machine learning. In this article I’m going to explain what neural network is and how we use it to solve the mentioned problem.

🔎

A neural network is a computational model inspired by the structure of the human brain. It contains simple information-processing units called neurons that are organized into layers. Here’s an illustration of a simple neural network.

neural network

The output of one neuron is an input to another neuron. Each of the connections has a weight associated with it. The weight of the connection represents its relative importance. Each neuron implements a two-stage calculation to map inputs to an output. First, it computes a weighted sum of inputs to the neuron. Next, the result is fed into a non-linear function known as an activation function. When we are designing a neuron, we can use many different types of activation functions. Currently ReLU is the most popular type.

neuron activation

In short training a neural network is searching for the best set of weights. To do that we need input data together with an expected output. The process starts with randomly initializing the weights. Then it iteratively updates the weights of the network in response to the errors the network makes comparing to the expected output. After the network is trained, we can evaluate new data.

🌁

Convolution neural network (CNN) is designed for image recognition tasks. The key point to understand how they works is to understand convolution. The convolution is a operation of filter with matrix of image to extract some visual feature. The example below shows the result of using edge enhance filter.

neuron activation

Convolution allows to find all the locations in the image where the specific visual feature occured. The output of convolution process is known as feature map. The filters are determined automatically by learning from the trainging data. In cat or dog classification problem the filters will identify visual features determining the result, e.g. nose size or ear shape. In CNNs convolution is followed by two other processes: nonlinear activation function (ReLU) and pooling. Pooling is responsible for reducing the spatial size of feature map and extracting dominant features invariant to translations, transformations and distortions.

neuron activation

Standard convolutional neural network has two stages. First stage is series of convolutional layers with activation function followed by pooling layers. Second stage is fully connected artificial neural network where every neuron in the previous layer is connected to every neuron in the next layer. Before second stage we have to flatten our pooled feature maps into a one huge dimensional vector.

💻

To be continued…