Reproduce cartoon image

Pretty interesting app for today's users.
type: applicationlevel: medium

Today, major social platforms such as Tiktok, Facebook, Instagram, ... to attract users to access, they are constantly improving in terms of interface and features for users to use such as beauty applications when taking photos, recording videos, livestream, .... In which the application converts portrait (or outdoor) images into new images in different artistic styles depending on the preferences of the user.

In this article, we will introduce to you a simple application related to this image conversion by the author group Jie Chen, Gang Liu, Xin Chen. To do this, they used the artificial intelligence algorithm "Generative Adversarial Networks'' combined with training on datasets in different art styles. Figure 1

Figure 1.illustrating images.

Generative Adversarial Networks (GANs)

How GANs work

GANs are a type of generative models, which observe many sample distributions and generate more samples of the same distribution. Other generative models include variational autoencoders (VAE) and Autoregressive models.

The GAN architecture

There are two networks in a basic GAN architecture: the generator model and the discriminator model. GANs get the word “adversarial” in its name because the two networks are trained simultaneously and competing against each other, like in a zero-sum game. The generator model generates new images. The goal of the generator is to generate images that look so real that it fools the discriminator. In the simplest GAN architecture for image synthesis, the input is typically random noise, and its output is a generated image. Figure 2

Figure 2. Generator input and output.

The discriminator is just a binary image classifier which you should already be familiar with. Its job is to classify whether an image is real or fake. Figure 3

Figure 3. Discriminator input and output.

Putting it all together, here is what a basic GAN architecture looks like: the generator makes fake images; we feed both the real images (training dataset) and the fake images into the discriminator in separate batches. The discriminator then tells whether an image is real or fake. Figure 4

Figure 4. The method.

Reproduce Cartoon Image

Based on the special architecture of GAN, the authors have used datasets with many different art styles combined with an improved GAN algorithm called AnimeGANv2, they have successfully designed an application that can convert, change the image, bring a relative sense to meet the needs of the user. This is also the foundation for developing our related projects.


With 3 basic datasets collected from famous cartoon movies, in different painting styles. 1. The style of artist Miyazaki Hayao from the movie "The Wind Rises": Figure 5

Figure 5. The style Miyazaki Hayao.

  1. The style of artist Makoto Shinkai from the movie "Your Name & Weathering with you": Figure 6

    Figure 6. The style Makoto Shinkai.

  2. The style of artist Kon Satoshi from the movie "Paprika": Figure 7

    Figure 7. The style Kon Satoshi.

The Method

AnimeGANv2 uses layer normalization of features to prevent the network from producing high-frequency artifacts in the generated images. However, AnimeGAN is prone to generate high-frequency artifacts due to the use of instance normalization, which is the same as the reason why styleGAN generates high-frequency artifacts. In fact, total variation loss cannot completely suppress the generation of high-frequency noise. Instance normalization is generally regarded as the best normalization method in style transfer. It can make different channels in the feature map have different feature properties, thereby promoting the diversity of styles in the images generated by the model. Layer normalization can make different channels in the feature map have the same distribution of feature properties, which can effectively prevent the generation of local noise.

Figure 8

Figure 8. The Method AnimeGANv2.

The network structure of the generator in AnimeGANv2 is shown in Figure 8. K represents the size of the convolution kernel, S represents the step size, C represents the number of convolution kernels, IRB represents the inverted residual block, resize represents the interpolation up-sampling method, and SUM means the element-wise addition. The generator parameter size of AnimeGANv2 is 8.6MB, and the generator parameter size of AnimeGAN is 15.8MB. AnimeGANv2 uses the same discriminator as AnimeGAN, the difference is that the discriminator uses layer normalization instead of instance normalization.


The author has designed a simple app to show the effect of the model, you can refer to it here Figure 9

Figure 9. App illustration.
In addition, the model not only works well for portraits, but is still very dark for real-life scene transitions.
Figure 10. The picture of the environment.


[1] Jie Chen, Gang Liu, Xin Chen "AnimeGAN: A Novel Lightweight GAN for Photo Animation." ISICA 2019: Artificial Intelligence Algorithms and Applications pp 242-256, 2019. [2] Sylvain Combettes,” A basic intro to GANs (Generative Adversarial Networks)”.Oct 26, 2020