- A quick example
- Emojis instead of photographs
- The people behind the emojis
- Per category
- Bald emojis
- Bearded faces
- Curly hair
- Face contour
- Sad facial expressions
- Santa Claus
- Skin colour
- Surprise expressions
- Wrinkles and expression lines
- Massaging hands
- Health workers
- Male teachers (and more glasses)
- Old age
- Young age
- Steamy pictures
- Head accessories
- Left and right weight
- Per category
- Limitations and biases
- The code
- Thanks for reading
There has been a lot of improvement in GANs 1 in the last years. One of the many uses has been to upscale blurry images.
The GAN 1 generates a realistic face based on the training on all the faces it has seen. This, which looks like an enhance CSI joke, is of course not reconstructing the original image that information is not there in the low resolution image). It is extrapolating how an image could look like.
A quick example
Emojis instead of photographs
We know this has decent results — like in the example above — with downscaled photos. How good would it be with stylized characters? Will it interpret the features correctly?
We have run the generator on all the emojis. From al the ones recognized as a face the generation has been performed and included in the image below.
The people behind the emojis
We ran every emoji, for different platforms, through the GAN 1. These are the results for all emojis recognized as a face 3. The emojis that were not recognized are ommited. Following, a selection of interesting examples.
As with other examples, not a single bald face was generated while generating images for this article. We can see in the following images how the shape is similar, but it is always — in all the images we generated — covered with hair.
Beards also do not seem to be generated. Below we can see examples of how they are interpreted as shadows or double chins.
We can see how caps and accessories are not often present in the training dataset. A cook’s hat has been interpreted here as white hair, with pretty convincing results.
We can also notice how the moustaches are interpreted as “reserved space”, or as a slighly more hairy area, but not as a full moustache or bear, an indication that probably they were also absent in training. Same space is reserved in the toque’s white area. This produces an ample forehead.
Curly hair from emojis is not correclt reproduced. It could be because the curly hair in an emoji is overly stylized and unrealistic, or for lack of training examples. We can only speculate about it. Belowe some examples:
The results are similar for different hair and skin colours:
A few more elves examples below. We can see the lower ear lobes have been reintepreted as hair or long earrings. The reptile-like pattern of the shirt is surprisingly consistent. The ears are — as it would be expected — pointy.
Sad facial expressions
Wrinkles and expression lines
In the following examples the hands around the head seem to have been interpreted as part of the face, producing a long forehead. The relaxed faces produced reflect very well th emojis expression.
In 2 out of 3 graduated emojis the Neural network has generated glasses that were not there in the emoji (!). I find that startling. Might be that we are reading too much into it.
Male teachers (and more glasses)
Here we have the opposite case as in the previous example: Glasses present in the emoji have disappeared in the generated faces.
In addition to this, we can notice how the blackboard in the background shapes the haircur of the generated faces.
This is a selection of some of the best images generated from an image representing an old person.
Some of the emojis generated images with younger characters. One specific style of emoji generated all the younger-looking images. I ignore if this is due to the random seed, or if something in the colours makes it more favourable.
To my surprise, the GAN 1 recognized very consistently images where there was a glow or a steam, and generated accorddinly diffuse images.
Crowns are probably not part of the training set — my wild guess. Instead of appearing in the resulting image, they seem to worsen the lighten conditions of the generated image.
Diadems have a similar result as crowns. They are also not generated.
The generated images never generate a hair accessory — including headscarves. The rest of the generation is still believable.
Most turban-wearing emojis seem to generate more skin surface. This translates in big foreheads and receding hairlines.
Left and right weight
The original image can influence if the generated image will be a frontal or a three quarter angle. We can see a good example of this in asymmetric source images, such as the emojis in the tipping hand and raising hand category. From the generated images they are more likely — not always — to weight more on the side where the raised hand was.
With the vampire emojis as source we observe an interesting effect: it seems ears and teeth get shaped into the source image shape, while other attributes — eye colour — do not adapt as easily.
Here is a summary of the main failures we ran into when feeding emojis as input. Probably some of these do not occur when using photographic images and others do.
Limitations and biases
There are always biases based on the data used for training. This is a limitation of learning by example. While not a criticism, it is useful to be aware of what the biases of a specific model are. The authors of pulse very wisely acknowledge these biases 5. In this article this has less relevance, since we are not ven using photographies but stylized icons — emojis — as an investigation, which will have their own biases in how they are stylized.
In this case we can speculate that the training did not include many cases of:
- dark-skinned people
The training set does not seem to include very young people or babies.
- hair other than wavy or straight
- bald people
- red hair
Moustaches produce a bigger space and some hair, but nothing too dense. I would guess training data consists mostly of clean-shaven or hairless subjects. In addition to this, we are using emojis as input — and not photographies — and it is very likely in most emojis the stylized version of hair has a very slight semblance to real hair.
- helmets, caps, hats, headscarves, turbans, crowns, diadems
accessories, except earrings and glasses
The faces generated are trying very hard to smile. Sometimes they result in a quirky half smile. This might be a cultural phenomenon in the training data — smiling when a photo is to be taken.
If you would like to reproduce the examples above or play with new ones.
Cloning the repo
git clone https://github.com/adamian98/pulse
The pulse repository 6 has instructions to install dependencies with
conda. If you prefer
virtualenv the following might be useful.
cd pulse # install dependencies sudo apt-get install python3.8-dev sudo apt install libpython3.8-dev virtualenv -p /usr/bin/python3.8 newenv3 ./newenv3/bin/pip install certifi cffi chardet cryptography \ cycler idna intel-openmp kiwisolver matplotlib mkl install numpy \ olefile pandas pillow pycparser pyopenssl pyparsing pysocks \ python-dateutil torch pytz readline requests scipy tk torchvision \ tornado urllib3 wheel zstd dlib
There are two main scripts we need to use:
align_face.pysets all the images in the input folder in the right format and downscales them to the desired size. The less resolution — more downscaling — the more room the GAN 1 has to reconstruct the high resolution image.
run.pygenerates the prediction and — optionally — saves the intermediate steps. This can be useful to generate animations such as the one picturing Lena at the beginning of this article.
Running (16px downscale)
# align face in image and downscale to resolution (16px) ./newenv3/bin/python align_face.py -input_dir 'lena_input_folder_16px' -output_size=16 # by default, downscaled images go to the `pulse/input`folder # you might clear that of other images # make the prediction and output intermediate stesps ./newenv3/bin/python run.py -output_dir='output_16' -save_intermediate -steps=200
Running (32px downscale)
# align face in image and downscale to resolution (32px) ./newenv3/bin/python align_face.py -input_dir 'lena_input_folder_32px' -output_size=32 # by default, downscaled images go to the `pulse/input`folder # you might clear that of other images # make the prediction and output intermediate stesps ./newenv3/bin/python run.py -output_dir='output_32' -save_intermediate -steps=200
Default folders and options
As stated above, some folders — like
pulse/input are specified by default.
We cans see a list of all the default options and folders in the original source code:
I hoped you liked it. This was a selection of the generated images. You can check all the emoji-source — generated image pairs in the github repository:
Thanks for reading
I want to make clear we do not have access to the training data. When we refer to the training data containing or not containing examples, we are only talking about the likelihood and abundance of it based on the results we generated for the source emojis. This is made explicit in some instances. Sometimes — for the benefit of the reader — we do not repeat it. For those cases please assume this meaning. ↩ ↩2 ↩3