Generating faces from emojis with stylegan and pulse
- A quick example
- Emojis instead of photographs
- The people behind the emojis
- Per category
- Artifacts
- Babies
- Bald emojis
- Bearded faces
- Brides
- Cooks
- Curly hair
- Earrings
- Elves
- Face contour
- Sad facial expressions
- Santa Claus
- Skin colour
- Surprise expressions
- Wrinkles and expression lines
- Massaging hands
- Health workers
- Glasses
- Male teachers (and more glasses)
- Old age
- Young age
- Steamy pictures
- Head accessories
- Left and right weight
- Vampires
- Skulls
- Nonhuman
- Failures
- Per category
- Limitations and biases
- The code
- Coda
- Thanks for reading
There has been a lot of improvement in GANs 1 in the last years. One of the many uses has been to upscale blurry images.
The GAN 1 generates a realistic face based on the training on all the faces it has seen. This, which looks like an enhance CSI joke, is of course not reconstructing the original image that information is not there in the low resolution image). It is extrapolating how an image could look like.
A quick example
We use the image of Lena 2 and downscale it with a script. Check the end of the article for all the code.
Emojis instead of photographs
We know this has decent results — like in the example above — with downscaled photos. How good would it be with stylized characters? Will it interpret the features correctly?
We have run the generator on all the emojis. From al the ones recognized as a face the generation has been performed and included in the image below.
The people behind the emojis
We ran every emoji, for different platforms, through the GAN 1. These are the results for all emojis recognized as a face 3. The emojis that were not recognized are ommited. Following, a selection of interesting examples.
(For more detailed examples, check the Per category; for a selection of failures click here)
Per category
Artifacts
Babies
Bald emojis
As with other examples, not a single bald face was generated while generating images for this article. We can see in the following images how the shape is similar, but it is always — in all the images we generated — covered with hair.
Bearded faces
Beards also do not seem to be generated. Below we can see examples of how they are interpreted as shadows or double chins.
Brides
Cooks
We can see how caps and accessories are not often present in the training dataset. A cook’s hat has been interpreted here as white hair, with pretty convincing results.
We can also notice how the moustaches are interpreted as “reserved space”, or as a slighly more hairy area, but not as a full moustache or bear, an indication that probably they were also absent in training. Same space is reserved in the toque’s white area. This produces an ample forehead.
Curly hair
Curly hair from emojis is not correclt reproduced. It could be because the curly hair in an emoji is overly stylized and unrealistic, or for lack of training examples. We can only speculate about it. Belowe some examples:
The results are similar for different hair and skin colours:
Earrings
Elves
A few more elves examples below. We can see the lower ear lobes have been reintepreted as hair or long earrings. The reptile-like pattern of the shirt is surprisingly consistent. The ears are — as it would be expected — pointy.
Face contour
Sad facial expressions
Santa Claus
Skin colour
Surprise expressions
Wrinkles and expression lines
Massaging hands
In the following examples the hands around the head seem to have been interpreted as part of the face, producing a long forehead. The relaxed faces produced reflect very well th emojis expression.
Health workers
Glasses
In 2 out of 3 graduated emojis the Neural network has generated glasses that were not there in the emoji (!). I find that startling. Might be that we are reading too much into it.
Male teachers (and more glasses)
Here we have the opposite case as in the previous example: Glasses present in the emoji have disappeared in the generated faces.
In addition to this, we can notice how the blackboard in the background shapes the haircur of the generated faces.
Old age
This is a selection of some of the best images generated from an image representing an old person.
Young age
Some of the emojis generated images with younger characters. One specific style of emoji generated all the younger-looking images. I ignore if this is due to the random seed, or if something in the colours makes it more favourable.
Steamy pictures
To my surprise, the GAN 1 recognized very consistently images where there was a glow or a steam, and generated accorddinly diffuse images.
Head accessories
Crowns
Crowns are probably not part of the training set — my wild guess. Instead of appearing in the resulting image, they seem to worsen the lighten conditions of the generated image.
Diadems
Diadems have a similar result as crowns. They are also not generated.
Hats
Headscarves
The generated images never generate a hair accessory — including headscarves. The rest of the generation is still believable.
Helmets
Red hair
Turbans
Most turban-wearing emojis seem to generate more skin surface. This translates in big foreheads and receding hairlines.
Left and right weight
The original image can influence if the generated image will be a frontal or a three quarter angle. We can see a good example of this in asymmetric source images, such as the emojis in the tipping hand and raising hand category. From the generated images they are more likely — not always — to weight more on the side where the raised hand was.
Vampires
With the vampire emojis as source we observe an interesting effect: it seems ears and teeth get shaped into the source image shape, while other attributes — eye colour — do not adapt as easily.
Skulls
Nonhuman
Failures
Here is a summary of the main failures we ran into when feeding emojis as input. Probably some of these do not occur when using photographic images and others do.
Limitations and biases
There are always biases based on the data used for training. This is a limitation of learning by example. While not a criticism, it is useful to be aware of what the biases of a specific model are. The authors of pulse very wisely acknowledge these biases 5. In this article this has less relevance, since we are not ven using photographies but stylized icons — emojis — as an investigation, which will have their own biases in how they are stylized.
In this case we can speculate that the training did not include many cases of:
- dark-skinned people
-
Different ages
The training set does not seem to include very young people or babies.
-
hair
- hair other than wavy or straight
- bald people
- red hair
-
beards, moustaches
Moustaches produce a bigger space and some hair, but nothing too dense. I would guess training data consists mostly of clean-shaven or hairless subjects. In addition to this, we are using emojis as input — and not photographies — and it is very likely in most emojis the stylized version of hair has a very slight semblance to real hair.
- helmets, caps, hats, headscarves, turbans, crowns, diadems
-
accessories, except earrings and glasses
-
non-smiling faces
The faces generated are trying very hard to smile. Sometimes they result in a quirky half smile. This might be a cultural phenomenon in the training data — smiling when a photo is to be taken.
The code
If you would like to reproduce the examples above or play with new ones.
Cloning the repo
git clone https://github.com/adamian98/pulse
Installing dependencies
The pulse repository 6 has instructions to install dependencies with conda
. If you prefer virtualenv
the following might be useful.
cd pulse
# install dependencies
sudo apt-get install python3.8-dev
sudo apt install libpython3.8-dev
virtualenv -p /usr/bin/python3.8 newenv3
./newenv3/bin/pip install certifi cffi chardet cryptography \
cycler idna intel-openmp kiwisolver matplotlib mkl install numpy \
olefile pandas pillow pycparser pyopenssl pyparsing pysocks \
python-dateutil torch pytz readline requests scipy tk torchvision \
tornado urllib3 wheel zstd dlib
Running
There are two main scripts we need to use:
-
align_face.py
sets all the images in the input folder in the right format and downscales them to the desired size. The less resolution — more downscaling — the more room the GAN 1 has to reconstruct the high resolution image. -
run.py
generates the prediction and — optionally — saves the intermediate steps. This can be useful to generate animations such as the one picturing Lena at the beginning of this article.
Running (16px downscale)
# align face in image and downscale to resolution (16px)
./newenv3/bin/python align_face.py -input_dir 'lena_input_folder_16px' -output_size=16
# by default, downscaled images go to the `pulse/input`folder
# you might clear that of other images
# make the prediction and output intermediate stesps
./newenv3/bin/python run.py -output_dir='output_16' -save_intermediate -steps=200
Running (32px downscale)
# align face in image and downscale to resolution (32px)
./newenv3/bin/python align_face.py -input_dir 'lena_input_folder_32px' -output_size=32
# by default, downscaled images go to the `pulse/input`folder
# you might clear that of other images
# make the prediction and output intermediate stesps
./newenv3/bin/python run.py -output_dir='output_32' -save_intermediate -steps=200
Default folders and options
As stated above, some folders — like pulse/input
are specified by default.
We cans see a list of all the default options and folders in the original source code:
For align_face.py
:
For run.py
:
Coda
I hoped you liked it. This was a selection of the generated images. You can check all the emoji-source — generated image pairs in the github repository:
Thanks for reading
Footnotes
-
https://en.wikipedia.org/wiki/Lenna A commonly-used test image for image processing. ↩
-
by
align_faces.py
. see Running the code ↩ ↩2 -
I want to make clear we do not have access to the training data. When we refer to the training data containing or not containing examples, we are only talking about the likelihood and abundance of it based on the results we generated for the source emojis. This is made explicit in some instances. Sometimes — for the benefit of the reader — we do not repeat it. For those cases please assume this meaning. ↩ ↩2 ↩3
-
Biases were addressed in an updated section of the paper https://arxiv.org/pdf/2003.03808.pdf ↩ ↩2