Time to start getting your hands dirty with some paint. Digital paint at least.
I’m not going to teach you how to do all the things that is required for digital painting. That’s a whole field in itself, but I would like to show my process for working with style transfer neural networks and offers some tips on things that I think help in the process.
I spoke a little bit in the last post about image generation batch scripting. You can go check that bit out, but I’ll just skip over that and talk about this aspect in a more conceptual and practical level.
Image generation is all about curation. It’s a process of throwing a lot of stuff at the wall and then seeing what sticks. Because it’s trivially easy to generate images due to automation and space, I often times will run the image through 100 different “source-styles”. Generating one image takes maybe 5 minutes but if you let the system run overnight you can easily get all that knocked out in around 8 hours. It works while you sleep.
Picking your style source images.
Keep detail low
You want to think of of your style source images like wallpaper, or texture. Its like a landscape. It’s got to be broad. It can’t be too detailed. One of the reasons I say this is because you’re going to get a lot of details blurred out through the downsizing process. I run my neural network on my home cloud server. It has a 970. It’s a bit dated but I also think you’d probably have similar resolution issues with newer cards. If you increase the size of an image without distorting it’s ratio you have to size up both width and height which means for ever pixel of height you had you also multiplied by the width. So a card would probably have to be multiple times more powerful than the 970 to increase the resolution up much more. That’s why I enlarge the images after the fact because the only way to get usable images is to downsize them to longest side being 500px. The resulting images can be scaled up by Waifu2x but the image that goes in has to be pretty small first.
That process loses detail. So broad images are good. I think it’s good to get an image that doesn’t feel like it is too strong any particular subject in it’s composition.
Balance of high and low contrast areas
You want something that has kind of a lower contrast low noise interrupted by areas of sharp transition in hue, value, and saturation.
Color can be mapped into 3 dimensions. Hue (the color like red or green), saturation (how much color versus grey there is), and value (the darkness or brightness of the color). Here’s a 3D visualization toy/tool you can play around with to understand it better, or a 2D one if you prefer. This article also has some great 3d visualization tools you can play with and gives a more in depth explanation of the different ways in which color variables can be mapped in 3D space.
High contrast areas are going to be ones in which colors which are next to each other have higher variance along one of these axis.
Value contrast is by far the most important, probably because the majority of your eyesight comes from rods rather than cones. Rods are more numerous and sensitive, but don’t detect color*, while cones detect color and value. Value contrast can most easily be seen in HDR photography, where the camera takes a wider range of dark and light values through multiple exposures and compresses them to retain a greater amount of value contrast information.
Hue is simply what you think of as color normally. Red, blue, green etc. The contrast between hues is generally greatest between complementary colors. Orange and blue provide good contrast against one another, which is why it used to be so common in film posters.
Your eyes do not perceive all colors equally well. This is referred to as relative color luminosity. Colors like red, and blue are perceived as “darker” due to them being on the edges of our color vision. This is because our lowest frequency perceptions of light start at red, and go up to blue. Purple is a color we see due to interference between the lowest frequency receptors (red) and the highest frequency receptors(blue). In some sense purple is an illusion. Though I think the interference might be inherent even if we were capable of seeing beyond blue into ultraviolet light because ultraviolet light is roughly 800THZ, making it basically an “octave” higher than red at 400THZ. (also neat article on some of the history of overlap between color theory and music theory here).
Besides aiming for complementary colors you can also try to pair colors with high and low color luminosity. Red and Blue have a certain amount of contrast because they’re pretty far apart from each other on the spectrum but they’re both perceived with low color luminescence which means they can get sort of muddy. (think how the American flag would look without the white bars and stripes to break things up.)
Saturation contrast is generally less used but it’s quite striking when it is, such as in Sin City or Schindler’s list. Saturation contrast is pretty significant but comes with the problem of having trouble mapping semantically to your source content images. If you have an image that is gray with blue flowers and you try to apply that as a style to a portrait then you might end up with a face with a bunch of random blue splotches on it because the neural network is trying to follow differences in form and edges so it doesn’t really understand foreground/background and object differentiation.
A good reference point for what kinds of images you should be looking for can be best described in terms of color blocking. Color blocking is a thing done in fashion inspired by the work of Piet Mondrian. A strongly color blocked scene is going to areas of flatness butted up against other areas of flatness which also have high contrast to their neighbors.
Some contrast but not too much
It’s hard for me to reliably say how much or which kinds of contrasts are best because there are so many additional variables, but I think you can look at two extremes.
Highly “patterned” images with extreme and distinct contrasts between areas can overwhelm a content image, such as in this Lisa Frank tiger pattern below.
Insufficient contrast (or contrast which gets averaged out due to it being small in scale, such as in noise) can give insufficient texture information for the neural network to properly transfer style. (this is often true for images which are photographic in nature, as reality tends to be a lot more gradient than abstracted illustration).
That is not to say that these kinds of images are not still useful! They can generate very interesting textures or color fields which can then be brought back into the final painting later on in light touches.
To understand how this works I think you have to actually look at what the neural network project says in their paper on the subject. It’s a neural network trained to recognize texture, and it’s being tasked with successively transforming the image to resemble the style source image. It’s not transferring the essence of the art†.
In some ways I think it’s helpful to think of the texture of the source style image as being “wrapped around” the source content image.
This is why I think impressionism tends to be such a popular choice for these neural networks. Impressionist painters tend to offer you a nice rhythm of areas of noise and areas of emptiness.
Picking your source content images
It’s a good idea to have a strong single subject for source content images. This is again due to resolution issues. Landscape like images are good for source style images because they are broad, won’t overly bias the style being transferred too much, and don’t lose detail as much clarity when their resolution is low. For these same reasons landscape type images are poor content images. The neural network has trouble “wrapping” the style around the content if the content doesn’t have a strong composition. Portraits, still lives, things which have a singular subject give more clues for the neural network to wrap around. Landscapes mostly work as source content images when there’s exceptionally sharp blocking and I don’t see that as often other than maybe at the horizon line. When you use a overly broad sweep of an image, what you get back will more resemble noise.
I think it’s important to explain one of the things I find most fascinating about this artistic process. Neural networks are algorithmic in a sense and performed using a computer doing math, but it’s completely unlike any generative art we’ve ever had before. More traditional generative work is a human artist putting in data to a box ‡ which then spits some kind of result. It’s one direction. Using neural networks as I do is more like a two way classical dialectic than a one way transformation.
Image generation through the batch process can result in unpredictable results This is a powerful creative tool because it can inspire new ideas
The thing I would probably more relate the neural fusion process like is similar to the “games” surrealists used to create their work from some kind of primal unconscious. One of their more famous games is called “Exquisite Corpse”. A huge sheet of paper is folded up. One person will draw on one half of the paper, and then hand it off to someone else who hasn’t seen what they’ve drawn. The fold the paper such that the other person can see a couple of the starts of lines etc but they can’t actually see the picture, and then they have to draw their own picture in such a way that it melds with all the lines on the edge.
Then you unfold the paper and you get a weird crazy image.
Another example is the “cut ups” that Beat Generation writers like Jack Kerouc or Williams S. Burroughs would make. They’d write a bunch of words and then they’d have to write a story using a bunch of those words re arranged.
Kinda like the experience of writing a doc with copy paste in an office document program, or maybe like using those magnet word tiles you see on fridges.
Working with the neural network is a lot like that. You have a partner who is over the side of the wall and you can’t see them, but you can kinda hear them a little bit, and maybe they speak a different language, but you can kinda interact. Working with a neural network feels a lot more to me like improv comedy than it does using a filter.
You curate the source images and your partner transforms them in unexpected way. The next step then is to actually take those resulting images, curate them, layer them together in Photoshop, and selectively pull in and out different aspects of these paintings while also adding your own bespoke touches.
The basic layer structure is simple.
It’s basically just the original photo with a bunch of different NN renders laid over top, with selectively revealing and hiding different parts of the renders using layer masks. On top of all that you put in some key details to “sell” the painted look and help keep things clean, enhance the art direction etc.
To run through it with an example:
Original source photo on the bottom.
Then a version of the photo that has been edited using photoshop. The primary goal of this layer is to create a denoised non-photographic layer that can act as a base. You can create this layer with a combination of filters. You can do it however you want but my approach is to start stuff out with filter>pixelate>facet and then one of the filters from the filter galleries like drybrush.
Then I find the most important details that have been lost to this process and I user a layer mask to hide those areas so that the photo shows through. The difference is subtle but important.
Then just layer up all the neural-network outputs as you like on top. I’ve isolated them each here below so you can see what they look like.
The more you are using the neural-network output for detail the higher up it will probably be in the layer structure. The more “broad” the output render the lower down it will be. This is pretty basic though, it’s not a hard and fast rule or anything. Just use your brain and don’t have things on top that should be hidden behind other layers.
This isn’t a linear process either. It’s a lot of hopping around and pulling in and out various layers while also sometimes creating new layers to do custom detail work on. Here’s what all the hand painted detail looks like in isolation:
Neural networks provide “specificity plus abstraction” that photoshop filters do not. This is something I discussed in my first article on this subject. Adding custom detail is the last step in this neural network painting project and it’s a vital one because it allows for specificity which is beyond what even artificial neural networks provide. It uses the most powerful neural network of all! The human mind! It’s the thing that makes all the difference between “neural network as a fancy Photoshop filter” and “neural network as an assistive tool for artists”.
In addition to providing an additional higher level of specificity that the artitificial neural network is not capable of on it’s own, at this time, adding in custom detail gives the artist another step in which to inject their own expressive subjectivity. As I said in my first article, the ability to abstract and maintain specificity is part of what makes a piece of art look “human”. This is because choices must be made for what information is eliminated and what information is emphasized. In the same way that you participate in a dialogue with the neural network to pull out and push back various “versions” of the picture through the use of masks to selectively hide and reveal, so too do you further express your own subjectivity through the choices you make for what details to manually paint in, and which you choose to remove.
Some examples of detailing you might want to consider using at this stage is:
- pull out foreground from background to emphasize different elements, increase perceived depth, etc
- Sharpen/highlight edges to better define the form
- smooth over weird textures that throw off the wholeness and solidity of an images form
- sharpen eyes (and other facial features) and add eye-light
Finally with the details added to the neural network layers together:
Sum up and the Future
- Pick good style source images
- Treat your curation process of the images like it’s a dialogue with a creative partner
- Selectively pull in and out the image results according to this dialogue process
- Lock up tight the final fit and finish of the piece with manual hand detailing that supports the subjective viewpoint that the piece has produced
Whew this has been a big series for me. I’ve got a lot more articles in progress on the subject of neural networks, technology, art, and the intersection of art and technology. I am going to change my approach though away from these multipart series into something which is more agile and which I can get updated more often.
I’m learning a bunch of stuff super fast, and trying to cram everything into a single monolithic manifesto is too slow a way to work. I do have a lot of stuff related to this that I want to talk about though, so if you’re interested in keeping up on the subject, please subscribe to the newsletter!
As a teaser of what comes next, I’m thinking of trying to do some proof of concept work using neural networks to “photobash”. Photobashing is a digital painting technique often used in the concept art world to generate imagery extremely fast. I think there’s a lot of possibility here for using neural networks as a part of this process. So expect some more fantastical pieces soon that aren’t just straight ahead portraits!
- *Actually not totally true as apparently your rods can detect some blue light at extremely low light conditions. Neato!
- †I have an essay I want to write about this more specifically but I’ll forgo it for now.
- ‡This is a bit simplistic but I’ll cover this issue in a later blog specifically about generative art and it’s relationship to neural networks.