This is the first in a series of articles explaining how artists can use neural networks like DeepStyle to make art.
Let’s start with the basics. What are neural networks? Neural networks are a computational approach to solving problems that is modeled off the structure of the human brain. Rather than taking an input and giving a single output, (x+y=z) they take in a large set of inputs and run those inputs through a large set of nodes/neurons that transform those inputs and give a single output (a,b,c,d, e, is output as “Cabde). You throw a bunch of data through the nodes and you’re not going to get a very coherent output. This is how neural networks are different than traditional computing. They learn from their mistakes. You throw in data, you get out junk. You tell the computer that it’s output is warmer or colder, and then it tries again. It does this millions of times until it gets a balance of nodes that gives a rough approximation of a “correct”answer. This is super useful for the kinds of tasks that humans have usually excelled out while computers have not. Things like categorization, object recognition, speech recognition, or more “intuitive” guesswork.
Here’s a good video explaining the subject:
Also a shoutout to @stephencwelch for such a great video explaining the subject. One of the “human” things that neural networks are making progress in is art. Let me introduce you to DeepStyle:
Essentially what Deep Style does it has been trained to recognize objects and texture. You give it a photo, and you give it a painting, and it makes the photo look like it was painted in the style of the painting. You’ve probably seen the apps in the App Store that use this technology, calling the different painting effects “filters”. This process is very similar to a Google project called “Deep Dream” which you can read about here:
Deep Style isn’t just a Photoshop. It’s a lot more intelligent than that, but like Photoshop it should be looked at as a tool for artists. The development of art is intrinsically linked to the development of technology. Impressionism came out of new scientific discoveries in optics. The invention of amplified electric instruments lead to rock’n’roll.Technology creates and defines the canvas, but we have to decide what to put on there.
I will admit that having been trained in a traditional oil painting fine arts tradition kind of has given me a certain amount of leeriness towards “cheating”. As I’ve gotten older though I learned about the long history of “mechanical reproduction” in art. Going back to things like the camera obscura, all the way to modern “photobashing” people have been finding ways to “cheat” and produce work faster and more accurately. Ever seen those super impressive videos on YouTube where people draw those ultra realistic portraits just by drawing across the board? That’s actually a mechanical reproduction technique not unlike tracing. For the longest time I couldn’t figure out how people could just move their hand linearly across the board filling in the details like a printer, but that’s actually part of the trickery of the time lapse.
So Deep Style can be used as a tool, like any other artistic tool. I wanted more flexibility than downloading an App with presets would give me, so I partitioned my computer hard drive, and installed Linux to run the neural network. You can download an implementation of the neural network on GitHub here .
I’m not a computer scientist. I’m an just an artist who finds tech interesting. Right now, neural networks are so niche that they’re mostly only used and understood by Silicon Valley engineers and computer science academics. I think the basic concept of a neural network is actually quite intuitive, it’s modeled after our own brain structure after all, but despite the intuitiveness of it, it still has a bit of a barrier to entry for non-technically inclined people. I hope by sharing my experiences and knowledge that I will help other artists understand neural networks and find cool ways to implement them in their own workflow.
First a couple of terms:
Content image: This is the photo you want to transform. If I were trying to make a dog look like it had been painted by Monet, the dog would be the content image.
Style image: This is the image you are deriving the style from. So in the case of the previous example, Monet would be the style image.The possibilities left to experiment with neural networks, even just Deep Style itself are massive, but I’ll focus on one particular set of techniques I’ve been experimenting with. I call these paintings “neural fusion paintings”
Basics of the technique: I run an image through a neural network under a bunch of different parameters, and using different source style images. Sometimes from the same artist, sometimes multiple artists. Then I layer those styled images over the original photo and use a layer Mask in Photoshop to selectively reveal and hide different parts of the style images. Then I have a final layer where I touch up some of the details and blend different parts of the image.
This neural fusion technique is useful for a couple of reasons. First, I have a pretty hefty computer (I also do VR art and development), but even my computer has trouble processing these images. Neural networks take a massive number of simultaneous computations. This means they need a really good graphics card. I have a GTX 970 with 3.5 gigs of VRAM and that means I can get, with optimizations, output images that are about 600–800 pixels in width.
People have come up with some solutions, like slicing up the images into smaller pieces, and then applying the neural network to each piece and then sewing them back together, but this technique results in a “flat” look that loses a lot of the object recognition skills that make Deep Style so convincingly like a painting.
With the neural fusion technique though, some of the lack in resolution can be disguised through your manual blending layer in Photoshop. Another advantage to the neural fusion technique is that the hand-painted detail layer in Photoshop can make up for some of the lack of information resolution that the neural network can access on a smaller graphics card. The bigger the graphics card, the higher res the image that it can hold in vRAM, and therefore the more texture details it can extract from it. If you resize a picture of a person down to 100px by 100px then the photo is so small that you can’t really meaningfully extract any useful information about the facial recognition or text etc. It’s just a bunch of splotchy color squares. The apps you can get in the App store run on servers that are probably more powerful than your home computer, which means they can do some pretty rad stuff even if you don’t get the same control as a home installation.
The very talented artist, Fredrick Nolting has done some awesome work using the output from apps as a starting place for his expressionistic paintings:
There are lots of possibilities, and as GPU power goes up, the possibilities for artists will go up. This is a rich vein to explore. It won’t just be an Instagram filter, an automatic conversion of a photograph. It will provide a tool to transform visual material for the artist to sculpt and bend to their ends.
There are two aspects to “painterly” graphical representation that need to be taken into account. Noise and Specificity. Real life is naturally noisey. Light and color have very complex interactions with the surface of a material and the way they bounce around a room. There’s lots of tiny variations from speck to speck. When we say that a piece of CGI look “fake” what we’re usually grabbing on to is an insufficiently complex simulation of light on the surface, resulting in a simplistic, homogeneous “plastic” reflection of light. (remember, polymers are basically just a repeating chain of molecules so the way light interacts with it is going to be fairly consistent compared to the way it would interact with a complex wabi-sabi organic surface like skin).
“Un-noisey” rendering style might be preferable though! As argued by Scott McCloud, the emotional strength of art is often tied to it’s ability to simplify and abstract the world.
The process of running images through Deep Style naturally lowers their surface noise. Even though the resulting images can sometimes be harder to make out, or might look more chaotic, on a pixel by pixel level they’re generally more consistent. You can actually do this in Photoshop to some degree using it’s “Artistic” or “Facet” filters but this will end up looking like a filter because they lack the second element.
Specificity is in some ways at odds with decreasing noise. As you decrease the overall noise, you often decrease distinction, the sharpness dividing a foreground object and it’s background. It sort of smooths out details and makes everything a little “smudgey”. This makes a picture look less like a real painting because it looks programmatic. It essentially created without reference to the object depicted. A Photoshop filter doesn’t know that it’s applying an Artistic filter to a face, or a dog, or a landscape. It just sees a grid of colors. Neural networks are able to bring greater specificity in the way they denoise and apply texture which creates a more accurate illusion of being painted by hand. It seems to carry with it intention rather than being the result of an anonymous algorithm.
The places that the neural network have the most difficulty with, and consequently suffer the most at lower resolution are in facial features and fine edges, areas of high specificity. This is another advantage of the neural fusion technique. Details that might otherwise get smudged out, like individual hairs, eyelashes, or pupils, can have their specificity brought back in through a hand painted detail Photoshop layer.
Common issues are smudges between surfaces (a face might kind of “bleed” into the background behind it) or a rippling effect where there should be a consistent flat or curved surface (but the neural network doesn’t know that it’s a surface. While your eyes know what a hand is, and the curve of a hand, the neural network is just working with a bunch of pixels and trying to find the average) You can selectively reveal some of the original photo to help here but the texture and color will often not match properly and look a little weird because photographs are generally higher noise than the intended painted look. Using some of the Artistic filters in Photoshop along with hand painting generally can help out with this issue if you do choose to use the photograph directly in the painting. Having figure drawing experience is a big help here, even if in many ways the bulk of the work is being taken care of for you by the neural network.
There’s more I could dive into here, like the technical details of setting up the neural network for non-technical people etc but I think I’ll bring this article to a close for now. I would love feedback! If you’re an artist working with neural networks, let me know! If there’s something unclear or that you’d like to know more about ask me! Here’s a bonus gif to show how the layers are built up:
Below you can also see some other portraits I’ve made. I’m going to be expanding on this process very soon, branching out from portraits, and exploring some additional techniques, so be sure to subscribe to the newsletter and stay tuned!
Also published on Medium.