PixelNN: Example-based Image Synthesis

Aayush Bansal, Yaser Sheikh, Deva Ramanan

teaserImage
teaserImage

Abstract

We present a simple nearest-neighbor (NN) approach that synthesizes high-frequency photorealistic images from an "incomplete" signal such as a low-resolution image, a surface normal map, or edges. Current state-of-the-art deep generative models designed for such conditional image synthesis lack two important things: (1) they are unable to generate a large set of diverse outputs, due to the mode collapse problem. (2) they are not interpretable, making it difficult to control the synthesized output. We demonstrate that NN approaches potentially address such limitations, but suffer in accuracy on small datasets. We design a simple pipeline that combines the best of both worlds: the first stage uses a convolutional neural network (CNN) to maps the input to a (overly-smoothed) image, and the second stage uses a pixel-wise nearest neighbor method to map the smoothed output to multiple high-quality, high-frequency outputs in a controllable manner. We demonstrate our approach for various input modalities, and for various domains ranging from human faces to cats-and-dogs to shoes and handbags.

Paper



PixelNN: Example-based Image Synthesis.

A. Bansal, Y. Sheikh, and D. Ramanan

arXiv | bibtex

Comparison with Pix-to-Pix

Comparison

Multiple Outputs

Multiple
Multiple
Multiple

Edges-to-Faces

Edges2Faces

Normals-to-Faces

Normals2Faces

Edges-to-Cats-&-Dogs

Edges2CatsAndDogs

Normals-to-Cats-&-Dogs

Normals2CatsAndDogs

Example Frequency Analysis


We did frequency analysis via FFT to understand the frequency content in the output of our images.
FreqAnalysis

Comments, questions to Aayush Bansal.