CS 180: Introduction to Computational Photography and Computer Vision

Project 6: Image Quilting & Lightfield Camera

Stephen Su

Image Quilting Overview

In this project, we learn about different methods to synthesize texture and transfer texture from one object to another object. Texture synthesis is when we take a sample image of a texture and use it to generate a larger image from the texture. Texture transfer is giving an object the appearance of having the same texture as a sample while preserving its basic shape. The project follows the implementation of the paper Image Quilting for Texture Synthesis and Transfer by Efros and Freeman.

Part 1: Randomly Sampled Texture

The easiest way to generate texture is by randomly sampling patches of small squares of size patch_size from our sample image. To randomly sample, we will use a seed of $0$. We then create an output image by tiling these square patches. For the following images, we will be using an output image of size $400 \times 400$ pixels and patch_size = 40. Below are some examples.

Sample Output

Bricks

Randomly Sampled

Cracker

Randomly Sampled

Bread

Randomly Sampled

Text

Randomly Sampled

Popcorn

Randomly Sampled

Tree

Randomly Sampled

Part 2: Overlapping Patches

A better approach for texture synthesis is to instead have our sampled patches overlap with the existing patches. To determine which patch would best overlap with the existing patches, we introduce a sum of squared differences (SSD) cost function. Using the cost function, we iterate through every possible patch in the sample texture and compute the cost of the overlapping region between the sampled patch and the existing patches. We then select a random patch with low cost out of the tol lowest cost patches. For the following examples, we use an output image of size $400 \times 400$ pixels, patch_size = 35, overlap = 9, and tol = 3.

Sample Output

Bricks

Overlapping Patches

Cracker

Overlapping Patches

Bread

Overlapping Patches

Text

Overlapping Patches

Popcorn

Overlapping Patches

Tree

Overlapping Patches

Overall, the results here look much more coherent than the randomly sampled patches!

Part 3: Seam Finding

To further improve our results, we implement seam finding. Instead of overlapping newly sampled patches on top of existing patches, Seam Finding finds the minimum SSD cost of a contiguous path such that we are able to produce a clean combination of the two different patches. The shortest patch from one side to the other in the overlapping region will then be used to slice our existing patches as well as our sampled patch. After slicing, we can then stitch our patches together to generate a clean patch that fits with the existing patches. For the following examples, we use an output image of size $400 \times 400$ pixels, patch_size = 33, overlap = 16, and tol = 3.

Sample Output

Bricks

Seam Finding

Cracker

Seam Finding

Bread

Seam Finding

Text

Seam Finding

Popcorn

Seam Finding

Tree

Seam Finding

For some examples, there are clearly less visual artifacts when compared to the naive overlapping patch method!

Part 4: Texture Transfer

We can apply texture synthesis to transfer texture from one object to another. The result is an image with the same structure as a guidance image with the style of a sample texture image. We first need to define a correspondence map to compute the difference between the guidance image and the sample texture image. The correspondence map will use a blurred, black-and-white version of the guidance and sample texture image. We then need to change our cost function to account for the guidance image as well as the sample texture image. The new cost function is $$ \alpha \cdot SSD(x_t, x_{sample}) + (1 - \alpha) \cdot SSD(x_{guidance}, x_{sample})$$ where $x_t$ is our current patch to fill in and may contain overlaps from other patches, $x_{sample}$ is our sample texture image, and $x_{guidance}$ is our guidance image. We then proceed to use our Seam Finding method as usual. Below are some results.

Example 1: Sketch and Richard Feynman

Texture

Guidance Image

patch_size=25, overlap=7, tol=3, alpha=0.4

Example 2: T1 Faker

Texture

Guidance Image

patch_size=15, overlap=5, tol=3, alpha=0.3

Example 3: Fabric and Pepe the Frog

Texture

Guidance Image

patch_size=15, overlap=5, tol=3, alpha=0.3

Example 4: Yarn and Squidward

Texture

Guidance Image

patch_size=15, overlap=5, tol=3, alpha=0.3

Bells and Whistles

We can combine the methods of texture transfer learned here with the blending techniques found in Project 2 to create a face-in-toast image. We first need to use the texture transfer methods to create a face using an image of a toast as our sample texture image. For the face, we will be using Professor Efros here as an example. Below are the results.

Toast Texture

Professor Efros

patch_size=25, overlap=13, tol=3, alpha=0.3

We can then use Laplacian Pyramid Blending between our original toast image and the texture transferred toast image. Using the following mask for the Laplacian pyramid, this gives us

Mask

Face-In-Toast!

Project Insights

I loved being able to apply the techniques in this project onto some of the images I found online. My favorite part was being able to produce the Efros Toast!

Citations


Lightfield Camera Overview

This project was based off the paper Light Field Photography with a Hand-held Plenoptic Camera by Ng et al., where we capture multiple images over a plane orthogonal to the optical axis to achieve complex effects using simple operations such as shifting and averaging images. The goal of this project is to reproduce some of these effects using lightfield data. We will be using the Chess dataset found in the Stanford Lightfield Archive.

Part 1: Depth Refocusing

Objects far away from the camera don't vary their position significantly when the camera moves around while keeping the optical axis unchanged. Nearby objects, on the other hand, vary their position significantly across images. If we average all the images in the grid without any shifting, we will produce an image that is sharp around the far-away objects but blurry around the nearby objects. Here is what an averaged image would look like.

Averaged Image

Similarly, if we shift each image, we can focus on objects at different depths. To determine the shift for each image, we first need to define a center point. Each image comes with a $(u, v)$ coordinate representing the image's position, as well a $(x, y)$ coordinate representing the image's position with regards to a (17, 17) grid. Our center image will be the image with a coordinate of $(x, y) = (8, 8)$, and we will denote that image's position as $(u_{center}, v_{center})$. To calculate the shift for each image at position $(u, v)$, we use the following formulas. $$u_{shift} = (u - u_{center}) \cdot c$$ $$v_{shift} = (v - v_{center}) \cdot c$$ Here, $(u_{shift}, v_{shift})$ represents the shift needed for the current image, and $c$ is a hyperparameter we adjust to manipulate the point of focus in the averaged image. Positive $c$ values means we are shifting the focus towards nearby objects, while negative $c$ values means we are shifting the focus towards objects far away. After we shift each image by a single $c$ value, we can then average each shifted image to change the depth. Below are some examples of the dataset averaged with various $c$ values.

$c = -0.5$

$c = -0.1$

$c = 0.0$

$c = 0.1$

$c = 0.5$

Here is a GIF displaying the various $c$ values from $-0.5$ to $0.5$.

Part 2: Aperture Adjustment

The size of the aperture represents how much light is captured by the camera. In a sense, averaging a large number of images is similar to using a large aperture. With more data, we have more of the scene captured and more light coming in. On the other hand, averaging a smaller number of images is similar to using a smaller aperture, due to us capturing less of the scene and having less light coming in. We can simulate this change by selectively choosing which images to use in our averaging process.

To determine which images we will average, we will compute a distance $d$ for each image such that $d$ is the distance from $(u, v)$ to $(u_{center}, v_{center})$. We will only choose images such that they are within a certain radius $r$ from the center. Then, we will shift and average our selected images using the same process as the previous section, with $c = 0.2$. Below are some examples of various $r$ values used.

$r = 10$

$r = 30$

$r = 50$

Here is a GIF of the various $r$ values used from $5$ to $70$.

Project Insights

It was fun seeing how much of a camera we can simulate just based off data alone along with some shifting and averaging! My favorite part was learning more about how cameras worked and all the different terminologies throughout this project.

Citations