In this project, we learn about different methods to synthesize texture and transfer texture from one object to another object. Texture synthesis is when we take a sample image of a texture and use it to generate a larger image from the texture. Texture transfer is giving an object the appearance of having the same texture as a sample while preserving its basic shape. The project follows the implementation of the paper Image Quilting for Texture Synthesis and Transfer by Efros and Freeman.
The easiest way to generate texture is by randomly sampling patches of small squares of size patch_size
from our sample image. To randomly sample, we will use a seed of $0$. We then create an output image by tiling these square patches. For the following images, we will be using an output image of size $400 \times 400$ pixels and patch_size = 40
. Below are some examples.
Sample | Output |
---|---|
![]() Bricks |
![]() Randomly Sampled |
![]() Cracker |
![]() Randomly Sampled |
![]() Bread |
![]() Randomly Sampled |
![]() Text |
![]() Randomly Sampled |
![]() Popcorn |
![]() Randomly Sampled |
![]() Tree |
![]() Randomly Sampled |
A better approach for texture synthesis is to instead have our sampled patches overlap with the existing patches. To determine which patch would best overlap with the existing patches, we introduce a sum of squared differences (SSD) cost function. Using the cost function, we iterate through every possible patch in the sample texture and compute the cost of the overlapping region between the sampled patch and the existing patches. We then select a random patch with low cost out of the tol
lowest cost patches. For the following examples, we use an output image of size $400 \times 400$ pixels, patch_size = 35
, overlap = 9
, and tol = 3
.
Sample | Output |
---|---|
![]() Bricks |
![]() Overlapping Patches |
![]() Cracker |
![]() Overlapping Patches |
![]() Bread |
![]() Overlapping Patches |
![]() Text |
![]() Overlapping Patches |
![]() Popcorn |
![]() Overlapping Patches |
![]() Tree |
![]() Overlapping Patches |
Overall, the results here look much more coherent than the randomly sampled patches!
To further improve our results, we implement seam finding. Instead of overlapping newly sampled patches on top of existing patches, Seam Finding finds the minimum SSD cost of a contiguous path such that we are able to produce a clean combination of the two different patches. The shortest patch from one side to the other in the overlapping region will then be used to slice our existing patches as well as our sampled patch. After slicing, we can then stitch our patches together to generate a clean patch that fits with the existing patches. For the following examples, we use an output image of size $400 \times 400$ pixels, patch_size = 33
, overlap = 16
, and tol = 3
.
Sample | Output |
---|---|
![]() Bricks |
![]() Seam Finding |
![]() Cracker |
![]() Seam Finding |
![]() Bread |
![]() Seam Finding |
![]() Text |
![]() Seam Finding |
![]() Popcorn |
![]() Seam Finding |
![]() Tree |
![]() Seam Finding |
For some examples, there are clearly less visual artifacts when compared to the naive overlapping patch method!
We can apply texture synthesis to transfer texture from one object to another. The result is an image with the same structure as a guidance image with the style of a sample texture image. We first need to define a correspondence map to compute the difference between the guidance image and the sample texture image. The correspondence map will use a blurred, black-and-white version of the guidance and sample texture image. We then need to change our cost function to account for the guidance image as well as the sample texture image. The new cost function is $$ \alpha \cdot SSD(x_t, x_{sample}) + (1 - \alpha) \cdot SSD(x_{guidance}, x_{sample})$$ where $x_t$ is our current patch to fill in and may contain overlaps from other patches, $x_{sample}$ is our sample texture image, and $x_{guidance}$ is our guidance image. We then proceed to use our Seam Finding method as usual. Below are some results.
![]() Texture |
![]() Guidance Image |
![]()
|
![]() Texture |
![]() Guidance Image |
![]()
|
![]() Texture |
![]() Guidance Image |
![]()
|
![]() Texture |
![]() Guidance Image |
![]()
|
We can combine the methods of texture transfer learned here with the blending techniques found in Project 2 to create a face-in-toast image. We first need to use the texture transfer methods to create a face using an image of a toast as our sample texture image. For the face, we will be using Professor Efros here as an example. Below are the results.
![]() Toast Texture |
![]() Professor Efros |
![]()
|
We can then use Laplacian Pyramid Blending between our original toast image and the texture transferred toast image. Using the following mask for the Laplacian pyramid, this gives us
![]() Mask |
![]() Face-In-Toast! |
I loved being able to apply the techniques in this project onto some of the images I found online. My favorite part was being able to produce the Efros Toast!
This project was based off the paper Light Field Photography with a Hand-held Plenoptic Camera by Ng et al., where we capture multiple images over a plane orthogonal to the optical axis to achieve complex effects using simple operations such as shifting and averaging images. The goal of this project is to reproduce some of these effects using lightfield data. We will be using the Chess dataset found in the Stanford Lightfield Archive.
Objects far away from the camera don't vary their position significantly when the camera moves around while keeping the optical axis unchanged. Nearby objects, on the other hand, vary their position significantly across images. If we average all the images in the grid without any shifting, we will produce an image that is sharp around the far-away objects but blurry around the nearby objects. Here is what an averaged image would look like.
![]() Averaged Image |
Similarly, if we shift each image, we can focus on objects at different depths. To determine the shift for each image, we first need to define a center point. Each image comes with a $(u, v)$ coordinate representing the image's position, as well a $(x, y)$ coordinate representing the image's position with regards to a (17, 17) grid. Our center image will be the image with a coordinate of $(x, y) = (8, 8)$, and we will denote that image's position as $(u_{center}, v_{center})$. To calculate the shift for each image at position $(u, v)$, we use the following formulas. $$u_{shift} = (u - u_{center}) \cdot c$$ $$v_{shift} = (v - v_{center}) \cdot c$$ Here, $(u_{shift}, v_{shift})$ represents the shift needed for the current image, and $c$ is a hyperparameter we adjust to manipulate the point of focus in the averaged image. Positive $c$ values means we are shifting the focus towards nearby objects, while negative $c$ values means we are shifting the focus towards objects far away. After we shift each image by a single $c$ value, we can then average each shifted image to change the depth. Below are some examples of the dataset averaged with various $c$ values.
![]() $c = -0.5$ |
![]() $c = -0.1$ |
![]() $c = 0.0$ |
![]() $c = 0.1$ |
![]() $c = 0.5$ |
Here is a GIF displaying the various $c$ values from $-0.5$ to $0.5$.
![]() |
The size of the aperture represents how much light is captured by the camera. In a sense, averaging a large number of images is similar to using a large aperture. With more data, we have more of the scene captured and more light coming in. On the other hand, averaging a smaller number of images is similar to using a smaller aperture, due to us capturing less of the scene and having less light coming in. We can simulate this change by selectively choosing which images to use in our averaging process.
To determine which images we will average, we will compute a distance $d$ for each image such that $d$ is the distance from $(u, v)$ to $(u_{center}, v_{center})$. We will only choose images such that they are within a certain radius $r$ from the center. Then, we will shift and average our selected images using the same process as the previous section, with $c = 0.2$. Below are some examples of various $r$ values used.
![]() $r = 10$ |
![]() $r = 30$ |
![]() $r = 50$ |
Here is a GIF of the various $r$ values used from $5$ to $70$.
![]() |
It was fun seeing how much of a camera we can simulate just based off data alone along with some shifting and averaging! My favorite part was learning more about how cameras worked and all the different terminologies throughout this project.