CS 180 Project 1

Approach

One of the key issues in this project was how to align each of the three color channels on top of each other. Simply stacking the three channels would often create blurry, unappealing images that are hard to look at. To counteract this effect, we can add some displacement along each channel. Using the blue channel as our base, we can align both the red and green channels to the blue channel. For low resolutions, we searched in a [-15, 15] window along both the x and y axis. To score each displacement, we used the Normalized Cross Correlation (NCC) metric, which in terms of vectors is a dot product of two normalized vectors. However, because channels are represented as matrices, we adjust the metric to instead use the Hadamard product or element-wise product of two normalized matrices and then sum over all elements in the resulting product. To normalize a matrix, we divide each element by the forbenius norm of that matrix. Furthermore, to avoid fitting noise, we removed 5% from the margins as to only score using the inner pixels. NCC is used such that a higher score between two channels indicates a higher similarity level. Thus, we take the displacement that generates the highest NCC score between two channels.

For higher resolutions (mainly .tif files), manually searching in a window for the best displacement becomes algorithmically inefficient and computationally expensive. To speed up the process of finding the best displacement for larger images, we can implement an Image Pyramid. Image Pyramids represent the image at multiple, smaller scales, often by factors of 2. At each level, we would scale the image down by a factor of 2 until the image is small enough. Once we have a low enough resolution, we manually search for the best displacement. Larger images would then take the resulting displacement from the smaller images, scale the displacement accordingly by a factor of 2, and perform another manual search using the results from the smaller image as our starting point. This process continues until we are able to work our way up to our original resolution image to generate the best displacement.

Unaligned Images	Aligned Images
cathedral.jpg	Red Offset (12, 3), Green Offset (5, 2)
monastery.jpg	Red Offset (3, 2), Green Offset (-3, 2)
tobolsk.jpg	Red Offset (6, 3), Green Offset (3, 2)

High Resolution Image Gallery

Unaligned Images	Aligned Images
church.tif	Red Offset (58, -5), Green Offset (25, 3)
emir.tif	Red Offset (103, 43), Green Offset (49, 24)
harvesters.tif	Red Offset (124, 14), Green Offset (60, 16)
icon.tif	Red Offset (89, 23), Green Offset (40, 17)
lady.tif	Red Offset (110, 12), Green Offset (55, 8)
melons.tif	Red Offset (177, 11), Green Offset (82, 9)
onion_church.tif	Red Offset (108, 36), Green Offset (51, 26)
sculpture.tif	Red Offset (140, -26), Green Offset (33, -11)
self_portrait.tif	Red Offset (175, 34), Green Offset (79, 29)
three_generations.tif	Red Offset (112, 10), Green Offset (55, 13)
train.tif	Red Offset (87, 32), Green Offset (42, 6)

Bells and Whistles

Although the algorithm was able to correctly align each of the images, the images still contained black, as well as colored borders on it's margins that make the photos difficult to look at. As a result, we implement automatic cropping as a feature to combat this issue. The algorithm works by first applying a Canny filter to the photo to get a black and white representation of the edges. Then, we manually search within the margins of the photo to look for any lines, and crop the margins if we were able to find any lines. The steps are as follows:

Apply a gaussian blur to the image, with sigma=3. Blurring the image removes any excess noise, as well as smooth out edges.
Reduce the image to grayscale. This is to simplify further computations.
Similar to how we implemented an Image Pyramid, represent the image at smaller scales, by repeatedly rescaling the image down by a factor of 2 until the image's length or width is within 300 pixels. Any cropping that we do on this smaller image has to be scaled to the larger image.
Compute a Canny filter over the downscaled image for edge detection. This will give the edges of the image in black and white.
Find the edges of the image along the top, bottom, left, and right margins of the image. To ensure that we are not excessively cropping, we only look within 10% of the image from the top, bottom, left, and right of the image

Here is what the image looks like after each step of the algorithm on emir:

Gaussian Blur

Black and White

Downscale

Canny Filter

After applying the Canny filter, the main issue then becomes detecting the horizontal lines on the top and bottom margins, as well as the vertical lines on the left and right margins. Often, we run into situations where the borders detected by the Canny filter are not straight. To counteract this problem, we instead look in a window for lines by performing a bitwise or. This ensures that even if the line is not purely straight, we are still able to detect the line. The threshold we set for us to consider cropping at that line is 0.75. This means that a line has to cover at least 75% of the image either vertically or horizontally for us to crop the line. Below are the results from cropping. We compare our automatic cropping with a default crop of 10% along the top, bottom, left, and right margins.

10% Cropping	Automatic Cropping
10% Cropping	Automatic Cropping
10% Cropping	Automatic Cropping
10% Cropping	Automatic Cropping
10% Cropping	Automatic Cropping