2D Fourier Transform for Image Recognition and Processing in Julia

Summary
An exploration of using the 2D Fast Fourier Transform (FFT) in Julia to analyze image frequencies, understanding how computers extract features for recognition.

Quick Reference: FFT Spectrum Components

ComponentContainsReconstruction
Full SpectrumMagnitude + PhasePerfect original image
Magnitude OnlyIntensity infoBlurry, no structure
Phase OnlyEdge/structure infoOutlines preserved

In previous exploration of the Fourier Transform, we focused on processing one-dimensional arrays or vectors, such as the analysis of blue whale vocalizations. Now, we extend this concept to the 2D Fourier Transform to understand the frequency characteristics inherent in images. In the field of Artificial Intelligence, image recognition—the ability to distinguish between a cat, a dog, or a car—relies on identifying distinct features. Just as humans recognize objects based on specific traits, we must enable computers to extract and analyze these features.

Consider a classic application like recognizing digits in the MNIST dataset. The process typically involves passing data through convolutional layers, activation layers, pooling layers, and affine transformation layers, finally resulting in a Softmax output that provides classification probabilities. In simple terms, a convolution kernel (a matrix) acts as a sliding window, performing convolution operations across the image. These operations identify low-level features like textures and edges. By stacking these layers with activation and pooling functions, the network extracts increasingly high-level, abstract features.

When we talk about image features, we are essentially dealing with frequencies—specifically, the rate of change in pixel brightness. In the context of AI and deep learning, this is often referred to as the gradient. By feeding a vast number of images into a model and utilizing gradient descent to update parameters like weights and biases, we fit the model to maximize recognition accuracy. Once trained, we test the model with new images to evaluate its generalization performance.

Just as listening to music happens in the time domain, viewing images happens in the spatial domain. While this is intuitive for humans, processing raw spatial data can be computationally complex for certain tasks. The Fourier Transform allows us to convert this data from the spatial domain into the frequency domain, simplifying the analysis of periodic patterns and variations. For this demonstration, we will convert our input image to grayscale.

Since our input is a 3D color image (RGB), we first convert it to grayscale. This reduces it to a single channel, representing intensity values, which simplifies the Fourier analysis:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
using Images
using FFTW
using Plots

## Load and Preprocess Image
image_path = joinpath(@__DIR__, "wukong.jpg")
img = load(image_path)

# Convert to grayscale (common in image processing and AI for gradient analysis)
img_gray = Gray.(img)
img_matrix = Float64.(img_gray)

## 2D Fourier Transform Analysis
# Compute 2D discrete Fourier transform
fft_result = fft(img_matrix)

# Shift zero-frequency component to spectrum center
fft_shifted = fftshift(fft_result)

# Calculate magnitude spectrum (frequency domain amplitude)
magnitude = abs.(fft_shifted)

# Normalize magnitude to 0-255 range for better visualization
magnitude_normalized = (magnitude .- minimum(magnitude)) ./ 
                       (maximum(magnitude) - minimum(magnitude)) .* 255

# Calculate phase spectrum (frequency domain phase angles in radians)
phase = angle.(fft_result)

## Plot FFT Analysis Results
p1 = heatmap(img_matrix,
    c=:grays, yflip=true, axis=false, colorbar=false,
    title="Original Image", aspect_ratio=:equal)

p2 = heatmap(magnitude_normalized,
    c=:grays, yflip=true, axis=false, colorbar=false,
    title="Magnitude Spectrum", aspect_ratio=:equal)

p3 = heatmap(log.(magnitude_normalized .+ 1),
    c=:grays, yflip=true, axis=false, colorbar=false,
    title="Log Magnitude Spectrum", aspect_ratio=:equal)

p4 = heatmap(phase,
    c=:grays, yflip=true, axis=false, colorbar=false,
    title="Phase Spectrum", aspect_ratio=:equal)

# Display all FFT analysis plots
plot(p1, p2, p3, p4,
    layout=(1, 4),
    size=(1600, 400))

As observed, the original image is visually intuitive to us. However, the frequency components and phase angles in the frequency domain—while abstract and less intuitive to the human eye—contain structured data that is often easier for computers to process and analyze.

Next, we perform the 2D Inverse Fourier Transform. We will attempt to reconstruct the image using different components: the full spectrum, the magnitude spectrum alone, and the phase spectrum alone.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
## Inverse Fourier Transform Reconstruction
# Reconstruct from full spectrum (magnitude + phase) - perfect reconstruction
reconstructed_full = abs.(ifft(fft_result))

# Reconstruct from magnitude only (phase = 0) - loses structural details
reconstructed_magnitude = abs.(ifft(abs.(fft_result)))

# Reconstruct from phase only (magnitude = 1) - preserves edge information
reconstructed_phase = abs.(ifft(exp.(1im .* phase)))

## Plot Reconstruction Comparison
q1 = heatmap(reconstructed_phase,
    c=:grays, yflip=true, axis=false, colorbar=false,
    title="Phase Reconstruction", aspect_ratio=:equal)

q2 = heatmap(reconstructed_full,
    c=:grays, yflip=true, axis=false, colorbar=false,
    title="Full Spectrum Reconstruction", aspect_ratio=:equal)

q3 = heatmap(reconstructed_magnitude,
    c=:grays, yflip=true, axis=false, colorbar=false,
    title="Magnitude Reconstruction", aspect_ratio=:equal)

# Display all reconstruction results
plot(q1, q2, q3,
    layout=(1, 3),
    size=(1200, 400))

The results are illuminating: the full spectrum perfectly reconstructs the original image. Interestingly, the phase spectrum alone preserves the structural outlines and edges of the image, while the magnitude spectrum primarily contains intensity information but lacks structure.

To understand the conversion of phase angles to complex numbers in the code (exp.(1im .* phase)), we can look to Euler’s formula. For instance, e(iπ)=1e^{(i\pi)} = -1 represents a 180180-degree rotation on the complex plane (landing on 1-1 on the real axis). Similarly, a 4545-degree rotation is e(iπ/4)=cos(π/4)+isin(π/4)0.707+0.707ie^{(i\pi/4)} = \cos(\pi/4) + i\sin(\pi/4) \approx 0.707 + 0.707i.