ML-02: The Perceptron and Decision Boundaries

Publish on: 2022/04/07 Classify at: CODE/Supervised Machine Learning

Words: 1281 Read:≈ 7min

Summary

Dive into the perceptron — the building block of neural networks. Learn how a simple linear classifier draws decision boundaries, and build one from scratch.

Learning Objectives

After reading this post, you will be able to:

Understand what binary classification means
Explain the mathematical representation of a decision boundary
Implement the perceptron algorithm from scratch
Distinguish between linearly separable and non-separable data

Theory

Building on the ML fundamentals from the previous article, this post introduces the perceptron — the simplest neural network and the foundation of deep learning.

The Biological Inspiration: From Neuron to Perceptron

Binary Classification Problem

Consider a bank evaluating credit card applications. Based on historical data:

Applicant	Annual Income	Debt Ratio	Approved?
Alice	$80,000	0.2	✅ Yes
Bob	$30,000	0.8	❌ No
Carol	$60,000	0.3	✅ Yes
Dave	$40,000	0.6	❌ No

This is a binary classification problem: we want to classify new applicants into two groups (approved/rejected).

graph LR A[Input Features] --> B[Classifier] B --> C{Class +1 or -1?} C -->|+1| D[Approved] C -->|-1| E[Rejected] style D fill:#c8e6c9 style E fill:#ffcdd2

The Decision Boundary

When plotting the data on a 2D plane (income vs. debt ratio), we observe that approved and rejected applicants can be separated by a line:

Credit Card Approval: Decision Boundary Example

This separating line is called the decision boundary.

Mathematical Formulation

How can this separating line be described mathematically? This is where the decision boundary equation comes in.

A line in 2D can be written as:

$$w_1 x_1 + w_2 x_2 + b = 0$$

Or in vector form:

$$\boldsymbol{w} \cdot \boldsymbol{x} + b = 0$$

Where:

$\boldsymbol{w} = (w_1, w_2)$ is the weight vector (normal to the line)
$\boldsymbol{x} = (x_1, x_2)$ is the input features
$b$ is the bias term

The Perceptron Model

With the mathematical representation in place, the next step is to define how the model makes predictions.

The perceptron classifies a point based on which side of the line it falls:

$$h(\boldsymbol{x}) = \text{sign}(\boldsymbol{w} \cdot \boldsymbol{x} + b)$$

Where the sign function returns:

$+1$ if $\boldsymbol{w} \cdot \boldsymbol{x} + b > 0$
$-1$ if $\boldsymbol{w} \cdot \boldsymbol{x} + b < 0$

graph TB subgraph Perceptron A["x₁, x₂, ..., xₙ"] --> B["∑ wᵢxᵢ + b"] B --> C["sign()"] C --> D["+1 or -1"] end

Linear Separability

The perceptron works perfectly when classes can be separated by a line. But what does “separable” really mean?

A dataset is linearly separable if there exists a line (or hyperplane) that perfectly separates the two classes:

Linearly Separable	Not Linearly Separable
A line can separate all points	No line can perfectly separate
Perceptron converges	Perceptron may not converge

The Perceptron Learning Algorithm

The perceptron learns iteratively by examining each training example and adjusting weights when misclassifications occur:

Algorithm:

1
2
3
4
5
6
1. Initialize w = 0, b = 0
2. For each training example (xᵢ, yᵢ):
   - If yᵢ(w · xᵢ + b) ≤ 0:  (misclassified)
     - Update: w ← w + yᵢxᵢ
     - Update: b ← b + yᵢ
3. Repeat until no misclassifications (or max iterations)

Intuition: When a point is misclassified, we “push” the decision boundary toward the correct side.

Animation: Watch how the perceptron adjusts its decision boundary step by step

Code Practice

Now let’s implement everything from scratch and see the perceptron in action.

Implementing the Perceptron from Scratch

🐍 Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import numpy as np
import matplotlib.pyplot as plt

class Perceptron:
    def __init__(self, learning_rate=1.0, max_iter=1000):
        self.lr = learning_rate
        self.max_iter = max_iter
        self.w = None
        self.b = None
        
    def fit(self, X, y):
        n_samples, n_features = X.shape
        
        # Initialize weights and bias
        self.w = np.zeros(n_features)
        self.b = 0
        
        for _ in range(self.max_iter):
            misclassified = False
            
            for xi, yi in zip(X, y):
                # Check if point is misclassified
                if yi * (np.dot(self.w, xi) + self.b) <= 0:
                    # Update weights and bias
                    self.w += self.lr * yi * xi
                    self.b += self.lr * yi
                    misclassified = True
            
            # Stop if no misclassifications
            if not misclassified:
                break
                
        return self
    
    def predict(self, X):
        linear_output = np.dot(X, self.w) + self.b
        return np.sign(linear_output)
    
    def score(self, X, y):
        predictions = self.predict(X)
        return np.mean(predictions == y)

Testing on Linearly Separable Data

🐍 Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Generate linearly separable data
np.random.seed(42)
X_pos = np.random.randn(20, 2) + np.array([2, 2])
X_neg = np.random.randn(20, 2) + np.array([-2, -2])
X = np.vstack([X_pos, X_neg])
y = np.array([1] * 20 + [-1] * 20)

# Train perceptron
clf = Perceptron()
clf.fit(X, y)

print(f"Weights: {clf.w}")
print(f"Bias: {clf.b}")
print(f"Accuracy: {clf.score(X, y):.2%}")

Output:

1
2
3
Weights: [2.49671415 1.8617357 ]
Bias: 1.0
Accuracy: 100.00%

Visualizing the Decision Boundary

🐍 Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def plot_decision_boundary(clf, X, y, title="Decision Boundary"):
    plt.figure(figsize=(8, 6))
    
    # Plot data points
    plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], 
                c='#4CAF50', marker='o', label='Class +1', edgecolors='k')
    plt.scatter(X[y == -1][:, 0], X[y == -1][:, 1], 
                c='#EF5350', marker='x', label='Class -1', s=100)
    
    # Plot decision boundary
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    x_line = np.linspace(x_min, x_max, 100)
    
    # Decision boundary: w1*x1 + w2*x2 + b = 0
    # => x2 = -(w1*x1 + b) / w2
    if clf.w[1] != 0:
        y_line = -(clf.w[0] * x_line + clf.b) / clf.w[1]
        plt.plot(x_line, y_line, 'k-', linewidth=2, label='Decision Boundary')
    
    plt.xlabel('$x_1$', fontsize=12)
    plt.ylabel('$x_2$', fontsize=12)
    plt.title(title)
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.savefig('assets/decision_boundary.png', dpi=150)
    plt.show()

plot_decision_boundary(clf, X, y, "Perceptron Decision Boundary")

What Happens with Non-Separable Data?

🐍 Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Add some noise to make data non-separable
X_noisy = X.copy()
y_noisy = y.copy()
# Flip some labels
y_noisy[0] = -1
y_noisy[21] = 1

# Train perceptron
clf_noisy = Perceptron(max_iter=100)
clf_noisy.fit(X_noisy, y_noisy)

print(f"Accuracy on noisy data: {clf_noisy.score(X_noisy, y_noisy):.2%}")

plot_decision_boundary(clf_noisy, X_noisy, y_noisy, 
                       "Perceptron on Non-Separable Data")

Output:

`1`	`Accuracy on noisy data: 95.00%`

Perceptron Decision Boundary on Non-Separable Data

The perceptron still finds a reasonable boundary, but it can’t achieve 100% accuracy on non-separable data.

Using sklearn’s Perceptron

🐍 Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from sklearn.linear_model import Perceptron as SkPerceptron
from sklearn.preprocessing import StandardScaler

# Standardize features (recommended for perceptron)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train sklearn perceptron
sk_clf = SkPerceptron(max_iter=1000, random_state=42)
sk_clf.fit(X_scaled, y)

print(f"sklearn Perceptron Accuracy: {sk_clf.score(X_scaled, y):.2%}")

Output:

`1`	`sklearn Perceptron Accuracy: 100.00%`

Deep Dive

Q1: Why does the perceptron update rule work?

When a point $(x_i, y_i)$ is misclassified, updating $w \leftarrow w + y_i x_i$ moves the decision boundary to correctly classify that point. The dot product $w \cdot x_i$ increases by $y_i |x_i|^2$, pushing the output toward the correct sign.

Q2: What if data isn’t linearly separable?

The basic perceptron will oscillate forever without converging. Solutions include:

Pocket Algorithm: Keep track of the best weights seen so far
Soft-margin methods: Allow some misclassifications

Q3: Perceptron vs. Logistic Regression?

Aspect	Perceptron	Logistic Regression
Output	Hard decision (+1/-1)	Probability (0-1)
Loss Function	Misclassification count	Cross-entropy
Non-separable data	May not converge	Always converges

Summary

Concept	Key Points
Binary Classification	Classifying into two groups
Decision Boundary	$\boldsymbol{w} \cdot \boldsymbol{x} + b = 0$
Perceptron	$h(\boldsymbol{x}) = \text{sign}(\boldsymbol{w} \cdot \boldsymbol{x} + b)$
Linear Separability	A line can perfectly separate classes
Update Rule	$\boldsymbol{w} \leftarrow \boldsymbol{w} + y_i \boldsymbol{x}_i$

References

Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain . Psychological Review, 65(6), 386–408.
Perceptron - Wikipedia
sklearn Perceptron Documentation
Abu-Mostafa, Y. S., Magdon-Ismail, M., & Lin, H. T. (2012). Learning From Data. AMLBook.