ML-02: The Perceptron and Decision Boundaries

Summary
Dive into the perceptron — the building block of neural networks. Learn how a simple linear classifier draws decision boundaries, and build one from scratch.

Learning Objectives

After reading this post, you will be able to:

  • Understand what binary classification means
  • Explain the mathematical representation of a decision boundary
  • Implement the perceptron algorithm from scratch
  • Distinguish between linearly separable and non-separable data

Theory

Building on the ML fundamentals from the previous article, this post introduces the perceptron — the simplest neural network and the foundation of deep learning.

The Biological Inspiration: From Neuron to Perceptron

Binary Classification Problem

Consider a bank evaluating credit card applications. Based on historical data:

ApplicantAnnual IncomeDebt RatioApproved?
Alice$80,0000.2✅ Yes
Bob$30,0000.8❌ No
Carol$60,0000.3✅ Yes
Dave$40,0000.6❌ No

This is a binary classification problem: we want to classify new applicants into two groups (approved/rejected).

graph LR A[Input Features] --> B[Classifier] B --> C{Class +1 or -1?} C -->|+1| D[Approved] C -->|-1| E[Rejected] style D fill:#c8e6c9 style E fill:#ffcdd2

The Decision Boundary

When plotting the data on a 2D plane (income vs. debt ratio), we observe that approved and rejected applicants can be separated by a line:

Credit Card Approval: Decision Boundary Example

This separating line is called the decision boundary.

Mathematical Formulation

How can this separating line be described mathematically? This is where the decision boundary equation comes in.

A line in 2D can be written as:

$$w_1 x_1 + w_2 x_2 + b = 0$$

Or in vector form:

$$\boldsymbol{w} \cdot \boldsymbol{x} + b = 0$$

Where:

  • $\boldsymbol{w} = (w_1, w_2)$ is the weight vector (normal to the line)
  • $\boldsymbol{x} = (x_1, x_2)$ is the input features
  • $b$ is the bias term

The Perceptron Model

With the mathematical representation in place, the next step is to define how the model makes predictions.

The perceptron classifies a point based on which side of the line it falls:

$$h(\boldsymbol{x}) = \text{sign}(\boldsymbol{w} \cdot \boldsymbol{x} + b)$$

Where the sign function returns:

  • $+1$ if $\boldsymbol{w} \cdot \boldsymbol{x} + b > 0$
  • $-1$ if $\boldsymbol{w} \cdot \boldsymbol{x} + b < 0$
graph TB subgraph Perceptron A["x₁, x₂, ..., xₙ"] --> B["∑ wᵢxᵢ + b"] B --> C["sign()"] C --> D["+1 or -1"] end

Linear Separability

The perceptron works perfectly when classes can be separated by a line. But what does “separable” really mean?

A dataset is linearly separable if there exists a line (or hyperplane) that perfectly separates the two classes:

Linearly SeparableNot Linearly Separable
A line can separate all pointsNo line can perfectly separate
Perceptron convergesPerceptron may not converge

The Perceptron Learning Algorithm

The perceptron learns iteratively by examining each training example and adjusting weights when misclassifications occur:

Algorithm:

1
2
3
4
5
6
1. Initialize w = 0, b = 0
2. For each training example (xᵢ, yᵢ):
   - If yᵢ(w · xᵢ + b) ≤ 0:  (misclassified)
     - Update: w ← w + yᵢxᵢ
     - Update: b ← b + yᵢ
3. Repeat until no misclassifications (or max iterations)

Intuition: When a point is misclassified, we “push” the decision boundary toward the correct side.

Animation: Watch how the perceptron adjusts its decision boundary step by step

Code Practice

Now let’s implement everything from scratch and see the perceptron in action.

Implementing the Perceptron from Scratch

🐍 Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import numpy as np
import matplotlib.pyplot as plt

class Perceptron:
    def __init__(self, learning_rate=1.0, max_iter=1000):
        self.lr = learning_rate
        self.max_iter = max_iter
        self.w = None
        self.b = None
        
    def fit(self, X, y):
        n_samples, n_features = X.shape
        
        # Initialize weights and bias
        self.w = np.zeros(n_features)
        self.b = 0
        
        for _ in range(self.max_iter):
            misclassified = False
            
            for xi, yi in zip(X, y):
                # Check if point is misclassified
                if yi * (np.dot(self.w, xi) + self.b) <= 0:
                    # Update weights and bias
                    self.w += self.lr * yi * xi
                    self.b += self.lr * yi
                    misclassified = True
            
            # Stop if no misclassifications
            if not misclassified:
                break
                
        return self
    
    def predict(self, X):
        linear_output = np.dot(X, self.w) + self.b
        return np.sign(linear_output)
    
    def score(self, X, y):
        predictions = self.predict(X)
        return np.mean(predictions == y)

Testing on Linearly Separable Data

🐍 Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Generate linearly separable data
np.random.seed(42)
X_pos = np.random.randn(20, 2) + np.array([2, 2])
X_neg = np.random.randn(20, 2) + np.array([-2, -2])
X = np.vstack([X_pos, X_neg])
y = np.array([1] * 20 + [-1] * 20)

# Train perceptron
clf = Perceptron()
clf.fit(X, y)

print(f"Weights: {clf.w}")
print(f"Bias: {clf.b}")
print(f"Accuracy: {clf.score(X, y):.2%}")

Output:

1
2
3
Weights: [2.49671415 1.8617357 ]
Bias: 1.0
Accuracy: 100.00%

Visualizing the Decision Boundary

🐍 Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def plot_decision_boundary(clf, X, y, title="Decision Boundary"):
    plt.figure(figsize=(8, 6))
    
    # Plot data points
    plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], 
                c='#4CAF50', marker='o', label='Class +1', edgecolors='k')
    plt.scatter(X[y == -1][:, 0], X[y == -1][:, 1], 
                c='#EF5350', marker='x', label='Class -1', s=100)
    
    # Plot decision boundary
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    x_line = np.linspace(x_min, x_max, 100)
    
    # Decision boundary: w1*x1 + w2*x2 + b = 0
    # => x2 = -(w1*x1 + b) / w2
    if clf.w[1] != 0:
        y_line = -(clf.w[0] * x_line + clf.b) / clf.w[1]
        plt.plot(x_line, y_line, 'k-', linewidth=2, label='Decision Boundary')
    
    plt.xlabel('$x_1$', fontsize=12)
    plt.ylabel('$x_2$', fontsize=12)
    plt.title(title)
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.savefig('assets/decision_boundary.png', dpi=150)
    plt.show()

plot_decision_boundary(clf, X, y, "Perceptron Decision Boundary")
Perceptron Decision Boundary

What Happens with Non-Separable Data?

🐍 Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Add some noise to make data non-separable
X_noisy = X.copy()
y_noisy = y.copy()
# Flip some labels
y_noisy[0] = -1
y_noisy[21] = 1

# Train perceptron
clf_noisy = Perceptron(max_iter=100)
clf_noisy.fit(X_noisy, y_noisy)

print(f"Accuracy on noisy data: {clf_noisy.score(X_noisy, y_noisy):.2%}")

plot_decision_boundary(clf_noisy, X_noisy, y_noisy, 
                       "Perceptron on Non-Separable Data")

Output:

1
Accuracy on noisy data: 95.00%
Perceptron Decision Boundary on Non-Separable Data

The perceptron still finds a reasonable boundary, but it can’t achieve 100% accuracy on non-separable data.

Using sklearn’s Perceptron

🐍 Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from sklearn.linear_model import Perceptron as SkPerceptron
from sklearn.preprocessing import StandardScaler

# Standardize features (recommended for perceptron)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train sklearn perceptron
sk_clf = SkPerceptron(max_iter=1000, random_state=42)
sk_clf.fit(X_scaled, y)

print(f"sklearn Perceptron Accuracy: {sk_clf.score(X_scaled, y):.2%}")

Output:

1
sklearn Perceptron Accuracy: 100.00%

Deep Dive

Q1: Why does the perceptron update rule work?

When a point $(x_i, y_i)$ is misclassified, updating $w \leftarrow w + y_i x_i$ moves the decision boundary to correctly classify that point. The dot product $w \cdot x_i$ increases by $y_i |x_i|^2$, pushing the output toward the correct sign.

Q2: What if data isn’t linearly separable?

The basic perceptron will oscillate forever without converging. Solutions include:

  • Pocket Algorithm: Keep track of the best weights seen so far
  • Soft-margin methods: Allow some misclassifications

Q3: Perceptron vs. Logistic Regression?

AspectPerceptronLogistic Regression
OutputHard decision (+1/-1)Probability (0-1)
Loss FunctionMisclassification countCross-entropy
Non-separable dataMay not convergeAlways converges

Summary

ConceptKey Points
Binary ClassificationClassifying into two groups
Decision Boundary$\boldsymbol{w} \cdot \boldsymbol{x} + b = 0$
Perceptron$h(\boldsymbol{x}) = \text{sign}(\boldsymbol{w} \cdot \boldsymbol{x} + b)$
Linear SeparabilityA line can perfectly separate classes
Update Rule$\boldsymbol{w} \leftarrow \boldsymbol{w} + y_i \boldsymbol{x}_i$

References