ML-00: Python Environment Setup for Machine Learning

Publish on: 2022/04/05 Classify at: CODE/Supervised Machine Learning

Words: 999 Read:≈ 5min

Summary

Compare Python environment options and set up a reproducible ML environment using Docker for this blog series. We examine System Python, venv, Conda, and Docker, recommending Docker + VSCode Dev Containers for guaranteed reproducibility.

Learning Objectives

After reading this post, you will be able to:

Understand the pros/cons of different Python environment methods
Know why Docker is recommended for this series
Set up the ML environment using Docker (recommended) or alternatives
Run the first test to verify your environment

Theory

Before diving into machine learning, you need a properly configured Python environment. Let’s explore the options and choose the best approach for this series.

Python Environment Methods Comparison

There are several ways to manage Python environments. Here’s a quick comparison:

Method	Description	Best For
System Python	Python installed directly on OS	Quick scripts, beginners
venv	Built-in virtual environment	Simple projects, lightweight
Conda	Package + environment manager	Data science, complex dependencies
Docker	Containerized environment	Reproducibility, team collaboration

graph TB subgraph Methods["Python Environment Methods"] A["System Python Simple but risky"] B["venv Lightweight isolation"] C["Conda Data science friendly"] D["Docker Full reproducibility"] end A -->|"More Isolation"| B B -->|"Better Dependency Mgmt"| C C -->|"Complete Portability"| D style D fill:#c8e6c9

Method Details

Let’s examine each method in more detail, along with its trade-offs.

System Python (Not Recommended)

1
2
# Just install packages globally
pip install numpy pandas scikit-learn

✅ Pros	❌ Cons
Zero setup	Pollutes system Python
Works immediately	Version conflicts between projects
	Hard to reproduce on other machines

venv (Python Built-in)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Create virtual environment
python -m venv ml_env

# Activate (Windows)
ml_env\Scripts\activate

# Activate (macOS/Linux)
source ml_env/bin/activate

# Install packages
pip install numpy pandas scikit-learn

# Save dependencies
pip freeze > requirements.txt

✅ Pros	❌ Cons
Built into Python	Only isolates Python packages
Lightweight	Doesn’t handle system libraries
Easy to understand	Python version tied to system

Conda (Anaconda/Miniconda)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Create environment with specific Python version
conda create -n ml_env python=3.11

# Activate
conda activate ml_env

# Install packages
conda install numpy pandas scikit-learn

# Export environment
conda env export > environment.yml

✅ Pros	❌ Cons
Manages Python versions	Large installation size
Handles non-Python deps (CUDA, MKL)	Can be slow to resolve dependencies
Great for data science	Environment files may not be portable

Docker (Recommended for This Series)

1
2
# Pull and run — everything included!
docker run -it -v $(pwd):/app ml-blog:latest python

✅ Pros	❌ Cons
100% reproducible	Learning curve for beginners
Isolates everything (OS, libs, Python)	Slight overhead
Same environment everywhere	Requires Docker installation
Easy to share and deploy

Why Docker for This Series?

Among all options, Docker stands out for one critical reason: reproducibility.

graph LR A[You] --> B[Docker Container] C[Your Classmate] --> B D[Production Server] --> B B --> E["Same Python 3.12 Same NumPy 2.x Same scikit-learn 1.x Same Results!"] style B fill:#e1f5fe style E fill:#c8e6c9

💡 The Goal: When you run the code from this blog series, you get exactly the same results as everyone else — no “it works on my machine” surprises!

Docker Installation (Windows)

Step 1: Install Docker Desktop

Download Docker Desktop from: docker.com/products/docker-desktop
Run the installer and follow the prompts
During installation, ensure “Use WSL 2 based engine” is checked
Restart your computer if prompted

Step 2: Verify Installation

Open PowerShell and run:

1
2
docker --version
docker run hello-world

If you see “Hello from Docker!”, you’re ready to go!

💡 Tip: If WSL2 is not installed, Docker Desktop will prompt you to install it.

Code Practice

Time to set up your environment! Choose one of the following options based on your preference.

Option A: Using VSCode Dev Container (Recommended)

VSCode Dev Containers provide the best development experience — your editor runs inside the container with all extensions pre-configured!

Step 1: Install VSCode Extension

Install the Dev Containers extension in VSCode:

Extension ID: ms-vscode-remote.remote-containers

Step 2: Create Configuration Files

Create a .devcontainer folder in your project root with these files:

.devcontainer/devcontainer.json:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
  "name": "Python 3.12 ML",
  "image": "python:3.12-slim",

  "customizations": {
    "vscode": {
      "settings": {
        "python.defaultInterpreterPath": "/usr/local/bin/python"
      },
      "extensions": [
        "ms-python.python",
        "ms-python.debugpy",
        "ms-toolsai.jupyter"
      ]
    }
  },

  "features": {
    "ghcr.io/devcontainers/features/git:1": {}
  },

  "postCreateCommand": "pip install --upgrade pip && pip install -r requirements.txt",

  "remoteUser": "root"
}

requirements.txt (in project root):

1
2
3
4
5
6
# ML Course Dependencies
numpy>=1.24.0
pandas>=2.0.0
matplotlib>=3.7.0
scikit-learn>=1.3.0
jupyter>=1.0.0

Step 3: Open in Container

Open your project folder in VSCode
Press F1 and select “Dev Containers: Reopen in Container”
Wait for the container to build (first time takes a few minutes)
You’re now working inside the container!

graph LR A[Your Project] --> B[VSCode] B --> C[Dev Container] C --> D["Python 3.12 + All ML Libraries + Jupyter Support"] style C fill:#e1f5fe style D fill:#c8e6c9

Option B: Using venv (Without Docker)

If you prefer not to use Docker:

1
2
3
4
5
6
7
8
# Create and activate environment
python -m venv ml_env

# Windows
ml_env\Scripts\activate

# Install dependencies
pip install numpy pandas matplotlib scikit-learn jupyter

Option C: Using Conda (Without Docker)

1
2
3
4
5
# Create environment
conda create -n ml_code python=3.12 numpy pandas matplotlib scikit-learn jupyter -y

# Activate
conda activate ml_code

Environment Verification

Regardless of which method you chose, run this code to verify everything is set up correctly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# verify_environment.py
import sys
print(f"✓ Python {sys.version.split()[0]}")

import numpy as np
print(f"✓ NumPy {np.__version__}")

import pandas as pd
print(f"✓ Pandas {pd.__version__}")

import matplotlib
print(f"✓ Matplotlib {matplotlib.__version__}")

import sklearn
print(f"✓ scikit-learn {sklearn.__version__}")

print("\n🎉 Environment ready for the ML Code Series!")

Expected Output:

1
2
3
4
5
6
7
✓ Python 3.12.12
✓ NumPy 2.3.5
✓ Pandas 2.3.3
✓ Matplotlib 3.10.8
✓ scikit-learn 1.8.0

🎉 Environment ready for the ML Code Series!

Summary

Method	Recommendation
Docker	⭐ Recommended — guaranteed reproducibility
venv	Good alternative if Docker not available
Conda	Good for data science workflows
System	❌ Not recommended — may cause conflicts

Johan Blog

ML-00: Python Environment Setup for Machine Learning

Learning Objectives

Theory

Python Environment Methods Comparison

Method Details

System Python (Not Recommended)

venv (Python Built-in)

Conda (Anaconda/Miniconda)

Docker (Recommended for This Series)

Why Docker for This Series?

Docker Installation (Windows)

Step 1: Install Docker Desktop

Step 2: Verify Installation

Code Practice

Option A: Using VSCode Dev Container (Recommended)

Step 1: Install VSCode Extension

Step 2: Create Configuration Files

Step 3: Open in Container

Option B: Using venv (Without Docker)

Option C: Using Conda (Without Docker)

Environment Verification

Summary

References