ML-00: Python Environment Setup for Machine Learning

Summary
Compare Python environment options and set up a reproducible ML environment using Docker for this blog series. We examine System Python, venv, Conda, and Docker, recommending Docker + VSCode Dev Containers for guaranteed reproducibility.

Learning Objectives

After reading this post, you will be able to:

  • Understand the pros/cons of different Python environment methods
  • Know why Docker is recommended for this series
  • Set up the ML environment using Docker (recommended) or alternatives
  • Run the first test to verify your environment

Theory

Before diving into machine learning, you need a properly configured Python environment. Let’s explore the options and choose the best approach for this series.

Python Environment Methods Comparison

There are several ways to manage Python environments. Here’s a quick comparison:

MethodDescriptionBest For
System PythonPython installed directly on OSQuick scripts, beginners
venvBuilt-in virtual environmentSimple projects, lightweight
CondaPackage + environment managerData science, complex dependencies
DockerContainerized environmentReproducibility, team collaboration
graph TB subgraph Methods["Python Environment Methods"] A["System Python<br/>Simple but risky"] B["venv<br/>Lightweight isolation"] C["Conda<br/>Data science friendly"] D["Docker<br/>Full reproducibility"] end A -->|"More Isolation"| B B -->|"Better Dependency Mgmt"| C C -->|"Complete Portability"| D style D fill:#c8e6c9

Method Details

Let’s examine each method in more detail, along with its trade-offs.

1
2
# Just install packages globally
pip install numpy pandas scikit-learn
✅ Pros❌ Cons
Zero setupPollutes system Python
Works immediatelyVersion conflicts between projects
Hard to reproduce on other machines

venv (Python Built-in)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Create virtual environment
python -m venv ml_env

# Activate (Windows)
ml_env\Scripts\activate

# Activate (macOS/Linux)
source ml_env/bin/activate

# Install packages
pip install numpy pandas scikit-learn

# Save dependencies
pip freeze > requirements.txt
✅ Pros❌ Cons
Built into PythonOnly isolates Python packages
LightweightDoesn’t handle system libraries
Easy to understandPython version tied to system

Conda (Anaconda/Miniconda)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Create environment with specific Python version
conda create -n ml_env python=3.11

# Activate
conda activate ml_env

# Install packages
conda install numpy pandas scikit-learn

# Export environment
conda env export > environment.yml
✅ Pros❌ Cons
Manages Python versionsLarge installation size
Handles non-Python deps (CUDA, MKL)Can be slow to resolve dependencies
Great for data scienceEnvironment files may not be portable
1
2
# Pull and run — everything included!
docker run -it -v $(pwd):/app ml-blog:latest python
✅ Pros❌ Cons
100% reproducibleLearning curve for beginners
Isolates everything (OS, libs, Python)Slight overhead
Same environment everywhereRequires Docker installation
Easy to share and deploy

Why Docker for This Series?

Among all options, Docker stands out for one critical reason: reproducibility.

graph LR A[You] --> B[Docker Container] C[Your Classmate] --> B D[Production Server] --> B B --> E["Same Python 3.12<br/>Same NumPy 2.x<br/>Same scikit-learn 1.x<br/>Same Results!"] style B fill:#e1f5fe style E fill:#c8e6c9

💡 The Goal: When you run the code from this blog series, you get exactly the same results as everyone else — no “it works on my machine” surprises!

Docker Installation (Windows)

Step 1: Install Docker Desktop

  1. Download Docker Desktop from: docker.com/products/docker-desktop
  2. Run the installer and follow the prompts
  3. During installation, ensure “Use WSL 2 based engine” is checked
  4. Restart your computer if prompted

Step 2: Verify Installation

Open PowerShell and run:

1
2
docker --version
docker run hello-world

If you see “Hello from Docker!”, you’re ready to go!

Docker Desktop

💡 Tip: If WSL2 is not installed, Docker Desktop will prompt you to install it.

Code Practice

Time to set up your environment! Choose one of the following options based on your preference.

VSCode Dev Containers provide the best development experience — your editor runs inside the container with all extensions pre-configured!

Step 1: Install VSCode Extension

Install the Dev Containers extension in VSCode:

  • Extension ID: ms-vscode-remote.remote-containers

Step 2: Create Configuration Files

Create a .devcontainer folder in your project root with these files:

.devcontainer/devcontainer.json:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
  "name": "Python 3.12 ML",
  "image": "python:3.12-slim",

  "customizations": {
    "vscode": {
      "settings": {
        "python.defaultInterpreterPath": "/usr/local/bin/python"
      },
      "extensions": [
        "ms-python.python",
        "ms-python.debugpy",
        "ms-toolsai.jupyter"
      ]
    }
  },

  "features": {
    "ghcr.io/devcontainers/features/git:1": {}
  },

  "postCreateCommand": "pip install --upgrade pip && pip install -r requirements.txt",

  "remoteUser": "root"
}

requirements.txt (in project root):

1
2
3
4
5
6
# ML Course Dependencies
numpy>=1.24.0
pandas>=2.0.0
matplotlib>=3.7.0
scikit-learn>=1.3.0
jupyter>=1.0.0

Step 3: Open in Container

  1. Open your project folder in VSCode
  2. Press F1 and select “Dev Containers: Reopen in Container”
  3. Wait for the container to build (first time takes a few minutes)
  4. You’re now working inside the container!
graph LR A[Your Project] --> B[VSCode] B --> C[Dev Container] C --> D["Python 3.12<br/>+ All ML Libraries<br/>+ Jupyter Support"] style C fill:#e1f5fe style D fill:#c8e6c9

Option B: Using venv (Without Docker)

If you prefer not to use Docker:

1
2
3
4
5
6
7
8
# Create and activate environment
python -m venv ml_env

# Windows
ml_env\Scripts\activate

# Install dependencies
pip install numpy pandas matplotlib scikit-learn jupyter

Option C: Using Conda (Without Docker)

1
2
3
4
5
# Create environment
conda create -n ml_code python=3.12 numpy pandas matplotlib scikit-learn jupyter -y

# Activate
conda activate ml_code

Environment Verification

Regardless of which method you chose, run this code to verify everything is set up correctly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# verify_environment.py
import sys
print(f"✓ Python {sys.version.split()[0]}")

import numpy as np
print(f"✓ NumPy {np.__version__}")

import pandas as pd
print(f"✓ Pandas {pd.__version__}")

import matplotlib
print(f"✓ Matplotlib {matplotlib.__version__}")

import sklearn
print(f"✓ scikit-learn {sklearn.__version__}")

print("\n🎉 Environment ready for the ML Code Series!")

Expected Output:

1
2
3
4
5
6
7
✓ Python 3.12.12
✓ NumPy 2.3.5
✓ Pandas 2.3.3
✓ Matplotlib 3.10.8
✓ scikit-learn 1.8.0

🎉 Environment ready for the ML Code Series!

Summary

MethodRecommendation
Docker⭐ Recommended — guaranteed reproducibility
venvGood alternative if Docker not available
CondaGood for data science workflows
System❌ Not recommended — may cause conflicts

References