ML-00: Python Environment Setup for Machine Learning

Summary
Compare Python environment options and set up a reproducible ML environment using Docker for this blog series. We examine System Python, venv, Conda, and Docker, recommending Docker + VSCode Dev Containers for guaranteed reproducibility.

Learning Objectives

After reading this post, you will be able to:

  • Understand the pros/cons of different Python environment methods
  • Know why Docker is recommended for this series
  • Set up the ML environment using Docker (recommended) or alternatives
  • Run the first test to verify your environment

Theory

Python Environment Methods Comparison

There are several ways to manage Python environments. Here’s a quick comparison:

MethodDescriptionBest For
System PythonPython installed directly on OSQuick scripts, beginners
venvBuilt-in virtual environmentSimple projects, lightweight
CondaPackage + environment managerData science, complex dependencies
DockerContainerized environmentReproducibility, team collaboration
Python Environment Methods
More Isolation
Better Dependency Mgmt
Complete Portability
System Python
Simple but risky
venv
Lightweight isolation
Conda
Data science friendly
Docker
Full reproducibility

Method Details

1
2
# Just install packages globally
pip install numpy pandas scikit-learn
✅ Pros❌ Cons
Zero setupPollutes system Python
Works immediatelyVersion conflicts between projects
Hard to reproduce on other machines

venv (Python Built-in)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Create virtual environment
python -m venv ml_env

# Activate (Windows)
ml_env\Scripts\activate

# Activate (macOS/Linux)
source ml_env/bin/activate

# Install packages
pip install numpy pandas scikit-learn

# Save dependencies
pip freeze > requirements.txt
✅ Pros❌ Cons
Built into PythonOnly isolates Python packages
LightweightDoesn’t handle system libraries
Easy to understandPython version tied to system

Conda (Anaconda/Miniconda)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Create environment with specific Python version
conda create -n ml_env python=3.11

# Activate
conda activate ml_env

# Install packages
conda install numpy pandas scikit-learn

# Export environment
conda env export > environment.yml
✅ Pros❌ Cons
Manages Python versionsLarge installation size
Handles non-Python deps (CUDA, MKL)Can be slow to resolve dependencies
Great for data scienceEnvironment files may not be portable
1
2
# Pull and run — everything included!
docker run -it -v $(pwd):/app ml-blog:latest python
✅ Pros❌ Cons
100% reproducibleLearning curve for beginners
Isolates everything (OS, libs, Python)Slight overhead
Same environment everywhereRequires Docker installation
Easy to share and deploy

Why Docker for This Series?

You
Docker Container
Your Classmate
Production Server
Same Python 3.12
Same NumPy 2.x
Same scikit-learn 1.x
Same Results!

💡 The Goal: When you run the code from this blog series, you get exactly the same results as everyone else — no “it works on my machine” surprises!

Docker Installation (Windows)

Step 1: Install Docker Desktop

  1. Download Docker Desktop from: docker.com/products/docker-desktop
  2. Run the installer and follow the prompts
  3. During installation, ensure “Use WSL 2 based engine” is checked
  4. Restart your computer if prompted

Step 2: Verify Installation

Open PowerShell and run:

1
2
docker --version
docker run hello-world

If you see “Hello from Docker!”, you’re ready to go!

Docker Desktop

💡 Tip: If WSL2 is not installed, Docker Desktop will prompt you to install it.

Code Practice

VSCode Dev Containers provide the best development experience — your editor runs inside the container with all extensions pre-configured!

Step 1: Install VSCode Extension

Install the Dev Containers extension in VSCode:

  • Extension ID: ms-vscode-remote.remote-containers

Step 2: Create Configuration Files

Create a .devcontainer folder in your project root with these files:

.devcontainer/devcontainer.json:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
  "name": "Python 3.12 ML",
  "image": "python:3.12-slim",

  "customizations": {
    "vscode": {
      "settings": {
        "python.defaultInterpreterPath": "/usr/local/bin/python"
      },
      "extensions": [
        "ms-python.python",
        "ms-python.debugpy",
        "ms-toolsai.jupyter"
      ]
    }
  },

  "features": {
    "ghcr.io/devcontainers/features/git:1": {}
  },

  "postCreateCommand": "pip install --upgrade pip && pip install -r requirements.txt",

  "remoteUser": "root"
}

requirements.txt (in project root):

1
2
3
4
5
6
# ML Course Dependencies
numpy>=1.24.0
pandas>=2.0.0
matplotlib>=3.7.0
scikit-learn>=1.3.0
jupyter>=1.0.0

Step 3: Open in Container

  1. Open your project folder in VSCode
  2. Press F1 and select “Dev Containers: Reopen in Container”
  3. Wait for the container to build (first time takes a few minutes)
  4. You’re now working inside the container!
Your Project
VSCode
Dev Container
Python 3.12
+ All ML Libraries
+ Jupyter Support

Option B: Using venv (Without Docker)

If you prefer not to use Docker:

1
2
3
4
5
6
7
8
# Create and activate environment
python -m venv ml_env

# Windows
ml_env\Scripts\activate

# Install dependencies
pip install numpy pandas matplotlib scikit-learn jupyter

Option C: Using Conda (Without Docker)

1
2
3
4
5
# Create environment
conda create -n ml_code python=3.12 numpy pandas matplotlib scikit-learn jupyter -y

# Activate
conda activate ml_code

Environment Verification

Run this code to verify your setup is correct:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# verify_environment.py
import sys
print(f"✓ Python {sys.version.split()[0]}")

import numpy as np
print(f"✓ NumPy {np.__version__}")

import pandas as pd
print(f"✓ Pandas {pd.__version__}")

import matplotlib
print(f"✓ Matplotlib {matplotlib.__version__}")

import sklearn
print(f"✓ scikit-learn {sklearn.__version__}")

print("\n🎉 Environment ready for the ML Code Series!")

Expected Output:

1
2
3
4
5
6
7
✓ Python 3.12.12
✓ NumPy 2.3.5
✓ Pandas 2.3.3
✓ Matplotlib 3.10.8
✓ scikit-learn 1.8.0

🎉 Environment ready for the ML Code Series!

Summary

MethodRecommendation
Docker⭐ Recommended — guaranteed reproducibility
venvGood alternative if Docker not available
CondaGood for data science workflows
System❌ Not recommended — may cause conflicts

References