[← Back to main docs](index.md)

# GPU Stack Installation & CUDA Enablement

Use this guide to install the GPU-capable versions of PyTorch and TensorFlow,
verify CUDA availability, and switch between CPU and GPU smoke tests.

## 1. Prerequisites

- **NVIDIA driver** that supports the CUDA version you plan to use. Update
  through NVIDIA Experience (Windows) or your distro’s driver manager (Linux).
- **CUDA toolkit** (optional). Most pip wheels bundle the needed runtime, but
  native builds may require the toolkit/`nvcc`.
- **Python 3.10–3.12** in a virtual environment.

## 2. PyTorch (CUDA) Installation

Pick the CUDA build that matches your driver/toolkit. Example for CUDA 12.1:

```bash
# Windows / macOS / Linux (same pip command)
pip install torch==2.2.2 --index-url https://download.pytorch.org/whl/cu121
```

Need CUDA 11.8 instead? Swap the wheel URL:

```bash
pip install torch==2.2.2 --index-url https://download.pytorch.org/whl/cu118
```

### Verify PyTorch CUDA

```bash
python - <<'PY'
import torch
print("PyTorch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("Device:", torch.cuda.get_device_name(0))
PY
```

## 3. TensorFlow (GPU) Installation

TensorFlow ≥2.11 ships a unified wheel that includes GPU support as long as
the installed TensorFlow build matches the current driver/CUDA/cuDNN stack.
Install a TensorFlow version that is compatible with the host you are bringing
up. Example:

```bash
pip install tensorflow
```

### Verify TensorFlow CUDA

```bash
python - <<'PY'
import tensorflow as tf
print("TensorFlow:", tf.__version__)
gpus = tf.config.list_physical_devices("GPU")
print("GPUs:", gpus)
if not gpus:
    raise SystemExit("No GPU devices detected")
with tf.device("/GPU:0"):
    a = tf.random.normal((2048, 2048))
    b = tf.random.normal((2048, 2048))
    _ = tf.reduce_mean(tf.matmul(a, b)).numpy()
print("GPU matmul ok")
PY
```

## 4. Switching Between CPU and GPU Runs

- **CPU-only smoke test:** clear `CUDA_VISIBLE_DEVICES` and run:

  ```bash
  gpumemprof info
  gpumemprof track --duration 10 --interval 0.5 --output track.json --format json
  gpumemprof analyze track.json --format txt --output analysis.txt
  gpumemprof diagnose --duration 0 --output ./diag
  ```

  On Apple Silicon, clearing `CUDA_VISIBLE_DEVICES` disables CUDA but
  `gpumemprof info` may still report the `mps` backend. Treat this as a
  non-CUDA smoke test rather than a strict CPU-only force.

- **GPU path:** unset `CUDA_VISIBLE_DEVICES` (or set it to a GPU index) and run
  one known-good PyTorch workload plus the TensorFlow GPU matmul check:

  ```bash
  python -m examples.basic.pytorch_demo
  python - <<'PY'
  import tensorflow as tf
  from stormlog.tensorflow import TFMemoryProfiler

  profiler = TFMemoryProfiler(device="/GPU:0", enable_tensor_tracking=True)
  with profiler.profile_context("matmul_step"):
      a = tf.random.normal((4096, 4096))
      b = tf.random.normal((4096, 4096))
      c = tf.matmul(a, b)
      _ = tf.reduce_mean(c).numpy()

  results = profiler.get_results()
  print(f"Peak memory: {results.peak_memory_mb:.2f} MB")
  print(f"Snapshots captured: {len(results.snapshots)}")
  PY
  ```

  After this path is clean, you can run `python -m examples.basic.tensorflow_demo`
  as the training-backed source-checkout example.

## 5. Common Issues

| Symptom | Fix |
| --- | --- |
| `torch.cuda.is_available()` is False | Confirm NVIDIA driver is installed, retry with the correct CUDA wheel, or reboot after driver install. |
| TensorFlow sees `/GPU:0` but training-backed ops fail | Confirm the GPU matmul check above succeeds first, then align the TensorFlow/CUDA/cuDNN stack before moving to Keras or cuDNN-backed demos. |
| TensorFlow cannot find cuDNN | Install a TensorFlow build that matches the current CUDA/cuDNN stack and rerun the GPU matmul check before using training-backed examples. |
| `RuntimeError: CUDA driver not found` | Check that `nvidia-smi` works on the command line; reinstall the driver if necessary. |
| CI path installing wrong framework | Follow `.github/workflows/ci.yml` logic: install the base deps, then exactly one framework (PyTorch or TensorFlow). |

## 6. Next Steps

Once the GPU stack is working, run the Textual TUI for an interactive check:

```bash
pip install "stormlog[tui,torch]"
stormlog
```

Use the overview tab to confirm the profiler sees your GPUs before running the
full benchmarking or release checklist.