← Back to main docs

Troubleshooting Guide

This guide focuses on the failure modes that show up in the current codebase and workflows.

Installation and entrypoints

gpumemprof: command not found

Reinstall the package into the active environment:

pip install -e .
hash -r
gpumemprof --help

tfmemprof: command not found

Install the TensorFlow extra:

pip install -e ".[tf]"
hash -r
tfmemprof --help

stormlog: command not found

Install the TUI dependencies:

pip install -e ".[tui,torch]"
hash -r
stormlog

Missing dependencies

ModuleNotFoundError: No module named 'torch'

Install the PyTorch extra instead of trying to use CUDA-specific profiling without the framework:

pip install "stormlog[torch]"

ModuleNotFoundError: No module named 'tensorflow'

pip install "stormlog[tf]"

Visualization export errors

PNG and HTML exports depend on the visualization stack:

pip install "stormlog[viz]"

Runtime mismatches

GPUMemoryProfiler fails on a non-CUDA machine

That class is for CUDA-backed PyTorch profiling. On CPU-only or MPS-only systems, use:

  • gpumemprof monitor

  • gpumemprof track

  • CPUMemoryProfiler

  • CPUMemoryTracker

If you need setup guidance for real CUDA profiling, see the GPU Setup Guide.

TensorFlow CLI is installed but reports no GPU

Start with:

tfmemprof info

If it still reports no GPU devices, treat it as an environment problem first:

  • TensorFlow build may be CPU-only

  • device visibility may be restricted

  • the current host may genuinely be CPU-only

The profiler still supports CPU-backed TensorFlow runs.

TUI issues

Monitoring starts but Visualizations stays empty

The Visualizations tab only renders after timeline samples exist.

Use this sequence:

  1. open Monitoring

  2. click Start Live Tracking

  3. let the workload run long enough to create samples

  4. open Visualizations

  5. click Refresh Timeline

Diagnostics loads but shows no rank data

Check the source you loaded:

  • live diagnostics require an active tracker session with telemetry events

  • artifact diagnostics require real JSON, CSV, or diagnose paths

  • after changing artifact paths, click Refresh

PNG or HTML export appears blank

This usually means there were no timeline samples, not that the export code failed.

Validate in order:

  1. start tracking

  2. confirm the monitoring log is receiving events

  3. refresh the Visualizations tab

  4. export again

The TUI layout looks broken

The app can run in a small terminal, but it is easier to use with a wider window. The deterministic snapshot coverage uses roughly 140x44.

CLI workflow issues

gpumemprof analyze rejects --input

That is expected. The current PyTorch CLI uses a positional input file:

gpumemprof analyze track.json --format txt --output analysis.txt

tfmemprof analyze rejects the positional input style

That is also expected. The current TensorFlow CLI uses --input:

tfmemprof analyze --input tf_monitor.json --detect-leaks --optimize

Diagnose bundle feels too slow

Use --duration 0 when you only need the bundle structure and not a new sampling window:

gpumemprof diagnose --duration 0 --output ./diag_bundle
tfmemprof diagnose --duration 0 --output ./tf_diag

CI and docs issues

Sphinx build fails locally

Install the docs extra and rebuild:

pip install -e ".[docs]"
python3 -m sphinx -W --keep-going -b html docs docs/_build/html

A docs snippet looks suspicious

Use the code and --help output as the source of truth, then run:

python3 -m pytest tests/test_docs_regressions.py -v