PyTorch Guide

This guide is for the current PyTorch-backed Stormlog workflow: profiling torch.cuda workloads, tracking long-running jobs across supported backends, and exporting artifacts for later diagnostics.

Before you start

Validate the environment:

gpumemprof info
python -m examples.basic.pytorch_demo

python -m examples.basic.pytorch_demo is source-checkout only. If you installed from PyPI, use gpumemprof info, gpumemprof track, and the Python snippets in the Usage Guide instead.

If no supported torch.cuda backend is available, the example script will skip the bounded profiling path. In that case, use the tracker, CLI, or CPU-only flows in the CPU Compatibility Guide.

Daily workflow: ML engineer

Use this when you want to understand the memory cost of a training or inference step.

import torch
from stormlog import GPUMemoryProfiler

profiler = GPUMemoryProfiler(track_tensors=True)
device = profiler.device
model = torch.nn.Linear(1024, 256).to(device)

def train_step() -> torch.Tensor:
    x = torch.randn(64, 1024, device=device)
    y = model(x)
    return y.sum()

profile = profiler.profile_function(train_step)
summary = profiler.get_summary()

print(profile.function_name)
print(f"Peak memory: {summary['peak_memory_usage'] / (1024**3):.2f} GB")

Use profile_context when you want one named block rather than one function call:

import torch
from stormlog import GPUMemoryProfiler

profiler = GPUMemoryProfiler()
device = profiler.device
model = torch.nn.Linear(1024, 256).to(device)

with profiler.profile_context("forward_pass"):
    outputs = model(torch.randn(32, 1024, device=device))

Daily workflow: debugging growth over time

Use the tracker when one profiled call is not enough. This is the backend-aware PyTorch path for CUDA, ROCm, and MPS telemetry.

from stormlog import MemoryTracker

tracker = MemoryTracker(
    sampling_interval=0.5,
    enable_alerts=True,
)

tracker.start_tracking()

# run workload here

tracker.stop_tracking()
stats = tracker.get_statistics()
print(stats["total_events"])
print(stats["peak_memory"])

If you want exports without writing custom code, use the CLI instead:

gpumemprof track --duration 30 --interval 0.5 --output track.json --format json
gpumemprof analyze track.json --format txt --output analysis.txt
gpumemprof diagnose --duration 0 --output ./diag_bundle

Daily workflow: release or CI triage

Use these commands when you need a reproducible signal rather than an ad hoc notebook session:

Source checkout only. These commands require the repository examples/ package.

python -m examples.cli.quickstart
python -m examples.cli.capability_matrix --mode smoke --target both --oom-mode simulated

If the run produces telemetry or a diagnose bundle, load the artifacts into the TUI Diagnostics tab to compare ranks or inspect anomaly indicators.

TUI-assisted PyTorch workflow

The current TUI PyTorch-related tabs are:

PyTorch for sample workloads and collected profile summaries
Monitoring for live tracking and exports
Visualizations for timeline export
Diagnostics for artifact review

Launch:

pip install "stormlog[tui,torch]"
stormlog

Recommended validation sequence

Use this when changing PyTorch behavior or reviewing a regression:

Source checkout only. Replace the example-module steps below with the CLI-only validation from the CLI Guide if you installed from PyPI.

gpumemprof info
python -m examples.basic.pytorch_demo
python -m examples.advanced.tracking_demo
gpumemprof track --duration 10 --interval 0.5 --output track.json --format json
gpumemprof analyze track.json --format txt --output analysis.txt
gpumemprof diagnose --duration 0 --output ./diag_bundle

Common issues

`GPUMemoryProfiler` raises because no `torch.cuda` backend is available

That is expected on CPU-only or MPS-only hosts. Use MemoryTracker, the CLI, or the CPU profiler classes instead.

`gpumemprof analyze` rejects `--input`

The current CLI uses a positional input file:

gpumemprof analyze track.json --format txt --output analysis.txt

Plot export fails

Install the visualization extra:

pip install "stormlog[viz]"