TensorFlow Guide

This guide covers the current TensorFlow workflow in Stormlog: profiling TensorFlow code directly, tracking TensorFlow memory usage from the CLI, and exporting artifacts for later review.

Before you start

Validate the environment:

tfmemprof info

If you are bringing up a GPU runtime, start with the workload-backed matmul recipe in TensorFlow Production Recipes before using training-backed examples.

python -m examples.basic.tensorflow_demo is source-checkout only and exercises TensorFlow’s training path. Use it after the runtime is aligned for training- backed ops, not as the first GPU smoke test.

These checks work on CPU-backed TensorFlow installs as well as GPU-backed ones.

Daily workflow: ML engineer

Use TFMemoryProfiler when you want snapshots and aggregate results around a real TensorFlow workload.

from stormlog.tensorflow import TFMemoryProfiler

profiler = TFMemoryProfiler(enable_tensor_tracking=True)

with profiler.profile_context("training"):
    model.fit(x_train, y_train, epochs=1, batch_size=32)

results = profiler.get_results()
print(f"Peak memory: {results.peak_memory_mb:.2f} MB")
print(f"Snapshots captured: {len(results.snapshots)}")

Decorator-style profiling also exists through the TensorFlow profiler API, but the context-manager flow above is the clearest daily-workflow path.

Daily workflow: investigate sustained growth

The current TensorFlow CLI is the simplest way to capture longer-running telemetry:

tfmemprof monitor --interval 0.5 --duration 30 --output tf_monitor.json
tfmemprof track --interval 0.5 --threshold 4096 --output tf_track.json
tfmemprof analyze --input tf_monitor.json --detect-leaks --optimize --report tf_report.txt
tfmemprof diagnose --duration 0 --output ./tf_diag

For CPU-backed TensorFlow or when the GPU backend is unavailable, add --device /CPU:0 to monitor and track.

Key difference from gpumemprof:

TensorFlow analysis uses --input
PyTorch analysis uses a positional file argument

Daily workflow: release or CI triage

Use the maintained example and scenario modules instead of inventing a one-off shell script:

Source checkout only. These commands require the repository examples/ package.

python -m examples.cli.quickstart
python -m examples.scenarios.tf_end_to_end_scenario
python -m examples.cli.capability_matrix --mode smoke --target both --oom-mode simulated

This is the fastest route to a reproducible TensorFlow artifact set.

TUI-assisted TensorFlow workflow

The current TUI can help after or during a TensorFlow run:

TensorFlow tab for sample workloads and collected summaries
Monitoring for live tracking
Visualizations for plot export
Diagnostics for artifact review
CLI & Actions for command-driven flows

Launch:

pip install "stormlog[tui,torch]"
stormlog

The current TUI startup path imports PyTorch immediately, so TensorFlow-only environments still need the torch extra for stormlog to launch.

Recommended validation sequence

Use this when you need a compact TensorFlow confidence pass:

tfmemprof info
tfmemprof monitor --interval 0.5 --duration 15 --output tf_monitor.json
tfmemprof analyze --input tf_monitor.json --detect-leaks --optimize --report tf_report.txt
tfmemprof diagnose --duration 0 --output ./tf_diag

If you are validating a GPU-backed runtime, run the /GPU:0 matmul recipe from TensorFlow Production Recipes before this sequence. If you want the source-checkout training-backed example after that, run python -m examples.basic.tensorflow_demo.

Common issues

`tfmemprof` is installed but no GPU devices appear

Run:

tfmemprof info

If it still shows no GPU devices, treat it as an environment issue first. The TensorFlow flow still supports CPU-backed runs.

`tfmemprof analyze` rejects positional input

That is expected. Use:

tfmemprof analyze --input tf_monitor.json --detect-leaks --optimize

Plot export fails

Install the visualization extra:

pip install "stormlog[viz]"