stormlog.device_collectors
Backend-aware device memory collector abstractions.
Functions
|
Build a backend collector for CUDA/ROCm/MPS runtime environments. |
Return the active torch runtime backend in this environment. |
Classes
|
Collector for NVIDIA CUDA runtime memory counters. |
Backend-specific collector contract for device memory signals. |
|
|
Normalized device-memory sample produced by a backend collector. |
|
Device-memory sample plus diagnostics about partial/core collection failures. |
|
Collector for Apple Metal (MPS) runtime counters. |
|
Collector for ROCm runtimes surfaced through torch.cuda APIs. |
- class stormlog.device_collectors.DeviceMemoryCollector[source]
Bases:
ABCBackend-specific collector contract for device memory signals.
- abstract is_available()[source]
Return whether this collector can sample in the current runtime.
- Return type:
bool
- class stormlog.device_collectors.DeviceMemorySample(allocated_bytes, reserved_bytes, used_bytes, free_bytes, total_bytes, active_bytes, inactive_bytes, device_id)[source]
Bases:
objectNormalized device-memory sample produced by a backend collector.
- Parameters:
allocated_bytes (int)
reserved_bytes (int)
used_bytes (int)
free_bytes (int | None)
total_bytes (int | None)
active_bytes (int | None)
inactive_bytes (int | None)
device_id (int)
- allocated_bytes: int
- reserved_bytes: int
- used_bytes: int
- free_bytes: int | None
- total_bytes: int | None
- active_bytes: int | None
- inactive_bytes: int | None
- device_id: int
- class stormlog.device_collectors.DeviceMemorySampleResult(sample, partial_fields=(), errors=<factory>, core_error=None)[source]
Bases:
objectDevice-memory sample plus diagnostics about partial/core collection failures.
- Parameters:
sample (DeviceMemorySample | None)
partial_fields (tuple[str, ...])
errors (dict[str, str])
core_error (str | None)
- sample: DeviceMemorySample | None
- partial_fields: tuple[str, ...] = ()
- errors: dict[str, str]
- core_error: str | None = None
- property is_partial: bool
- property is_core_failure: bool
- class stormlog.device_collectors.CudaDeviceCollector(device=None)[source]
Bases:
DeviceMemoryCollectorCollector for NVIDIA CUDA runtime memory counters.
- Parameters:
device (Union[str, int, torch.device, None])
- telemetry_collector = 'stormlog.cuda_tracker'
- is_available()[source]
Return whether this collector can sample in the current runtime.
- Return type:
bool
- class stormlog.device_collectors.ROCmDeviceCollector(device=None)[source]
Bases:
CudaDeviceCollectorCollector for ROCm runtimes surfaced through torch.cuda APIs.
- Parameters:
device (Union[str, int, torch.device, None])
- telemetry_collector = 'stormlog.rocm_tracker'
- class stormlog.device_collectors.MPSDeviceCollector(device=None)[source]
Bases:
DeviceMemoryCollectorCollector for Apple Metal (MPS) runtime counters.
- Parameters:
device (Union[str, int, torch.device, None])
- telemetry_collector = 'stormlog.mps_tracker'
- is_available()[source]
Return whether this collector can sample in the current runtime.
- Return type:
bool