Latest in Continuous Profiling for GPUs

Attending ai.engineer 2025 in San Francisco

June 4, 2025

This week we're attending the ai.engineer conference in San Francisco.

We are thrilled to have a booth! Please join us at S18.


By adding more metrics, our continuous profiling for GPUs product now includes detailed clock speed, PCIe throughput data, and temperature, providing deeper contextual insights into performance and potential bottlenecks.

Expanded Metric Collection

Clock Speed Metrics

GPUs aggressively throttle when hitting temperature limits (typically 83-87°C). You'll see clock speeds drop in steps, sometimes losing 200-300MHz, directly impacting frame rates and rendering performance.

Image

Temperature Metrics

Modern GPUs operate safely up to specific temperature limits, typically 83-87°C for most consumer cards. Understanding these thresholds helps you recognize when your card approaches dangerous territory.

Image

PCIe Throughput Metrics

Modern high-end GPUs can saturate PCIe bandwidth during specific scenarios. Monitoring reveals when the PCIe bus becomes the limiting factor rather than GPU processing power.

Image

Per-Process Utilization Metrics

Process-level GPU utilization tracking in PyTorch environments provides granular visibility into how individual training processes, data loaders, and model components consume GPU resources, enabling precise optimization of machine learning workflows.

Drill down into a node, GPU, and process by process ID (pid).

Image

On-GPU-Time profiles for CUDA

We’ll show off a sneak peek of a feature that’s coming soon: GPU time profiles. These show the time taken on the GPU for each CUDA kernel, as well as a flame graph of the associated CPU stack traces that led to their launch.

This allows engineers working on GPU-bound workflows to easily identify at a glance which kernels they should focus their optimization efforts on for maximum impact.

A small CUDA program generating random matrices and multiplying them.
A small CUDA program generating random matrices and multiplying them.

A more realistic flame graph showing the time spent on the GPU for a PyTorch training.
A more realistic flame graph showing the time spent on the GPU for a PyTorch training.

In the coming weeks, we will blog about this new feature in more detail. Stay tuned!

Get Started Today!

Ready to unlock the full potential of your GPUs?

  • Learn More: Visit the Polar Signals Continuous Profiling for GPUs Product Page for detailed information.
  • See the Docs: Dive into our Documentation to understand setup and usage.
  • Request a Demo: Talk to our team to get you started and see how it can fit your specific needs.
Discuss:
Sign up for the latest Polar Signals news