Iteratively explore your flame graphs with the MCP sandwich tool

Don't bloat your LLM's context. Let it iteratively explore the parts of your flame graphs that actually matter.

Author:Matthias Loibl
Matthias Loibl

The power of continuous profiling comes from always having the data. Every function, every millisecond, already collected and waiting. But data only helps if you can get to it without breaking your flow, and for a while the last step was always the same: stop what you're doing, open the profiler UI, and start clicking.

Our MCP server already closed most of that gap. Point an MCP client (Claude Code, Claude Desktop, Cursor, anything that speaks the Model Context Protocol) at your Polar Signals data, and you can ask it questions in plain English. It answers from your actual production profiles instead of guessing.

This release makes it noticeably better. We added tools that let the model read your profiles the way you'd read a flame graph, and cleaned up the wording across the existing ones so it picks the right one more often.

What we had so far

The workflow was already most of the way there. The model finds the right profile type, lists the labels and their values to scope down to one process, and then pulls a profile. The catch is what that profile looked like: a flat table.

  • list_projects
  • profile_types
  • labels
  • values
  • get_profile
  • query_source_report

As you can see the tools were named quite randomly. get_profile was actually a bad name for the tool as it didn't describe what the tool did, it should have been get_profile_table in this case.

However, the table is genuinely useful. It lists every function with its cumulative value (time including everything it calls) and its flat value (time spent in the function itself), so "where is the time going" gets answered in a sentence. But a table is flat by definition. It throws away the one thing that makes a flame graph a flame graph: the call stack. Which function calls which, and how the cost flows through the call stack, isn't in there.

For an LLM that's a real problem, and not only a missing feature. To reason about structure from a flat table, the model has to load the whole table into its context and reconstruct the relationships in its head. The interesting function is three rows from the top, but the model is carrying all several hundred rows to find it.

So what's new?

That's why we added the sandwich tool. Once the table surfaces a suspicious function, the model drills into that one function instead of staring at the table. The sandwich isn't a flame graph, but it brings back the hierarchy a flame graph gives you: it returns two trees rooted at the focused function, its callers branching up to the entry points that pay for it, and its callees branching down to the leaves it spends time in.

Sandwich view for comm=prometheus focused on the runtime.mallocgc function

Note: runtime.slicebytetostring, runtime.newobject, runtime.growslice,runtime.makeslice are directly above the runtime.mallocgc function in the sandwich visualization screenshot above and are the first four callers in the MCP output below.

polarsignals - query_profile_sandwich (MCP)(project_id: "ec7e346b-...", query: "parca_agent:samples:count:cpu:nanoseconds:delta{comm=\"prometheus\"}", function_name: "runtime.mallocgc", time_range: "1h")
Sandwich View for: runtime.mallocgc
Query: parca_agent:samples:count:cpu:nanoseconds:delta{comm="prometheus"}
Time Range: 2026-06-30T14:12:06Z to 2026-06-30T15:12:06Z
Two call trees, each with the focused function at depth 0 and listed breadth-first — rebuild each tree from its link column. The link column and its direction differ per section (see the note under each); -1 marks the focused function. Unsymbolized frames (empty function_name) add an 'address' column with their hex memory address.

=== CALLERS ===
Callers of the focused function. It is shown at depth 0 but is really the innermost frame — the leaf of the call stack — so its descendants are the functions that call it and reading deeper walks UPWARD toward the entry. Each row's child is the function it called toward the focus. The deepest row is only the highest frame captured here, NOT necessarily the true root: the actual entry point is at the top of the stack and may lie beyond what's shown (the row limit can cut it off), so don't assume the deepest frame is where the goroutine started.
Truncated to the top 100 frames (breadth-first); deeper frames omitted.
[100]{id,child,depth,cumulative,flat,function_name}:
  0,-1,0,18.7s,0s,runtime.mallocgc
  1,0,1,3.9s,0s,runtime.slicebytetostring
  2,0,1,9.3s,0s,runtime.newobject
  3,0,1,1.1s,0s,runtime.growslice
  4,0,1,3.4s,0s,runtime.makeslice
  5,4,2,526.3ms,0s,github.com/prometheus/prometheus/tsdb.(*blockBaseSeriesSet).Next
  6,2,2,1.6s,0s,github.com/prometheus/prometheus/tsdb.(*blockChunkSeriesSet).At
  7,2,2,4.1s,0s,github.com/prometheus/prometheus/tsdb.(*blockSeriesSet).At
  8,2,2,526.3ms,0s,github.com/prometheus/prometheus/tsdb.(*blockSeriesEntry).Iterator
  9,3,2,105.3ms,0s,github.com/prometheus/prometheus/promql.expandSeriesSet
  10,4,2,210.5ms,0s,github.com/prometheus/prometheus/tsdb/chunkenc.NewXORChunk
 ...

=== CALLEES ===
Callees of the focused function, which is the root (depth 0). Each row's parent is the function that called it (normal flame-graph direction), so reading deeper follows execution into callees; the deepest leaves are where time is actually spent.
[77]{id,parent,depth,cumulative,flat,function_name}:
  0,-1,0,21.8s,1.2s,runtime.mallocgc
  1,0,1,13.3s,2.7s,runtime.mallocgcSmallScanNoHeader
  2,0,1,1.5s,315.8ms,runtime.mallocgcTiny
  3,0,1,4.3s,684.2ms,runtime.mallocgcSmallNoscan
  4,1,2,3.2s,3.1s,runtime.nextFreeFast
  5,2,2,473.7ms,473.7ms,runtime.nextFreeFast
  6,2,2,105.3ms,0s,asm_common_interrupt
  7,3,2,1.6s,1.5s,runtime.nextFreeFast
  8,2,2,263.2ms,263.2ms,runtime.acquirem
  9,2,2,157.9ms,157.9ms,runtime.getMCache
  10,1,2,2.1s,105.3ms,runtime.(*mcache).nextFree
  9,2,2,157.9ms,157.9ms,runtime.getMCache
  10,1,2,2.1s,105.3ms,runtime.(*mcache).nextFree
  ...

The payoff is that the model navigates instead of dumps. It walks up and down those trees and pulls only the branches that matter for the question at hand into context, the same way you'd expand and collapse a flame graph rather than read every frame at once. Less noise, more signal, and the model stays focused on the path that's actually expensive.

Sometimes you want the whole tree, though, not a single function. So this release also brings actual flame graphs into the MCP. We had a tool called flame graph before, but it returned a table, the same naming sloppiness we just owned up to above. The new query_profile_flamegraph returns the real hierarchy. A full flame graph dumped into context has the same problem as the flat table, only bigger, and that's what the focus_path parameter is for: pass a stack prefix and the tree stays rooted at the real root while every sibling branch gets trimmed away, so the model zooms into one path the same way you'd click into a flame graph in the UI. It even works on frames without symbols. Unsymbolized frames return their hex memory address, and focus_path accepts those addresses, so kernel and native stacks are just as navigable.

Alongside the new tools, we went through the existing ones and cleaned up their descriptions and output wording. Small thing, but it matters: the clearer each tool describes itself, the more reliably the model reaches for the right one instead of guessing or asking you to clarify.

  • list_projects
  • query_profile_types
  • query_profile_labels
  • query_profile_label_values
  • query_profile_table
  • query_profile_sandwich
  • query_profile_flamegraph
  • query_profile_source

Teaching the model the workflow

Tool descriptions only carry so much, though. The MCP server also serves a continuous-profiling skill, a markdown resource that agents automatically load before their first profiling query. It walks the model through the whole investigation: find the project, pick a profile type, scope down to a single process with labels and their values, then table, sandwich, flame graph, and finally source. With that sequence in context, the model picks the right tool with the right parameters far more reliably, instead of fumbling through the API one guess at a time.

A skill is context too, of course, loaded on every investigation. So for this release we rewrote it: about 40% smaller, from roughly 4,700 tokens down to 2,850, with the guidance for the sandwich and flame graph tools added in.

Evals

Rewriting the instructions the model depends on is exactly the kind of change that regresses silently, so we run evals for the skill: the same set of questions, replayed against a frozen window of real profiling data that we also use for our UI end-to-end tests, before and after every change. The rewrite passed without regressions, and the run traces showed the model actually picking up the new flame graph guidance, depth limits, focus paths and all. If a shorter skill or a reworded tool description ever makes the model investigate worse, the evals catch it before we release it.

One question, end to end

Here's the example that sold us on the sandwich tool. Ask about a single garbage-collection function:

Where is runtime.scanobject spending time, and what's calling it?

The model calls the sandwich tool, which roots its trees at runtime.scanobject and walks both directions. Run live against a real process, the callers side comes back with three GC paths into scanobject (background fractional, background idle, and mutator-assist), all rooted at runtime.systemstack, with the sample times summing exactly to the root:

runtime.systemstack                                            842.1ms
├── runtime.gcAssistAlloc.func1 → … → runtime.scanobject       210.5ms
└── runtime.gcBgMarkWorker.func2
    ├── runtime.gcDrainMarkWorkerFractional → … → scanobject   421.1ms
    └── runtime.gcDrainMarkWorkerIdle → … → scanobject         210.5ms
                                                          ── = 842.1ms

The model reads that back as a diagnosis: most of the scanobject time is background mark work, with a chunk of mutator-assist, meaning the allocator is occasionally falling behind and pulling your own goroutines into GC. All from one question, without leaving the chat.

Notice that the leaves sum exactly to the root. That's not decoration. It's a correctness property of the underlying aggregation, and the cheapest check we have that the query did the right thing.

How it stays honest

One detail we're happy with. Every one of these tools now builds its query from the same templates our cloud UI renders. We share a small set of query templates across the Go services and the TypeScript frontend, with golden fixtures that fail CI the moment the two drift apart.

Simply put, the MCP server and the web app ask your database the exact same questions. When you cross-check a model's answer against the flame graph in the UI, they match, because they're literally the same query underneath.

Try it out

The MCP server speaks to Polar Signals Cloud and works with any MCP-capable client. We're just getting started here.

If you already have the MCP server connected, the new tools are there the next time you ask why a function is hot. If you don't, the AI & IDEs page on Polar Signals Cloud walks you through pointing your client at your data. We hope this helps you get to your profiles without breaking your flow, and we'd love to hear what you think!


Sign up for Polar Signals Cloud to try it out. If you have questions, join us on Discord, we'd love to hear from you.

Keep up with Polar Signals

Receive new posts, product updates, and insights on performance engineering straight to your inbox.