November 19, 2025

Custom Labels for Node.js

The World So Far

One of the popular features of both Parca and Polar Signals Cloud, already discussed on this blog previously, is the ability to attach arbitrary key/value labels from within your code at runtime, which are then used to annotate the stack traces seen by our suite of profilers.

For example, if you use distributed tracing, you might set the trace_id label to the current trace ID. Then, when viewing profiles in Parca or Polar Signals Cloud later on, you can see a flame graph of CPU time (or GPU time or wall-clock time) for the exact trace you're interested in.

What's New: Tl;dr.

Until now, this feature was only available in Go, Rust, C, and C++ (and other languages that can link against C). Naturally, our users at Node shops (most notably Notion) were intrigued by this feature and wanted it to be available for their JavaScript and TypeScript code as well.

Today we are announcing the launch of the feature for Node.js v22 and above, with the @polarsignals/custom-labels library providing a simple API for instrumenting your code with whatever labels you want.

How to Use It

The library is very simple to use: just depend on our npm package:

npm install @polarsignals/custom-labels

Then, execute any piece of code wrapped in a withLabels call:

const cl = require('@polarsignals/custom-labels');

// snip...

cl.withLabels(
    () => {
        // some code to execute, during which the "username" and "traceId"
        // labels should be set
    },
    "username", currentUserName, "traceId", currentTraceId
);

Now, the username and traceId labels will be correctly set not only while the specified closure is executed, but also during any asynchronous work that it spawns. For full details, see the readme.

If that's all you wanted to know, great! If not, read on for some technical details of how the problem differs for Node compared to native code, and how we overcame those difficulties.

Technical Implementation Details

The Native Approach: Thread-Local Storage

In traditional synchronous programming, the fundamental unit of executing code is the thread. Furthermore, it is relatively straightforward to use thread-local variables in close-to-the-metal systems languages like Rust and C. Any thread can set its own value for such a variable, and later read back what it wrote, without interference from other threads in the system (which may in turn be setting their own values for the same variable).

For these reasons, the thread was chosen as the scope of granularity for custom labels in the initial Rust/C/C++ implementation. What this means is that, conceptually, code using custom_labels looks like this pseudocode:

On some thread:
1. Set label trace_id=4a4df982-c597-49c1-8fc6-c05a44f8d80d
2. Do some stuff on the same thread...
3. Set label trace_id=da4cc50d-af13-4c34-9bad-e45fb6194bd5
4. Do some more stuff on the same thread...
5. Delete label trace_id.

When this thread is paused by the kernel to to take a stack trace for profiling, the profiler can read the keys and values for the current set of labels out of the aforementioned thread-local storage variable, and record them as metadata for the trace.

The Native Approach Extended: Asynchronous Code In Rust

Modern-day programming, however, is often not traditional and synchronous. Various techniques beyond simple multithreading are used to structure concurrent code; for example, goroutines in Go, or async programming in Rust. We won't talk much about goroutines in this post, but it's worth digressing to talk about Rust, because its asynchronous programming model bears some similarity to Node's.

In the post linked above, we discussed making async code work with custom labels, but the underlying approach is basically the same: the current set of label is a thread-local property. We cope with the fact that async tasks might go in and out of being scheduled on the CPU several times before they complete, and might even migrate to different threads, by simply wrapping the stack of futures in one that applies a set of labels, polls its inner future, then un-applies them. Since that outer future is polled whenever the task is scheduled to run on a thread, this ensures that the current thread always contains the set of labels corresponding to the asynchronous task.

Node.js Custom Labels: First Try

Of course, when our users asked us to implement custom labels for Node.js, where asynchronous programming is absolutely ubiquitous, our first instinct was to do it the same way we do it in Rust, which was working well. When the user added labels to the current async task, we would call into a native extension library that would add them to a label set (maintained as a native object in C++ code). Later, whenever that task was scheduled to run on the CPU, we would set the thread-local current labelset variable to point to that set, and unset it when the task went off-CPU again.

This meant that our library had to gain control whenever the task was scheduled or unscheduled; these notifications are helpfully provided by Node's built-in async_hooks library; in particular, by the "before" and "after" hooks.

This approach had the advantage that we didn't have to change our profiler at all: it still found the labels pointed to by the same thread-local symbol that it already needed to read for Rust and other native languages; the labels-processing code didn't need to worry at all about the fact that it happened to be unwinding a JavaScript stack instead of a Rust one.

Unfortunately, it had a much more significant disadvantage: it created significant performance overhead. The extra bookkeeping by Node required to enable async hooks at all, combined with the cost of switching into our JavaScript code every time one of them fired, caused a meaningful regression in tail latency for our beta testers at Notion. This violated what I view as the cardinal rule of performance tools: first, do no harm!

Node.js Custom Labels: Working Version

Obviously, another approach was needed, so we went back to the drawing board. Thankfully, the developers of Node and v8 had already thought a lot about the problem of propagating a context through async code. This work culminated in the AsyncContextFrame-based implementation of AsyncLocalStorage. AsyncLocalStorage has existed for a while as a way to attach values to asynchronous tasks, but was previously based on async_hooks -- much like our "first try" described above -- with all the performance issues that entails.

However, since Node v22, an experimental flag causes it to instead be implemented in a much more performant way. The details are a bit complicated, but basically, the data to be propagated is written to a field maintained by v8 called continuation_preserved_embedder_data. The runtime itself then ensures that the correct value is applied whenever code is scheduled to run.

Fortunately, this feature was made default in Node v24, meaning our users will be able to set labels without relying on any special command-line flags.

On the implementation side, this means that setting the labels is now trivial (just create an instance of AsyncLocalStorage and manipulate it as necessary), but it makes actually finding the values in eBPF much more tedious. Rather than being stored at a well-known symbol like they are for native code, they now live in a JavaScript object in v8 internals. And neither Node nor v8 contains enough symbols by default to tell us the layout of its internal types, which would allow us to extract this data at runtime.

Fortunately, however, Node and v8 are open-source software. Therefore, we can check out the code ourselves and, for each released version, figure out the offsets in its internal objects of the relevant fields by hand. Better yet, rather than do anything by hand, we can use libclang to analyze the code for us and print out the relevant offsets, using a quick and dirty script like this one.

Putting it all together, our user-mode agent communicates the correct offsets to the eBPF-based unwinder based on the current Node version, and the unwinder finds the current set of labels at sampling time, and reports them as part of each sample.

Conclusion

To use the feature, make sure you're running at least parca-agent version 0.44.0. As always, happy profiling!

Discuss: