Correctly Profiling Node.js with Zero-Instrumentation

Or: A tale of handling unexpected behaviour by the v8 runtime.

Correctly Profiling Node.js with Zero-Instrumentation
July 12, 2023

The Parca Agent project's aim is to provide profiling for any server application without any instrumentation necessary. Among many other languages and runtimes, we want to support Node.js environments as best as possible.

Node.js is a popular JavaScript runtime environment that is used for server-side applications. It's built on Chrome's V8 JavaScript engine, which implements just-in-time (JIT) compilation to achieve superior runtime performance. However, the JIT compilation process makes certain tasks, such as profiling, particularly challenging. In this blog post, we'll dive into how we've recently improved the profiling capabilities for Node.js applications in the Parca Agent project.

What is described in this blog post has recently been released as part of v0.22.0 of Parca Agent.

The Problem

Recently, users of Parca Agent noticed that specifically for Node.js workloads, profiling data was not showing up at all in Polar Signals Cloud, even though they were sure it was using plenty of CPU. We investigated and quickly found that, indeed, metrics were showing that stack unwinding was unsuccessful at a high rate for these workloads.

After investigating various avenues that lead nowhere, we eventually found this code comment piece of code in the v8 codebase:

// Generate code for entering a JS function with the interpreter.
// On entry to the function the receiver and arguments have been pushed on the
// stack left to right.
//
// The live registers are:
// o rax: actual argument count
// o rdi: the JS function object being called
// o rdx: the incoming new target or generator object
// o rsi: our context
// o rbp: the caller's frame pointer
// o rsp: stack pointer (pointing to return address)
//
// The function builds an interpreter frame. See InterpreterFrameConstants in
// frame-constants.h for its layout.

Aha! What this means is that `Builtins_` in Node.js, while compiled ahead-of-time (AoT) with the runtime, they are created using a custom code generator. This means these functions are essentially equivalent to hand-written assembly without telling the tooling around it how to unwind these functions like the rest of the runtime code does. However, we are in luck as these functions make use of frame-pointers, so we can reliably use frame-pointers to unwind these frames (if you're unfamiliar or want to understand in more depth what this means, read our blog post on DWARF-based Stack Walking Using eBPF).

Since this is equivalent to writing raw assembly, this could have also been solved in the v8 runtime by specifying `.cfi` directives, but these binaries are widely used in real deployments, so we had no choice other than introduce a special case for v8.

The Solution: Frame Pointer Unwinding

We addressed this issue by enhancing Node.js profiling support to use frame-pointer unwinding specifically for these frames and then switch back to DWARF unwinding. With frame-pointer unwinding, it's possible to trace the call stack back through its frame pointers, allowing us to profile Node.js and other V8-based applications more accurately and comprehensively.

The full patch can be found here.

And with these patches: Success! We're seeing both the Javascript frames unwinded and symbolized, as well as the Node.js runtime:

Screenshot showing Node.js runtime and Javascript code frames of code calculating the Fibonacci sequence.
Screenshot showing Node.js runtime and Javascript code frames of code calculating the Fibonacci sequence.

What does this mean for Node.js developers?

These patches provide Node.js/V8-based application developers with more comprehensive profiling data through the Parca Agent project. Using this in-depth data can help developers identify bottlenecks in their code, which in turn will enable them to optimize their applications for better performance.

Aside from demonstrating better support in Parca Agent, we hope to help developers of other debugging tools to understand the edge cases that come with supporting the v8 runtime.

Getting the Best Profiling Results

For best profiling results, we recommend passing these two flags to the node command:

$ node --perf-basic-prof --interpreted-frames-native-stack main.js

These ensure that stack traces can be correctly symbolized and that interpreted frames (even if the v8 runtime JIT compiles them quickly) show up with their javascript code instead of just "interpreted".

Help Us Test!

While we've tested this patch with several Polar Signals Cloud users and many versions of Node.js, please help us test this even more by deploying the v0.22.0 release of Parca Agent.

Join the Parca Discord server if you have any questions, need help troubleshooting, or just want to say hi. See you there!

Discuss:
Sign up for the latest Polar Signals news