Why Compiler Function Inlining Matters
All about function inlining, how it helps us create performant software, how we can learn to work with it, and how it influences profiling.
Let's imagine a super simple program written in Go.
This program simply iterates to 1000 and adds the numbers onto the result.
Note: This is similar in other compiled languages, we just use Go as an example.
The comment `//go:noinline` tells the compiler not to inline this function. How does it perform? Let’s see with this simple benchmark.
We run this benchmark function by running
`go test -bench=BenchmarkAdd -count=10 | tee BenchmarkAddNoInline.txt`
On its own, this doesn't tell us much. Therefore, we want to compare this against a benchmark run that has the `add` function inline. By removing the `//go:noinline` comment the compiler should inline this function. Let's run the benchmark again:
`go test -bench=BenchmarkAdd -count=10 | tee BenchmarkAddInline.txt`
Interesting!
Go has a little helper tool called benchstat that we can use to compare these results.
It seems that for this example program inlining the add function makes a huge difference. Why is that?
Why does inlining exist?
When you call a function in your program the compiler must emit a few extra instructions to actually make that function call happen. Specifically, depending on the function call ABI, the compiler will pass function arguments either on the stack or via CPU registers. Following that, the return address of the function we're about to call must be pushed onto the stack so we can continue where we left off before calling that function.
Finally, a jump (or similar) instruction must be used to begin executing the called function. When the function call returns we must reverse that process a bit by restoring the caller's stack frame and reading return values off of the stack or CPU registers. This extra overhead is relatively small, but when calling a function in a tight loop for example it can really add up. Inlining functions removes this overhead by simply "inlining" or copying the instructions the function would normally execute directly into the function that calls it.
If this sounds like a lot of overhead for this small function, then you're right.
Inlining has some other nice properties to it as well: It helps with fetching instructions to execute from memory and has better CPU cache properties since the instructions are contiguous as opposed to having to be fetched from memory and then executed.
Inlining
Since the function has no side effects Go decides to inline this function. We can check this by compiling the program with some flags: `go build -gcflags -m main.go`
We can compare the assembly with and without inlining by adding and removing that `//go:noinline` comment.
On the compiled binary we can run the `go tool objdump main | grep main.go`
As you can see at the end there is our add function with two lines of assembly for it. We can also see the assembly call to `CALL main.add(SB)` that invokes the function. Now, if we let the Go compiler inline the add function we get the resulting assembly:
TEXT main.main(SB) /home/metalmatze/src/github.com/polarsignals/inlining/main.go main.go:3 0x4553e0 31c0 XORL AX, AX main.go:5 0x4553e2 eb03 JMP 0x4553e7 main.go:5 0x4553e4 48ffc0 INCQ AX main.go:5 0x4553e7 483de8030000 CMPQ $0x3e8, AX main.go:5 0x4553ed 72f5 JB 0x4553e4 main.go:8 0x4553ef c3 RET
As you can see now, there is no CALL to main.add(SB) anymore and instead it all happens within the main.main, which means that the overhead of calling the function add is gone.
Function inlining and profiling
These inlined functions basically disappear as their own function calls in the compiled binaries, yet, as humans, we don't necessarily know this, so it's important to be able to differentiate them in profiling data analysis.
In pprof each Function has a Location and Line that reference the function itself. Inline functions are thus at the same Locations, however, have their own Line (think about it, these functions are still on a different source code line) and then point to their own function. More on the pprof internals can be found in our previous “DIY pprof profiles using Go” blog post!
Rendering these inlined functions is done by showing them as part of the stack trace and essentially “squeezing” them in between the other functions.

Here you can see a part of a Prometheus goroutine stack trace. The `waitRead` function was inlined and is shown like any other function.
Rendering a flame graph with inlined functions
Each profile within Parca, which is a continuous profiling project for applications and infrastructure, needs to be rendered as an icicle graph, which means that we need to walk all stack traces of a profile and create a tree data structure from these individual stack traces merging at the root and inserting the individual stack traces as individual trees onto the existing tree.
It becomes quite a challenge with inlined functions to render them properly in Parca’s icicle graphs. Basically, while merging the new stack trace tree, each inlined function becomes its own subtree of stack traces again that have to be correctly merged into the existing tree too.
Finally, our implementation handles these cases correctly since we merged out Pull Request: https://github.com/parca-dev/parca/pull/485
Roadmap for inlined functions
Currently, we don’t show the inlined functions in any specific way. What do you think, reader, would you want us to handle these more specifically in the icicle graphs? Is it fine for you to simply show them as "normal" functions?
Further reading
- https://en.wikipedia.org/wiki/Inline_expansion
- https://dave.cheney.net/2014/06/07/five-things-that-make-go-fast
- https://dave.cheney.net/2020/04/25/inlining-optimisations-in-go
- https://medium.com/@felipedutratine/does-golang-inline-functions-b41ee2d743fa
- https://medium.com/a-journey-with-go/go-inlining-strategy-limitation-6b6d7fc3b1be
- internal/inline/inl.go
Read more

Keep up with Polar Signals
Receive new posts, product updates, and insights on performance engineering straight to your inbox.