Winter is coming - FrostDB in Prometheus

FrostDB is an embedded columnar database built for observability data. This article will explore FrostDB's potential by using it in Prometheus.

July 22, 2022

We built FrostDB specifically to handle high cardinality continuous profiling data. However, FrostDB has potential to be a great solution for many other types of observability data as well. An outline of this post:

  • How FrostDB stores data
  • Why this is a good fit for metrics data
  • Introducing Epimetheus, a Prometheus fork that is backed by FrostDB

How FrostDB stores data

FrostDB is a columnar database, which means that instead of storing data contiguously alongide the other data points it's associated with; we instead store the data contiguously with data of the same column. You can see an example of this layout in the image below, where some observability data is stored in columnar format on the left, and in row oriented format on the right.

This type of storage format allows us to more efficiently compress data of the same type. For example, we can compress labels with the same value such as `num_goroutines` using a run length encoding dictionary. It also is useful when you only need to query data of a specific column. So, say you only want to know the labels that are present in the database, you'd only need to read the contigous blocks of memory for the labels instead of reading all blocks of memory to then filter out the labels in each block. Lastly one of the advantages of columnar storage is it can leverage vectorized instructions which can make certain types of computations more effecient to run.

Under the hood FrostDB stores all of it's data in the Apache Parquet columnar file format. Using this open source file format allows us to build FrostDB with a concept called dynamic columns. Dynamic columns are columns that are generated on the fly, every time a new column value is seen. This allows FrostDB to handle high-cardinality workloads without having wite amplification occur on the storage.

Why this is a good fit for metrics data

As you can maybe tell from the columnar diagram in the previous section, metrics workloads tend to have a lot of repeated values. For example, you may have a large number of pods in your production Kubernetes cluster that all have the label pair of `env=production`. Ephemeral infrastructure like Kubernetes pods can also have high cardinality with things like container IDs that can churn when pods get rescheduled. High cardinality and repeated data sets lend themselves nicely to a columnar datastore with dynamic columns like FrostDB.

Introducing Epimetheus, a Prometheus fork that is backed by FrostDB

To demonstrate FrostDB's abilities to store different types of observability data we've built Epimetheus, a fork of Prometheus that is backed by FrostDB.

A small disclaimer about Epimetheus: this was built to showcase FrostDB and is not meant to be a replacement for Prometheus, nor are any claims about performance being made here. FrostDB is still very much in it's early stages and does not have the time and rigor that has been put into Prometheus' own TSDB. Maybe one day we'll get there but for now it's cool to see where FrostDB could be headed.

To store the metrics data that Prometheus ingests we need to define a FrostDB schema with three columns. A column for the value of the metric, a column for the timestamp of the observation, and lastly we need to define a dynamic column for all the labels that can be associated with an observation. This schema is defined as such:

dynparquet.NewSchema(
"metrics_schema",
[]dynparquet.ColumnDefinition{{
Name: "labels",
StorageLayout: parquet.Encoded(parquet.String(), &parquet.RLEDictionary),
Dynamic: true,
}, {
Name: "timestamp",
StorageLayout: parquet.Int(64),
Dynamic: false,
}, {
Name: "value",
StorageLayout: parquet.Leaf(parquet.DoubleType),
Dynamic: false,
}},
[]dynparquet.SortingColumn{
dynparquet.NullsFirst(dynparquet.Ascending("labels")),
dynparquet.Ascending("timestamp"),
},
)

ref

This schema uses the RLE Dictionary that we talked about in the first section to effeciently store the label columns. It also sorts by labels first to ensure we have the most repeated label values in a row to make the RLE an even more effecient method of storage.

Since we have our storage schema defined, we can implement the storage interface and write metrics to FrostDB. The next step is to be able to query this data out of FrostDB. We effectively need to implement three basic interfaces to support Prometheus queries against FrostDB. We can implement the `LabelNames` interface by performing a schema scan, since label names are the dynamic column names, we need only to read the schemas in the database to put together the full set of label names.

err := engine.ScanTable("metrics").
Project(logicalplan.DynCol("labels")).
Filter(promMatchersToFrostDBExprs(matchers)).
Execute(context.Background(), func(ar arrow.Record) error {
defer ar.Release()
for i := 0; i < int(ar.NumCols()); i++ {
sets[ar.ColumnName(i)] = struct{}{}
}
return nil
})

ref

For the `LabelValues` query interface we need to actually read up all the values for the labels columns so we use the `Distinct` query function and a table scan to read only the labels columns from storage.

err := engine.ScanTable("metrics").
Filter(promMatchersToFrostDBExprs(matchers)).
Distinct(logicalplan.Col("labels."+name)).
Execute(context.Background(), func(ar arrow.Record) error {
defer ar.Release()
parseRecordIntoSeriesSet(ar, sets)
return nil
})

ref

The last query interface we need to implement is the `Select` interface which performs the actual querying of the data. For this we need to select all the columns for the matching data from the database.

err := engine.ScanTable("metrics").
Filter(logicalplan.And(
logicalplan.And(
logicalplan.Col("timestamp").GT(logicalplan.Literal(hints.Start)),
logicalplan.Col("timestamp").LT(logicalplan.Literal(hints.End)),
),
promMatchersToFrostDBExprs(matchers),
)).
Project(
logicalplan.DynCol("labels"),
logicalplan.Col("timestamp"),
logicalplan.Col("value"),
).
Execute(context.Background(), func(ar arrow.Record) error {
defer ar.Release()
parseRecordIntoSeriesSet(ar, sets)
return nil
})

ref

Thanks to Prometheus' simple storage interface, and FrostDBs flexibly query engine we are able to replace TSDB with FrostDB with very little effort.

You can find all the code required to implement Prometheus with FrostDB here.

We are excited to see what other use cases FrostDB might have in the observability and analytics space, and hope to see it become a solution for many projects in the future.

Discuss:
Sign up for the latest Polar Signals news