Introduction
The time-series database InfluxDB v2 has been released in november 2020. A paper has been published about how to migrate from InfluxDB v1 to InfluxDB v2 - InfluxDB - Migration to version 2.
InfluxQL (Influx Query Language), the InfluxDB v 1.x SQL Like language, is still supported in InfluxDB v2 for backward compatibility, but only for InfluxDB 1.x users with authentication migrated to InfluxDB v2.
Flux is now the native language in InfluxDB v2. Flux language unifies queries and ETL processings (InfluxQL, TickScripts).
When we are used to SQL, moving from InfluxQL to Flux seems a little bit difficult, but not really. Bear in mind that Flux language brings features that InfluxQL could not cover (joins, pivots, external data sources…).
In this paper, a brief overview about migrating the existing InfluxQL queries to Flux.
Quick reminders
Measurements and series
In the time series database InfluxDB, the format of a point is the following :
measurement[,tag=value[,tag=value]] field=value[,field=value] [<timestamp>]
vpsmetrics,location=france,host=vpsfrsqlpac1 pcpu=49,mem=877 1580918550000000000 vpsmetrics,location=france,host=vpsfrsqlpac2 pcpu=22,mem=455 1580918550000000000
Series are the combination measurement/possible tag keys :
measurement, tag key1=value1, tag key2=value2 [,...]
vpsmetrics,location=france,host=vpsfrsqlpac1 vpsmetrics,location=france,host=vpsfrsqlpac2
- Data are overwritten if the point already exists.
- The server’s timestamp is used if omitted.
- Data types can be forced when writing the first point :
value=25i
for an integer datatype,status=t|f
for a boolean datatype.
v1 databases and retention policies, v2 buckets
A v1 database/retention policy is a bucket in version 2. An organization is mandatory and initialized at upgrade or at creation.
A first Flux query
Every Flux query starts with the from
clause selecting the bucket where the measurement is stored.
Flux | InfluxQL |
---|---|
|
|
Time range, range
The time range is then applied using the range
clause.
Time range can be absolute or relative, RFC3339 timestamps format is used for absolute times.
The operator |>
adds clauses/functions.
Flux | InfluxQL |
---|---|
|
|
Filtering, filter
Filtering on measurements, tag keys, field keys… are performed using filter
.
To filter on the measurement vpsmetrics
:
Flux | InfluxQL |
---|---|
|
|
Dot notation is possible :
|
Dot notation is used in the next sections. This choice depends on the coding preferences.
To add filters on tag keys (host
…) :
Flux | InfluxQL |
---|---|
|
|
To add filters on field keys (pcpu
…) :
Flux | InfluxQL |
---|---|
|
|
Getting results, yield
Use yield
to get the results, optionally with a name.
Flux | InfluxQL |
---|---|
|
|
Great, we get our first Flux query :
Regular expressions
Compared to InfluxQL, basic regular expressions are used in Flux queries in the same way. Nothing new, except the new package regexp in InfluxDB v2, not covered here as this paper only focuses on migrating InfluxQL code to Flux code.
Flux | InfluxQL |
---|---|
|
|
InfluxDB v2 output format
It’s important to understand the InfluxDB v2 raw data output format, different than the v1 output format. In the output below, filter is set only on the measurement name, filters on tag keys and on field keys are not applied.
|
table _start _stop _time _value _field _measurement host location
----- -------------------- --------------------- ------------------------------ ------ ------ ------------ ------------ --------
0 2021-02-05T00:00:00Z 2021-02-05T23:59:00Z 2021-02-05T03:00:06.446501067Z 1182 mem vpsmetrics vpsfrsqlpac1 france
0 2021-02-05T00:00:00Z 2021-02-05T23:59:00Z 2021-02-05T03:00:16.604175869Z 817 mem vpsmetrics vpsfrsqlpac1 france
…
1 2021-02-05T00:00:00Z 2021-02-05T23:59:00Z 2021-02-05T03:00:06.446501067Z 62 pcpu vpsmetrics vpsfrsqlpac1 france
1 2021-02-05T00:00:00Z 2021-02-05T23:59:00Z 2021-02-05T03:00:16.604175869Z 66 pcpu vpsmetrics vpsfrsqlpac1 france
…
2 2021-02-05T00:00:00Z 2021-02-05T23:59:00Z 2021-02-05T03:00:07.420674651Z 429 mem vpsmetrics vpsfrsqlpac2 france
2 2021-02-05T00:00:00Z 2021-02-05T23:59:00Z 2021-02-05T03:00:17.176860469Z 464 mem vpsmetrics vpsfrsqlpac2 france
…
3 2021-02-05T00:00:00Z 2021-02-05T23:59:00Z 2021-02-05T03:00:07.420674651Z 29 pcpu vpsmetrics vpsfrsqlpac2 france
3 2021-02-05T00:00:00Z 2021-02-05T23:59:00Z 2021-02-05T03:00:17.176860469Z 32 pcpu vpsmetrics vpsfrsqlpac2 france
- A table identifier is applied on each results set.
- Range time is fully described by the columns
_start
and_stop
. - The measurement name is in the column
_measurement
. - The field key and its value are respectively in the columns
_field
and_value
. - Tag keys columns are displayed at the end.
Columns can be removed using the drop
or keep
functions, we may
not want all the columns in the raw data ouput format :
|
|
Columns are renamed using rename
function :
|
Windowing data, aggregateWindow
The most important work when migrating InfluxQL queries to Flux syntax : the data windowing queries with aggregates. Time series databases are designed to this purpose.
The translation is quite easy with the function aggregateWindow
. The example below computes the
mean value with the mean
function,
the other aggregate functions are obviously available (count
, sum
…)
Flux | InfluxQL |
---|---|
|
|
… _time _value _field _measurement …
… -------------------- ----------------- --------- ------------
2021-02-05T06:00:00Z pcpu vpsmetrics
2021-02-05T07:00:00Z pcpu vpsmetrics
2021-02-05T08:00:00Z 61.70815450643776 pcpu vpsmetrics
2021-02-05T09:00:00Z 60.55806451612903 pcpu vpsmetrics
2021-02-05T10:00:00Z 60.01699716713881 pcpu vpsmetrics
…
To remove the NULL
values, the argument createEmpty
is set to FALSE
in the aggregateWindow
function.
Flux | InfluxQL |
---|---|
|
|
To apply a default value for empty data, the fill
function with the argument value
is called after windowing data :
Flux | InfluxQL |
---|---|
|
|
To apply the previous non null value for empty data, the fill
function with the argument usePrevious
is called after windowing data :
Flux | InfluxQL |
---|---|
|
|
Behind the scene, the aggregateWindow
function is not really a native function, this function
performs steps involving especially the window
and mean
functions :
|
|
The column _stop
is duplicated to re-add the column _time
removed by the mean
function call.
From subqueries
Subqueries are frequently used in InfluxQL to compute aggregates on "windowed" data. The translation to Flux language is quite straightforward for basic subqueries :
Flux | InfluxQL |
---|---|
|
|
When computed columns are created using InfluxQL subqueries, pivot
and map
Flux functions are combined,
syntax a little bit more complicated. Pivoting data is mandatory to align fields within each input table that have the same timestamp,
the map
function creates the computed column :
Flux | InfluxQL |
---|---|
|
|
Alternatively, the pivot
function can be replaced by the function schema.fieldsAsCols
.
schema.fieldsAsCols
is a special application of the pivot
function that pivots on
_field
and _time
columns to align fields.
Flux | InfluxQL |
---|---|
|
|
Copying data
In InfluxDB v1.x, SELECT INTO
statements are used to copy data from one measurement to another.
In InfluxDB v2, use the function to()
.
Notice in the below example the set
function to change the measurement name in order to copy the data into the same bucket.
Flux | InfluxQL |
---|---|
|
|
Translation tool, influx client transpile option
Update (March 9 2021) : the transpile
option will be deprecated in
next versions - GitHub Influxdata/Influxdb, fix(cmd/influx): delete unsupported influx transpile command
The influx
client transpile
option translates InfluxQL queries to Flux syntax,
not fully reliable but a good starting tool for beginners in Flux language.
Some adjustments to apply (range time…) and then testings :
$ influx transpile \ 'SELECT pcpu FROM netdatatsdb.autogen.vpsmetrics WHERE (time > now() -1d and host =~ /^vps/ and host !~ /(de|uk|us)/)'
package main from(bucket: "netdatatsdb/autogen") |> range(start: 2021-02-04T17:16:31.295656071Z, stop: 2262-04-11T23:47:16.854775806Z) |> filter(fn: (r) => (r._measurement == "vpsmetrics" and r._field == "pcpu")) |> filter(fn: (r) => (r["host"] =~ /^vps/ and r["host"] !~ /(de|uk|us)/)) |> group(columns: ["_measurement", "_start", "_stop", "_field"], mode: "by") |> keep(columns: ["_measurement", "_start", "_stop", "_field", "_time", "_value"]) |> rename(columns: {_value: "pcpu"}) |> yield(name: "0")
$ influx transpile \ 'SELECT mean(pcpu) FROM netdatatsdb.autogen.vpsmetrics WHERE time >= now() - 1w GROUP BY time(15m) FILL(none)'
package main from(bucket: "netdatatsdb/autogen") |> range(start: 2021-01-29T16:25:00.305159561Z, stop: 2021-02-05T16:25:00.305159561Z) |> filter(fn: (r) => (r._measurement == "vpsmetrics" and r._field == "pcpu")) |> group(columns: ["_measurement", "_start", "_stop", "_field"], mode: "by") |> keep(columns: ["_measurement", "_start", "_stop", "_field", "_time", "_value"]) |> window(every: 15m) |> mean() |> map(fn: (r) => ({r with _time: r._start})) |> window(every: inf) |> rename(columns: {_value: "mean"}) |> yield(name: "0")
Unfortunately, the transpile option does not translate subqueries, an error "unimplemented
" is raised :
$ influx transpile \ 'SELECT max(pctspace) FROM ( SELECT (used / (used + available)) * 100 AS pctspace FROM netdatatsdb.autogen.vps_space WHERE (host='vpsfrsqlpac1' AND time > now() -1d))'
Error: unimplemented: source must be a measurement
Conclusion
One may regret InfluxQL is replaced by Flux, a NoSQL language which seems syntaxically heavier when getting started but Flux language brings features which were not possible with InfluxQL (joins, pivot, SQL external datasources…), these advanced features will be covered in upcoming papers.
It just takes some time to get used to coding with Flux language when coming from SQL language.