Executing a pipeline aggregation
Elasticsearch allows you to define aggregations that are a mix of the results of other aggregations (for example, by comparing the results of two metric aggregations); these are pipeline aggregations.
They are very common when you need to compute results from different aggregations, such as statistics on results.
Getting ready
You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.
To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.
To correctly execute the following commands, you will need to create index-pipagg with the following command:
PUT /index-pipagg
{ “mappings”: {
“properties”: {
“type”: { “type”: “keyword” },
“date”: { “type”: “date” } } } }
Then, populate it with some documents, as follows:
PUT /_bulk
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-01-01”, “price”: 200, “promoted”: true, “rating”: 1, “type”: “hat”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-01-01”, “price”: 200, “promoted”: true, “rating”: 1, “type”: “t-shirt”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-01-01”, “price”: 150, “promoted”: true, “rating”: 5, “type”: “bag”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-02-01”, “price”: 50, “promoted”: false, “rating”: 1, “type”: “hat”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-02-01”, “price”: 10, “promoted”: true, “rating”: 4, “type”: “t-shirt”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-03-01”, “price”: 200, “promoted”: true, “rating”: 1, “type”: “hat”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-03-01”, “price”: 175, “promoted”: false, “rating”:2, “type”: “t-shirt”}
How to do it...
To execute a pipeline aggregation, we will perform the following steps:
- Execute a query and calculate a composed aggregation that will divide the sales across the month, and for every month, we will compute the incoming
price. To get the extended aggregation on these sales, we will execute the following code:POST /index-pipagg/_search?size=0 { “aggs” : { “sales_per_month” : { “date_histogram” : { “field” : “date”, “calendar_interval” : “month” }, “aggs”: { “sales”: { “sum”: { “field”: “price” }}}}, “stats_monthly_sales”: { “stats_bucket”: { “buckets_path”: “sales_per_month>sales” } } } }
The result returned by Elasticsearch, if everything is okay, should be as follows:
{ … truncated…
“aggregations” : {
“sales_per_month” : {
“buckets” : [
{“key_as_string” : “2022-01-01T00:00:00.000Z”,
“key” : 1640995200000, “doc_count” : 3,
“sales” : { “value” : 550.0 } },
{“key_as_string” : “2022-02-01T00:00:00.000Z”,
“key” : 1643673600000, “doc_count” : 2,
“sales” : { “value” : 60.0 } },
{“key_as_string” : “2022-03-01T00:00:00.000Z”,
“key” : 1646092800000, “doc_count” : 2,
“sales” : { “value” : 375.0 } } ] },
“stats_monthly_sales” : {
“count” : 3, “min” : 60.0, “max” : 550.0,
“avg” : 328.3333333333333, “sum” : 985.0 } } }
How it works...
The pipeline aggregation can compute an aggregation based on another one. You can consider the pipeline aggregation similar to a metric (see the Executing a stats aggregation recipe in this chapter) that is working on the results of other aggregations.
The most commonly used types of pipeline aggregation are as follows:
avg_bucket: Used to compute the average of parent aggregations.derivative: Used to compute the derivative of parent aggregations.max_bucket: Used to compute the maximum of related aggregations.min_bucket: Used to compute the minimum of related aggregations.sum_bucket: Used to compute the sum of related aggregations.stats_bucket: Used to compute the statistics of related aggregations.extended_stats_bucket: Used to compute the statistics of related aggregations.percentile_bucket: Used to compute the percentile of related aggregations.moving_fn: A moving function, which is used to compute the percentile of related aggregations.cumulative_sum: Used to compute the derivative of parent aggregations.bucket_script: Used to define the operation between related aggregations. This is the most powerful one if you need to customize complex value computations between aggregation metrics.bucket_select: Used to filter out parent bucket aggregation.bucket_sort: Used to sort parent bucket aggregation.
Every pipeline type aggregation has additional parameters that relate to how the metric is computed; the online official documentation covers all the corner cases of these usages.
See also
The official Elasticsearch documentation on pipeline aggregations at https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-pipeline.html