You're reading from Elasticsearch 8.x Cookbook Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise

Product type Paperback

Published in May 2022

Publisher Packt

ISBN-13 9781801079815

Length 750 pages

Edition 5th Edition

Languages

Java

Tools

Elasticsearch

Concepts

Enterprise Search

Author (1):

Alberto Paro

View More author details

Table of Contents (20) Chapters

Preface

1. Chapter 1: Getting Started

2. Chapter 2: Managing Mappings FREE CHAPTER

3. Chapter 3: Basic Operations

4. Chapter 4: Exploring Search Capabilities

5. Chapter 5: Text and Numeric Queries

6. Chapter 6: Relationships and Geo Queries

7. Chapter 7: Aggregations

8. Chapter 8: Scripting in Elasticsearch

9. Chapter 9: Managing Clusters

10. Chapter 10: Backups and Restoring Data

11. Chapter 11: User Interfaces

12. Chapter 12: Using the Ingest Module

13. Chapter 13: Java Integration

14. Chapter 14: Scala Integration

15. Chapter 15: Python Integration

16. Chapter 16: Plugin Development

17. Chapter 17: Big Data Integration

18. Chapter 18: X-Pack

19. Other Books You May Enjoy

Executing a pipeline aggregation

Elasticsearch allows you to define aggregations that are a mix of the results of other aggregations (for example, by comparing the results of two metric aggregations); these are pipeline aggregations.

They are very common when you need to compute results from different aggregations, such as statistics on results.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need to create index-pipagg with the following command:

PUT /index-pipagg
{ “mappings”: {
    “properties”: {
      “type”: { “type”: “keyword” },
      “date”: { “type”: “date” } } } }

Then, populate it with some documents, as follows:

PUT /_bulk 
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-01-01”, “price”: 200, “promoted”: true, “rating”: 1, “type”: “hat”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-01-01”, “price”: 200, “promoted”: true, “rating”: 1, “type”: “t-shirt”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-01-01”, “price”: 150, “promoted”: true, “rating”: 5, “type”: “bag”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-02-01”, “price”: 50, “promoted”: false, “rating”: 1, “type”: “hat”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-02-01”, “price”: 10, “promoted”: true, “rating”: 4, “type”: “t-shirt”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-03-01”, “price”: 200, “promoted”: true, “rating”: 1, “type”: “hat”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-03-01”, “price”: 175, “promoted”: false, “rating”:2, “type”: “t-shirt”}

How to do it...

To execute a pipeline aggregation, we will perform the following steps:

Execute a query and calculate a composed aggregation that will divide the sales across the month, and for every month, we will compute the incoming price. To get the extended aggregation on these sales, we will execute the following code:

POST /index-pipagg/_search?size=0
{ “aggs” : {
        “sales_per_month” : {
            “date_histogram” : {
                “field” : “date”,
                “calendar_interval” : “month”
            },
            “aggs”: {
                “sales”: {
                    “sum”: { “field”: “price” }}}},
        “stats_monthly_sales”: {
            “stats_bucket”: {
                “buckets_path”: “sales_per_month>sales” } } } }

The result returned by Elasticsearch, if everything is okay, should be as follows:

{ … truncated…
  “aggregations” : {
    “sales_per_month” : {
      “buckets” : [
        {“key_as_string” : “2022-01-01T00:00:00.000Z”,
          “key” : 1640995200000, “doc_count” : 3,
          “sales” : { “value” : 550.0 } },
        {“key_as_string” : “2022-02-01T00:00:00.000Z”,
          “key” : 1643673600000, “doc_count” : 2,
          “sales” : { “value” : 60.0 } },
        {“key_as_string” : “2022-03-01T00:00:00.000Z”,
          “key” : 1646092800000, “doc_count” : 2,
          “sales” : { “value” : 375.0 } } ] },
    “stats_monthly_sales” : {
      “count” : 3, “min” : 60.0, “max” : 550.0,
      “avg” : 328.3333333333333, “sum” : 985.0 } } }

How it works...

The pipeline aggregation can compute an aggregation based on another one. You can consider the pipeline aggregation similar to a metric (see the Executing a stats aggregation recipe in this chapter) that is working on the results of other aggregations.

The most commonly used types of pipeline aggregation are as follows:

avg_bucket: Used to compute the average of parent aggregations.
derivative: Used to compute the derivative of parent aggregations.
max_bucket: Used to compute the maximum of related aggregations.
min_bucket: Used to compute the minimum of related aggregations.
sum_bucket: Used to compute the sum of related aggregations.
stats_bucket: Used to compute the statistics of related aggregations.
extended_stats_bucket: Used to compute the statistics of related aggregations.
percentile_bucket: Used to compute the percentile of related aggregations.
moving_fn: A moving function, which is used to compute the percentile of related aggregations.
cumulative_sum: Used to compute the derivative of parent aggregations.
bucket_script: Used to define the operation between related aggregations. This is the most powerful one if you need to customize complex value computations between aggregation metrics.
bucket_select: Used to filter out parent bucket aggregation.
bucket_sort: Used to sort parent bucket aggregation.

Every pipeline type aggregation has additional parameters that relate to how the metric is computed; the online official documentation covers all the corner cases of these usages.