Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Elasticsearch 8.x Cookbook

You're reading from   Elasticsearch 8.x Cookbook Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise

Arrow left icon
Product type Paperback
Published in May 2022
Publisher Packt
ISBN-13 9781801079815
Length 750 pages
Edition 5th Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Alberto Paro Alberto Paro
Author Profile Icon Alberto Paro
Alberto Paro
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Preface 1. Chapter 1: Getting Started 2. Chapter 2: Managing Mappings FREE CHAPTER 3. Chapter 3: Basic Operations 4. Chapter 4: Exploring Search Capabilities 5. Chapter 5: Text and Numeric Queries 6. Chapter 6: Relationships and Geo Queries 7. Chapter 7: Aggregations 8. Chapter 8: Scripting in Elasticsearch 9. Chapter 9: Managing Clusters 10. Chapter 10: Backups and Restoring Data 11. Chapter 11: User Interfaces 12. Chapter 12: Using the Ingest Module 13. Chapter 13: Java Integration 14. Chapter 14: Scala Integration 15. Chapter 15: Python Integration 16. Chapter 16: Plugin Development 17. Chapter 17: Big Data Integration 18. Chapter 18: X-Pack 19. Other Books You May Enjoy

Executing a pipeline aggregation

Elasticsearch allows you to define aggregations that are a mix of the results of other aggregations (for example, by comparing the results of two metric aggregations); these are pipeline aggregations.

They are very common when you need to compute results from different aggregations, such as statistics on results.

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as cURL (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Using Kibana Console is recommended, as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, you will need to create index-pipagg with the following command:

PUT /index-pipagg
{ “mappings”: {
    “properties”: {
      “type”: { “type”: “keyword” },
      “date”: { “type”: “date” } } } }

Then, populate it with some documents, as follows:

PUT /_bulk 
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-01-01”, “price”: 200, “promoted”: true, “rating”: 1, “type”: “hat”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-01-01”, “price”: 200, “promoted”: true, “rating”: 1, “type”: “t-shirt”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-01-01”, “price”: 150, “promoted”: true, “rating”: 5, “type”: “bag”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-02-01”, “price”: 50, “promoted”: false, “rating”: 1, “type”: “hat”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-02-01”, “price”: 10, “promoted”: true, “rating”: 4, “type”: “t-shirt”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-03-01”, “price”: 200, “promoted”: true, “rating”: 1, “type”: “hat”}
{“index”:{“_index”:”index-pipagg”}}
{“date”: “2022-03-01”, “price”: 175, “promoted”: false, “rating”:2, “type”: “t-shirt”}

How to do it...

To execute a pipeline aggregation, we will perform the following steps:

  1. Execute a query and calculate a composed aggregation that will divide the sales across the month, and for every month, we will compute the incoming price. To get the extended aggregation on these sales, we will execute the following code:
    POST /index-pipagg/_search?size=0
    { “aggs” : {
            “sales_per_month” : {
                “date_histogram” : {
                    “field” : “date”,
                    “calendar_interval” : “month”
                },
                “aggs”: {
                    “sales”: {
                        “sum”: { “field”: “price” }}}},
            “stats_monthly_sales”: {
                “stats_bucket”: {
                    “buckets_path”: “sales_per_month>sales” } } } }

The result returned by Elasticsearch, if everything is okay, should be as follows:

{ … truncated…
  “aggregations” : {
    “sales_per_month” : {
      “buckets” : [
        {“key_as_string” : “2022-01-01T00:00:00.000Z”,
          “key” : 1640995200000, “doc_count” : 3,
          “sales” : { “value” : 550.0 } },
        {“key_as_string” : “2022-02-01T00:00:00.000Z”,
          “key” : 1643673600000, “doc_count” : 2,
          “sales” : { “value” : 60.0 } },
        {“key_as_string” : “2022-03-01T00:00:00.000Z”,
          “key” : 1646092800000, “doc_count” : 2,
          “sales” : { “value” : 375.0 } } ] },
    “stats_monthly_sales” : {
      “count” : 3, “min” : 60.0, “max” : 550.0,
      “avg” : 328.3333333333333, “sum” : 985.0 } } }

How it works...

The pipeline aggregation can compute an aggregation based on another one. You can consider the pipeline aggregation similar to a metric (see the Executing a stats aggregation recipe in this chapter) that is working on the results of other aggregations.

The most commonly used types of pipeline aggregation are as follows:

  • avg_bucket: Used to compute the average of parent aggregations.
  • derivative: Used to compute the derivative of parent aggregations.
  • max_bucket: Used to compute the maximum of related aggregations.
  • min_bucket: Used to compute the minimum of related aggregations.
  • sum_bucket: Used to compute the sum of related aggregations.
  • stats_bucket: Used to compute the statistics of related aggregations.
  • extended_stats_bucket: Used to compute the statistics of related aggregations.
  • percentile_bucket: Used to compute the percentile of related aggregations.
  • moving_fn: A moving function, which is used to compute the percentile of related aggregations.
  • cumulative_sum: Used to compute the derivative of parent aggregations.
  • bucket_script: Used to define the operation between related aggregations. This is the most powerful one if you need to customize complex value computations between aggregation metrics.
  • bucket_select: Used to filter out parent bucket aggregation.
  • bucket_sort: Used to sort parent bucket aggregation.

Every pipeline type aggregation has additional parameters that relate to how the metric is computed; the online official documentation covers all the corner cases of these usages.

See also

The official Elasticsearch documentation on pipeline aggregations at https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-pipeline.html

lock icon The rest of the chapter is locked
Visually different images
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Elasticsearch 8.x Cookbook
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Modal Close icon
Modal Close icon