How does a composite aggregation work in Elasticsearch?

When you perform a composite aggregation in Elasticsearch, it groups documents based on one or more fields and paginates the results efficiently. Here’s how it works:

1. Elasticsearch identifies the fields to group by: Before performing the composite aggregation, Elasticsearch first identifies the field or fields to group the documents by. These fields are specified in the composite aggregation request.

2. Elasticsearch returns the first page of results: Next, Elasticsearch returns the first page of results, along with a composite key that represents the last group in the page. The size of the page is specified in the composite aggregation request.

3. The client requests subsequent pages of results: The client can then use the composite key to request the next page of results. Elasticsearch returns the next page of results, along with a new composite key representing the last group in the page. This process can be repeated until all groups have been returned.

4. Elasticsearch performs sub-aggregations on each group: In addition to grouping and paginating the results, composite aggregations also allow you to perform sub-aggregations on each group. These sub-aggregations are specified in the composite aggregation request, and can include aggregate statistics, histograms, or other metrics.

5. Elasticsearch returns the complete set of results: Once all pages of results have been retrieved, Elasticsearch returns the complete set of results, including the groupings and sub-aggregations.

For example, let’s say you have an index of customer orders, and each document has a “product” field that represents the product ordered, and a “timestamp” field that represents the date and time the order was placed. You could perform a composite aggregation on the “product” field and a date histogram aggregation on the “timestamp” field to group the orders by product and day. Elasticsearch would then return the first page of results, along with a composite key representing the last group in the page. You could use this key to request the next page of results, and continue until all groups have been returned.

Composite aggregations can be a powerful tool for efficiently paginating through large result sets, while also performing sub-aggregations to obtain detailed metrics within each group. They are especially useful when working with large datasets that cannot be processed in a single request.