8.3 하위 - sub-aggregations

Bucket Aggregation 으로 만든 버킷들 내부에 다시 "aggs" : { } 를 선언해서 또다른 버킷을 만들거나 Metrics Aggregation 을 만들어 사용이 가능합니다. 다음은 terms aggregation을 이용해서 생성한 stations 버킷 별로 avg aggregation을 이용해서 passangers 필드의 평균값을 계산하는 avg_psg_per_st 을 생성하는 예제입니다.

terms aggs 아래에 avg aggs 사용

GET my_stations/_search
{
  "size": 0,
  "aggs": {
    "stations": {
      "terms": {
        "field": "station.keyword"
      },
      "aggs": {
        "avg_psg_per_st": {
          "avg": {
            "field": "passangers"
          }
        }
      }
    }
  }
}

terms aggs 아래에 avg aggs 사용 결과

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "stations" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "강남",
          "doc_count" : 5,
          "avg_psg_per_st" : {
            "value" : 5931.2
          }
        },
        {
          "key" : "불광",
          "doc_count" : 1,
          "avg_psg_per_st" : {
            "value" : 971.0
          }
        },
        {
          "key" : "신촌",
          "doc_count" : 1,
          "avg_psg_per_st" : {
            "value" : 3912.0
          }
        },
        {
          "key" : "양재",
          "doc_count" : 1,
          "avg_psg_per_st" : {
            "value" : 4121.0
          }
        },
        {
          "key" : "종각",
          "doc_count" : 1,
          "avg_psg_per_st" : {
            "value" : 2314.0
          }
        },
        {
          "key" : "홍제",
          "doc_count" : 1,
          "avg_psg_per_st" : {
            "value" : 1021.0
          }
        }
      ]
    }
  }
}

stations 버킷들 별로 avg_psg_per_st 라는 aggregation이 실행된 것을 확인할 수 있습니다. 버킷 안에 또 다른 하위 버킷을 만드는 것도 가능합니다. 다음은 terms aggregation 을 이용해서 line.keyword 별로 lines 버킷을 만들고 그 안에 또다시 terms aggregation을 이용한 stations_per_lines 버킷을 만든 예제입니다.

terms aggs 아래에 하위 terms aggs 사용

GET my_stations/_search
{
  "size": 0,
  "aggs": {
    "lines": {
      "terms": {
        "field": "line.keyword"
      },
      "aggs": {
        "stations_per_lines": {
          "terms": {
            "field": "station.keyword"
          }
        }
      }
    }
  }
}

terms aggs 아래에 하위 terms aggs 사용 결과

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "lines" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "2호선",
          "doc_count" : 6,
          "stations_per_lines" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "강남",
                "doc_count" : 5
              },
              {
                "key" : "신촌",
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : "3호선",
          "doc_count" : 3,
          "stations_per_lines" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "불광",
                "doc_count" : 1
              },
              {
                "key" : "양재",
                "doc_count" : 1
              },
              {
                "key" : "홍제",
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : "1호선",
          "doc_count" : 1,
          "stations_per_lines" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "종각",
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    }
  }
}

각 호선별로 역 버킷이 생성된 것을 확인할 수 있습니다. 두 번째 하위 aggregations 안에 또 다른 metrics aggregations를 입력하는 것도 가능합니다.

하위 버킷이 깊어질수록 elasticsearch 가 하는 작업량과 메모리 소모량이 기하급수적으로 늘어나기 때문에 예상치 못한 오류를 발생 시킬수도 있습니다. 보통은 2레벨의 깊이 이상의 버킷은 생성하지 않는 것이 좋습니다.

Previous8.2 버킷 - Bucket Aggregations Next8.4 파이프라인 - Pipeline Aggregations

Last updated 6 years ago

Was this helpful?