Kibana Alerting的实现分析

本文分析Kibana Alerting的实现。

Alerting的配置

基本信息

Name，报警的名称。
Tags，可选。
Check every N second/minute/hour/day，隔多久检查一次，这个应该是fixed rate（固定频率，不管上一次是否执行完成，因此可能产生重叠），而不是fixed delay（上次执行完成后等待固定时间再执行）。
Notify every N second/minute/hour/day，在报警激活的情况下，隔多久发送一次通知。

Index 求值配置

Index，被查询的Index名字或者Index Pattern，可以是多个。
- time field，date类型的字段
When，配置怎么计算值，有count、average、sum、min、max：
- Of，聚合字段，必须是keyword mapping的，count时不需要填
Over/Group Over，有两种：
- all documents，不对结果进行分组
- top N by field，根据field字段对结果进行分组，取doc数量前N个的组。field字段必须是keyword mapping的。报警会根据每个组的情况分别触发。

Index 阈值触发条件

Threshold，is above, is above or equals, is below, is below or equals, 和 is between，然后是具体值value。
时间窗口，只支持For the last N second/minute/hour/day，即从当前时间开始往前推多少时间的数据。其实就是range query加上date math（注意不需要做rounding）：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


GET /<index>/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1h",
        "lt": "now"
      }
    }
  }
}

Alerting配置的要点

不是所有Index都支持配置Alerting，必须得有date类型字段才行。
只支持对最近的数据进行统计，从now-duration到now的范围内的数据做一些统计工作。

Index求值模式

Index求值模式由两个维度组成，共四种：

是否分组，如果分组则用到terms agg和sub aggregation
是否聚合

不分组，不聚合

count是唯一不聚合的查询，就是取返回结果的total count，对应的ES Query DSL：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


GET /<index>/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1h",
        "lt": "now"
      }
    }
  },
  "size": 0
}

因为我们只对total count感兴趣不需要具体的doc，所以设置"size": 0。

返回的结果，注意看hits.total.value字段，同时hits.hits数组是空的，因为我们设置了size=0：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


{
  ...
  "hits" : {
    "total" : {
      "value" : 9339,
      "relation" : "eq"
    },
    "hits" : [ ],
    ...
  }
}

不分组，聚合

average、sum、min、max都是聚合查询，需要配置聚合字段，字段必须是keyword mapping的，比如：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


GET /<index>/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1h",
        "lt": "now"
      }
    }
  },
  "size": 0,
  "aggs": {
    "avg_timeElapse": {
       "avg": { "field": "api_call.timeElapse" }
    }
  }
}

响应结果，我们不需要看hits字段，只要看aggregations.avg_timeElapse.value字段就行了：

1
2
3
4
5
6
7
8


{
  ...
  "aggregations" : {
    "avg_timeElapse" : {
      "value" : 240.7022207635926
    }
  }
}

分组，不聚合

因为count不是聚合查询，所以只需要用terms agg分组，同样分组的字段必须是keyword mapping的，同时配置了aggs.apiSvcPerOp_buck.terms.size=2对应top N语义：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


GET /<index>/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1h",
        "lt": "now"
      }
    }
  },
  "size": 0,
  "aggs": {
    "apiSvcPerOp_buck": {
      "terms": {
        "field": "api_call.apiSvcVerOp.keyword",
        "size": 2
      }
    }
  }
}

注意看结果的aggregations.apiSvcPerOp_buck.buckets[].doc_count，可以看到返回了各个keyword的doc_count

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


{
  ...
  "aggregations" : {
    "apiSvcPerOp_buck" : {
      "doc_count_error_upper_bound" : 52,
      "sum_other_doc_count" : 4739,
      "buckets" : [
        {
          "key" : "user_v1_loadUserInfoByAccountName",
          "doc_count" : 1829
        },
        {
          "key" : "user_v1_listAccountGroups",
          "doc_count" : 1297
        }
      ]
    }
  }
}

分组，聚合

针对average、sum、min、max的分组查询则是，先用terms agg然后把avg等作为它的sub aggregation：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


GET /<index>/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1h",
        "lt": "now"
      }
    }
  },
  "size": 0,
  "aggs": {
    "apiSvcPerOp_buck": {
      "terms": {
        "field": "api_call.apiSvcVerOp.keyword",
        "size": 3
      },
      "aggs": {
        "avg_timeElapse": {
          "avg": { "field": "api_call.timeElapse" }
        }  
      }
    }
  }
}

注意结果中的aggregations.apiSvcPerOp_buck.buckets[*].avg_timeElapse.value字段，就是每个分组的聚合结果：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


{
  ...
  "aggregations" : {
    "apiSvcPerOp_buck" : {
      "doc_count_error_upper_bound" : 52,
      "sum_other_doc_count" : 4739,
      "buckets" : [
        {
          "key" : "user_v1_loadUserInfoByAccountName",
          "doc_count" : 1829,
          "avg_timeElapse" : {
            "value" : 51.38928376161837
          }
        },
        {
          "key" : "user_v1_listAccountGroups",
          "doc_count" : 1297,
          "avg_timeElapse" : {
            "value" : 445.27370855821124
          }
        }
      ]
    }
  }
}

Kibana Alerting的实现分析

Alerting的配置

基本信息

Index 求值配置

Index 阈值触发条件

Alerting配置的要点

Index求值模式

不分组，不聚合

不分组，聚合

分组，不聚合

分组，聚合

版权

评论

Kibana Alerting的实现分析

Alerting的配置

基本信息

Index 求值配置

Index 阈值触发条件

Alerting配置的要点

Index求值模式

不分组，不聚合

不分组，聚合

分组，不聚合

分组，聚合

版权

相关文章

评论