收藏夹 - 监控（持续更新）

2018-12-03 1 分钟阅读收藏夹

Monitoring

Metrics, tracing, and logging，Metrics、tracing、logging三个监控系统的区别和联系
Observability 3 ways: logging metrics and tracing，上篇文章的Slides。

Performance benchmark

“How NOT to Measure Latency” by Gil Tene，如何正确解读监控/压力测试结果

Metrics

Prometheus

基本概念：

How does a Prometheus Counter work?，对于理解rate()函数至关重要
Counting with Prometheus [I] ，上篇博客的关联Presentation
rate()/increase() extrapolation considered harmful 关于rate()函数extrapolation（外推）算法的讨论
How does a Prometheus Gauge work?，gauge类型的分析
irate graphs are better graphs，irate提供了更即时的结果
Avoid irate() in alerts
Rate then sum, never sum then rate，rate在前sum在后
Why are Prometheus histograms cumulative?，histogram类型的分析

几个使用技巧：

Existential issues with metrics，使用metrics-based monitoring system的的注意事项
Common query patterns in PromQL，几个常见的PromQL语句
Composing range vector functions in PromQL，如何实现诸如这样的查询：最近1小时内，rate(x[5m])的最高值

运维技巧：

Machine metrics

Understanding Machine CPU usage，虽然是P8S的一篇博客，但是对于理解常见的几个CPU指标还是有用的

Tracing

TODO

Logging

TODO

版权

CC BY-NC-ND 4.0

评论