Monitoring
- Metrics, tracing, and logging,Metrics、tracing、logging三个监控系统的区别和联系
- Observability 3 ways: logging metrics and tracing,上篇文章的Slides。
Performance benchmark
- “How NOT to Measure Latency” by Gil Tene,如何正确解读监控/压力测试结果
Metrics
Prometheus
基本概念:
- How does a Prometheus Counter work?,对于理解rate()函数至关重要
- Counting with Prometheus [I] ,上篇博客的关联Presentation
- rate()/increase() extrapolation considered harmful 关于rate()函数extrapolation(外推)算法的讨论
- How does a Prometheus Gauge work?,gauge类型的分析
- irate graphs are better graphs,irate提供了更即时的结果
- Avoid irate() in alerts
- Rate then sum, never sum then rate,rate在前sum在后
- Why are Prometheus histograms cumulative?,histogram类型的分析
几个使用技巧:
- Existential issues with metrics,使用metrics-based monitoring system的的注意事项
- Common query patterns in PromQL,几个常见的PromQL语句
- Composing range vector functions in PromQL,如何实现诸如这样的查询:最近1小时内,rate(x[5m])的最高值
运维技巧:
Machine metrics
- Understanding Machine CPU usage,虽然是P8S的一篇博客,但是对于理解常见的几个CPU指标还是有用的
Tracing
TODO
Logging
TODO
评论