一、概述
在版本4.0+ grafana中增加了Alerting 告警模块,丰富了grafana功能,以前告警需要借助AlertManager,但是有grafana告警模块之后就可以不使用AlertManager告警了,但是grafana也支持对接AlertManager,所以还是非常方面,又可以省区一个组件的维护和资源开销。
下图概述了 Grafana 告警的工作原理,并向您介绍了一些关键概念,这些概念协同工作并构成了我们灵活而强大的警报引擎的核心。
特征:
- 一页包含所有警报:单个 Grafana 警报页面将 Grafana 管理的警报和驻留在与 Prometheus 兼容的数据源中的警报整合到一个位置。
- 多维度告警:警报规则可以为每个警报规则创建多个单独的警报实例(称为多维警报),使你能够强大而灵活地通过单个警报来了解整个系统。
- 路由警报:根据您定义的标签将每个警报实例路由到特定的联系点。通知策略是一组规则,用于将警报路由到联系点的位置、时间和方式。
- 抑制告警:抑制告警允许您停止接收来自一个或多个警报规则的持久通知。您还可以根据特定条件部分暂停警报。
- 抑制告警时间段:使用抑制告警时间段设置,您可以指定不希望生成或发送新通知的时间间隔。您还可以将警报通知冻结在重复时间段内,例如在维护期间。
官方文档:https://grafana.com/docs/grafana/latest/alerting/
关于Grafana其它模块的介绍可以参考我这篇文章:【云原生】Grafana 介绍与实战操作
告警配置全过程如下图:
二、Grafana Alerting 模块介绍
- Alert rules(告警规则)——设置确定是否触发警报实例的评估条件。告警规则由一个或多个查询和表达式、条件、计算频率以及满足条件的持续时间(可选)组成。
- Contact points(联络点即告警通道)——定义在警报触发时如何通知联系人。我们支持多种 告警通道,例如:邮件、webhook、alertmanager、钉钉等等。
- Notification policies(通知策略)——设置警报的路由位置、时间和方式。每个通知策略指定一组标签匹配器,以指示它们负责哪些警报。通知策略分配有一个由一个或多个通知程序组成的联系点。
- Silences(告警抑制)——可以设置某个时间段不告警,例如:系统升级或者阶段。
三、配置图表
图表配置可以参考我这篇文章:【云原生】Grafana 介绍与实战操作
四、告警告警规则
进入编辑界面,可以是下图Edit
进入编辑界面,也可以通过快捷方式“选中图表-》按e
”
配置相关信息
配置link,可以在告警里显示,就可以跳转到相关监控项图表
告警状态变化Normal
-》Padding
-》Firing
五、配置告警通道(Contact points)
1)Email
1、配置smtp(grafana.ini)
[smtp]
enabled = true
host = "smtp.qq.com:465"
user = "xxxxxx@qq.com"
# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
password = "xxxxxx"
;cert_file =
;key_file =
;skip_verify = false
from_address = xxxxxx@qq.com
from_name = Grafana
# EHLO identity in SMTP dialog (defaults to instance_name)
;ehlo_identity = dashboard.example.com
# SMTP startTLS policy (defaults to 'OpportunisticStartTLS')
;startTLS_policy = NoStartTLS
【温馨提示】上面配置记得换成自己的邮箱密码。
重启grafana
systemctl restart grafana-server
2、配置消息模板
{{ define "myalert" }}
[{{.Status}}] {{ .Labels.alertname }}
Labels:
{{ range .Labels.SortedPairs }}
{{ .Name }}: {{ .Value }}
{{ end }}
{{ if gt (len .Annotations) 0 }}
Annotations:
{{ range .Annotations.SortedPairs }}
{{ .Name }}: {{ .Value }}
{{ end }}
{{ end }}
{{ if gt (len .SilenceURL ) 0 }}
Silence alert: {{ .SilenceURL }}
{{ end }}
{{ if gt (len .DashboardURL ) 0 }}
Go to dashboard: {{ .DashboardURL }}
{{ end }}
{{ end }}
{{ define "mymessage" }}
{{ if gt (len .Alerts.Firing) 0 }}
{{ len .Alerts.Firing }} firing:
{{ range .Alerts.Firing }} {{ template "myalert" .}} {{ end }}
{{ end }}
{{ if gt (len .Alerts.Resolved) 0 }}
{{ len .Alerts.Resolved }} resolved:
{{ range .Alerts.Resolved }} {{ template "myalert" .}} {{ end }}
{{ end }}
{{ end }}
3、配置告警通道
上面配置好后就等待着告警就ok了。告警信息示例如下:
2)WebHook
告警示例 JSON:
{
"receiver": "My Super Webhook",
"status": "firing",
"orgId": 1,
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "High memory usage",
"team": "blue",
"zone": "us-1"
},
"annotations": {
"description": "The system has high memory usage",
"runbook_url": "https://myrunbook.com/runbook/1234",
"summary": "This alert was triggered for zone us-1"
},
"startsAt": "2021-10-12T09:51:03.157076+02:00",
"endsAt": "0001-01-01T00:00:00Z",
"generatorURL": "https://play.grafana.org/alerting/1afz29v7z/edit",
"fingerprint": "c6eadffa33fcdf37",
"silenceURL": "https://play.grafana.org/alerting/silence/new?alertmanager=grafana&matchers=alertname%3DT2%2Cteam%3Dblue%2Czone%3Dus-1",
"dashboardURL": "",
"panelURL": "",
"valueString": "[ metric='' labels={} value=14151.331895396988 ]"
},
{
"status": "firing",
"labels": {
"alertname": "High CPU usage",
"team": "blue",
"zone": "eu-1"
},
"annotations": {
"description": "The system has high CPU usage",
"runbook_url": "https://myrunbook.com/runbook/1234",
"summary": "This alert was triggered for zone eu-1"
},
"startsAt": "2021-10-12T09:56:03.157076+02:00",
"endsAt": "0001-01-01T00:00:00Z",
"generatorURL": "https://play.grafana.org/alerting/d1rdpdv7k/edit",
"fingerprint": "bc97ff14869b13e3",
"silenceURL": "https://play.grafana.org/alerting/silence/new?alertmanager=grafana&matchers=alertname%3DT1%2Cteam%3Dblue%2Czone%3Deu-1",
"dashboardURL": "",
"panelURL": "",
"valueString": "[ metric='' labels={} value=47043.702386305304 ]"
}
],
"groupLabels": {},
"commonLabels": {
"team": "blue"
},
"commonAnnotations": {},
"externalURL": "https://play.grafana.org/",
"version": "1",
"groupKey": "{}:{}",
"truncatedAlerts": 0,
"title": "[FIRING:2] (blue)",
"state": "alerting",
"message": "**Firing**\n\nLabels:\n - alertname = T2\n - team = blue\n - zone = us-1\nAnnotations:\n - description = This is the alert rule checking the second system\n - runbook_url = https://myrunbook.com\n - summary = This is my summary\nSource: https://play.grafana.org/alerting/1afz29v7z/edit\nSilence: https://play.grafana.org/alerting/silence/new?alertmanager=grafana&matchers=alertname%3DT2%2Cteam%3Dblue%2Czone%3Dus-1\n\nLabels:\n - alertname = T1\n - team = blue\n - zone = eu-1\nAnnotations:\nSource: https://play.grafana.org/alerting/d1rdpdv7k/edit\nSilence: https://play.grafana.org/alerting/silence/new?alertmanager=grafana&matchers=alertname%3DT1%2Cteam%3Dblue%2Czone%3Deu-1\n"
}
这里通过python的去写webhook,因为条件有限,还是通过webhook转到邮箱发告警,一般企业会通过webhook转钉钉,微信,zabbix等等。
1、编写webhook api服务
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# @Time : 2022/12/24 11:03
# @Author : liugp
# @Email : liugp@163.com
# @File : GrafanaWebHook.py
import json
import smtplib
from email.mime.text import MIMEText
from email.header import Header
from flask import Flask, request
# pip3 install flask
app = Flask(__name__)
class GrafanaWebHook:
def __init__(self):
# 第三方 SMTP 服务信息
self.mail_host = "smtp.qq.com"
self.mail_user = "xxxxxx@qq.com"
self.mail_pass = "xxxxxx"
self.sender = "xxxxxx@qq.com"
self.receiver = "xxxxxx@163.com" # 接收邮件,可设置为你的QQ邮箱或者其他邮箱
def send_mail(self, title, status, messages):
print(messages)
for message in messages:
message['panelURL'] = str(message['panelURL']).replace("localhost:3000","192.168.182.110:3000")
print(message)
if not 'description' in message['annotations'].keys():
message['annotations']['description'] = "test"
message = MIMEText('grafana alert:' + title + '\n告警时间:' + str(message['startsAt']) +
'\n告警状态:' + str(status) + '\n告警内容:' + str(
message['annotations']['description']) + '\n告警面板:' + str(message['silenceURL']) + '', 'plain', 'utf-8')
message['From'] = self.sender
message['To'] = self.receiver
subject = title
message['Subject'] = Header(subject, 'utf-8')
try:
smtpObj = smtplib.SMTP_SSL(self.mail_host, 465)
smtpObj.login(self.mail_user, self.mail_pass)
smtpObj.sendmail(self.sender, self.receiver, message.as_string())
print("邮件发送成功")
return True
except smtplib.SMTPException as e:
print("Error: 无法发送邮件", e)
return False
def getAlertData(self):
alertData = request.get_data()
# 将str类型的数据转换为dict类型
alertData = json.loads(alertData)
#print(alertData)
return alertData
@app.route('/webhook', methods=["POST"])
def webhook_server():
gw = GrafanaWebHook()
alertData = gw.getAlertData()
title = alertData['title']
status = alertData['status']
messages = alertData['alerts']
ret = gw.send_mail(title, status, messages)
if ret:
return {"status":"ok"}
else:
return {"status":"error"}
if __name__ == "__main__":
app.run(debug=False, host='0.0.0.0', port=18088)
【温馨提示】使用时注意把上面的邮箱和密码修改哦!!!
2、在grafana页面上配置
配置好后就可以等待告警,告警示例如下:
3)Alertmanager
配置如下:
这里主要讲了三种告警通道,其它告警通道小伙伴可以自行测试验证,有疑问的小伙伴也欢迎给我留言,后续会持续更新【云原生+大数据】相关的文章,请小伙伴耐心等待~