【监控与可观测性】03-ELK日志体系搭建:从采集到告警的完整闭环 ELK 日志体系搭建从采集到告警的完整闭环专栏监控 可观测性难度进阶标签ELKElasticsearchLogstashKibanaFilebeat日志前言分散在各服务器上的日志在出问题时根本来不及一台台查。ELK 日志平台把所有日志集中管理是运维的眼睛。一、架构选型应用服务器 ↓ Filebeat轻量采集推荐 Kafka消息队列流量削峰 ↓ Logstash清洗、转换、过滤 Elasticsearch存储、索引 ↓ Kibana查询、可视化 ↓ Watcher 告警规则 钉钉/企微通知为什么加 Kafka日志量高峰期 Logstash 处理不过来Kafka 做缓冲防止数据丢失。二、Docker Compose 快速搭建# docker-compose.ymlversion:3.8services:elasticsearch:image:elasticsearch:8.8.0environment:-discovery.typesingle-node-ES_JAVA_OPTS-Xms2g-Xmx2g-xpack.security.enabledfalseports:-9200:9200volumes:-es_data:/usr/share/elasticsearch/datakibana:image:kibana:8.8.0ports:-5601:5601environment:-ELASTICSEARCH_HOSTShttp://elasticsearch:9200depends_on:-elasticsearchlogstash:image:logstash:8.8.0volumes:-./logstash.conf:/usr/share/logstash/pipeline/logstash.confdepends_on:-elasticsearchvolumes:es_data:三、Filebeat 配置部署在应用服务器# /etc/filebeat/filebeat.ymlfilebeat.inputs:-type:logenabled:truepaths:-/var/log/nginx/access.logfields:service:nginxenv:productionmultiline:# 处理Java异常的多行日志pattern:^\d{4}-\d{2}-\d{2}negate:truematch:after-type:logpaths:-/opt/app/logs/*.logfields:service:myappoutput.kafka:hosts:[kafka:9092]topic:logs-%{[fields.service]}codec.json:pretty:false四、Logstash 处理管道# logstash.confinput{kafka{bootstrap_serverskafka:9092topics_patternlogs-.*group_idlogstashcodecjson}}filter{# 解析Nginx access logif[fields][service]nginx{grok{match{message%{IPORHOST:remote_ip} - %{DATA:user} \[%{HTTPDATE:time}\] %{WORD:method} %{DATA:url} HTTP/%{NUMBER:http_version} %{NUMBER:response_code} %{NUMBER:bytes} %{DATA:referrer} %{DATA:agent} %{NUMBER:request_time}}}mutate{convert{response_codeintegerbytesintegerrequest_timefloat}}}# 慢请求标记if[request_time]and[request_time]1.0{mutate{add_tag[slow_request]}}# 解析时间戳date{match[time,dd/MMM/yyyy:HH:mm:ss Z]targettimestamp}}output{elasticsearch{hosts[elasticsearch:9200]indexlogs-%{[fields][service]}-%{YYYY.MM.dd}}}五、Elasticsearch 索引模板# 创建索引模板避免字段映射冲突curl-XPUThttp://localhost:9200/_index_template/logs-HContent-Type: application/json-d { index_patterns: [logs-*], template: { settings: { number_of_shards: 2, number_of_replicas: 1, index.lifecycle.name: logs-policy }, mappings: { properties: { timestamp: {type: date}, response_code: {type: integer}, request_time: {type: float}, remote_ip: {type: ip} } } } }六、ILM 索引生命周期管理自动清理# 配置策略7天转到warm30天删除curl-XPUThttp://localhost:9200/_ilm/policy/logs-policy-HContent-Type: application/json-d { policy: { phases: { hot: { actions: { rollover: { max_size: 10GB, max_age: 1d } } }, warm: { min_age: 7d, actions: { readonly: {}, shrink: {number_of_shards: 1}, forcemerge: {max_num_segments: 1} } }, delete: { min_age: 30d, actions: {delete: {}} } } } }七、Kibana 日志查询技巧# 查询Nginx 5xx错误 response_code 500 AND fields.service: nginx # 查询慢请求 tags: slow_request AND request_time 2 # 查询特定IP remote_ip: 192.168.1.100 # 时间范围关键字 timestamp:[now-1h TO now] AND message: OutOfMemoryError结语ELK平台的核心价值是把日志变成可搜索、可告警的数据资产。Filebeat轻量采集Kafka削峰Logstash清洗ES存储Kibana展示五个组件各司其职。