移除、简单的数据处理操作
发布时间:2025-06-24 17:33:53 作者:北方职教升学中心 阅读量:799
配置完成后,再次进行验证
[root@logstash-server logstash]# bin/logstash -f config/first-pipeline.conf
下面是输出内容
#需要等一下,才能输出以下内容:[2024-03-14T11:49:56,399][INFO ][logstash.javapipeline ][main]Pipeline started {"pipeline.id"=>"main"}[2024-03-14T11:49:56,443][INFO ][logstash.agent ]Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}{"@timestamp"=>2024-03-14T03:49:56.438442963Z, "@version"=>"1", "user_agent"=>{"original"=>"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"}, "host"=>{"name"=>"logstash-server"}, "log"=>{"file"=>{"path"=>"/var/log/httpd.log"}}, "http"=>{"request"=>{"method"=>"GET", "referrer"=>"http://semicomplete.com/presentations/logstash-monitorama-2013/"}, "version"=>"1.1", "response"=>{"scode"=>200, "body"=>{"bytes"=>203023}}}, "timestamp"=>"04/Jan/2015:05:13:42 +0000", "url"=>{"original"=>"/presentations/logstash-monitorama-2013/imageskibana-search.png"}, "message"=>"83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/imageskibana-search.png HTTP/1.1\"200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\"\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"", "event"=>{"original"=>"83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] "GET /presentations/logstash-monitorama-2013/imageskibana-search.png HTTP/1.1"200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36""}, "source"=>{"address"=>"83.149.9.216"}}
你会发现原来的非结构化数据,变为结构化的数据了。在此示例中,该clientip字段包含IP地址。移除、简单的数据处理操作。
这里使用的模式是 %{COMBINEDAPACHELOG}
%{COMBINEDAPACHELOG}
是一个预定义的 grok 模式,用于解析 Apache HTTP 服务器的**“combined”**日志格式。
输入插件使用来自源的数据,过滤器插件根据你的指定修改数据,输出插件将数据写入目标。
grok 如何知道哪些内容是你感兴趣的呢?它是通过自己预定义的模式来识别感兴趣的字段的。
通过 Elastic Cloud 无服务器上的 Elastic Observability 中的 Logstash 集成提供 Logstash 监控。为此,应该使用grok过滤器插件。
2. 工作原理
Logstash 事件处理管道有三个阶段:输入 → 过滤器 → 输出。
filters:filters 是一个可选模块,可以在数据同步到目的地之前,对数据进行一些格式化、
该geoip插件配置要求您指定包含IP地址来查找源字段的名称。
消息队列 kafka、
所以, 这里使用了 file
,创建示例日志文件
[root@logstash-server ~]# vim /var/log/httpd.log83.149.9.216 - - [04/Jan/2015:05:13:42 +0000]"GET /presentations/logstash-monitorama-2013/imageskibana-search.png HTTP/1.1"200203023"http://semicomplete.com/presentations/logstash-monitorama-2013/""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
确保没有缓存数据
[root@logstash file]# pwd/usr/local/logstash/data/plugins/inputs/file[root@logstash file]# ls -a....sincedb_aff270f7990dabcdbd0044eac08398ef[root@logstash file]# rm -rf .sincedb_aff270f7990dabcdbd0044eac08398ef#第一次执行肯定是没有的,data目录下面也没有plugins这个目录
修改好的管道配置文件如下:
[root@logstash-server logstash]# vim /usr/local/logstash/config/first-pipeline.conf#注释方法#####input {file{path =>["/var/log/httpd.log"]start_position =>"beginning"}}filter {grok {# 对 web 日志进行过滤处理,输出结构化的数据# 在 message 字段对应的值中查询匹配上 COMBINEDAPACHELOGmatch =>{"message"=>"%{COMBINEDAPACHELOG}"}}}output {stdout {}}
match => { "message" => "%{COMBINEDAPACHELOG}"}
的意思是:
当匹配到 “message” 字段时,用户模式 “COMBINEDAPACHELOG}” 进行字段映射。任何类型的事件都可以通过广泛的输入、
- grok 是一种采用组合多个预定义的正则表达式,用来匹配分割文本并映射到关键字的工具。rabbitmq 等:支持从各种消息队列读取数据。logstash-output-elasticsearch 设置默认为端口:9200。
Logstash 到 Elasticsearch 无服务器的已知问题。
完成后的管道配置文件如下:
[root@logstash-server logstash]# vim config/first-pipeline.confinput {file{path =>["/var/log/httpd.log"]start_position =>"beginning"}}filter {grok {match =>{"message"=>"%{COMBINEDAPACHELOG}"}}geoip {source=>"clientip"}}output {stdout {}}
再次输入之前的日志内容,就会看到如下输出
#记得先删除缓存[root@logstash-server logstash]# rm -rf data/plugins[root@logstash-server logstash]# bin/logstash -f config/first-pipeline.conf[2023-05-04T11:30:41,667][INFO ][logstash.agent ]Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}{"host"=>"logstash-server", "verb"=>"GET", "geoip"=>{"country_name"=>"Russia", "country_code2"=>"RU", "location"=>{"lat"=>55.7527, "lon"=>37.6172}, "longitude"=>37.6172, "region_name"=>"Moscow", "region_code"=>"MOW", "timezone"=>"Europe/Moscow", "country_code3"=>"RU", "continent_code"=>"EU", "ip"=>"83.149.9.216", "city_name"=>"Moscow", "latitude"=>55.7527, "postal_code"=>"129223"}, "ident"=>"-", "clientip"=>"83.149.9.216", "auth"=>"-", "@timestamp"=>2023-05-04T03:30:42.063Z, "message"=>"83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/imageskibana-search.png HTTP/1.1\"200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\"\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"", "timestamp"=>"04/Jan/2015:05:13:42 +0000", "@version"=>"1", "path"=>"/var/log/httpd.log", "request"=>"/presentations/logstash-monitorama-2013/imageskibana-search.png", "bytes"=>"203023", "agent"=>"\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"", "httpversion"=>"1.1", "response"=>"200", "referrer"=>"\"http://semicomplete.com/presentations/logstash-monitorama-2013/\""}
详情请参考 grok 和 geoip,更多过滤器插件的使用:过滤器插件
#查看插件[root@logstash-server logstash]# ./bin/logstash-plugin listUsing bundled JDK: /usr/local/logstash/jdklogstash-codec-avrologstash-codec-ceflogstash-codec-collectdlogstash-codec-dotslogstash-codec-ednlogstash-codec-edn_lineslogstash-codec-es_bulklogstash-codec-fluentlogstash-codec-graphitelogstash-codec-jsonlogstash-codec-json_lineslogstash-codec-linelogstash-codec-msgpacklogstash-codec-multilinelogstash-codec-netflowlogstash-codec-plainlogstash-codec-rubydebuglogstash-filter-aggregatelogstash-filter-anonymizelogstash-filter-cidrlogstash-filter-clonelogstash-filter-csvlogstash-filter-datelogstash-filter-de_dotlogstash-filter-dissectlogstash-filter-dnslogstash-filter-droplogstash-filter-elasticsearchlogstash-filter-fingerprintlogstash-filter-geoiplogstash-filter-groklogstash-filter-httplogstash-filter-jsonlogstash-filter-kvlogstash-filter-memcachedlogstash-filter-metricslogstash-filter-mutatelogstash-filter-prunelogstash-filter-rubylogstash-filter-sleeplogstash-filter-splitlogstash-filter-syslog_prilogstash-filter-throttlelogstash-filter-translatelogstash-filter-truncatelogstash-filter-urldecodelogstash-filter-useragentlogstash-filter-uuidlogstash-filter-xmllogstash-input-azure_event_hubslogstash-input-beats└── logstash-input-elastic_agent (alias)logstash-input-couchdb_changeslogstash-input-dead_letter_queuelogstash-input-elastic_serverless_forwarderlogstash-input-elasticsearchlogstash-input-execlogstash-input-filelogstash-input-ganglialogstash-input-gelflogstash-input-generatorlogstash-input-graphitelogstash-input-heartbeatlogstash-input-httplogstash-input-http_pollerlogstash-input-imaplogstash-input-jmslogstash-input-pipelogstash-input-redislogstash-input-snmplogstash-input-snmptraplogstash-input-stdinlogstash-input-sysloglogstash-input-tcplogstash-input-twitterlogstash-input-udplogstash-input-unixlogstash-integration-aws ├── logstash-codec-cloudfront ├── logstash-codec-cloudtrail ├── logstash-input-cloudwatch ├── logstash-input-s3 ├── logstash-input-sqs ├── logstash-output-cloudwatch ├── logstash-output-s3 ├── logstash-output-sns └── logstash-output-sqslogstash-integration-elastic_enterprise_search ├── logstash-output-elastic_app_search └── logstash-output-elastic_workplace_searchlogstash-integration-jdbc ├── logstash-input-jdbc ├── logstash-filter-jdbc_streaming └── logstash-filter-jdbc_staticlogstash-integration-kafka ├── logstash-input-kafka └── logstash-output-kafkalogstash-integration-logstash ├── logstash-input-logstash └── logstash-output-logstashlogstash-integration-rabbitmq ├── logstash-input-rabbitmq └── logstash-output-rabbitmqlogstash-output-csvlogstash-output-elasticsearchlogstash-output-emaillogstash-output-filelogstash-output-graphitelogstash-output-httplogstash-output-lumberjacklogstash-output-nagioslogstash-output-nulllogstash-output-pipelogstash-output-redislogstash-output-stdoutlogstash-output-tcplogstash-output-udplogstash-output-webhdfslogstash-patterns-core
6. 配置接收 Beats 的输入
# 监听 5044 端口,接收 filebeat 的输入;logstash服务器上操作[root@logstash-server logstash]# vim config/first-pipeline.confinput {beats {port =>5044}}filter {grok {match =>{"message"=>"%{COMBINEDAPACHELOG}"}}# geoip { source => "clientip" }}output {stdout {}}
运行 logstash 之后,修改 filebeat 的 yml 文件输出目标如下:
# filebeat 服务器上面操作:[root@filebeat-server filebeat]# vim filebeat.yml...output.logstash: # The Logstash hostshosts: ["192.168.221.140:5044"]#IP是logstash的IP... #将 output.elasticsearch 删除,output.logstash复制到这里
filebeat机器清除缓存目录
[root@filebeat-server filebeat]# rm -rf /usr/local/filebeat/data/
运行filebeat
[root@filebeat-server filebeat]# systemctl restart filebeat.service[root@filebeat-server filebeat]# systemctl status filebeat.service● filebeat.service - Filebeat sends log files to Logstash or directly to Elasticsearch. Loaded: loaded (/usr/lib/systemd/system/filebeat.service;enabled;vendor preset: disabled)Active: active (running)since 四 2024-03-14 15:29:16 CST;6s ago Main PID: 1418(filebeat)CGroup: /system.slice/filebeat.service └─1418 /usr/local/filebeat/filebeat -c/usr/local/filebeat/filebea...3月 1415:29:16 filebeat-server systemd[1]: Stopped Filebeat sends log file....3月 1415:29:16 filebeat-server systemd[1]: Started Filebeat sends log file....Hint: Some lines were ellipsized, use -lto show infull.
运行logstash
[root@logstash-server logstash]# rm -rf data/plugins[root@logstash-server logstash]# bin/logstash -f config/first-pipeline.confUsing bundled JDK: /usr/local/logstash/jdkOpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated inversion 9.0and will likely be removed ina future release...............#可以看到 logstash 中取的是 filebeat 收集的日志信息{"input"=>{"type"=>"log"}, "source"=>{"address"=>"123.127.39.50"}, "http"=>{"request"=>{"referrer"=>"http://81.68.233.173/", "method"=>"GET"}, "version"=>"1.1", "response"=>{"body"=>{"bytes"=>14137}, "status_code"=>200}}, "ecs"=>{"version"=>"1.12.0"}, "log"=>{"offset"=>0, "file"=>{"path"=>"/opt/nginx/log/nginx/access.log"}}, "agent"=>{"id"=>"afbbf9f5-d7f7-4057-a70d-fa4e3a4741fc", "version"=>"8.12.2", "type"=>"filebeat", "ephemeral_id"=>"28cf958a-d735-43d4-88c0-19d4460a39f2", "name"=>"filebeat-server"}, "@version"=>"1", "host"=>{"containerized"=>false, "architecture"=>"x86_64", "name"=>"filebeat-server", "mac"=>[[0]"00-0C-29-40-59-B2"], "id"=>"4746d2ecb7c945cdbc93de5d156817a0", "ip"=>[[0]"192.168.221.139", [1]"fe80::4ee8:bb9d:ef6c:9934"], "hostname"=>"filebeat-server", "os"=>{"codename"=>"Core", "platform"=>"centos", "name"=>"CentOS Linux", "type"=>"linux", "version"=>"7 (Core)", "kernel"=>"3.10.0-1062.el7.x86_64", "family"=>"redhat"}}, "user_agent"=>{"original"=>"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.192 Safari/537.36"}, "service"=>{"type"=>"nginx"}, "@timestamp"=>2024-03-14T07:30:51.531Z, "tags"=>[[0]"beats_input_codec_plain_applied"], "url"=>{"original"=>"/logo.jpg"}, "fileset"=>{"name"=>"access"}, "message"=>"123.127.39.50 - - [04/Mar/2021:10:50:28 +0800] \"GET /logo.jpg HTTP/1.1\"200 14137 \"http://81.68.233.173/\"\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.192 Safari/537.36\"\"-\"", "timestamp"=>"04/Mar/2021:10:50:28 +0800", "event"=>{"module"=>"nginx", "original"=>"123.127.39.50 - - [04/Mar/2021:10:50:28 +0800] "GET /logo.jpg HTTP/1.1"200 14137 "http://81.68.233.173/""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.192 Safari/537.36""-"", "timezone"=>"+08:00", "dataset"=>"nginx.access"}}{"input"=>{"type"=>"log"}, "ecs"=>{"version"=>"1.12.0"}, "log"=>{"offset"=>0, "file"=>{"path"=>"/opt/nginx/log/nginx/error.log"}}, "agent"=>{"id"=>"afbbf9f5-d7f7-4057-a70d-fa4e3a4741fc", "type"=>"filebeat", "version"=>"8.12.2", "ephemeral_id"=>"28cf958a-d735-43d4-88c0-19d4460a39f2", "name"=>"filebeat-server"}, "@version"=>"1", "host"=>{"containerized"=>false, "architecture"=>"x86_64", "name"=>"filebeat-server", "mac"=>[[0]"00-0C-29-40-59-B2"], "id"=>"4746d2ecb7c945cdbc93de5d156817a0", "ip"=>[[0]"192.168.221.139", [1]"fe80::4ee8:bb9d:ef6c:9934"], "hostname"=>"filebeat-server", "os"=>{"codename"=>"Core", "family"=>"redhat", "name"=>"CentOS Linux", "type"=>"linux", "version"=>"7 (Core)", "kernel"=>"3.10.0-1062.el7.x86_64", "platform"=>"centos"}}, "service"=>{"type"=>"nginx"}, "@timestamp"=>2024-03-14T07:30:51.531Z, "tags"=>[[0]"beats_input_codec_plain_applied", [1]"_grokparsefailure"], "fileset"=>{"name"=>"error"}, "message"=>"2021/03/04 10:50:28 [error] 11396#0: *5 open() \"/farm/bg.jpg\"failed (2: No such file or directory), client: 123.127.39.50, server: localhost, request: \"GET /bg.jpg HTTP/1.1\", host: \"81.68.233.173\", referrer: \"http://81.68.233.173/\"", "event"=>{"module"=>"nginx", "original"=>"2021/03/04 10:50:28 [error] 11396#0: *5 open() "/farm/bg.jpg"failed (2: No such file or directory), client: 123.127.39.50, server: localhost, request: "GET /bg.jpg HTTP/1.1", host: "81.68.233.173", referrer: "http://81.68.233.173/"", "dataset"=>"nginx.error", "timezone"=>"+08:00"}}