摘要:所以采用作為整個(gè)集群的監(jiān)控方案是合適的??梢赃^(guò)濾需要寫到遠(yuǎn)端存儲(chǔ)的。配置中,在的聯(lián)邦和遠(yuǎn)程讀寫的可以考慮設(shè)置該配置項(xiàng),從而區(qū)分各個(gè)集群。目前支持方案支持高可用。目前我們的持久化方案準(zhǔn)備用。
prometheus的遠(yuǎn)端存儲(chǔ) 前言
prometheus在容器云的領(lǐng)域?qū)嵙ξ阌怪靡?,越?lái)越多的云原生組件直接提供prometheus的metrics接口,無(wú)需額外的exporter。所以采用prometheus作為整個(gè)集群的監(jiān)控方案是合適的。但是metrics的存儲(chǔ)這塊,prometheus提供了本地存儲(chǔ),即tsdb時(shí)序數(shù)據(jù)庫(kù)。本地存儲(chǔ)的優(yōu)勢(shì)就是運(yùn)維簡(jiǎn)單,啟動(dòng)prometheus只需一個(gè)命令,下面兩個(gè)啟動(dòng)參數(shù)指定了數(shù)據(jù)路徑和保存時(shí)間。
storage.tsdb.path: tsdb數(shù)據(jù)庫(kù)路徑,默認(rèn) data/
storage.tsdb.retention: 數(shù)據(jù)保留時(shí)間,默認(rèn)15天
缺點(diǎn)就是無(wú)法大量的metrics持久化。當(dāng)然prometheus2.0以后壓縮數(shù)據(jù)能力得到了很大的提升。
為了解決單節(jié)點(diǎn)存儲(chǔ)的限制,prometheus沒(méi)有自己實(shí)現(xiàn)集群存儲(chǔ),而是提供了遠(yuǎn)程讀寫的接口,讓用戶自己選擇合適的時(shí)序數(shù)據(jù)庫(kù)來(lái)實(shí)現(xiàn)prometheus的擴(kuò)展性。
prometheus通過(guò)下面兩張方式來(lái)實(shí)現(xiàn)與其他的遠(yuǎn)端存儲(chǔ)系統(tǒng)對(duì)接
Prometheus 按照標(biāo)準(zhǔn)的格式將metrics寫到遠(yuǎn)端存儲(chǔ)
prometheus 按照標(biāo)準(zhǔn)格式從遠(yuǎn)端的url來(lái)讀取metrics
下面我將重點(diǎn)剖析遠(yuǎn)端存儲(chǔ)的方案
遠(yuǎn)程寫
# The URL of the endpoint to send samples to. url:# Timeout for requests to the remote write endpoint. [ remote_timeout: | default = 30s ] # List of remote write relabel configurations. write_relabel_configs: [ - ... ] # Sets the `Authorization` header on every remote write request with the # configured username and password. # password and password_file are mutually exclusive. basic_auth: [ username: ] [ password: ] [ password_file: ] # Sets the `Authorization` header on every remote write request with # the configured bearer token. It is mutually exclusive with `bearer_token_file`. [ bearer_token: ] # Sets the `Authorization` header on every remote write request with the bearer token # read from the configured file. It is mutually exclusive with `bearer_token`. [ bearer_token_file: /path/to/bearer/token/file ] # Configures the remote write request"s TLS settings. tls_config: [ ] # Optional proxy URL. [ proxy_url: ] # Configures the queue used to write to remote storage. queue_config: # Number of samples to buffer per shard before we start dropping them. [ capacity: | default = 100000 ] # Maximum number of shards, i.e. amount of concurrency. [ max_shards: | default = 1000 ] # Maximum number of samples per send. [ max_samples_per_send: | default = 100] # Maximum time a sample will wait in buffer. [ batch_send_deadline: | default = 5s ] # Maximum number of times to retry a batch on recoverable errors. [ max_retries: | default = 10 ] # Initial retry delay. Gets doubled for every retry. [ min_backoff: | default = 30ms ] # Maximum retry delay. [ max_backoff: | default = 100ms ]
遠(yuǎn)程讀
# The URL of the endpoint to query from. url:# An optional list of equality matchers which have to be # present in a selector to query the remote read endpoint. required_matchers: [ : ... ] # Timeout for requests to the remote read endpoint. [ remote_timeout: | default = 1m ] # Whether reads should be made for queries for time ranges that # the local storage should have complete data for. [ read_recent: | default = false ] # Sets the `Authorization` header on every remote read request with the # configured username and password. # password and password_file are mutually exclusive. basic_auth: [ username: ] [ password: ] [ password_file: ] # Sets the `Authorization` header on every remote read request with # the configured bearer token. It is mutually exclusive with `bearer_token_file`. [ bearer_token: ] # Sets the `Authorization` header on every remote read request with the bearer token # read from the configured file. It is mutually exclusive with `bearer_token`. [ bearer_token_file: /path/to/bearer/token/file ] # Configures the remote read request"s TLS settings. tls_config: [ ] # Optional proxy URL. [ proxy_url: ]
PS
遠(yuǎn)程寫配置中的write_relabel_configs 該配置項(xiàng),充分利用了prometheus強(qiáng)大的relabel的功能??梢赃^(guò)濾需要寫到遠(yuǎn)端存儲(chǔ)的metrics。
例如:選擇指定的metrics。
remote_write: - url: "http://prometheus-remote-storage-adapter-svc:9201/write" write_relabel_configs: - action: keep source_labels: [__name__] regex: container_network_receive_bytes_total|container_network_receive_packets_dropped_total
global配置中external_labels,在prometheus的聯(lián)邦和遠(yuǎn)程讀寫的可以考慮設(shè)置該配置項(xiàng),從而區(qū)分各個(gè)集群。
global: scrape_interval: 20s # The labels to add to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: cid: "9"已有的遠(yuǎn)端存儲(chǔ)的方案
現(xiàn)在社區(qū)已經(jīng)實(shí)現(xiàn)了以下的遠(yuǎn)程存儲(chǔ)方案
AppOptics: write
Chronix: write
Cortex: read and write
CrateDB: read and write
Elasticsearch: write
Gnocchi: write
Graphite: write
InfluxDB: read and write
OpenTSDB: write
PostgreSQL/TimescaleDB: read and write
SignalFx: write
上面有些存儲(chǔ)是只支持寫的。其實(shí)研讀源碼,能否支持遠(yuǎn)程讀,
取決于該存儲(chǔ)是否支持正則表達(dá)式的查詢匹配。具體實(shí)現(xiàn)下一節(jié),將會(huì)解讀一下prometheus-postgresql-adapter和如何實(shí)現(xiàn)一個(gè)自己的adapter。
同時(shí)支持遠(yuǎn)程讀寫的
Cortex來(lái)源于weave公司,整個(gè)架構(gòu)對(duì)prometheus做了上層的封裝,用到了很多組件。稍微復(fù)雜。
InfluxDB 開(kāi)源版不支持集群。對(duì)于metrics量比較大的,寫入壓力大,然后influxdb-relay方案并不是真正的高可用。當(dāng)然餓了么開(kāi)源了influxdb-proxy,有興趣的可以嘗試一下。
CrateDB 基于es。具體了解不多
TimescaleDB 個(gè)人比較中意該方案。傳統(tǒng)運(yùn)維對(duì)pgsql熟悉度高,運(yùn)維靠譜。目前支持 streaming replication方案支持高可用。
后記其實(shí)如果收集的metrics用于數(shù)據(jù)分析,可以考慮clickhouse數(shù)據(jù)庫(kù),集群方案和寫入性能以及支持遠(yuǎn)程讀寫。這塊正在研究中。待有了一定成果以后再專門寫一篇文章解讀。目前我們的持久化方案準(zhǔn)備用TimescaleDB。
文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請(qǐng)注明本文地址:http://systransis.cn/yun/33070.html
摘要:根據(jù)配置文件,對(duì)接收到的警報(bào)進(jìn)行處理,發(fā)出告警。在默認(rèn)情況下,用戶只需要部署多套,采集相同的即可實(shí)現(xiàn)基本的。通過(guò)將監(jiān)控與數(shù)據(jù)分離,能夠更好地進(jìn)行彈性擴(kuò)展。參考文檔本文為容器監(jiān)控實(shí)踐系列文章,完整內(nèi)容見(jiàn) 系統(tǒng)架構(gòu)圖 1.x版本的Prometheus的架構(gòu)圖為:showImg(https://segmentfault.com/img/remote/1460000018372350?w=14...
摘要:根據(jù)配置文件,對(duì)接收到的警報(bào)進(jìn)行處理,發(fā)出告警。在默認(rèn)情況下,用戶只需要部署多套,采集相同的即可實(shí)現(xiàn)基本的。通過(guò)將監(jiān)控與數(shù)據(jù)分離,能夠更好地進(jìn)行彈性擴(kuò)展。參考文檔本文為容器監(jiān)控實(shí)踐系列文章,完整內(nèi)容見(jiàn) 系統(tǒng)架構(gòu)圖 1.x版本的Prometheus的架構(gòu)圖為:showImg(https://segmentfault.com/img/remote/1460000018372350?w=14...
摘要:為了解決單節(jié)點(diǎn)存儲(chǔ)的限制,沒(méi)有自己實(shí)現(xiàn)集群存儲(chǔ),而是提供了遠(yuǎn)程讀寫的接口,讓用戶自己選擇合適的時(shí)序數(shù)據(jù)庫(kù)來(lái)實(shí)現(xiàn)的擴(kuò)展性。的其實(shí)是一個(gè),至于在的另一端是什么類型的時(shí)序數(shù)據(jù)庫(kù)它根本不關(guān)心,如果你愿意,你也可以編寫自己的。 概述 Prometheus提供了本地存儲(chǔ),即tsdb時(shí)序數(shù)據(jù)庫(kù),本地存儲(chǔ)給Prometheus帶來(lái)了簡(jiǎn)單高效的使用體驗(yàn),prometheus2.0以后壓縮數(shù)據(jù)能力也得到了...
摘要:為了解決單節(jié)點(diǎn)存儲(chǔ)的限制,沒(méi)有自己實(shí)現(xiàn)集群存儲(chǔ),而是提供了遠(yuǎn)程讀寫的接口,讓用戶自己選擇合適的時(shí)序數(shù)據(jù)庫(kù)來(lái)實(shí)現(xiàn)的擴(kuò)展性。的其實(shí)是一個(gè),至于在的另一端是什么類型的時(shí)序數(shù)據(jù)庫(kù)它根本不關(guān)心,如果你愿意,你也可以編寫自己的。 概述 Prometheus提供了本地存儲(chǔ),即tsdb時(shí)序數(shù)據(jù)庫(kù),本地存儲(chǔ)給Prometheus帶來(lái)了簡(jiǎn)單高效的使用體驗(yàn),prometheus2.0以后壓縮數(shù)據(jù)能力也得到了...
閱讀 3997·2021-11-22 15:31
閱讀 2542·2021-11-18 13:20
閱讀 3118·2021-11-15 11:37
閱讀 7053·2021-09-22 15:59
閱讀 750·2021-09-13 10:27
閱讀 3787·2021-09-09 09:33
閱讀 1450·2019-08-30 15:53
閱讀 2573·2019-08-29 15:37