Catalog / Prometheus Cheatsheet
Prometheus Cheatsheet
A quick reference guide for Prometheus, covering essential concepts, PromQL queries, configuration, and best practices for monitoring and alerting in a DevOps environment.
Core Concepts
Metrics and Data Model
Metric Types:
|
Data Model: Prometheus fundamentally stores all data as time series: streams of timestamped values belonging to the same metric and the same set of labeled dimensions. |
Labels: Key-value pairs that allow Prometheus’s dimensional data model to shine. Any given combination of labels for the same metric identify a particular dimensional instantiation of that metric (e.g. all HTTP requests that used the method |
Architecture
Prometheus Server |
Scrapes and stores time-series data. |
Service Discovery |
Automatically discovers targets to scrape. |
Exporters |
Expose metrics from third-party systems (e.g., node_exporter for system metrics). |
Alertmanager |
Handles alerts sent by Prometheus. |
Key Components
Prometheus Server: The core component responsible for scraping metrics, storing them, and evaluating alerting rules. Exporters: Tools that expose metrics in a Prometheus-readable format. Examples include Alertmanager: Handles alerts generated by Prometheus. It can group, deduplicate, and route alerts to various receivers (e.g., email, Slack, PagerDuty). |
PromQL - Querying Prometheus
Basic Queries
|
|
|
Functions
|
Calculates the per-second average rate of increase of the time series in the range vector. Use for counters. |
|
Calculates the per-second instant rate of increase of the time series in the range vector. Useful for graphing volatile counters. |
|
Calculates the increase in the time series in the range vector. Good for graphing total increases. |
|
Sums the values of all time series with the same label values. |
|
Averages the values of all time series with the same label values. |
|
Calculates the given quantile from a histogram. |
Common Queries
CPU Usage:
|
Memory Usage:
|
Disk Usage:
|
Configuration
Prometheus Configuration File (prometheus.yml)
The main configuration file for Prometheus, written in YAML. It defines scrape configurations, alerting rules, and other settings. |
global: - Global settings such as scrape interval and evaluation interval. |
Scrape Configuration
A scrape configuration defines how Prometheus scrapes metrics from a target.
|
job_name: - The name of the job. |
Alerting Rules
Alerting rules define conditions under which alerts should be fired.
|
alert: - The name of the alert. |
Best Practices
Naming Conventions
Use consistent and descriptive names for metrics and labels. |
Follow the |
Use labels to add dimensionality to your metrics (e.g., |
Alerting Strategies
Define meaningful alerts that provide actionable insights. |
Use |
Group alerts based on severity and route them to the appropriate teams. |
Monitoring Strategies
Monitor key performance indicators (KPIs) for your applications and infrastructure. |
Use dashboards to visualize metrics and identify trends. |
Implement service discovery to automatically monitor new instances. |