Core Concepts
Understand Prometheus metrics fundamentals.
Metric Types
Counter
A counter is a cumulative metric that only increases.
Use for: Counting events (requests, errors, operations)
// Example: Track cache hits
metrics.incrementCounter('redisx_cache_hits_total', { layer: 'l1' });
// Value over time:
// t=0: 0
// t=1: 5
// t=2: 12
// t=3: 18 ← Always increasingIn Prometheus:
# Total cache hits
redisx_cache_hits_total{layer="l1"}
# Rate per second (more useful)
rate(redisx_cache_hits_total{layer="l1"}[5m])Gauge
A gauge is a metric that can go up or down.
Use for: Current values (active connections, queue size, memory usage)
// Example: Track active locks
activeLocks.set(5); // Set to 5
activeLocks.inc(); // Increase to 6
activeLocks.dec(); // Decrease to 5
// Value over time:
// t=0: 0
// t=1: 5
// t=2: 3 ← Can decrease
// t=3: 7 ← Can increaseIn Prometheus:
# Current active locks
redisx_locks_active
# Average over time
avg_over_time(redisx_locks_active[5m])Histogram
A histogram samples observations and counts them in buckets.
Use for: Measuring distributions (latency, request sizes)
// Example: Track command latency
const timer = commandDuration.startTimer({ command: 'GET' });
// ... execute command ...
timer(); // Records durationBuckets:
le="0.001": 1523 ← 1523 requests < 1ms
le="0.005": 2891 ← 2891 requests < 5ms
le="0.01": 3102 ← 3102 requests < 10ms
+Inf: 3102 ← Total requestsIn Prometheus:
# P95 latency
histogram_quantile(0.95,
rate(myapp_redis_command_duration_seconds_bucket[5m])
)
# P99 latency
histogram_quantile(0.99,
rate(myapp_redis_command_duration_seconds_bucket[5m])
)
# Average latency
rate(myapp_redis_command_duration_seconds_sum[5m])
/
rate(myapp_redis_command_duration_seconds_count[5m])Summary
A summary is similar to histogram but calculates quantiles on the client side.
Use for: Pre-calculated percentiles (less common, prefer histograms)
Labels
Labels add dimensions to metrics.
Good Labels
// ✅ Good - Low cardinality
metrics.incrementCounter('cache_hits_total', {
layer: 'l1', // Limited values: l1, l2
});
// ✅ Good - Bounded values
metrics.incrementCounter('lock_acquisitions_total', {
status: 'success', // Limited values: success, failed
});Bad Labels
// ❌ Bad - High cardinality
metrics.incrementCounter('cache_hits_total', {
userId: '12345', // Millions of unique values!
timestamp: Date.now().toString(), // Infinite values!
});
// ❌ Bad - Unbounded
metrics.incrementCounter('lock_acquisitions_total', {
lockKey: randomUUID(), // Every lock is unique!
});Label Best Practices
1. Use low-cardinality labels:
// ❌ Bad - Too many unique values
{ endpoint: '/api/users/12345/profile' }
// ✅ Good - Grouped by pattern
{ endpoint: '/api/users/:id/profile' }2. Limit number of labels:
// ❌ Bad - Too many labels
{
service, region, datacenter,
rack, server, pod, container
}
// ✅ Good - Essential labels only
{ service, region, env }3. Choose meaningful names:
// ❌ Bad - Unclear
{ t: 'u', o: 'g' }
// ✅ Good - Self-documenting
{ type: 'user', operation: 'get' }Cardinality
Cardinality = Number of unique time series
// Metric with 2 labels:
// - namespace: 3 values (users, products, orders)
// - operation: 4 values (get, set, delete, exists)
// Cardinality = 3 × 4 = 12 time seriesHigh Cardinality Problems
Symptoms:
- High memory usage in Prometheus
- Slow queries
- OOM errors
Causes:
// ❌ This creates millions of time series!
httpRequests.inc({
userId: req.user.id, // 1M users
requestId: req.id, // Infinite
timestamp: Date.now(), // Infinite
});
// Cardinality = ∞Solutions:
// ✅ Use aggregated labels
httpRequests.inc({
endpoint: normalizeEndpoint(req.path), // ~100 endpoints
method: req.method, // 9 methods
status: Math.floor(res.status / 100), // 5 categories (2xx, 3xx, etc)
});
// Cardinality = 100 × 9 × 5 = 4,500 ✓Naming Conventions
Metric Names
// Pattern: <namespace>_<subsystem>_<name>_<unit>
// ✅ Good examples
redisx_command_duration_seconds
redisx_cache_hits_total
redisx_locks_active
// ❌ Bad examples
redis_time // Missing unit
CacheHits // Not snake_case
myapp_hits // Missing subsystemLabel Names
// ✅ Good - Snake case, descriptive
{
http_method: 'GET',
status_code: '200',
cache_tier: 'l1',
}
// ❌ Bad - Inconsistent, unclear
{
Method: 'GET', // PascalCase
code: '200', // Too short
tier1: 'yes', // tier1 what?
}Metric Lifecycle
Best Practices
1. Choose the Right Type
| Scenario | Metric Type | Example |
|---|---|---|
| Count events | Counter | cache_hits_total |
| Current value | Gauge | active_connections |
| Latency/duration | Histogram | command_duration_seconds |
| Size distribution | Histogram | message_size_bytes |
2. Use Consistent Units
// ✅ Good - Use base units
command_duration_seconds // Not milliseconds
memory_bytes // Not megabytes
temperature_celsius // Not fahrenheit
// ✅ Good - Add suffix
_total // Counter
_seconds // Duration
_bytes // Size
_ratio // 0-1 value (e.g., hit rate)3. Keep Labels Bounded
// ✅ Good
const allowedLayers = ['l1', 'l2'];
if (allowedLayers.includes(layer)) {
metrics.incrementCounter('cache_hits_total', { layer });
}
// ❌ Bad - Unbounded
metrics.incrementCounter('cache_hits_total', { key: userInput }); // Any value!4. Use Histograms for Latency
// ✅ Good - Can calculate percentiles
const timer = commandDuration.startTimer();
await executeCommand();
timer();
// ❌ Bad - Lose distribution info
const start = Date.now();
await executeCommand();
avgLatency.set(Date.now() - start);5. Avoid High Cardinality
// Rule of thumb:
// Total cardinality < 10,000 per metric = ✅ Good
// Total cardinality > 100,000 per metric = ⚠️ Warning
// Total cardinality > 1,000,000 per metric = ❌ BadQuery Examples
Rate
# Requests per second
rate(http_requests_total[5m])Percentiles
# P95 latency
histogram_quantile(0.95,
rate(http_request_duration_seconds_bucket[5m])
)Error Rate
# Error percentage
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
* 100Aggregation
# Total across all instances
sum(redis_connections_active)
# Average per instance
avg(redis_connections_active)
# Max across instances
max(redis_connections_active)Next Steps
- Configuration — Configure metrics
- Prometheus — Set up Prometheus
- Custom Metrics — Create your own metrics