The Carbon Daemons¶
When we talk about “Carbon” we mean one or more of various daemons that make up the
storage backend of a Graphite installation. In simple installations, there is typically
only one daemon,
carbon-cache.py. As an installation grows, the
carbon-aggregator.py daemons can be introduced to distribute metrics load and
perform custom aggregations, respectively.
All of the carbon daemons listen for time-series data and can accept it over a common set of protocols. However, they differ in what they do with the data once they receive it. This document gives a brief overview of what each daemon does and how you can use them to build a more sophisticated storage backend.
carbon-cache.py accepts metrics over various protocols and writes them to disk as efficiently as
possible. This requires caching metric values in RAM as they are received, and flushing them to disk
on an interval using the underlying whisper library. It also provides a query service for in-memory
metric datapoints, used by the Graphite webapp to retrieve “hot data”.
carbon-cache.py requires some basic configuration files to run:
carbon-cache.pywhat ports (2003/2004/7002), protocols (newline delimited, pickle) and transports (TCP/UDP) to listen on.
- Defines a retention policy for incoming metrics based on regex patterns. This
policy is passed to whisper when the
.wspfile is pre-allocated, and dictates how long data is stored for.
As the number of incoming metrics increases, one
carbon-cache.py instance may not be
enough to handle the I/O load. To scale out, simply run multiple
carbon-cache.py instances (on one or more machines) behind a
If clients connecting to the
carbon-cache.py are experiencing errors
such as connection refused by the daemon, a common reason is a shortage
of file descriptors.
console.log file, if you find presence of:
Could not accept new connection (EMFILE)
exceptions.IOError: [Errno 24] Too many open files: '/var/lib/graphite/whisper/systems/somehost/something.wsp'
the number of files
carbon-cache.py can open will need to be increased.
Many systems default to a max of 1024 file descriptors. A value of 8192 or more may
be necessary depending on how many clients are simultaneously connecting to the
In Linux, the system-global file descriptor max can be set via sysctl. Per-process limits are set via ulimit. See documentation for your operating system distribution for details on how to set these values.
carbon-relay.py serves two distinct purposes: replication and sharding.
When running with
RELAY_METHOD = rules, a
carbon-relay.py instance can
run in place of a
carbon-cache.py server and relay all incoming metrics to
carbon-cache.py’s running on different ports or hosts.
RELAY_METHOD = consistent-hashing mode, a
DESTINATIONS setting defines a
sharding strategy across multiple
carbon-cache.py backends. The same
consistent hashing list can be provided to the graphite webapp via
spread reads across the multiple backends.
carbon-relay.py is configured via:
carbon-aggregator.py can be run in front of
carbon-cache.py to buffer
metrics over time before reporting them into whisper. This is
useful when granular reporting is not required, and can help reduce I/O load
and whisper file sizes due to lower retention policies.
carbon-aggregator.py is configured via:
[aggregator]section defines listener and destination host/ports.
- Defines a time interval (in seconds) and aggregation function (sum or
average) for incoming metrics matching a certain pattern. At the end of each
interval, the values received are aggregated and published to
carbon-cache.pyas a single metric.
carbon-aggregator-cache.py combines both
carbon-cache.py. This is useful to reduce the resource and administration
overhead of running both daemons.
carbon-aggregator-cache.py is configured via: