TCO

Open source vs paid monitoring: the real total cost

Verified April 2026

Prometheus is free. Running it is not. An independent TCO comparison of self-hosted, Grafana Cloud, and Datadog at the same 100-host scale.

TL;DR

Self-hosted Prometheus plus Grafana plus Loki plus Tempo for 100 hosts costs approximately $2K to $8K/mo in infrastructure plus 0.5 to 1 FTE of engineering time. Loaded TCO: $8K to $20K/mo. Datadog at the same scale lists at $5K to $15K/mo; Grafana Cloud at $3K to $9K/mo. The answer depends on whether you have platform engineering capacity to deploy.

Three options at 100 hosts

Loaded TCO comparison

All numbers are loaded annual cost at a typical 100-host deployment. Self-hosted figures use industry-standard cloud compute pricing for the underlying infrastructure plus engineering time at $200K loaded cost per FTE.

Self-hosted (Prometheus, Grafana, Loki, Tempo)

Licence$0

Infrastructure$2K to $8K/mo

Engineering0.5 to 1 FTE setup; 0.25 to 0.5 FTE ongoing

Training$5K to $15K initial

Loaded total

$8K to $20K/mo loaded

Cheapest at the licence level. Real cost is engineering time and operational maturity.

Grafana Cloud (managed open source)

Licence$3K to $9K/mo at 100 hosts

Infrastructuren/a

Engineering0.1 FTE ongoing

TrainingLower; same Grafana / Prometheus stack

Loaded total

$3K to $9K/mo

The pragmatic middle ground. Open-source data formats, managed operations, no vendor lock at the data layer.

Fully commercial (Datadog)

Licence$5K to $15K/mo at 100 hosts

Infrastructuren/a

Engineering0.1 FTE ongoing

TrainingDatadog-specific training

Loaded total

$5K to $15K/mo

Highest licence cost, lowest operational burden, broadest integration ecosystem.

Self-hosted

What 'free' Prometheus actually requires

The licence is zero. The deployment is a multi-component distributed system.

Prometheus

Metrics ingest and storage

2 to 4 vCPU, 16 to 32 GB RAM per ingest replica. 100 GB to 1 TB local storage with WAL. Federation or remote-write to long-term backend (Cortex, Mimir, Thanos, VictoriaMetrics).

Grafana

Dashboarding and alerting UI

1 to 2 vCPU, 4 GB RAM, single replica adequate for most teams. Postgres or MySQL for state.

Loki

Log aggregation

Object storage backend (S3, GCS, Azure Blob). 2 to 4 vCPU per ingester, 8 GB RAM. Horizontally scalable.

Tempo

Distributed tracing

Object storage backend. 2 to 4 vCPU per ingester. Designed for cheap trace storage.

Long-term metrics (Mimir/Thanos/VictoriaMetrics)

Long-term retention and query

Object storage plus query/ingest workers. The deciding factor between hobbyist and production-grade Prometheus deployments.

The egress trap

Cross-AZ data transfer between Prometheus, Loki, and the applications they monitor adds up. See egresscost.com for cloud egress pricing detail.

Decision matrix

Open source wins when

+Strong platform engineering team with Kubernetes operational maturity.
+K8s-native stack where the team already runs Helm charts and Operators.
+High data volumes where the per-host or per-GB pricing of commercial vendors hits hard.
+Strict data residency or compliance requirements that favour self-managed.
+Long-term horizon. Open-source pays back over multi-year deployments, not pilots.

Paid wins when

+Small or generalist engineering team, no platform function.
+Aggressive feature breadth (RUM, synthetics, security, AI ops) needed out of the box.
+Compliance regimes that prefer SaaS audit trails (SOC 2, FedRAMP).
+Short on-call rota where 4am pages from a self-managed Prometheus cluster are unacceptable.
+Pre-product-market-fit teams where engineering hours are scarcer than dollars.

The hybrid path

Most teams settle in the middle

Common hybrid pattern

Prometheus and Grafana for infrastructure metrics (free at the licence layer, mature operationally). Datadog or Grafana Cloud for APM and logs (where the integration breadth pays off). Typical saving versus full Datadog: 40 to 60 percent. Trade-off: two-platform operational complexity.

Migration cost

What it costs to switch from Datadog to self-hosted

A realistic migration timeline for a 100-host Datadog deployment to a self-hosted Prometheus stack runs 8 to 16 weeks of engineering time across multiple roles. Key cost components:

2 to 4 weeks: Prometheus / Grafana / Loki / Tempo deployment, parallel to existing Datadog.
2 to 4 weeks: dashboard migration and validation. Grafana imports many but not all Datadog dashboards cleanly.
2 to 3 weeks: alert translation. Datadog monitors map roughly but not exactly to Prometheus alerting rules.
1 to 2 weeks: runbook update, on-call team retraining, paging integration.
1 to 3 weeks: cutover, monitoring of the monitoring, decommissioning.

Loaded engineering cost at typical SaaS rates: $30,000 to $80,000 one-off. Annual saving needs to clear that hurdle inside 12 months for the project to be a clean win.

Where to go next

Compare all six vendors →

Including Grafana Cloud as the open-source-managed bridge.

Run the calculator →

Model self-hosted versus paid for your numbers.

Reduce costs first →

Often a cheaper win than a full migration.

Frequently asked

Is Prometheus free?

Yes, at the licence level. Operating Prometheus in production for a 100-host cluster typically costs $2K to $8K/mo in cloud infrastructure plus 0.5 to 1 FTE in setup engineering and 0.25 to 0.5 FTE ongoing. Total loaded TCO sits at $8K to $20K/mo for a non-trivial deployment.

Is open source monitoring really free?

Licence cost is zero. True total cost of ownership includes infrastructure, engineering time for setup and operation, training, and the opportunity cost of platform engineers not building product. For teams without platform engineering capacity, managed open source (Grafana Cloud) usually beats both extremes.

What is the TCO of self-hosted monitoring?

For a 100-host deployment: $2K to $8K/mo cloud infrastructure plus 0.5 to 1 FTE in year one and 0.25 to 0.5 FTE ongoing, plus $5K to $15K initial training. Loaded annual TCO lands at $100K to $250K. The crossover with Datadog is roughly equivalent at this scale; below 50 hosts paid wins, above 500 hosts open source wins.

Should I use open source or paid monitoring?

Open source if you have platform engineering capacity and a long horizon. Paid if you have a small generalist team, broad feature requirements, or a short on-call rota. Grafana Cloud is the most defensible middle ground for teams that want open-source data formats without operational burden.