GreptimeDB vs. Elasticsearch

Inverted indexes power search.
They also inflate your observability storage.

Elasticsearch is a distributed search and analytics engine built on Apache Lucene, widely used for log analysis, APM, and enterprise search. Its inverted index excels at full-text search, but for observability workloads — especially traces and high-volume logs — the storage overhead is structural: JSON document model + inverted indexes + multiple index replicas inflate data significantly. Click here to read the full log benchmark report. For trace storage comparison, see the storage efficiency analysis.

CHALLENGER

Elasticsearch

Search-first architecture where observability data inflates fast

Indexes all fields by default — high-cardinality trace/span IDs inflate storage up to 45x
Log ingestion at 39K TPS — GreptimeDB achieves 185K TPS (4.7x) in [benchmark](/blogs/2025-04-24-elasticsearch-greptimedb-comparison-performance)
JVM tuning, shard management, and ILM add operational burden

GREPTIMEDB

GreptimeDB

Columnar storage on S3, designed for observability retention from day one

Columnar compression stores observability data at 1/10 the size (benchmark)
Flexible indexing — skipping index for high-cardinality trace/span IDs, bloom-filter fulltext for logs, inverted index where it fits
4.7x faster log ingestion in benchmark tests, S3 write with only 1-2% throughput loss
Jaeger UI compatible — migrate traces in about a week

Elastic Stack

5-9 COMPONENTS

Beats / OTel Collector / Logstash

collect + transform

Ingest Nodes + Data Nodes

index + store

Master / Coordinator Nodes

cluster routing

ILM + Snapshot Repositories

retention + archive

GreptimeDB

1 DATABASE

Frontend node (stateless, auto-scale)

query + ingest gateway

Datanode (compute, stateless)

native object storage

Unified SQL + PromQL workflow for observability
Native support for logs, metrics, and traces
Scale compute and storage independently

Feature comparison

Dimension	GreptimeDB	Elasticsearch
Storage format	Apache Parquet (columnar, compressed)	Lucene segments with inverted indexes
Indexing strategy	Per-field: inverted, skipping, bloom-filter fulltext, vector	Indexes all fields by default — storage-heavy on high-cardinality data
Storage efficiency	~1/10 of ES for log data; up to 45x reduction for traces	JSON + inverted index + replicas inflate storage
Query language	SQL + PromQL (dual interface)	Query DSL (JSON-based), Elasticsearch SQL (built-in)
Data types	Metrics + Logs + Traces in one database	Primarily documents (separate systems for metrics)
Ingestion throughput	185K TPS (structured logs)	39K TPS (structured logs, 4.7x slower)
Storage backend	Native object storage (S3, OSS, GCS)	Local disk, snapshot to S3 for cold data
Scaling model	Compute-storage disaggregation, stateless nodes	Master-data node cluster with shard management
Trace compatibility	Jaeger Query API compatible, migrate in ~1 week	Native Jaeger/ES backend
OpenTelemetry	Native OTLP (all signals)	Via Elastic APM / Observability stack
License	Apache 2.0	ELv2 / SSPL / AGPLv3 (triple-licensed since 2024)
Operational complexity	Single system for observability	ELK/EEK stack: multiple components to manage