zero_

High-Performance Distributed Log Ingestion Platform

Engineering Analysis: This project represents a case study in evolutionary architecture—optimizing a system from a simple synchronous API to a high-throughput, distributed ingestion engine capable of handling 7.2x the baseline load through careful resource management and low-level optimizations.

The Evolution of Performance

The core philosophy of this project is iterative optimization. We didn't just build a fast system; we documented the journey. The project includes a benchmark suite that validates each optimization stage.

Stage 1: Baseline (I/O Bottleneck)

Implementation: Synchronous ClickHouse inserts. Analysis: This stage established the "floor." Throughput was limited by network round-trip time (RTT). The Node.js event loop spent most of its time idle, waiting for database acknowledgments.

Stage 2: Fire-and-Forget (Naive Concurrency)

Implementation: Application-level async dispatch + ClickHouse Client Optimization. Technique:

Enabled async_insert=1 in the ClickHouse client to allow server-side buffering.
Disabled wait_for_async_insert=0 to return immediately without waiting for disk fsync.
Used Connection Pooling (max_open_connections) to manage high concurrency. Improvement: 0.7x (Degraded) Analysis: While latency improved, the unconstrained concurrency overwhelmed the system (Unbounded Concurrency). Thousands of pending promises flooded memory, and the lack of backpressure caused instability. Speed increased, but reliability dropped.

Stage 3: Request Coalescing (Application Batching)

Implementation: RequestManager with Double-Buffering. Technique:

Request Coalescing: Gathering multiple incoming HTTP requests into a single in-memory batch.
Double Buffer Pattern: Using two switched arrays (Ping-Pong) to accumulate data without allocating new buffers constantly. Analysis: This stage moved the bottleneck from I/O to CPU. While we saved network calls, the overhead of managing thousands of tiny batches in JavaScript (parsing JSON, managing accumulation timers) blocked the main event loop. This proved that application-level batching in a single thread has limits.

Stage 4: Redis Streams (Durable Buffering)

Implementation: Decoupling ingestion from processing. Technique:

Producer-Consumer: API writes to Redis Streams (memory speed) and returns immediately.
Durable Buffering: Data is persisted in Redis before processing, preventing data loss. Analysis: This was the breakthrough. By removing the database entirely from the hot path and using Redis as a high-speed buffer, we achieved massive throughput. The bottleneck shifted entirely to how fast Node.js could parse requests.

Stage 5: Worker Threads & Clusters (Zero-Copy & CPU Offloading)

Implementation: Parallel processing pipeline. Technique:

Cluster Mode: Spawning one process per CPU core to utilize the full machine resources.
Worker Threads: Offloading CPU-heavy validation (AJV schema checks) to separate threads, keeping the main event loop free for I/O.
Zero-Copy Operations: Careful transfer of data buffers between threads and network to minimize memory copy overhead. Analysis: The final architecture combines all previous lessons. We maximize I/O with Redis/Clusters and maximize CPU with Worker Threads.

Stage 6: C++ Ingester (Native Performance)

Implementation: Re-write of the ingestion consumer in C++. Technique:

Native Code: Using C++ for raw speed and minimal memory footprint.
Efficient Parsing: SIMD-accelerated JSON parsing (simdjson).
Connection Reuse: Persistent ClickHouse connections to eliminate TCP handshake overhead. Analysis: This represents the ultimate optimization step—moving compute-heavy tasks (parsing, batching) out of the managed runtime (Node.js) entirely to native code.

Stage 7: Protocol Buffers

Implementation: Binary serialization instead of JSON. Technique:

Schema-Driven: Using defined .proto contracts (LogEntry) instead of schemaless JSON.
Binary Payload: Reducing payload size by avoiding repeated field names and using efficient varint encoding. Analysis: Initial micro-benchmarks show a 54% reduction in size and 9.4x faster parsing compared to JSON. This addresses the next bottleneck: Network Bandwidth and CPU parsing overhead.

Engineering Deep Dive: Critical Features

1. Zero-Copy & Double Buffering

The Challenge: In high-throughput systems, copying data in memory (e.g., Buffer.concat or JSON.parse creating new objects) is expensive. Implementation:

Double Buffer Pattern: The RequestManager maintains two pre-allocated arrays (bufferA and bufferB). We swap pointer references instead of moving data locally.
Zero-Copy Protobufs: When using gRPC, we decode Protocol Buffers directly from the binary stream without creating intermediate buffer copies where possible.

2. Batch Validation (CPU access optimization)

The Challenge: Validating objects one-by-one causes "function call overhead" and prevents V8 optimization (inline caching). Implementation:

See src/infrastructure/workers/validation-worker.js.
The worker receives a batch of raw data.
It pre-allocates the result array (const validEntries = new Array(length)).
It iterates in a tight loop. This is cache-friendly and reduced garbage collection pressure compared to creating thousands of promises for individual items.

3. Single Writer Pattern

The Challenge: ClickHouse performs poorly with many small, concurrent inserts (locking overhead). Implementation:

The BatchBuffer acts as a Singleton Funnel.
No matter how many concurrent HTTP requests come in, they all pour into one buffer.
Only one flush operation happens at a time.
Result: We send 1 massive INSERT query (e.g., 100,000 rows) instead of 100,000 small queries. This maximizes ClickHouse's columnar merge tree performance.

4. Database Optimization (ClickHouse)

Configuration:

async_insert = 1: Allows ClickHouse to buffer writes server-side (RAM) before writing to disk.
wait_for_async_insert = 0: We don't wait for disk flush (fsync). We trust the "fire-and-forget" speed.
compression: { request: true }: We compress HTTP bodies sent to ClickHouse, trading a tiny bit of CPU for massive Network Throughput gains.

Architecture Principles

The codebase follows strict Onion Architecture to ensure the core logic is immune to infrastructure changes.

Domain: Defines LogEntry entity and LogRepository interface.
Application: Contains IngestLog use case. It knows what to do, not how to save it.
Infrastructure: Implements the actual ClickHouse queries. You could swap this for PostgreSQL/Elasticsearch without touching a line of business logic.

Experimental: Vector Semantic Search

We moved beyond simple text search (grep) to Semantic Understanding.

How: Using OpenAI embeddings to convert log messages ("DB connection failed") into vectors.
Storage: Storing these vectors in ClickHouse.
Query: "Find anomalies in payment service" -> converts to vector -> Cosine Similarity search in DB.

Tech Stack

Runtime: Node.js 22+ (Cluster Mode + Worker Threads)
Protocols: HTTP/1.1, HTTP/2 (Multiplexing), gRPC (Protobufs)
Storage: ClickHouse (OLAP), Redis (Streams/Caching)
Containerization: Docker Compose

Speed Test (Benchmark)

To see these optimizations in action on your machine:

# 1. Start Infrastructure
docker-compose up -d

# 2. Run the full optimization suite
npm run benchmark

Performance Benchmarks

Latest results running benchmark/http/autocannon-runner.js:

Scenario	Throughput	Latency (P99)	Improvement
Baseline (Single Node)	~6,071 req/s	328 ms	1.0x
Optimized (Cluster Mode)	~17,255 req/s	154 ms	2.8x

Note: Optimized run uses Node.js Cluster Mode (All cores) + Worker Threads + Redis Streams buffering.

Benchmark Environment:

OS: macOS
CPU: 8-Core
RAM: 32GB
Runtime: Docker Container + Node.js 22+

Ingestion Throughput (Redis -> ClickHouse)

Implementation	Throughput	Memory Usage	Improvement
Node.js (4 Workers)	~4,214 logs/sec	~149 MB	1.0x
C++ (1 Thread)	~11,542 logs/sec	~42 MB	2.7x

Serialization Efficiency (Micro-benchmark)

Format	Parsing Speed	Payload Size	Improvement
JSON	~108k ops/sec	357 bytes	1.0x
Protobuf	~1,000k ops/sec	163 bytes	9.4x Speed / 2x Size

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
benchmark		benchmark
config/clickhouse		config/clickhouse
cpp-ingester		cpp-ingester
measurements		measurements
proto		proto
src		src
tests/unit		tests/unit
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
LICENSE		LICENSE
README.md		README.md
cluster.js		cluster.js
docker-compose.yml		docker-compose.yml
init-clickhouse.sql		init-clickhouse.sql
jest.config.js		jest.config.js
nodemon.json		nodemon.json
package-lock.json		package-lock.json
package.json		package.json
redis.conf		redis.conf
setup-certs.sh		setup-certs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zero_

The Evolution of Performance

Stage 1: Baseline (I/O Bottleneck)

Stage 2: Fire-and-Forget (Naive Concurrency)

Stage 3: Request Coalescing (Application Batching)

Stage 4: Redis Streams (Durable Buffering)

Stage 5: Worker Threads & Clusters (Zero-Copy & CPU Offloading)

Stage 6: C++ Ingester (Native Performance)

Stage 7: Protocol Buffers

Engineering Deep Dive: Critical Features

1. Zero-Copy & Double Buffering

2. Batch Validation (CPU access optimization)

3. Single Writer Pattern

4. Database Optimization (ClickHouse)

Architecture Principles

Experimental: Vector Semantic Search

Tech Stack

Speed Test (Benchmark)

Performance Benchmarks

Ingestion Throughput (Redis -> ClickHouse)

Serialization Efficiency (Micro-benchmark)

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

zero_

The Evolution of Performance

Stage 1: Baseline (I/O Bottleneck)

Stage 2: Fire-and-Forget (Naive Concurrency)

Stage 3: Request Coalescing (Application Batching)

Stage 4: Redis Streams (Durable Buffering)

Stage 5: Worker Threads & Clusters (Zero-Copy & CPU Offloading)

Stage 6: C++ Ingester (Native Performance)

Stage 7: Protocol Buffers

Engineering Deep Dive: Critical Features

1. Zero-Copy & Double Buffering

2. Batch Validation (CPU access optimization)

3. Single Writer Pattern

4. Database Optimization (ClickHouse)

Architecture Principles

Experimental: Vector Semantic Search

Tech Stack

Speed Test (Benchmark)

Performance Benchmarks

Ingestion Throughput (Redis -> ClickHouse)

Serialization Efficiency (Micro-benchmark)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages