High-Performance Distributed Log Ingestion Platform
Engineering Analysis: This project represents a case study in evolutionary architecture—optimizing a system from a simple synchronous API to a high-throughput, distributed ingestion engine capable of handling 7.2x the baseline load through careful resource management and low-level optimizations.
The core philosophy of this project is iterative optimization. We didn't just build a fast system; we documented the journey. The project includes a benchmark suite that validates each optimization stage.
Implementation: Synchronous ClickHouse inserts. Analysis: This stage established the "floor." Throughput was limited by network round-trip time (RTT). The Node.js event loop spent most of its time idle, waiting for database acknowledgments.
Implementation: Application-level async dispatch + ClickHouse Client Optimization. Technique:
- Enabled
async_insert=1in the ClickHouse client to allow server-side buffering. - Disabled
wait_for_async_insert=0to return immediately without waiting for disk fsync. - Used Connection Pooling (
max_open_connections) to manage high concurrency. Improvement: 0.7x (Degraded) Analysis: While latency improved, the unconstrained concurrency overwhelmed the system (Unbounded Concurrency). Thousands of pending promises flooded memory, and the lack of backpressure caused instability. Speed increased, but reliability dropped.
Implementation: RequestManager with Double-Buffering.
Technique:
- Request Coalescing: Gathering multiple incoming HTTP requests into a single in-memory batch.
- Double Buffer Pattern: Using two switched arrays (Ping-Pong) to accumulate data without allocating new buffers constantly. Analysis: This stage moved the bottleneck from I/O to CPU. While we saved network calls, the overhead of managing thousands of tiny batches in JavaScript (parsing JSON, managing accumulation timers) blocked the main event loop. This proved that application-level batching in a single thread has limits.
Implementation: Decoupling ingestion from processing. Technique:
- Producer-Consumer: API writes to Redis Streams (memory speed) and returns immediately.
- Durable Buffering: Data is persisted in Redis before processing, preventing data loss. Analysis: This was the breakthrough. By removing the database entirely from the hot path and using Redis as a high-speed buffer, we achieved massive throughput. The bottleneck shifted entirely to how fast Node.js could parse requests.
Implementation: Parallel processing pipeline. Technique:
- Cluster Mode: Spawning one process per CPU core to utilize the full machine resources.
- Worker Threads: Offloading CPU-heavy validation (AJV schema checks) to separate threads, keeping the main event loop free for I/O.
- Zero-Copy Operations: Careful transfer of data buffers between threads and network to minimize memory copy overhead. Analysis: The final architecture combines all previous lessons. We maximize I/O with Redis/Clusters and maximize CPU with Worker Threads.
Implementation: Re-write of the ingestion consumer in C++. Technique:
- Native Code: Using C++ for raw speed and minimal memory footprint.
- Efficient Parsing: SIMD-accelerated JSON parsing (
simdjson). - Connection Reuse: Persistent ClickHouse connections to eliminate TCP handshake overhead. Analysis: This represents the ultimate optimization step—moving compute-heavy tasks (parsing, batching) out of the managed runtime (Node.js) entirely to native code.
Implementation: Binary serialization instead of JSON. Technique:
- Schema-Driven: Using defined
.protocontracts (LogEntry) instead of schemaless JSON. - Binary Payload: Reducing payload size by avoiding repeated field names and using efficient varint encoding. Analysis: Initial micro-benchmarks show a 54% reduction in size and 9.4x faster parsing compared to JSON. This addresses the next bottleneck: Network Bandwidth and CPU parsing overhead.
The Challenge: In high-throughput systems, copying data in memory (e.g., Buffer.concat or JSON.parse creating new objects) is expensive.
Implementation:
- Double Buffer Pattern: The
RequestManagermaintains two pre-allocated arrays (bufferAandbufferB). We swap pointer references instead of moving data locally. - Zero-Copy Protobufs: When using gRPC, we decode Protocol Buffers directly from the binary stream without creating intermediate buffer copies where possible.
The Challenge: Validating objects one-by-one causes "function call overhead" and prevents V8 optimization (inline caching). Implementation:
- See
src/infrastructure/workers/validation-worker.js. - The worker receives a batch of raw data.
- It pre-allocates the result array (
const validEntries = new Array(length)). - It iterates in a tight loop. This is cache-friendly and reduced garbage collection pressure compared to creating thousands of promises for individual items.
The Challenge: ClickHouse performs poorly with many small, concurrent inserts (locking overhead). Implementation:
- The
BatchBufferacts as a Singleton Funnel. - No matter how many concurrent HTTP requests come in, they all pour into one buffer.
- Only one flush operation happens at a time.
- Result: We send 1 massive INSERT query (e.g., 100,000 rows) instead of 100,000 small queries. This maximizes ClickHouse's columnar merge tree performance.
Configuration:
async_insert = 1: Allows ClickHouse to buffer writes server-side (RAM) before writing to disk.wait_for_async_insert = 0: We don't wait for disk flush (fsync). We trust the "fire-and-forget" speed.compression: { request: true }: We compress HTTP bodies sent to ClickHouse, trading a tiny bit of CPU for massive Network Throughput gains.
The codebase follows strict Onion Architecture to ensure the core logic is immune to infrastructure changes.
- Domain: Defines
LogEntryentity andLogRepositoryinterface. - Application: Contains
IngestLoguse case. It knows what to do, not how to save it. - Infrastructure: Implements the actual ClickHouse queries. You could swap this for PostgreSQL/Elasticsearch without touching a line of business logic.
We moved beyond simple text search (grep) to Semantic Understanding.
- How: Using OpenAI embeddings to convert log messages ("DB connection failed") into vectors.
- Storage: Storing these vectors in ClickHouse.
- Query: "Find anomalies in payment service" -> converts to vector -> Cosine Similarity search in DB.
- Runtime: Node.js 22+ (Cluster Mode + Worker Threads)
- Protocols: HTTP/1.1, HTTP/2 (Multiplexing), gRPC (Protobufs)
- Storage: ClickHouse (OLAP), Redis (Streams/Caching)
- Containerization: Docker Compose
To see these optimizations in action on your machine:
# 1. Start Infrastructure
docker-compose up -d
# 2. Run the full optimization suite
npm run benchmarkLatest results running benchmark/http/autocannon-runner.js:
| Scenario | Throughput | Latency (P99) | Improvement |
|---|---|---|---|
| Baseline (Single Node) | ~6,071 req/s | 328 ms | 1.0x |
| Optimized (Cluster Mode) | ~17,255 req/s | 154 ms | 2.8x |
Note: Optimized run uses Node.js Cluster Mode (All cores) + Worker Threads + Redis Streams buffering.
Benchmark Environment:
- OS: macOS
- CPU: 8-Core
- RAM: 32GB
- Runtime: Docker Container + Node.js 22+
| Implementation | Throughput | Memory Usage | Improvement |
|---|---|---|---|
| Node.js (4 Workers) | ~4,214 logs/sec | ~149 MB | 1.0x |
| C++ (1 Thread) | ~11,542 logs/sec | ~42 MB | 2.7x |
| Format | Parsing Speed | Payload Size | Improvement |
|---|---|---|---|
| JSON | ~108k ops/sec | 357 bytes | 1.0x |
| Protobuf | ~1,000k ops/sec | 163 bytes | 9.4x Speed / 2x Size |

