Engineering

e6data’s Architectural Bets: our Head of Engineering’s conversation w/Pete at Zero Prime Podcast

e6data Team

April 23, 2025

e6data's founding engineer and Head of Engineering, Sudarshan Lakshminarasimhan, unpacks the internals of their compute engine on the Zero Prime podcast

Discussing e6data’s Architectural Bets on the Zero Prime Podcast

Want to see e6data in action?

Learn how data teams power their workloads.

Get Demo

Our founding engineer and Head of Engineering, Sudarshan, recently went on the Zero Prime podcast and unpacked the internals of our compute engine.

“We don’t treat object stores like cold storage. And we don’t think your planner should be the bottleneck in a high-QPS workload.”
‍
Sudarshan, Founding Engineer, e6data

It’s a story of breaking away from the driver-executor model, rethinking scheduling for the object-store era, and why atomic, per-component scaling actually matters.

‍

The Real Problem (2025 edition)

Everyone says “compute and storage are decoupled.” Not really.

You scale the cluster because some ad-hoc queries spike.
That 10% of your workload defines your baseline cluster size.
Your scheduler doesn’t react in real-time, so you over-provision just in case.
You get 10% more queries, and you’re forced to double your warehouse size.

Today’s data infra ≠ Today’s compute requirements.

‍

Our Architectural Decisions

We are building e6data by imagining a new playbook. No central coordinator. No one mega-driver. No lock-in to a single table format. Here’s the breakdown:

‍

Core Shifts We Made So Far:

1. Disaggregation of internals
- Separate the planner, metadata ops, and workers.
- Each scales independently, not as a monolith.

2. Dynamic, mid-query scaling
- Queries can scale up/down during execution.
- No pre-provisioning for worst-case. Just-in-time compute.

3. Push-based vectorized execution
- We’re similar to DuckDB/Photon but go deeper on compute orchestration.
- Useful when dealing with 1k+ concurrent user-facing queries.

4. No opinionated stack
- Bring your own catalog, governance layer, and format.
- Plug in; don’t port over.

Scaling granularity

per-vCPU + component-aware

Full-cluster step scaling

Planner architecture

Stateless, elastic

Single-node driver / coordinator

Supported formats

Iceberg, Delta, Hudi (interoperable)

Often proprietary / locked-in

Cost-performance scaling

Linear with load

Non-linear + overprovisioning

Why It Matters

Cost: We run 1000 QPS workloads at ~60% lower TCO than other engines.
Latency: p95 under 2s, even with mixed workloads.
No Lock-In: Use Iceberg today. Switch to Delta tomorrow. Doesn’t matter to us.‍
Infra Reuse: Already on Kubernetes? Cool. We sit inside that.

‍

Where We’re Headed

Real-time ingest → queryable in <15s from object storage
Vector + SQL → cosine similarity inside SQL filters
AI-native enhancements → smart partitioning, query rewriting, and auto-guardrails

Share on

Build future-proof data products

Try e6data for your heavy workloads!

Get Started for Free

Frequently asked questions (FAQs)

How do I integrate e6data with my existing data infrastructure?

How does billing work?

What kind of file formats does e6data support?

What kind of performance improvements can I expect with e6data?

What kinds of deployment models are available at e6data ?

How does e6data handle data governance rules?

Available at

Blog Events Docs

Terms and Conditions Privacy Policy Cookie Policy

Back

Table of contents:

Listen to the full podcast

Apple Podcasts

Spotify

Share this article