Engineering

e6data’s Architectural Bets: our Head of Engineering’s conversation w/Pete at Zero Prime Podcast

e6data's founding engineer and Head of Engineering, Sudarshan Lakshminarasimhan, unpacks the internals of their compute engine on the Zero Prime podcast

Discussing e6data’s Architectural Bets on the Zero Prime Podcast

Want to see e6data in action?

Learn how data teams power their workloads.

Get Demo
Get Demo

Our founding engineer and Head of Engineering, Sudarshan, recently went on the Zero Prime podcast and unpacked the internals of our compute engine.

“We don’t treat object stores like cold storage. And we don’t think your planner should be the bottleneck in a high-QPS workload.”

Sudarshan, Founding Engineer, e6data

It’s a story of breaking away from the driver-executor model, rethinking scheduling for the object-store era, and why atomic, per-component scaling actually matters.

The Real Problem (2025 edition)

Everyone says “compute and storage are decoupled.” Not really.

  • You scale the cluster because some ad-hoc queries spike.
  • That 10% of your workload defines your baseline cluster size.
  • Your scheduler doesn’t react in real-time, so you over-provision just in case.
  • You get 10% more queries, and you’re forced to double your warehouse size.

Today’s data infra ≠ Today’s compute requirements.

Our Architectural Decisions

We are building e6data by imagining a new playbook. No central coordinator. No one mega-driver. No lock-in to a single table format. Here’s the breakdown:

Core Shifts We Made So Far:

1. Disaggregation of internals
- Separate the planner, metadata ops, and workers.
- Each scales independently, not as a monolith.

2. Dynamic, mid-query scaling
- Queries can scale up/down during execution.
- No pre-provisioning for worst-case. Just-in-time compute.

3. Push-based vectorized execution
- We’re similar to DuckDB/Photon but go deeper on compute orchestration.
- Useful when dealing with 1k+ concurrent user-facing queries.

4. No opinionated stack
- Bring your own catalog, governance layer, and format.
- Plug in; don’t port over.

Feature
e6data
Legacy Engines
Scaling granularity
per-vCPU + component-aware
Full-cluster step scaling
Planner architecture
Stateless, elastic
Single-node driver / coordinator
Supported formats
Iceberg, Delta, Hudi (interoperable)
Often proprietary / locked-in
Cost-performance scaling
Linear with load
Non-linear + overprovisioning

Why It Matters

  • Cost: We run 1000 QPS workloads at ~60% lower TCO than other engines.
  • Latency: p95 under 2s, even with mixed workloads.
  • No Lock-In: Use Iceberg today. Switch to Delta tomorrow. Doesn’t matter to us.
  • Infra Reuse: Already on Kubernetes? Cool. We sit inside that.

Where We’re Headed

  • Real-time ingest → queryable in <15s from object storage
  • Vector + SQL → cosine similarity inside SQL filters
  • AI-native enhancements → smart partitioning, query rewriting, and auto-guardrails
Share on

Build future-proof data products

Try e6data for your heavy workloads!

Get Started for Free
Get Started for Free
Frequently asked questions (FAQs)
How do I integrate e6data with my existing data infrastructure?

We are universally interoperable and open-source friendly. We can integrate across any object store, table format, data catalog, governance tools, BI tools, and other data applications.

How does billing work?

We use a usage-based pricing model based on vCPU consumption. Your billing is determined by the number of vCPUs used, ensuring you only pay for the compute power you actually consume.

What kind of file formats does e6data support?

We support all types of file formats, like Parquet, ORC, JSON, CSV, AVRO, and others.

What kind of performance improvements can I expect with e6data?

e6data promises a 5 to 10 times faster querying speed across any concurrency at over 50% lower total cost of ownership across the workloads as compared to any compute engine in the market.

What kinds of deployment models are available at e6data ?

We support serverless and in-VPC deployment models. 

How does e6data handle data governance rules?

We can integrate with your existing governance tool, and also have an in-house offering for data governance, access control, and security.

Table of contents:
Listen to the full podcast
Apple Podcasts
Spotify
Share this article

Subscribe to our newsletter - Data Engineering ACID

Get 3 weekly stories around data engineering at scale that the e6data team is reading.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

e6data’s Architectural Bets: our Head of Engineering’s conversation w/Pete at Zero Prime Podcast

April 23, 2025
/
e6data Team
Engineering
Discussing e6data’s Architectural Bets on the Zero Prime Podcast

Our founding engineer and Head of Engineering, Sudarshan, recently went on the Zero Prime podcast and unpacked the internals of our compute engine.

“We don’t treat object stores like cold storage. And we don’t think your planner should be the bottleneck in a high-QPS workload.”

Sudarshan, Founding Engineer, e6data

It’s a story of breaking away from the driver-executor model, rethinking scheduling for the object-store era, and why atomic, per-component scaling actually matters.

The Real Problem (2025 edition)

Everyone says “compute and storage are decoupled.” Not really.

  • You scale the cluster because some ad-hoc queries spike.
  • That 10% of your workload defines your baseline cluster size.
  • Your scheduler doesn’t react in real-time, so you over-provision just in case.
  • You get 10% more queries, and you’re forced to double your warehouse size.

Today’s data infra ≠ Today’s compute requirements.

Our Architectural Decisions

We are building e6data by imagining a new playbook. No central coordinator. No one mega-driver. No lock-in to a single table format. Here’s the breakdown:

Core Shifts We Made So Far:

1. Disaggregation of internals
- Separate the planner, metadata ops, and workers.
- Each scales independently, not as a monolith.

2. Dynamic, mid-query scaling
- Queries can scale up/down during execution.
- No pre-provisioning for worst-case. Just-in-time compute.

3. Push-based vectorized execution
- We’re similar to DuckDB/Photon but go deeper on compute orchestration.
- Useful when dealing with 1k+ concurrent user-facing queries.

4. No opinionated stack
- Bring your own catalog, governance layer, and format.
- Plug in; don’t port over.

Feature
e6data
Legacy Engines
Scaling granularity
per-vCPU + component-aware
Full-cluster step scaling
Planner architecture
Stateless, elastic
Single-node driver / coordinator
Supported formats
Iceberg, Delta, Hudi (interoperable)
Often proprietary / locked-in
Cost-performance scaling
Linear with load
Non-linear + overprovisioning

Why It Matters

  • Cost: We run 1000 QPS workloads at ~60% lower TCO than other engines.
  • Latency: p95 under 2s, even with mixed workloads.
  • No Lock-In: Use Iceberg today. Switch to Delta tomorrow. Doesn’t matter to us.
  • Infra Reuse: Already on Kubernetes? Cool. We sit inside that.

Where We’re Headed

  • Real-time ingest → queryable in <15s from object storage
  • Vector + SQL → cosine similarity inside SQL filters
  • AI-native enhancements → smart partitioning, query rewriting, and auto-guardrails
Listen to the full podcast
Share this article

FAQs

How does e6data reduce Snowflake compute costs without slowing queries?
e6data is powered by the industry’s only atomic architecture. Rather than scaling in step jumps (L x 1 -> L x 2), e6data scales atomically, by as little as 1 vCPU. In production with widely varying loads, this translates to > 60% TCO savings.
Do I have to move out of Snowflake?
No, we fit right into your existing data architecture across cloud, on-prem, catalog, governance, table formats, BI tools, and more.

Does e6data speed up Iceberg on Snowflake?
Yes, depending on your workload, you can see anywhere up to 10x faster speeds through our native and advanced Iceberg support. 

Snowflake supports Iceberg. But how do you get data there in real time?
Our real-time streaming ingest streams Kafka or SDK data straight into Iceberg—no Flink. Landing within 60 seconds and auto-registering each snapshot for instant querying.

How long does it take to deploy e6data alongside Snowflake?
Sign up the form and get your instance started. You can deploy it to any cloud, region, deployment model, without copying or migrating any data from Snowflake.

FAQs

Related posts

View All Posts

Related posts

View All
Engineering
This is some text inside of a div block.
June 27, 2025
/
Ankur Ranjan
Iceberg Catalogs 2025: A Deep Dive into Emerging Catalogs and Modern Metadata Management
Ankur Ranjan
June 27, 2025
View All
Engineering
This is some text inside of a div block.
June 11, 2025
/
Adishesh Kishore
Vector & Semantic Search in the Lakehouse: Faster Insight from Unstructured Data
Adishesh Kishore
June 11, 2025
View All
Engineering
This is some text inside of a div block.
June 6, 2025
/
Rajath Gowda
Solving Geospatial Analytics Performance Bottleneck: H3 vs Quadkey
Rajath Gowda
June 6, 2025
View All Posts