Product

e6data’s Hybrid Data Lakehouse: 10x Faster Queries, Near-Zero Egress, Sub-Second Latency

e6data Team

August 5, 2025

Want to see e6data in action?

Learn how data teams power their workloads.

Get Demo

Today,

92% of enterprises already run multi‑cloud
80% worry about over‑relying on one cloud
BFSI firms face region‑locked data rules, while needing global insight

But, most of these enterprises still end up copying data between regions and providers incurring hefty egress charges, and juggling inconsistent security controls in each environment leading to policy drift and compliance nightmares.

The hard truth is that most cloud data warehouses and lakehouses weren’t designed for hybrid agility, often being single cloud-dependent and lacking interoperability across regions, new formats like Iceberg, and more. The result: lock‑in, surprise costs, governance gaps, and poor performance at peak load for enterprises, especially in highly regulated industries like BFSI and healthcare.

TLDR; We ventured to solve this, and came up with our hybrid data lakehouse which queries 10x faster, with ~0% egress fee. Details below.

‍

What We Did Differently: Federated SQL Engine with Hybrid Cluster Architecture ‍

e6data’s Hybrid Data Lakehouse Architecture: An Example Setup

‍

The architecture is designed such that the hybrid cluster is abstracted out from the end user's querying experience, and they get to write queries as though there were a single cluster talking to all these data sources. Here’s a brief breakdown on how it works:

Federated SQL engine: Unifies queries across different clouds, regions, and on-prem silos without forcing data migration.
Hybrid cluster layout: One “main” entry-point cluster receives the query. Many “ancillary” clusters sit next to each data domain.
Smart task routing: The main cluster pushes each compute task to the ancillary cluster closest to the requested data, minimising egress and latency.
Secure peer-to-peer gateway: Encrypts traffic between clusters and keeps every data hand-off compliant with enterprise policies.
Governance built in: Central IAM and policy definitions propagate automatically to every cluster.

Benchmarks: 10x Faster Speed, ~0% Egress Fee, and Another ~40% Off Latency With Caching

Our engineering team went out to battle test this several times around on production data. One of the experiments relied on a TPCDS 1TB synthetic dataset, split across two: 720 M rows in AWS, 2.75 B rows on-prem with configurations as below:

Environment
1. On-prem environment is simulated using e6data’s GCP cloud account
2. Cloud environment is simulated using e6data’s AWS cloud account
Catalogs
1. Hive Catalog on-premises
2. Unity Catalog on cloud
Clusters (all executors are standard e6data executors (30 cores))
1. “on-prem-only” - 2 executors, 1 planner, workspace components all running on-premises
2. “hybrid” - 1 executor in on-premises, 1 executor in cloud, planner and all workspace components in on-premises
3. “hybrid-caching” - same as (b) , but has data caching enabled
Queries
1. Q1: Full table scan + filter, unions cloud and on-prem sales.
2. Q2: Scan + filter + aggregate, same union.

Results: 0.097% latency, 0.01% egress costs (with no caching)

Cluster	Q1 latency	Q1 egress	Q2 latency	Q2 egress
on-prem	41 s	35 Gb	8.2 s	7 Gb
hybrid	4 s	0.40 Gb	1.8 s	0.047 Gb
hybrid caching	2.5 s	0.40 Gb	0.88 s	0.044 Gb

10x speed by keeping compute local.
~100x less data shuffled; egress fees practically disappear.
Adding cache reduces another ~40% off latency with no extra data movement.

What it Enables: Complex, High-speed Analytics with Zero Egress Fee, and One Policy Governance

e6data's hybrid data lakehouse is supported on all major cloud providers, integrating with most popular metastores and object storage. It is used in production across customers, with one of the major US global bank saving 95% egress costs on it. Here are the list of features we support:

No egress fee – every task runs on compute co‑located with the source, therefore eliminating the need to move data anywhere.
One policy everywhere – Regulators expect identical ACLs in every zone; copy-pasting rules don't scale. e6data’s hybrid data lakehouse shares one catalog and IAM hook. If you set a rule once, it applies across the cluster.
Affinity‑aware scheduler – bind workers and tasks to resources closest to your data, so only the tiny result set crosses the wire, maintaining latency and performance SLAs .
Speaks open formats – compatible with all formats, like Parquet, Iceberg, Delta with no lock-in, and migrations.
Single logical cluster – one control plane, yet nodes live across all clouds (AWS, Azure, GCP), regions, and on‑prem.

What’s Next: Agent-facing Analytics on Hybrid Data Lakehouses

The future of analytics belongs to AI data analysts that never log off. They hammer databases with multiple queries, demanding sub-second answers around the clock. In cloud-only setups, this creates runaway spend. A hybrid lakehouse flips the economics. Each agent prompt is handled on the node sitting beside the data, so no cross-region transfers, no egress bills, and governance rules stay intact. As agent-driven analytics becomes a standard practice, a hybrid lakehouse is the only architecture that can keep up with both the speed and the costs of analytics.

Check out our documentation to dive deeper and see how e6data’s hybrid data lakehouse can fit into your architecture.

‍

Share on

Build future-proof data products

Try e6data for your heavy workloads!

Get Started for Free

Frequently asked questions (FAQs)

How do I integrate e6data with my existing data infrastructure?

How does billing work?

What kind of file formats does e6data support?

What kind of performance improvements can I expect with e6data?

What kinds of deployment models are available at e6data ?

How does e6data handle data governance rules?

Available at

Blog Events Docs

Terms and Conditions Privacy Policy Cookie Policy

Back

Table of contents:

Listen to the full podcast

Apple Podcasts

Spotify

Share this article

Back

e6data’s Hybrid Data Lakehouse: 10x Faster Queries, Near-Zero Egress, Sub-Second Latency

August 5, 2025

e6data Team

Product

Today,

92% of enterprises already run multi‑cloud
80% worry about over‑relying on one cloud
BFSI firms face region‑locked data rules, while needing global insight

TLDR; We ventured to solve this, and came up with our hybrid data lakehouse which queries 10x faster, with ~0% egress fee. Details below.

‍

What We Did Differently: Federated SQL Engine with Hybrid Cluster Architecture ‍

‍

Federated SQL engine: Unifies queries across different clouds, regions, and on-prem silos without forcing data migration.
Hybrid cluster layout: One “main” entry-point cluster receives the query. Many “ancillary” clusters sit next to each data domain.
Smart task routing: The main cluster pushes each compute task to the ancillary cluster closest to the requested data, minimising egress and latency.
Secure peer-to-peer gateway: Encrypts traffic between clusters and keeps every data hand-off compliant with enterprise policies.
Governance built in: Central IAM and policy definitions propagate automatically to every cluster.

Benchmarks: 10x Faster Speed, ~0% Egress Fee, and Another ~40% Off Latency With Caching

Environment
1. On-prem environment is simulated using e6data’s GCP cloud account
2. Cloud environment is simulated using e6data’s AWS cloud account
Catalogs
1. Hive Catalog on-premises
2. Unity Catalog on cloud
Clusters (all executors are standard e6data executors (30 cores))
1. “on-prem-only” - 2 executors, 1 planner, workspace components all running on-premises
2. “hybrid” - 1 executor in on-premises, 1 executor in cloud, planner and all workspace components in on-premises
3. “hybrid-caching” - same as (b) , but has data caching enabled
Queries
1. Q1: Full table scan + filter, unions cloud and on-prem sales.
2. Q2: Scan + filter + aggregate, same union.

Results: 0.097% latency, 0.01% egress costs (with no caching)

Cluster	Q1 latency	Q1 egress	Q2 latency	Q2 egress
on-prem	41 s	35 Gb	8.2 s	7 Gb
hybrid	4 s	0.40 Gb	1.8 s	0.047 Gb
hybrid caching	2.5 s	0.40 Gb	0.88 s	0.044 Gb

10x speed by keeping compute local.
~100x less data shuffled; egress fees practically disappear.
Adding cache reduces another ~40% off latency with no extra data movement.

What it Enables: Complex, High-speed Analytics with Zero Egress Fee, and One Policy Governance

No egress fee – every task runs on compute co‑located with the source, therefore eliminating the need to move data anywhere.
One policy everywhere – Regulators expect identical ACLs in every zone; copy-pasting rules don't scale. e6data’s hybrid data lakehouse shares one catalog and IAM hook. If you set a rule once, it applies across the cluster.
Affinity‑aware scheduler – bind workers and tasks to resources closest to your data, so only the tiny result set crosses the wire, maintaining latency and performance SLAs .
Speaks open formats – compatible with all formats, like Parquet, Iceberg, Delta with no lock-in, and migrations.
Single logical cluster – one control plane, yet nodes live across all clouds (AWS, Azure, GCP), regions, and on‑prem.

What’s Next: Agent-facing Analytics on Hybrid Data Lakehouses

Check out our documentation to dive deeper and see how e6data’s hybrid data lakehouse can fit into your architecture.

‍

Listen to the full podcast

Apple Podcasts

Spotify

Share this article

FAQs

What is a hybrid data lakehouse?

It is a single logical lakehouse whose storage, compute and catalog layers can live partly on-prem and partly in the cloud, giving you warehouse-grade reliability on data lakehouse for faster, cost-efficient, and scalable analytics. 

How are governance rules handled across clouds?

Policies are defined once in a central catalog/IAM hook and automatically propagate to every cluster, keeping identical ACLs in every zone.

Why is a hybrid approach vital for BFSI and other regulated sectors?

These industries must keep data within jurisdictions; hybrid execution lets them analyse it globally while avoiding region‑locked transfers, lock‑in and surprise costs.

How is the architecture ready for AI agent‑driven analytics?

Each prompt is executed on the node nearest the data, sustaining sub‑second answers around the clock while controlling spend and maintaining governance.

View All Posts

Engineering

October 16, 2025

Arnav Borkar

Metadata at Scale: Tackling Apache Iceberg Tables with Tens of Millions of Files

Arnav Borkar

October 16, 2025

Engineering

October 10, 2025

Samyak Sarnayak

Faster JSON in SQL: A Deep Dive into Variant Data Type

Samyak Sarnayak

October 10, 2025

Engineering

September 19, 2025

Yash Bhisikar

German Strings: The 16-Byte Secret to Faster Analytics

Yash Bhisikar

September 19, 2025

View All Posts

Available at

Blog Events Docs

Terms and Conditions Privacy Policy Cookie Policy

e6data’s Hybrid Data Lakehouse: 10x Faster Queries, Near-Zero Egress, Sub-Second Latency

What We Did Differently: Federated SQL Engine with Hybrid Cluster Architecture ‍

Benchmarks: 10x Faster Speed, ~0% Egress Fee, and Another ~40% Off Latency With Caching

Results: 0.097% latency, 0.01% egress costs (with no caching)

What it Enables: Complex, High-speed Analytics with Zero Egress Fee, and One Policy Governance

What’s Next: Agent-facing Analytics on Hybrid Data Lakehouses

Build future-proof data products

Frequently asked questions (FAQs)

Subscribe to our newsletter - Data Engineering ACID

e6data’s Hybrid Data Lakehouse: 10x Faster Queries, Near-Zero Egress, Sub-Second Latency

What We Did Differently: Federated SQL Engine with Hybrid Cluster Architecture ‍

Benchmarks: 10x Faster Speed, ~0% Egress Fee, and Another ~40% Off Latency With Caching

Results: 0.097% latency, 0.01% egress costs (with no caching)

What it Enables: Complex, High-speed Analytics with Zero Egress Fee, and One Policy Governance

What’s Next: Agent-facing Analytics on Hybrid Data Lakehouses

FAQs

FAQs

Related posts

Related posts

e6data’s Hybrid Data Lakehouse: 10x Faster Queries, Near-Zero Egress, Sub-Second Latency

What We Did Differently: Federated SQL Engine with Hybrid Cluster Architecture ‍

Benchmarks: 10x Faster Speed, ~0% Egress Fee, and Another ~40% Off Latency With Caching

Results: 0.097% latency, 0.01% egress costs (with no caching)

What it Enables: Complex, High-speed Analytics with Zero Egress Fee, and One Policy Governance

What’s Next: Agent-facing Analytics on Hybrid Data Lakehouses

View more articles

Build future-proof data products

Frequently asked questions (FAQs)

Subscribe to our newsletter - Data Engineering ACID

e6data’s Hybrid Data Lakehouse: 10x Faster Queries, Near-Zero Egress, Sub-Second Latency

What We Did Differently: Federated SQL Engine with Hybrid Cluster Architecture ‍

Benchmarks: 10x Faster Speed, ~0% Egress Fee, and Another ~40% Off Latency With Caching

Results: 0.097% latency, 0.01% egress costs (with no caching)

What it Enables: Complex, High-speed Analytics with Zero Egress Fee, and One Policy Governance

What’s Next: Agent-facing Analytics on Hybrid Data Lakehouses

FAQs

FAQs

Related posts

Related posts