Background
Managing petabytes of transactional and customer data spread across multiple cloud providers and on-premises data centers is hard, especially for global companies that operate across dozens of countries and regulatory jurisdictions. One of the leading global banks, headquartered in the US, which serves 90 million+ customers across 150+ countries, faced a similar compliance challenge. The bank’s data teams need to analyze global datasets (e.g., for risk and regulatory reports) without violating data residency laws or sacrificing speed.
Traditionally, each region ran its own analytics stack, making it hard to get a unified view. To best serve its global operations, the bank piloted with e6data’s hybrid lakehouse platform, which delivers analytics across multi-cloud and multi-region, without driving up egress costs, network costs, or operational complexity.
“We have been impressed with e6data’s locality-aware architecture and its ability to handle our multi-region workloads with low cost, while maintaining the SLAs.”
– VP of Data Engineering, Global Bank
Challenge – Multi-Region Data Governance & Latency Issues
The bank faced an acute challenge: data was siloed in multiple regions (AWS in the US, Azure in Europe, and on-premises storage in Asia) due to data residency and privacy regulations.
- Latency & SLA failures – Cross-region queries were painfully slow (often timing out) and incurred high cloud egress fees. p95 latencies for multi-region joins averaged to ~5 s, thus failing to meet the <1 s SLA requirements demanded by downstream trading desks and dashboards.
- Exploding egress bills – Every month, this team was moving over 50 TB of data across regions at a cost of $0.02/GB.
- Operational overhead – The team was under the heat with 6+ Spark ETL pipelines, 4+ IAM/KMS stacks, and 480 engineering hours/month gone just to keep all the data in sync.
- Governance drift – To add to everyone’s worry, quarterly audits flagged ~30 % of tables as out-of-policy because copies aged out of sync with their Unity catalog.
“Every morning, we shipped TBs of data across regions just so a set of queries could run. We were paying for the copies and the time.”
- Head of Risk Analytics, U.S. Global Bank
The bank needed a solution that could query all their data with low latency, minimize data movement, and enforce one set of governance policies everywhere.
Solution – e6data’s Hybrid Lakehouse Architecture
To address these challenges, the bank deployed e6data’s hybrid lakehouse platform on top of their existing data infrastructure to expand their capabilities across regions and on-prem. This solution introduced three core capabilities:
- Locality-Aware Execution: With e6data, a query that joins U.S. and European datasets executes partial plans within the U.S. cluster and within the EU cluster, then merges results, ensuring that only minimal aggregated data crosses regions. This locality-aware design guaranteed strict control over network egress, performance, and scalability.
- “Zero-Copy” Query Pushdown: e6data connected directly to the bank’s data lakes (Amazon S3 buckets, Azure Blob storage, and on-prem S3-compatible stores) – to query without any ETL. This not only reduced data processing overhead and cost, but also meant the bank’s data engineers did not have to rewrite any SQL or re-architect pipelines.
- Unity Catalog–Based Governance: A critical factor for the bank was governance through Unity Catalog. e6data extended it across hybrid environments for consistent policy enforcement and query execution, regardless of data location. This unified governance model eliminated “drift” in policies between platforms. All actions are logged centrally, satisfying the bank’s audit requirements.
Together, these features allowed the bank to query data as if it were in one lakehouse, without the usual costs of moving or duplicating data.
Results – 6-week pilot and beyond
After a 6-week pilot, the bank saw transformative improvements in both technical performance and operational efficiency. The table below summarizes the key before-and-after metrics: