Background
With over 68,000 global customers, a leading SaaS provider processes billions of customer interactions monthly across its product suite. At this scale, customer-facing analytics isnʼt just a feature—itʼs mission-critical. Their in-app dashboards serve 10 million+ queries per day while maintaining sub-second query response times and handling thousands of concurrent user sessions.
To best serve its large-scale user base, the company uses e6dataʼs compute engine that delivers high-performance querying without driving up compute costs.
“We have been impressed with e6dataʼs performance, concurrency, and granular scalability on our resource-intensive workloads.ˮ
- Head of Platform Engineering, Global SaaS Provider
Challenge: Zero-Failure SLAs at High Concurrency
The company faced scalability challenges with its existing data infrastructure stack and compute engine.
Their customer-facing dashboards process 10 million+ queries at 60 QPS (at peak) across terabytes of data. These were interactive and required sub-two-second p95 latencies with a 60-second timeout serving as a guardrail for the best user experience. They also wanted data exports where customers could export large volumes of data.
Increased resource utilisation led to a 2x compute cost jump every 18 months, which was extremely unsustainable.
“We have been using our existing compute engine for quite a long time. So, if you throw more compute at it, execution times can surely be faster. But again, the interoperability issue, support for AI & ML use cases- getting them out is quite challenging. This is the reason why a data lakehouse architecture and compute engine work very well.ˮ
- Engineering Lead, Global SaaS Provider
Moreover, the team was on the lookout to migrate to a lakehouse-based architecture in order to support AI/ML use cases, interoperability, and avoid vendor lock-in with full functionality as well. This is when the team started experimenting with the idea of a lakehouse-based compute engine and architecture.
Solution: Lakehouse Compute Engine Built for High QPS
During its transition to a data lakehouse-based architecture, they performed a thorough bake-off across e6data and three other engines. The evaluation criteria were:
1. Performance- Many of their customer-facing dashboards process highly normalized data with frequent updates, with queries that could involve an average of 10+ table joins. Achieving a sub-2-second response time for these queries at scale was a strict requirement.
2. Failure SLAs- Zero failure SLAs were non-negotiable for schedules and unloads, irrespective of query complexity. For web queries, SLAs were measured based on p95 and p99 query response times, with the query timeout specifically capped at 60 seconds to align with their existing solution. The timeout for schedules was set at 5 minutes, and for unload operations, at 20 minutes.
3. Cost-effectiveness- Suffering from doubling compute costs over 18 months, the team was looking for a sustainable solution with the growing user base.
4. Ease of adoption- They wanted a solution that was compatible with most of their existing data infrastructure (data catalog, BI tools, SQL dialects, cloud
providers, etc.)
5. Security- The team required a governed and compliant solution that retains 100% internal control of data, adhering specifically to HIPAA requirements. The
implementation of private link connectivity was also mandated to enhance data security and compliance standards.
Results: 6-Week Soak Test & Beyond
Initially, the company started e6data evaluation on data export workloads, where the cost advantages and stable throughput were immediately apparent. Encouraged by those results, they expanded testing to the customer-facing dashboard use case, where concurrency and low-latency requirements are far more demanding.
They replayed real production traffic (tens of millions of queries) to validate stability and performance. e6data met or exceeded the PoC goals across all workloads.
1. ~60% lower TCO: e6dataʼs ability to operate at scale on the data lakehouse led to $3M in annualized savings once fully deployed.
2. Faster queries at high concurrency: p95 latencies went from 30 seconds in early testing down to 1.5-2 seconds. Even as concurrency approached 60 QPS, the performance and failure SLAs criteria were met.
3. Security & compliance: The team was very happy with the documentation across all compliances and security documents, and the legal approval was a breeze. Meeting key requirements like HIPAA compliance and setting up private link connectivity gave them more confidence in the solution, leading to a strong and secure partnership
“So that was the differentiator when e6data was able to match web (customer-facing) SLAs. As we all know, maintaining copies of data is a hard problem to have in production. You can't have a copy just for exports and a copy for the web dashboards. But the moment e6data was able to meet the web SLAs, it became a very easy decision to make one data lake that serves both web and non-web, using e6data as a query engine.ˮ
— Engineering Manager, Global SaaS Provider