Engineering

Procedural Power, Set-speed: Inside e6data’s Froid-inspired UDF Engine

A business rule wrapped in a multi-statement UDF

Multi-statement UDF

Want to see e6data in action?

Learn how data teams power their workloads.

Get Demo
Get Demo

Picture the full expressive power of procedural code (variables, loops, even branching logic) living right inside your SQL, but without the notorious performance tax that usually comes with it. SQL User-Defined Functions (UDFs) were invented to give analysts that superpower, yet they often backfire when every call triggers a slow, row-by-row detour away from the query optimizer. At e6data, we decided that the trade-off was unacceptable, so we modified the rules. Inspired by Microsoft’s Froid framework and the wave of research that followed, our engine automatically inlines UDFs, fusing their logic back into the set-oriented query plan where it belongs. The result? Cleaner code, richer abstraction, and orders-of-magnitude faster execution, without asking developers to rewrite a single line. This post flags off a series where we’ll simplify UDFs, expose the bottlenecks, and then dive deep into the algorithms that enable e6data to make “procedural” feel as fast as “pure SQL.”

Why do UDFs Matter?

A SQL User-Defined Function, or UDF, is exactly what its name suggests: a function that you write and register inside the database to perform custom logic. Every major relational engine has supported UDFs for years, and cloud platforms such as Azure SQL Database log tens of millions of UDF definitions and billions of daily invocations. Clearly, developers love them, but there’s more to the story.

Why do Developers Reach for UDFs?

Benefit What it buys you Typical examples
Modularity & Reuse Write a rule once, call it everywhere. DRY for SQL. currency_convert(), apply_discount()
Imperative Expressiveness Loops, conditionals, nested calls; perfect for rules that don’t fit a single SELECT. tiered-tax calculations, complex eligibility checks
Maintainability & Readability A named function ( calculate_tax(…) ) beats a 200-line inline expression. Cleaner code reviews, easier onboarding
Data Proximity Logic runs inside the engine; no ETL, no external app server. Real-time scoring, in-database cleansing


With benefits like these, UDFs have become a staple of real-world SQL development. Seasoned DBAs, however, know the catch: performance pain. To see why, we first need to distinguish the two broad flavors of UDFs.

What are the Differences between Inline and Multi-Statement UDFs?

Despite their elegance and convenience, UDFs aren’t all built alike. The moment a developer moves from a tidy one-liner to a block with variables and loops, the database stops treating the function as part of the set-based plan and starts executing it row-by-row. That line in the sand, between inline and multi-statement UDFs, is exactly where performance fortunes diverge. Before we dive into how e6data erases that penalty, let’s clarify what distinguishes these two flavors and why the choice can make or break a query’s speed.

Type What it Looks Like How the Engine Usually Treats it Performance Profile
Inline (single-statement) A one-liner that returns the result of a single query or expression. Expanded (“macro-inlined”) directly into the outer query. Fast: the optimizer sees and optimizes it like normal SQL.
Multi-Statement A BEGIN … END block with variables, multiple queries, loops, or nested function calls. Executed as a separate routine, often once per row. Slow: imperative control flow defeats set-based optimization.


If your function is a simple expression, life is good. The database can slide it into the main plan and keep everything set-oriented. The moment you add loops, branches, or multiple statements, you cross into multi-statement territory. That’s where the pain starts.

Why do Traditional UDFs Slow You Down?

  1. Context-Switch Overhead
    Every invocation bounces the engine from set-oriented execution to a procedural mini-interpreter and back. Thousands of hops = thousands of stalls.
  2. Row-by-Row (RBAR) Execution
    A multi-statement UDF inside a SELECT list often fires once per output row. A 100 K-row result becomes 100 K miniature procedures, usually single-threaded.
  3. Optimizer Blindness
    The planner can’t peek inside the function, so it can’t reorder joins, push predicates, or parallelize. Lacking estimates, it often chooses the safest and slowest plan: evaluate the UDF verbatim for every row.

In combination, these flaws can throttle queries by 10x, 100x, even 1000x compared with set-based SQL.

Let’s understand how UDF works by taking an example business rule wrapped in a multi-statement UDF

CREATE OR REPLACE FUNCTION calc_progressive_tax(
        income        INTEGER,          -- taxable income
        slab_limits   INTEGER ARRAY,    -- upper bound of each slab
        slab_rates    NUMERIC(5,2) ARRAY -- rate (%) for each slab
)
RETURNS NUMERIC(18,2)
LANGUAGE SQL
AS
$$
DECLARE
    idx        INTEGER;
    remaining  INTEGER;
    bracketAmt INTEGER;
    tax        NUMERIC(18,2);
BEGIN
    -- Guard‑rails
    IF (income IS NULL OR income <= 0) THEN
        RETURN 0;
    END IF;
    remaining := income;
    tax       := 0;
    idx       := 1;
    -- Walk the slabs one‑by‑one
    WHILE (idx <= CARDINALITY(slab_limits))
        DO
            BEGIN
                -- all income consumed → early exit
                IF (remaining <= 0) THEN
                    BREAK;
                END IF;
                -- amount taxed in this slab
                bracketAmt := LEAST(remaining, slab_limits[idx]);
                -- add slab’s contribution
                tax := tax + bracketAmt * slab_rates[idx] / 100;
                -- move to next slab
                remaining := remaining - bracketAmt;
                idx       := idx + 1;
            END;
        END WHILE;
    RETURN tax;
END
$$


A seemingly innocent payroll query that calls the UDF:

SELECT
    e.emp_id,
    e.name,
    calc_progressive_tax(e.salary,
                         ARRAY[250000,500000,1000000],
                         ARRAY[5,20,30]) AS tax_due
FROM employee AS e
WHERE e.fiscal_year = 2025;


Under the hood, vanilla lakehouse engines still execute it like this, and the pain is familiar

Step Phase What happens Context
1 Scan Employee Read first row (emp_id = 42). Set‑oriented
2 Context switch to UDF Jump into procedure interpreter, allocate locals. Procedural / row‑at‑a‑time
3 Loop over slabs Run 1–3 assignment statements for each slab. Procedural
4 Return tax Single value bubbled back to outer query. Procedural
5 Switch back Resume main query, output row. Set‑oriented
6 Repeat 1 - 5 …for every remaining employee row.


The result? 50,000 employees = 50,000 miniature procedures plus tens of thousands of context switches. This is a textbook RBAR collapse.
As with the original loyalty‑tier example, the optimizer is blind to slab predicates, can’t push filters, and usually falls back to single‑threaded nested loops, i.e., the classic “avoid procedural UDFs in critical paths” warning.

Together, these issues can turn a neat, reusable abstraction into a 10x, 100x, or even 1000x slowdown. That’s why many DBAs warn, “Avoid UDFs in critical paths.”

Databricks and Snowflake do expose “SQL UDFs,” but only in the most limited, macro-like form: the body must be a single expression or query (no variables, no loops, no multi-statement logic) - CREATE FUNCTION (SQL and Python) | Databricks Documentation and Scalar SQL UDFs | Snowflake Documentation. As a result, developers quickly bump into those guardrails and fall back to external Python/Java UDFs or stored procedures, forfeiting optimizer insight and lakehouse performance.

e6data takes a different route: We are among the first lakehouse engines to bring full Froid-style inlining of multi-statement SQL UDFs into the platform. Instead of forcing users to shrink their logic (or move it out of SQL), we melt the procedural wrapper into a single relational plan the optimizer can chew on. So, no rewrites, no compromises.

For years, that warning felt like the end of the conversation: either abandon multi-statement UDFs or accept the drag they impose. But in database research circles, a quiet revolution was brewing. Starting with Microsoft’s landmark Froid paper, engineers showed that the real fix isn’t to delete UDFs, but to dissolve the procedural shell and fold its logic back into the relational plan. In other words, let the optimizer see everything, then do what it does best. We embraced and extended that idea inside our e6data engine. Instead of telling users to rewrite code, we rewrite the execution model: our inliner automatically transforms the example function above into a single, set-oriented expression the optimizer can streamline and parallelize at will.

WITH RECURSIVE "run" ("state", "prev_state", "rec", "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__", "return_value", "iterations") AS (
            SELECT 'entry', NULL, TRUE, "income", "slab_limits", "slab_rates", NULL, NULL, NULL, NULL, NULL, NULL, 0
            UNION ALL
            SELECT *
            FROM (SELECT CASE WHEN "income" IS NULL OR "income" <= 0 THEN 'IF_MERGE_0' ELSE 'IF_TRUE_0' END, 'entry', TRUE, "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__", NULL, "iterations" + 1
                        FROM "run"
                        WHERE "state" = 'entry'
                        UNION ALL
                        SELECT 'RETURN_BLOCK_0', 'IF_TRUE_0', TRUE, "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__#3", NULL, "iterations" + 1
                        FROM "run",
                            LATERAL (SELECT 0 AS "__RETURN_VAR__#3")
                        WHERE "state" = 'IF_TRUE_0'
                        UNION ALL
                        SELECT 'WHILE_CONDITION_0', 'IF_MERGE_0', TRUE, "income", "slab_limits", "slab_rates", "idx#2", "remaining#2", "bracketAmt", "tax#2", "__RETURN_VAR__", NULL, "iterations" + 1
                        FROM "run",
                            LATERAL (SELECT "income" AS "remaining#2"),
                            LATERAL (SELECT 0 AS "tax#2"),
                            LATERAL (SELECT 1 AS "idx#2")
                        WHERE "state" = 'IF_MERGE_0'
                        UNION ALL
                        SELECT 'RETURN_BLOCK_0', 'RETURN_BLOCK_0', FALSE, "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__", "__RETURN_VAR__", "iterations" + 1
                        FROM "run"
                        WHERE "state" = 'RETURN_BLOCK_0'
                        UNION ALL
                        SELECT CASE WHEN "idx" <= CARDINALITY("slab_limits") THEN 'WHILE_MERGE_0' ELSE 'WHILE_BODY_0' END, 'WHILE_CONDITION_0', TRUE, "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__", NULL, "iterations" + 1
                        FROM "run"
                        WHERE "state" = 'WHILE_CONDITION_0'
                        UNION ALL
                        SELECT CASE WHEN "remaining" <= 0 THEN 'IF_MERGE_1' ELSE 'WHILE_MERGE_0' END, 'WHILE_BODY_0', TRUE, "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__", NULL, "iterations" + 1
                        FROM "run"
                        WHERE "state" = 'WHILE_BODY_0'
                        UNION ALL
                        SELECT 'RETURN_BLOCK_0', 'WHILE_MERGE_0', TRUE, "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__#2", NULL, "iterations" + 1
                        FROM "run",
                            LATERAL (SELECT "tax" AS "__RETURN_VAR__#2")
                        WHERE "state" = 'WHILE_MERGE_0'
                        UNION ALL
                        SELECT 'WHILE_CONDITION_0', 'IF_MERGE_1', TRUE, "income", "slab_limits", "slab_rates", "idx#4", "remaining#4", "bracketAmt#3", "tax#4", "__RETURN_VAR__", NULL, "iterations" + 1
                        FROM "run",
                            LATERAL (SELECT LEAST("remaining", "slab_limits"["idx"]) AS "bracketAmt#3"),
                            LATERAL (SELECT "tax" + "bracketAmt#3" * "slab_rates"["idx"] / 100 AS "tax#4"),
                            LATERAL (SELECT "remaining" - "bracketAmt#3" AS "remaining#4"),
                            LATERAL (SELECT "idx" + 1 AS "idx#4")
                        WHERE "state" = 'IF_MERGE_1') AS "transitions"
            WHERE "iterations" < 100 AND "rec" = TRUE) 
  SELECT "return_value"
        FROM "run"
        WHERE "rec" = FALSE

Now the optimizer “sees” everything: parallelize the join, and eliminate every context switch. Execution time plummets, while developers keep writing clear, modular UDFs.

In our next posts, we’ll walk through how research breakthroughs like Microsoft’s Froid taught the industry to inline multi-statement UDFs (transforming them back into set-oriented plans) and how we built that capability natively into our e6data engine.

Stay tuned! The deep dive into the inlining algorithm is where the real fun begins!

Resource

Froid: Optimization of Imperative Programs in a Relational Database

Share on

Build future-proof data products

Try e6data for your heavy workloads!

Get Started for Free
Get Started for Free
Frequently asked questions (FAQs)
How do I integrate e6data with my existing data infrastructure?

We are universally interoperable and open-source friendly. We can integrate across any object store, table format, data catalog, governance tools, BI tools, and other data applications.

How does billing work?

We use a usage-based pricing model based on vCPU consumption. Your billing is determined by the number of vCPUs used, ensuring you only pay for the compute power you actually consume.

What kind of file formats does e6data support?

We support all types of file formats, like Parquet, ORC, JSON, CSV, AVRO, and others.

What kind of performance improvements can I expect with e6data?

e6data promises a 5 to 10 times faster querying speed across any concurrency at over 50% lower total cost of ownership across the workloads as compared to any compute engine in the market.

What kinds of deployment models are available at e6data ?

We support serverless and in-VPC deployment models. 

How does e6data handle data governance rules?

We can integrate with your existing governance tool, and also have an in-house offering for data governance, access control, and security.

Table of contents:
Listen to the full podcast
Apple Podcasts
Spotify

Subscribe to our newsletter - Data Engineering ACID

Get 3 weekly stories around data engineering at scale that the e6data team is reading.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Share this article

Procedural Power, Set-speed: Inside e6data’s Froid-inspired UDF Engine

July 18, 2025
/
Sweta Singh
Ruchir Raj
Engineering

Picture the full expressive power of procedural code (variables, loops, even branching logic) living right inside your SQL, but without the notorious performance tax that usually comes with it. SQL User-Defined Functions (UDFs) were invented to give analysts that superpower, yet they often backfire when every call triggers a slow, row-by-row detour away from the query optimizer. At e6data, we decided that the trade-off was unacceptable, so we modified the rules. Inspired by Microsoft’s Froid framework and the wave of research that followed, our engine automatically inlines UDFs, fusing their logic back into the set-oriented query plan where it belongs. The result? Cleaner code, richer abstraction, and orders-of-magnitude faster execution, without asking developers to rewrite a single line. This post flags off a series where we’ll simplify UDFs, expose the bottlenecks, and then dive deep into the algorithms that enable e6data to make “procedural” feel as fast as “pure SQL.”

Why do UDFs Matter?

A SQL User-Defined Function, or UDF, is exactly what its name suggests: a function that you write and register inside the database to perform custom logic. Every major relational engine has supported UDFs for years, and cloud platforms such as Azure SQL Database log tens of millions of UDF definitions and billions of daily invocations. Clearly, developers love them, but there’s more to the story.

Why do Developers Reach for UDFs?

Benefit What it buys you Typical examples
Modularity & Reuse Write a rule once, call it everywhere. DRY for SQL. currency_convert(), apply_discount()
Imperative Expressiveness Loops, conditionals, nested calls; perfect for rules that don’t fit a single SELECT. tiered-tax calculations, complex eligibility checks
Maintainability & Readability A named function ( calculate_tax(…) ) beats a 200-line inline expression. Cleaner code reviews, easier onboarding
Data Proximity Logic runs inside the engine; no ETL, no external app server. Real-time scoring, in-database cleansing


With benefits like these, UDFs have become a staple of real-world SQL development. Seasoned DBAs, however, know the catch: performance pain. To see why, we first need to distinguish the two broad flavors of UDFs.

What are the Differences between Inline and Multi-Statement UDFs?

Despite their elegance and convenience, UDFs aren’t all built alike. The moment a developer moves from a tidy one-liner to a block with variables and loops, the database stops treating the function as part of the set-based plan and starts executing it row-by-row. That line in the sand, between inline and multi-statement UDFs, is exactly where performance fortunes diverge. Before we dive into how e6data erases that penalty, let’s clarify what distinguishes these two flavors and why the choice can make or break a query’s speed.

Type What it Looks Like How the Engine Usually Treats it Performance Profile
Inline (single-statement) A one-liner that returns the result of a single query or expression. Expanded (“macro-inlined”) directly into the outer query. Fast: the optimizer sees and optimizes it like normal SQL.
Multi-Statement A BEGIN … END block with variables, multiple queries, loops, or nested function calls. Executed as a separate routine, often once per row. Slow: imperative control flow defeats set-based optimization.


If your function is a simple expression, life is good. The database can slide it into the main plan and keep everything set-oriented. The moment you add loops, branches, or multiple statements, you cross into multi-statement territory. That’s where the pain starts.

Why do Traditional UDFs Slow You Down?

  1. Context-Switch Overhead
    Every invocation bounces the engine from set-oriented execution to a procedural mini-interpreter and back. Thousands of hops = thousands of stalls.
  2. Row-by-Row (RBAR) Execution
    A multi-statement UDF inside a SELECT list often fires once per output row. A 100 K-row result becomes 100 K miniature procedures, usually single-threaded.
  3. Optimizer Blindness
    The planner can’t peek inside the function, so it can’t reorder joins, push predicates, or parallelize. Lacking estimates, it often chooses the safest and slowest plan: evaluate the UDF verbatim for every row.

In combination, these flaws can throttle queries by 10x, 100x, even 1000x compared with set-based SQL.

Let’s understand how UDF works by taking an example business rule wrapped in a multi-statement UDF

CREATE OR REPLACE FUNCTION calc_progressive_tax(
        income        INTEGER,          -- taxable income
        slab_limits   INTEGER ARRAY,    -- upper bound of each slab
        slab_rates    NUMERIC(5,2) ARRAY -- rate (%) for each slab
)
RETURNS NUMERIC(18,2)
LANGUAGE SQL
AS
$$
DECLARE
    idx        INTEGER;
    remaining  INTEGER;
    bracketAmt INTEGER;
    tax        NUMERIC(18,2);
BEGIN
    -- Guard‑rails
    IF (income IS NULL OR income <= 0) THEN
        RETURN 0;
    END IF;
    remaining := income;
    tax       := 0;
    idx       := 1;
    -- Walk the slabs one‑by‑one
    WHILE (idx <= CARDINALITY(slab_limits))
        DO
            BEGIN
                -- all income consumed → early exit
                IF (remaining <= 0) THEN
                    BREAK;
                END IF;
                -- amount taxed in this slab
                bracketAmt := LEAST(remaining, slab_limits[idx]);
                -- add slab’s contribution
                tax := tax + bracketAmt * slab_rates[idx] / 100;
                -- move to next slab
                remaining := remaining - bracketAmt;
                idx       := idx + 1;
            END;
        END WHILE;
    RETURN tax;
END
$$


A seemingly innocent payroll query that calls the UDF:

SELECT
    e.emp_id,
    e.name,
    calc_progressive_tax(e.salary,
                         ARRAY[250000,500000,1000000],
                         ARRAY[5,20,30]) AS tax_due
FROM employee AS e
WHERE e.fiscal_year = 2025;


Under the hood, vanilla lakehouse engines still execute it like this, and the pain is familiar

Step Phase What happens Context
1 Scan Employee Read first row (emp_id = 42). Set‑oriented
2 Context switch to UDF Jump into procedure interpreter, allocate locals. Procedural / row‑at‑a‑time
3 Loop over slabs Run 1–3 assignment statements for each slab. Procedural
4 Return tax Single value bubbled back to outer query. Procedural
5 Switch back Resume main query, output row. Set‑oriented
6 Repeat 1 - 5 …for every remaining employee row.


The result? 50,000 employees = 50,000 miniature procedures plus tens of thousands of context switches. This is a textbook RBAR collapse.
As with the original loyalty‑tier example, the optimizer is blind to slab predicates, can’t push filters, and usually falls back to single‑threaded nested loops, i.e., the classic “avoid procedural UDFs in critical paths” warning.

Together, these issues can turn a neat, reusable abstraction into a 10x, 100x, or even 1000x slowdown. That’s why many DBAs warn, “Avoid UDFs in critical paths.”

Databricks and Snowflake do expose “SQL UDFs,” but only in the most limited, macro-like form: the body must be a single expression or query (no variables, no loops, no multi-statement logic) - CREATE FUNCTION (SQL and Python) | Databricks Documentation and Scalar SQL UDFs | Snowflake Documentation. As a result, developers quickly bump into those guardrails and fall back to external Python/Java UDFs or stored procedures, forfeiting optimizer insight and lakehouse performance.

e6data takes a different route: We are among the first lakehouse engines to bring full Froid-style inlining of multi-statement SQL UDFs into the platform. Instead of forcing users to shrink their logic (or move it out of SQL), we melt the procedural wrapper into a single relational plan the optimizer can chew on. So, no rewrites, no compromises.

For years, that warning felt like the end of the conversation: either abandon multi-statement UDFs or accept the drag they impose. But in database research circles, a quiet revolution was brewing. Starting with Microsoft’s landmark Froid paper, engineers showed that the real fix isn’t to delete UDFs, but to dissolve the procedural shell and fold its logic back into the relational plan. In other words, let the optimizer see everything, then do what it does best. We embraced and extended that idea inside our e6data engine. Instead of telling users to rewrite code, we rewrite the execution model: our inliner automatically transforms the example function above into a single, set-oriented expression the optimizer can streamline and parallelize at will.

WITH RECURSIVE "run" ("state", "prev_state", "rec", "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__", "return_value", "iterations") AS (
            SELECT 'entry', NULL, TRUE, "income", "slab_limits", "slab_rates", NULL, NULL, NULL, NULL, NULL, NULL, 0
            UNION ALL
            SELECT *
            FROM (SELECT CASE WHEN "income" IS NULL OR "income" <= 0 THEN 'IF_MERGE_0' ELSE 'IF_TRUE_0' END, 'entry', TRUE, "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__", NULL, "iterations" + 1
                        FROM "run"
                        WHERE "state" = 'entry'
                        UNION ALL
                        SELECT 'RETURN_BLOCK_0', 'IF_TRUE_0', TRUE, "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__#3", NULL, "iterations" + 1
                        FROM "run",
                            LATERAL (SELECT 0 AS "__RETURN_VAR__#3")
                        WHERE "state" = 'IF_TRUE_0'
                        UNION ALL
                        SELECT 'WHILE_CONDITION_0', 'IF_MERGE_0', TRUE, "income", "slab_limits", "slab_rates", "idx#2", "remaining#2", "bracketAmt", "tax#2", "__RETURN_VAR__", NULL, "iterations" + 1
                        FROM "run",
                            LATERAL (SELECT "income" AS "remaining#2"),
                            LATERAL (SELECT 0 AS "tax#2"),
                            LATERAL (SELECT 1 AS "idx#2")
                        WHERE "state" = 'IF_MERGE_0'
                        UNION ALL
                        SELECT 'RETURN_BLOCK_0', 'RETURN_BLOCK_0', FALSE, "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__", "__RETURN_VAR__", "iterations" + 1
                        FROM "run"
                        WHERE "state" = 'RETURN_BLOCK_0'
                        UNION ALL
                        SELECT CASE WHEN "idx" <= CARDINALITY("slab_limits") THEN 'WHILE_MERGE_0' ELSE 'WHILE_BODY_0' END, 'WHILE_CONDITION_0', TRUE, "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__", NULL, "iterations" + 1
                        FROM "run"
                        WHERE "state" = 'WHILE_CONDITION_0'
                        UNION ALL
                        SELECT CASE WHEN "remaining" <= 0 THEN 'IF_MERGE_1' ELSE 'WHILE_MERGE_0' END, 'WHILE_BODY_0', TRUE, "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__", NULL, "iterations" + 1
                        FROM "run"
                        WHERE "state" = 'WHILE_BODY_0'
                        UNION ALL
                        SELECT 'RETURN_BLOCK_0', 'WHILE_MERGE_0', TRUE, "income", "slab_limits", "slab_rates", "idx", "remaining", "bracketAmt", "tax", "__RETURN_VAR__#2", NULL, "iterations" + 1
                        FROM "run",
                            LATERAL (SELECT "tax" AS "__RETURN_VAR__#2")
                        WHERE "state" = 'WHILE_MERGE_0'
                        UNION ALL
                        SELECT 'WHILE_CONDITION_0', 'IF_MERGE_1', TRUE, "income", "slab_limits", "slab_rates", "idx#4", "remaining#4", "bracketAmt#3", "tax#4", "__RETURN_VAR__", NULL, "iterations" + 1
                        FROM "run",
                            LATERAL (SELECT LEAST("remaining", "slab_limits"["idx"]) AS "bracketAmt#3"),
                            LATERAL (SELECT "tax" + "bracketAmt#3" * "slab_rates"["idx"] / 100 AS "tax#4"),
                            LATERAL (SELECT "remaining" - "bracketAmt#3" AS "remaining#4"),
                            LATERAL (SELECT "idx" + 1 AS "idx#4")
                        WHERE "state" = 'IF_MERGE_1') AS "transitions"
            WHERE "iterations" < 100 AND "rec" = TRUE) 
  SELECT "return_value"
        FROM "run"
        WHERE "rec" = FALSE

Now the optimizer “sees” everything: parallelize the join, and eliminate every context switch. Execution time plummets, while developers keep writing clear, modular UDFs.

In our next posts, we’ll walk through how research breakthroughs like Microsoft’s Froid taught the industry to inline multi-statement UDFs (transforming them back into set-oriented plans) and how we built that capability natively into our e6data engine.

Stay tuned! The deep dive into the inlining algorithm is where the real fun begins!

Resource

Froid: Optimization of Imperative Programs in a Relational Database

Listen to the full podcast
Share this article

FAQs

How does e6data reduce Snowflake compute costs without slowing queries?
e6data is powered by the industry’s only atomic architecture. Rather than scaling in step jumps (L x 1 -> L x 2), e6data scales atomically, by as little as 1 vCPU. In production with widely varying loads, this translates to > 60% TCO savings.
Do I have to move out of Snowflake?
No, we fit right into your existing data architecture across cloud, on-prem, catalog, governance, table formats, BI tools, and more.

Does e6data speed up Iceberg on Snowflake?
Yes, depending on your workload, you can see anywhere up to 10x faster speeds through our native and advanced Iceberg support. 

Snowflake supports Iceberg. But how do you get data there in real time?
Our real-time streaming ingest streams Kafka or SDK data straight into Iceberg—no Flink. Landing within 60 seconds and auto-registering each snapshot for instant querying.

How long does it take to deploy e6data alongside Snowflake?
Sign up the form and get your instance started. You can deploy it to any cloud, region, deployment model, without copying or migrating any data from Snowflake.

FAQs

Why do traditional multi-statement SQL UDFs slow queries down?
Each call jumps from set-oriented execution into a procedural interpreter, runs row-by-row (RBAR) and hides its logic from the optimizer, blocking join reordering, predicate pushdown and parallelism; often making queries 10x-1000x slower.
What is e6data’s solution to the UDF performance problem?
e6data automatically inlines multi-statement UDFs, dissolving the procedural wrapper into a single relational expression that the optimizer treats like ordinary set-based SQL, wiping out context switches and RBAR execution.
Do developers have to rewrite their existing UDFs to benefit?
No. e6data rewrites the execution model, not your code, so existing multi-statement UDFs run faster without any changes.
How does e6data’s inliner change what the optimizer can “see”?
After transformation, the UDF body appears as one recursive CTE inside the outer query, letting the planner reorder, parallelize and push predicates just like regular SQL.

Related posts

View All Posts

Related posts

View All
Engineering
This is some text inside of a div block.
August 8, 2025
/
Adishesh Kishore
Embedding Essentials: From Cosine Similarity to SQL with Vectors
Adishesh Kishore
August 8, 2025
View All
Product
This is some text inside of a div block.
August 5, 2025
/
e6data Team
e6data’s Hybrid Data Lakehouse: 10x Faster Queries, Near-Zero Egress, Sub-Second Latency
e6data Team
August 5, 2025
View All
Engineering
This is some text inside of a div block.
July 25, 2025
/
Rajath Gowda
Building a Modern Data Pipeline in Snowflake: From Snowpipe to Managed Iceberg Tables with Sync Checks
Rajath Gowda
July 25, 2025
View All Posts