Funnel Analysis in SQL

The right and wrong ways to write funnel analysis in SQL.

SQL is the standardized language of relational databases. And Postgres has become an increasingly popular choice for analytics databases, particularly with extensions like Timescale. Funnel-analysis is an essential component of many analytics solutions, yet documentation about it is sparse (and often lacking).

So how can one write simple, efficient queries for SQL analysis?…

What is Funnel Analysis?

Seems like a question worth answering. Imagine we are a store that sells tropical fruits. Suppose we have two customers, Bob and Alice.

Bob buys a papaya just in time for Summer, on June 1st. Alice does the same the next day, on June 2nd.

They are soon very happy with their purchase, especially Bob. So he comes back for more.

Bob then buys a mango on the 4th of July.

Suppose we mapped out their purchases on a table. It might look like this:

NameFruitDate
BobPapaya2025-06-01
AlicePapaya2025-06-02
BobMango2025-07-04

Great! So how does this relate to funnel queries. Well, suppose we wanted to know which customers did two things:

  1. Bought a Papaya
  2. Bought a Mango

We could do that with a query like this:

SELECT 'Papaya' AS step, COUNT(*) FROM events
UNION
SELECT 'Mango' AS step, COUNT(*) FROM events
ORDER BY COUNT DESC;

The result would look something like:

StepCount
Papaya2
Mango1

That seems accurate right?

Well, there’s a problem.

Suppose Bob actually bought a Mango before a Papaya. Our table would look like this:

NameFruitDate
BobMango2025-06-01
AlicePapaya2025-06-02
BobPapaya2025-07-04

Now, we run our same query, and get the same result:

StepCount
Papaya2
Mango1

But the problem is that Bob bought a papaya after buying a Mango. So the results should actually look like this:

StepCount
Papaya1
Mango0

The problem with the above query is that it doesn’t count for order. To perform funnel analysis, we’d want to know who buys a papaya and then mango.

So Now What?

That’s where Funnel Queries kick in. We need to be able to count how many people purchased a papaya and then a mango.

Despite its numerous strengths, SQL databases remain quite bad at performing sequential queries. Let me show you why:

Let’s try one approach—to use a left join.

WITH papaya_purchases AS (
    SELECT 
        name AS customer_name,
        date AS papaya_date
    FROM events
    WHERE fruit = 'Papaya'
),
mango_purchases AS (
    SELECT 
        name AS customer_name,
        date AS mango_date
    FROM events
    WHERE fruit = 'Mango'
)
SELECT 
    p.customer_name,
    p.papaya_date,
    m.mango_date
FROM papaya_purchases p
LEFT JOIN mango_purchases m
    ON p.customer_name = m.customer_name
    AND m.mango_date > p.papaya_date
WHERE m.mango_date IS NOT NULL;

Well, that result is accurate. Our output looks like what it’s supposed to:

StepCount
Papaya2
Mango0

But what if I wanted to keep track of other fruits? Like pineapples, guavas, and coconuts? I’d need to keep joining each table with another, causing what’s known as the space and time complexity problem.

Here’s a table illustrating this:

CustomerPapaya Purchase DateMango Purchase DateComparison Result
Bob2025-06-012025-07-04Valid (Mango > Papaya)
Bob2025-07-042025-06-01Invalid (Mango < Papaya)
Alice2025-06-02NULLNo Mango Purchase

Each papaya purchase is compared with every mango purchase for the same customer to determine if the mango purchase occurred after the papaya purchase; multiple comparisons are made for each customer. That’s fine for our use case, but what happens when we have more customers?

Number of FruitsNumber of Transactions per CustomerTotal Comparisons per Customer
21045
310120
410210
21004,950
3100161,700
4100392,122

As the number of fruits and transactions grows, the number of comparisons increases exponentially, making this strategy unfeasible for large (or medium-sized) datasets.

To avoid making comparisons for each customer and transactions, we can use a combination of window functions.

WITH funnel_counts AS (
    SELECT 
        name AS customer_name,
        MIN(CASE WHEN fruit = 'Papaya' THEN date ELSE NULL END) 
            OVER (
                PARTITION BY name 
                ORDER BY date ASC 
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
            ) AS timestamp_papaya,
        MIN(CASE WHEN fruit = 'Mango' THEN date ELSE NULL END) 
            OVER (
                PARTITION BY name 
                ORDER BY date ASC 
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
            ) AS timestamp_mango
    FROM events
)
SELECT 'Papaya' AS step, 
       COUNT(DISTINCT customer_name) AS customer_count
FROM funnel_counts
WHERE timestamp_papaya IS NOT NULL
UNION ALL
SELECT 'Mango' AS step, 
       COUNT(DISTINCT customer_name) AS customer_count
FROM funnel_counts
WHERE timestamp_mango IS NOT NULL 
  AND timestamp_mango > timestamp_papaya;

Let’s break it down:

The first part of the funnel query is a CTE that selects from the events table. We select the name and alias it as customer_name. Through the MIN function, we select the earliest date a customer bought a fruit, and then alias it as a field with the format timestamp_[fruitname].

WITH funnel_counts AS (
    SELECT 
        name AS customer_name,
        MIN(CASE WHEN fruit = 'Papaya' THEN date ELSE NULL END) 
            OVER (
                PARTITION BY name 
                ORDER BY date ASC 
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
            ) AS timestamp_papaya,
        MIN(CASE WHEN fruit = 'Mango' THEN date ELSE NULL END) 
            OVER (
                PARTITION BY name 
                ORDER BY date ASC 
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
            ) AS timestamp_mango
    FROM events
)

This table looks like this:

customer_nametimestamp_papayatimestamp_mango
Bob2025-07-042025-06-01
Alice2025-06-02NULL

The second part of the query involves counting the items in the funnel_counts CTE. We assign a label for the relevant fruit name (“Papaya” and “Mango”) in the SELECT statement and COUNT how many distinct customer_names completed each.

For each step of the funnel, we add conditions to narrow down the count. The first step requires that the customer bought a papaya; timestamp_papaya is not NULL.

The second step requires that the customer bought a mango, so timestamp_mango is not NULL. However, we also check that timestamp_mango is after timestamp_papaya—the customer must have bought a mango after a papaya. (The fact that a customer bought a papaya is implicitly checked for by the comparison operator).

SELECT 'Papaya' AS step, 
       COUNT(DISTINCT customer_name) AS customer_count
FROM funnel_counts
WHERE timestamp_papaya IS NOT NULL
UNION ALL
SELECT 'Mango' AS step, 
       COUNT(DISTINCT customer_name) AS customer_count
FROM funnel_counts
WHERE timestamp_mango IS NOT NULL 
  AND timestamp_mango > timestamp_papaya;

So our final result displays the amount of times each customer bought a papaya and mango sequentially, which is what a “funnel analysis” is!

StepCount
Papaya2
Mango0

Wrapping It Up

As we’ve shown above, funnel analysis in SQL isn’t too easy. The easiest solutions are not actually funnels, while using a series of joins is impractical for large databases.

The approach using window functions outlined above offers the best path forward for efficiency, but it isn’t the most syntactically concise or pretty either.

Nevertheless, SQL databases offer a number of advantages like high write volume and efficient queries, so it’s a tradeoff that many developers, like me me, are willing to make.

If you’re interested in a cookie-free analytics solution, take a look at my project, Cookieless.