Funnel Analysis in SQL
The right and wrong ways to write funnel analysis in SQL.

SQL is the standardized language of relational databases. And Postgres has become an increasingly popular choice for analytics databases, particularly with extensions like Timescale. Funnel-analysis is an essential component of many analytics solutions, yet documentation about it is sparse (and often lacking).
So how can one write simple, efficient queries for SQL analysis?…
What is Funnel Analysis?
Seems like a question worth answering. Imagine we are a store that sells tropical fruits. Suppose we have two customers, Bob and Alice.
Bob buys a papaya just in time for Summer, on June 1st. Alice does the same the next day, on June 2nd.
They are soon very happy with their purchase, especially Bob. So he comes back for more.
Bob then buys a mango on the 4th of July.
Suppose we mapped out their purchases on a table. It might look like this:
Name | Fruit | Date |
---|---|---|
Bob | Papaya | 2025-06-01 |
Alice | Papaya | 2025-06-02 |
Bob | Mango | 2025-07-04 |
Great! So how does this relate to funnel queries. Well, suppose we wanted to know which customers did two things:
- Bought a Papaya
- Bought a Mango
We could do that with a query like this:
SELECT 'Papaya' AS step, COUNT(*) FROM events
UNION
SELECT 'Mango' AS step, COUNT(*) FROM events
ORDER BY COUNT DESC;
The result would look something like:
Step | Count |
---|---|
Papaya | 2 |
Mango | 1 |
That seems accurate right?
Well, there’s a problem.
Suppose Bob actually bought a Mango before a Papaya. Our table would look like this:
Name | Fruit | Date |
---|---|---|
Bob | Mango | 2025-06-01 |
Alice | Papaya | 2025-06-02 |
Bob | Papaya | 2025-07-04 |
Now, we run our same query, and get the same result:
Step | Count |
---|---|
Papaya | 2 |
Mango | 1 |
But the problem is that Bob bought a papaya after buying a Mango. So the results should actually look like this:
Step | Count |
---|---|
Papaya | 1 |
Mango | 0 |
The problem with the above query is that it doesn’t count for order. To perform funnel analysis, we’d want to know who buys a papaya and then mango.
So Now What?
That’s where Funnel Queries kick in. We need to be able to count how many people purchased a papaya and then a mango.
Despite its numerous strengths, SQL databases remain quite bad at performing sequential queries. Let me show you why:
Let’s try one approach—to use a left join.
WITH papaya_purchases AS (
SELECT
name AS customer_name,
date AS papaya_date
FROM events
WHERE fruit = 'Papaya'
),
mango_purchases AS (
SELECT
name AS customer_name,
date AS mango_date
FROM events
WHERE fruit = 'Mango'
)
SELECT
p.customer_name,
p.papaya_date,
m.mango_date
FROM papaya_purchases p
LEFT JOIN mango_purchases m
ON p.customer_name = m.customer_name
AND m.mango_date > p.papaya_date
WHERE m.mango_date IS NOT NULL;
Well, that result is accurate. Our output looks like what it’s supposed to:
Step | Count |
---|---|
Papaya | 2 |
Mango | 0 |
But what if I wanted to keep track of other fruits? Like pineapples, guavas, and coconuts? I’d need to keep joining each table with another, causing what’s known as the space and time complexity problem.
Here’s a table illustrating this:
Customer | Papaya Purchase Date | Mango Purchase Date | Comparison Result |
---|---|---|---|
Bob | 2025-06-01 | 2025-07-04 | Valid (Mango > Papaya) |
Bob | 2025-07-04 | 2025-06-01 | Invalid (Mango < Papaya) |
Alice | 2025-06-02 | NULL | No Mango Purchase |
Each papaya purchase is compared with every mango purchase for the same customer to determine if the mango purchase occurred after the papaya purchase; multiple comparisons are made for each customer. That’s fine for our use case, but what happens when we have more customers?
Number of Fruits | Number of Transactions per Customer | Total Comparisons per Customer |
---|---|---|
2 | 10 | 45 |
3 | 10 | 120 |
4 | 10 | 210 |
2 | 100 | 4,950 |
3 | 100 | 161,700 |
4 | 100 | 392,122 |
As the number of fruits and transactions grows, the number of comparisons increases exponentially, making this strategy unfeasible for large (or medium-sized) datasets.
To avoid making comparisons for each customer and transactions, we can use a combination of window functions.
WITH funnel_counts AS (
SELECT
name AS customer_name,
MIN(CASE WHEN fruit = 'Papaya' THEN date ELSE NULL END)
OVER (
PARTITION BY name
ORDER BY date ASC
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS timestamp_papaya,
MIN(CASE WHEN fruit = 'Mango' THEN date ELSE NULL END)
OVER (
PARTITION BY name
ORDER BY date ASC
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS timestamp_mango
FROM events
)
SELECT 'Papaya' AS step,
COUNT(DISTINCT customer_name) AS customer_count
FROM funnel_counts
WHERE timestamp_papaya IS NOT NULL
UNION ALL
SELECT 'Mango' AS step,
COUNT(DISTINCT customer_name) AS customer_count
FROM funnel_counts
WHERE timestamp_mango IS NOT NULL
AND timestamp_mango > timestamp_papaya;
Let’s break it down:
The first part of the funnel query is a CTE that selects from the events table. We select the name and alias it as customer_name. Through the MIN
function, we select the earliest date a customer bought a fruit, and then alias it as a field with the format timestamp_[fruitname].
WITH funnel_counts AS (
SELECT
name AS customer_name,
MIN(CASE WHEN fruit = 'Papaya' THEN date ELSE NULL END)
OVER (
PARTITION BY name
ORDER BY date ASC
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS timestamp_papaya,
MIN(CASE WHEN fruit = 'Mango' THEN date ELSE NULL END)
OVER (
PARTITION BY name
ORDER BY date ASC
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS timestamp_mango
FROM events
)
This table looks like this:
customer_name | timestamp_papaya | timestamp_mango |
---|---|---|
Bob | 2025-07-04 | 2025-06-01 |
Alice | 2025-06-02 | NULL |
The second part of the query involves counting the items in the funnel_counts
CTE. We assign a label for the relevant fruit name (“Papaya” and “Mango”) in the SELECT
statement and COUNT how many distinct customer_names completed each.
For each step of the funnel, we add conditions to narrow down the count. The first step requires that the customer bought a papaya; timestamp_papaya
is not NULL
.
The second step requires that the customer bought a mango, so timestamp_mango
is not NULL
. However, we also check that timestamp_mango
is after timestamp_papaya
—the customer must have bought a mango after a papaya. (The fact that a customer bought a papaya is implicitly checked for by the comparison operator).
SELECT 'Papaya' AS step,
COUNT(DISTINCT customer_name) AS customer_count
FROM funnel_counts
WHERE timestamp_papaya IS NOT NULL
UNION ALL
SELECT 'Mango' AS step,
COUNT(DISTINCT customer_name) AS customer_count
FROM funnel_counts
WHERE timestamp_mango IS NOT NULL
AND timestamp_mango > timestamp_papaya;
So our final result displays the amount of times each customer bought a papaya and mango sequentially, which is what a “funnel analysis” is!
Step | Count |
---|---|
Papaya | 2 |
Mango | 0 |
Wrapping It Up
As we’ve shown above, funnel analysis in SQL isn’t too easy. The easiest solutions are not actually funnels, while using a series of joins is impractical for large databases.
The approach using window functions outlined above offers the best path forward for efficiency, but it isn’t the most syntactically concise or pretty either.
Nevertheless, SQL databases offer a number of advantages like high write volume and efficient queries, so it’s a tradeoff that many developers, like me me, are willing to make.
If you’re interested in a cookie-free analytics solution, take a look at my project, Cookieless.