SQL has come a long way since it was first introduced in the 1970s. This is part 2 of our mini-series on our favourite modern SQL language features.
Read Modern SQL; The Latest and the Greatest SQL Features (part 1). The first part started with the first 5 topics, and this article goes through the final 5.
- Filter results with QUALIFY
- RANGE and GROUPS in windows (not just ROWS!)
- EXCLUDE rows in a window frame
- NULL-safe comparison with IS [NOT] DISTINCT FROM
- Expressive join types (SEMI/ANTI, ASOF, LATERAL)
- Transpose tables with PIVOT and UNPIVOT
- Advanced GROUP BY options (ROLLUP, GROUPING SETS, CUBE)
- Simplified grouping with GROUP BY ALL
- First-class support for compound data types
- Recursive common table expressions
6. Transpose tables with PIVOT and UNPIVOT
A fairly common SQL operation we need to do is transpose some data (convert row values to column headers, and vice versa).
This is particularly important for pivoting long data into a wide table for humans to read more easily, and for unpivoting a wide table into a long table for a database to handle more easily.
It’s very common for a “pivot” to be implemented with CASE statements, such as this query (DuckDB compatible syntax) which converts sales into rows per year and columns per continent:
select
sales_year,
sum(case when continent = 'asia' then revenue end) as asia,
sum(case when continent = 'africa' then revenue end) as africa,
sum(case when continent = 'north_america' then revenue end) as north_america,
sum(case when continent = 'south_america' then revenue end) as south_america,
sum(case when continent = 'antarctica' then revenue end) as antarctica,
sum(case when continent = 'australia' then revenue end) as australia,
sum(case when continent = 'europe' then revenue end) as europe,
from sales
group by sales_year
order by sales_year
Many databases support a PIVOT clause to do this for us! For example, the above can be rewritten as something like the following (different databases have slightly different PIVOT semantics):
select *
from sales
pivot (
sum(revenue)
for continent in (
'asia',
'africa',
'north_america',
'south_america',
'antarctica',
'australia',
'europe'
)
group by sales_year
)
If you’ve used PIVOT before, you know that there are some quirks to it which can make PIVOT queries grow in complexity. In most databases, you also need to explicitly list the values to pivot on as this defines the columns in the result set.
The DuckDB developers thought that the PIVOT clause could do with some improvement, so while DuckDB supports the “usual” PIVOT syntax, there’s also a “simplified” implementation which automatically parses the column values into columns:
pivot sales
on continent
using sum(revenue)
group by sales_year
This is considerably simpler and more readable! DuckDB FTW 😄 If you use DuckDB more, you’ll find that DuckDB’s “friendly SQL” is a great implementation of what modern SQL should look like.
Databases that support PIVOT generally also support UNPIVOT, which is the converse: move values from columns into rows. We won’t go into more details here — check the docs for your database for more information.
The PIVOT (and UNPIVOT) clause is available in many databases, such as:
7. Advanced GROUP BY options (ROLLUP, GROUPING SETS, CUBE)
Your analysts will love these!
The GROUP BY clause is one of the very first clauses you learn with SQL, but it’s far less common to learn about the three options that most databases support with it: ROLLUP, GROUPING SETS, and CUBE.
We know that GROUP BY will calculate the aggregate values for the specified columns. These three options will then calculate various subtotals and grand totals, too!
These options can be specified between GROUP BY and the columns in the grouping:
FROM ...
GROUP BY [ ROLLUP | GROUPING SETS | CUBE ] ( ... )
Some databases also support an alternative syntax, which is to write WITH ROLLUP or WITH CUBE after the list of columns.
PivotTable outputs with ROLLUP
One of the things we love about Excel’s PivotTables are that they automatically come with subtotals and grand totals for the groups, in addition to the low-level aggregates.
The ROLLUP option brings that convenience to SQL: the output will include additional rows which include subtotals for some of the columns. For example:
select
sales_year,
product_id,
sum(amount) as total_sold
from sales
group by rollup (
sales_year,
product_id
)
order by sales_year, product_id
| sales_year | product_id | total_sold |
| 2024 | X | 30388.33 |
| 2024 | Y | 24510.35 |
| 2024 | Z | 26366.21 |
| 2024 | null | 81264.89 |
| 2025 | X | 66347.84 |
| 2025 | Y | 36066.47 |
| 2025 | Z | 39434.05 |
| 2025 | null | 141848.36 |
| null | null | 223113.25 |
The rows with the NULL values are the subtotal rows. Like an Excel PivotTable, this only gives the subtotals “from right to left” — that is, we get the yearly subtotals, but we don’t get the product ID subtotals. This is where GROUPING SETS comes in.
Granular subtotals with GROUPING SETS
Instead of specifying a list of columns in GROUPING SETS, we specify a list of lists of columns. This defines each of the subtotals that we want to see in the result set, with an empty list () used for the grand total.
To get the same output as using ROLLUP in the previous example, we would explicitly list the combinations of columns that we want in the result set:
select
sales_year,
product_id,
sum(amount) as total_sold
from sales
group by grouping sets (
(), -- Grand totals
(sales_year), -- Yearly subtotals
(sales_year, product_id), -- Year by product aggregates
)
order by sales_year, product_id
This makes it very easy to configure precisely which summaries we want. For example, including the product ID subtotals is as easy as including the product ID only as a combination:
select
sales_year,
product_id,
sum(amount) as total_sold
from sales
group by grouping sets (
(),
(sales_year),
(product_id),
(sales_year, product_id),
)
order by sales_year, product_id
| sales_year | product_id | total_sold |
| 2024 | X | 30388.33 |
| 2024 | Y | 24510.35 |
| 2024 | Z | 26366.21 |
| 2024 | null | 81264.89 |
| 2025 | X | 66347.84 |
| 2025 | Y | 36066.47 |
| 2025 | Z | 39434.05 |
| 2025 | null | 141848.36 |
| null | X | 96736.17 |
| null | Y | 60576.82 |
| null | Z | 65800.26 |
| null | null | 223113.25 |
All subtotals with CUBE
If you’ve worked in data analytics, you might be aware that computing the aggregates for all combinations of columns is a technique used by some BI layers to optimise dashboard/reporting performance, since it’s historically been a lot quicker to look up a value from a large table than to compute aggregates on the fly.
This is the whole purpose of OLAP cubes, and the CUBE option for GROUP BY does exactly this!
Adding the product ID subtotal to the previous GROUPING SET example meant that we were computing the subtotals for all combinations of the sales year and product ID. Rather than explicitly list all these combinations in GROUPING SETS, we can just specify the sales year and product ID in the CUBE option and it will derive all combinations for us:
select
sales_year,
product_id,
sum(amount) as total_sold
from sales
group by cube (sales_year, product_id) -- Equivalent to the previous example
order by sales_year, product_id
With the improvement of OLAP capabilities in traditional databases, and the rise of OLAP-oriented data platforms, OLAP cubes aren’t needed as much anymore. Still a cool feature to know about though!
Use GROUPING_ID to distinguish genuine NULL values
You may be wondering how to distinguish “genuine” NULL values from the NULL values introduced by the subtotal rows. Databases that support these options will include a special function to accompany them, usually called GROUPING_ID (or something similar), which returns a number corresponding to the combination of columns the row belongs to:
select
grouping_id(sales_year, product_id) as group_id,
sales_year,
product_id,
sum(amount) as total_sold
from sales
group by cube (sales_year, product_id)
order by sales_year, product_id
| group_id | sales_year | product_id | total_sold |
| 0 | 2024 | X | 30388.33 |
| 0 | 2024 | Y | 24510.35 |
| 0 | 2024 | Z | 26366.21 |
| 1 | 2024 | null | 81264.89 |
| 0 | 2025 | X | 66347.84 |
| 0 | 2025 | Y | 36066.47 |
| 0 | 2025 | Z | 39434.05 |
| 1 | 2025 | null | 141848.36 |
| 2 | null | X | 96736.17 |
| 2 | null | Y | 60576.82 |
| 2 | null | Z | 65800.26 |
| 3 | null | null | 223113.25 |
These GROUP BY options are supported by most databases that support OLAP workloads.
8. Simplified grouping with GROUP BY ALL
While we’re talking about GROUP BY, we should also give a special mention to GROUP BY ALL: instead of listing columns explicitly, this will infer which columns to group by from the corresponding SELECT clause!
For example, using GROUP BY ALL in the following query automatically figures out to include the sales year, product ID, and region ID as groups:
select
sales_year,
product_id,
region_id,
count(*) as sales_volume,
sum(amount) as sales_value,
avg(amount) as average_value
from sales
group by all
This is particularly helpful for development since it’s easy to adjust grouping columns on the fly, but we’d caution against using this in a production query: it’s too easy to accidentally include another column in the SELECT clause and break the grouping!
When it comes to production GROUP BY statements, explicit is better than implicit.
GROUP BY ALL is still fairly new, but most of the OLAP databases support it.
Many of these also support ORDER BY ALL, which orders by all columns in the result set!
9. First-class support for compound data types
Once upon a time, it was discouraged to give relational database columns a value that was compound (composed of several values) — for example, a JSON object.
For operational systems, this has been good advice since relational databases work best when we follow the relational model. However, compound data sometimes finds its way into a relational database anyway, and many databases are starting to offer first-class support for handling them.
Being able to transform data that isn’t only relational is a key aspect of modern SQL, particularly for analytical workloads.
Support for compound data types
The modern cloud data platforms all support several compound data types out of the box. The specifics will depend on the database, but they all have their own concept of:
- Arrays
- Maps
- Structs
There’s also nuance around the specific implementations of these types — for example, Snowflake has a generic VARIANT type, but also structured OBJECT and MAP types for various type constraints and compositions of types.
Functions for compound data types
In addition to these data types, modern databases also offer a suite of functions to operate on these compound data types.
For example, DuckDB has a page dedicated to the functions for each of its compound data types, such as:
- https://duckdb.org/docs/stable/sql/functions/array
- https://duckdb.org/docs/stable/sql/functions/map
- https://duckdb.org/docs/stable/sql/functions/struct.html
DuckDB in particular also has specialised syntax for constructing compound data types, which feel a lot like Python syntax:
select
[1, 2, 3] as list_,
[1, 2, 3]::int[3] as array_,
{'a': 1, 'b': false} as struct_,
map(['m', 'n'], [2, 3]) as map_,
'{"x": 4, "y": 5}'::json as json_,
Most database support some level of compound data types. Check the documentation for the full details of their support:
10. Recursive common table expressions
Recursive CTEs aren’t a new addition to the SQL language, but they’re still a fairly mysterious concept to most people.
The best way to think about recursive CTEs is that they’re SQL’s equivalent of a while loop: until some condition is met, keep running a query.
There are two main reasons to use recursive CTEs:
- Generating rows of data
- Flattening hierarchies (or, more generally, graph traversal)
For example, generating a date axis is easy with a recursive CTE — the following generates a row for every day in 2026 (DuckDB syntax):
with recursive dates(dt) as (
select '2026-01-01'::date
union all
select dt + interval '1 day'
from dates
where dt < '2026-12-31'::date
)
from dates
This isn’t a tutorial for recursive CTEs so we won’t go into details here, but there are plenty of tutorials online if you want to wrap your head around these — such as this one.
DuckDB-specific USING KEY option
If you’ve used recursive CTEs before, then you know how they usually work: each iteration appends the newest result set to all the previous iterations’ result sets.
This is okay for most use-cases, but there are times when we’d rather replace some of the intermediate results, rather than only append to them. This is particularly important when the recursive CTE is chunking through a lot of data, as this can gobble up a lot of space.
DuckDB has a special USING KEY option for recursive CTEs which allows you to define the “primary key” of the CTE. Then, each iteration will merge into the intermediate results, rather than append. This can be super helpful!
More details can be found in the corresponding blog post from the DuckDB devs.
Cool SQL projects powered by recursive CTEs
It may not seem like it, but recursive CTEs unlock a huge amount of potential. Some people have taken this to the extreme by doing things like:
- Generating the Mandelbrot set
- Building a Sudoku solver
- Building a functional DOOM emulator
- Building a functional Tetris game
As far as we know, all major SQL databases support recursive CTEs (with Databricks adding support half-way through 2025).
Fun fact: recursive CTEs make the SQL language Turing complete!
The best data pipelines know when to use these features
At Tasman, we write a lot of SQL — data is our business, and SQL is the language of data.
We don’t use all of these features daily, but knowing which feature to use when a scenario calls for it helps us to build clean and resilient pipelines:
QUALIFYkeeps the filtering of window function results with the window function usageIS DISTINCT FROMsaves a pipeline from failing at 3am because a null value has crept inGROUPING SETSpowers custom aggregates in a few lines, rather than several similar queries- Support for complex data types enables ingesting more sources while keeping transformation logic all together
- Recursive CTEs limit company hierarchy flattening to one join rather than 20
The best part? These databases can optimise queries best when we use their language-specific features, rather than coming up with out own logically similar alternatives — which keeps the runtimes and credit consumption down.
Are any of these features new to you? What modern SQL features would you add?