Home

Ranking results in a subquery using ROW_NUMBER()

December 16th, 2018

I was recently looking at some data that I wanted to rank. The problem was I did not want to see all the data in the table ranked. I just wanted to see a subset of the data, but with each row ranked among the entire dataset. Let me layout an example to illustrate the situation.

Let us imagine a table that contains the average wait time for attractions at Magic Kingdom in Walt Disney World. Our data might look like this:

Row Attraction Zone WaitTime
1 Astro Orbiter Tomorrowland 20
2 The Barnstormer Fantasyland 30
3 Big Thunder Mountain Railroad Frontierland 40
4 Country Bear Jamboree Frontierland 15
5 Dumbo the Flying Elephant Fantasyland 20
6 Enchanted Tales with Belle Fantasyland 25
7 Haunted Mansion Liberty Square 90
8 Jungle Cruise Adventureland 120
9 Pirate of the Caribbean Adventureland 90
10 Seven Dwarfs Mine Train Fantasyland 110
11 Space Mountain Tomorrowland 55
12 Splash Mountain Frontierland 60

Now let us say we want to rank these attractions by WaitTime. We might execute this query:

SELECT ATTRACTION, ZONE, WAITTIME, ROWNUM
FROM ATTRACTIONS ORDER BY WAITTIME DESC;

We could expect to get these results back:

Row Attraction Zone WaitTime Rownum
1 Jungle Cruise Adventureland 120 8
2 Seven Dwarfs Mine Train Fantasyland 110 10
3 Haunted Mansion Liberty Square 90 7
4 Pirate of the Caribbean Adventureland 90 9
5 Splash Mountain Frontierland 60 12
6 Space Mountain Tomorrowland 55 11
7 Big Thunder Mountain Railroad Frontierland 40 3
8 The Barnstormer Fantasyland 30 2
9 Enchanted Tales with Belle Fantasyland 25 6
10 Astro Orbiter Tomorrowland 20 1
11 Dumbo the Flying Elephant Fantasyland 20 5
12 Country Bear Jamboree Frontierland 15 4

Rownum represents the order that the row was selected in the query, so lets assume that each row was selected in alphabetical order.

Now instead of ranking all the attractions across the entire park lets rank just the ones in Fantasyland. So now our query is:

SELECT ATTRACTION, ZONE, WAITTIME, ROWNUM 
FROM ATTRACTIONS WHERE ZONE = 'Fantasyland' 
ORDER BY WAITTIME DESC;

We could expect to get these results back:

Row Attraction Zone WaitTime Rownum
1 Seven Dwarfs Mine Train Fantasyland 110 10
2 The Barnstormer Fantasyland 30 2
3 Enchanted Tales with Belle Fantasyland 25 6
4 Dumbo the Flying Elephant Fantasyland 20 5

Great! We have the attractions in Fantasyland ranked, but we have lost the information indicated how they rank against other rides in the other zones. This is where the window function ROW_NUMBER() comes in. Briefly, window functions are are SQL methods that calculate a single value for each row inside the “window” or a subset of data.

So if we try this query:

SELECT * FROM 
    SELECT ATTRACTION, ZONE, WAITTIME, ROW_NUMBER() 
        OVER(ORDER BY WAITTIME DESC)
        AS RANK FROM ATTRACTIONS)
WHERE ZONE = 'Fantasyland';

We should get:

Row Attraction Zone WaitTime RANK
1 Seven Dwarfs Mine Train Fantasyland 110 2
2 The Barnstormer Fantasyland 30 8
3 Enchanted Tales with Belle Fantasyland 25 9
4 Dumbo the Flying Elephant Fantasyland 20 11

Now we can see that the Seven Dwarfs Mine Train ride has the second highest wait time in the park. Thanks ROW_NUMBER()!

Window functions are really powerful when you are trying to do data analysis.

Check out this page SQL Window Functions Introduction to read more about them!