Aggregate functions

Overview

COUNT, SUM, AVG, MIN, and MAX are aggregate functions that can be used with or without the GROUP BY and HAVING clauses. When used without GROUP BY or HAVING, these functions perform the action that they are named after. For example, let’s look at our database from the joins page:

Orders
id description customer_id value

1

Water bottle

1

15.00

2

Key chain

1

7.50

3

Computer

3

2000.00

4

Thumb drive

3

25.00

5

Notebook

4

9.00

6

Shampoo

5.00

7

Paper

4.00

Customers
id first_name last_name email

1

Natalie

Wright

[email protected]

2

Ana

Sousa

[email protected]

3

Ben

Schwartz

[email protected]

4

Chen

Xi

[email protected]

5

Frank

Zhang

[email protected]

6

Tianchi

Liu

[email protected]

7

Jake

Jons

[email protected]

To count the number of rows in a table, for example, the orders table, you could run the following query.

SELECT COUNT(*) FROM orders;
COUNT(*)

7

As you can see, by default the column header for the resulting row is COUNT(*). To rename the column, you can create an alias.

SELECT COUNT(*) AS 'number of results' FROM orders;
number of results

7

COUNT only returns the number of rows returned by the query. If the query only returns 4 rows, 4 will be the result of COUNT.

SELECT COUNT(*) AS 'number of results' FROM orders WHERE value > 10;
number of results

3

In COUNT(*), the * is a wildcard that returns all columns from the selected table. In our examples with the orders table, this is the same as running the following query:

SELECT id, description, customer_id, value FROM orders;

You can, of course, selected a specific column, however, if there are any NULL values for the selected column, they will not be counted.

SELECT COUNT(customer_id) FROM orders;
COUNT(customer_id)

5

The rest of the functions work the same way. For example, SUM returns the sum of all of the values in the selected rows.

SELECT SUM(value) AS 'our sum' FROM orders WHERE value > 10;
our sum

2040.0

MIN, MAX, and AVG return the minimum, maximum, and average, respectively.

SELECT MIN(value) AS minimum, MAX(value) AS maximum, AVG(value) AS avg FROM orders WHERE value > 10;
minimum maximum avg

15.0

2000.0

680.0


GROUP BY

The capabilities of the aggregate functions are not realized until used in conjunction with the GROUP BY clause. The GROUP BY clause goes after the WHERE clause (if WHERE clause is present). GROUP BY forces the SQL engine to calculate the aggregate function after the data has been grouped by one or more columns. For example, both of the following queries work. The first calculates the sum of the value of all orders by the category of the order. The second does the same thing, but excludes all rows where value ⇐ 10 before performing the operations.

SELECT category, SUM(value) FROM orders GROUP BY category;
category|SUM(value)

beauty products|5.0

electronics|2025.0

misc|22.5

office|13.0

SELECT category, SUM(value) FROM orders WHERE value > 10 GROUP BY category;
category SUM(value)

electronics

2025.0

misc

15.0

This is powerful when you want to calculate values for many categories. For example, if we wanted to count the number of orders for each category, we could do so with the following query.

SELECT category, COUNT(*) FROM orders GROUP BY category;
category COUNT(*)

beauty products

1

electronics

2

misc

2

office

2

Or, you could get all of the statistics when grouped by category.

SELECT category, COUNT(*), SUM(value), MIN(value), MAX(value), AVG(value) FROM orders GROUP BY category;
category COUNT(*) SUM(value) MIN(value) MAX(value) AVG(value)

beauty products

1

5.0

5.0

5.0

5.0

electronics

2

2025.0

25.0

2000.0

1012.5

misc

2

22.5

7.5

15.0

11.25

office

2

13.0

4.0

9.0

6.5


HAVING

HAVING adds one more tool to the tool chest. HAVING is similar to the WHERE clause but instead of being applied to every row before grouping, it is applied on the aggregated data. A single query can have both a WHERE clause and a HAVING clause. The WHERE clause will filter the data before aggregation, and the HAVING clause will filter the data after aggregation. The following is an example that tries to demonstrate this slightly subtle difference.

In this example, the SUM of the value is calculated for each category, and afterwards, only results are included where the SUM of the value is greater than 10.

SELECT
	category,
	SUM(value)
FROM
	orders
GROUP BY
	category
HAVING
	value > 10;

The following are also equivalent:

SELECT
	category,
	SUM(value)
FROM
	orders
GROUP BY
	category
HAVING
	SUM(value) > 10;
SELECT
	category,
	SUM(value) AS 's_value'
FROM
	orders
GROUP BY
	category
HAVING
	s_value > 10;
category sum(value)

electronics

2025.0

misc

22.5

In this example, all rows where value ⇐ 10 are excluded, then the remaining rows are grouped by category, and the SUM of value is calculated for the groups.

SELECT
	category,
	SUM(value)
FROM
	orders
WHERE
	value > 10
GROUP BY
	category;
category sum(value)

electronics

2025.0

misc

15.0


Examples

All of the following examples are taken from the chinook database found at /anvil/projects/tdm/data/chinook/Chinook_Sqlite.sqlite


Find the average length (in seconds) of the tracks with genre Pop.

Click to see solution
SELECT
	AVG(Milliseconds / 1000.0) AS avg
FROM
	Track
WHERE
	genreId = 9;
avg

229.0341

Figure out what the longest track (in seconds) is for the "Bossa Nova" genre.

Click to see solution
SELECT
	GenreId
FROM
	Genre
WHERE
	Name = 'Bossa Nova';
GenreId

11

SELECT
	*,
	MAX(Milliseconds / 1000.0) AS Seconds
FROM
	Track
WHERE
	genreId = 11;
TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice Seconds

646

Samba Da Bênção

52

1

11

NA

409965

13490008

0.99

409.965

Get the average price per hour for Bossa Nova music (genreId of 11).

Click to see solution
SELECT
	AVG(UnitPrice / (Milliseconds / 1000.0 / 3600)) AS 'Price per Hour'
FROM
	Track
WHERE
	genreId = 11;
Price per Hour

17.7131539609759

Get the average time (in seconds) for tracks by genre.

Click to see solution
SELECT
	genreId,
	AVG(Milliseconds / 1000.0) AS 'Average seconds per track'
FROM
	Track
GROUP BY
	genreId;
GenreId Average seconds per track

1

283.9100

2

291.7554

3

309.7494

4

234.3538

5

134.6435

6

270.3598

7

232.8593

8

247.1778

9

229.0341

10

244.3709

You can take this 1 step further by getting the genre names. This requires using an inner join, however.

SELECT
	g.Name,
	track_time. 'Average seconds per track'
FROM
	Genre AS g
	INNER JOIN (
		SELECT
			genreId,
			AVG(Milliseconds / 1000.0) AS 'Average seconds per track'
		FROM
			Track
		GROUP BY
			genreId) AS track_time ON g.GenreId = track_time.GenreId
ORDER BY
	track_time. 'Average seconds per track' DESC;
Name Average seconds per track

Sci Fi & Fantasy

2911.7830

Science Fiction

2625.5491

Drama

2575.2838

TV Shows

2145.0410

Comedy

1585.2637

Metal

309.7494

Electronica/Dance

302.9858

Heavy Metal

297.4529

Classical

293.8676

Jazz

291.7554

Figure out the average price per track for each genre.

Click to see solution
SELECT
	genreId,
	AVG(UnitPrice) AS 'Average price per track'
FROM
	Track
GROUP BY
	genreId;
GenreId Average price per track

1

0.99

2

0.99

3

0.99

4

0.99

5

0.99

6

0.99

7

0.99

8

0.99

9

0.99

10

0.99

What is the average number of tracks per album?

Click to see solution
SELECT
	AVG(trackCount)
FROM (
	SELECT
		COUNT(*) AS trackCount
	FROM
		Track
	GROUP BY
		albumId) AS track_count;
AVG(trackCount)

10.0951

Figure out the average number of tracks per album genre.

Click to see solution
SELECT
	genreId,
	AVG(trackCount)
FROM (
	SELECT
		genreId,
		COUNT(*) AS trackCount
	FROM
		Track
	GROUP BY
		albumId) AS track_count
GROUP BY
	genreId;
genreId AVG(trackCount)

1

11.41379

2

10.00000

3

10.90625

4

14.43478

5

12.00000

6

13.85714

7

14.81579

8

15.00000

9

16.00000

10

10.75000

Of course, we can add the names of genres in, using joins.

SELECT
	Name,
	avg_track_count. 'Average Track Count'
FROM
	Genre AS g
	INNER JOIN (
		SELECT
			genreId,
			AVG(trackCount) AS 'Average Track Count'
		FROM (
			SELECT
				genreId,
				COUNT(*) AS trackCount
			FROM
				Track
			GROUP BY
				albumId) AS track_count
		GROUP BY
			genreId) AS avg_track_count ON g.GenreId = avg_track_count.genreId;
Name Average Track Count

Rock

11.41379

Jazz

10.00000

Metal

10.90625

Alternative & Punk

14.43478

Rock And Roll

12.00000

Blues

13.85714

Latin

14.81579

Reggae

15.00000

Pop

16.00000

Soundtrack

10.75000