TLDR: To count related records (and get a zero when there are none) use a LEFT JOIN
. To count related records that match a certain criteria (and get a zero when there are none) use a CASE
statement in the SELECT
fields.
I had a table of school terms and a table of enrolments. Here’s a super simplified example using an INNER JOIN
:
SELECT term.year, enrol.id
FROM term INNER JOIN enrol on term.term_id = enrol.term_id
year | id |
---|---|
… | … |
2021 | 69423 |
2021 | 694170 |
2023 | 69423 |
2023 | 69584 |
2024 | 69456 |
To count the enrolments per year was a simple matter of adding a COUNT
, changing it to a LEFT JOIN
and adding the GROUP BY
:
SELECT term.year, COUNT(enrol.id)
FROM term LEFT JOIN enrol on term.term_id = enrol.term_id
GROUP BY term.year
ORDER BY term.year DESC
year | |
---|---|
2024 | 1 |
2023 | 2 |
2022 | 0 |
2021 | 2 |
… | … |
Note the count of 0 in 2022, that’s what I want! But then when I tried to add a WHERE
clause to only get the enrolments where there was a possible flaw in the data, I stopped getting a zero count for the missing years:
SELECT term.year, COUNT(enrol.id)
FROM term LEFT JOIN enrol on term.term_id = enrol.term_id
WHERE enrol.raw_mark <> enrol.final_mark
GROUP BY term.year
ORDER BY term.year DESC
year | |
---|---|
2023 | 1 |
If I understand correctly, the WHERE
clause is removing the rows (including the rows that only contain a term.year) before they get counted.
The solution (thanks to my DB guru friend TC) is to move the logic from the WHERE
clause up into the SELECT
. Also, now that there are blanks instead of nulls in the second column, we can go back to a regular INNER JOIN
and still get the zero counts:
SELECT term.year,
COUNT(CASE WHEN enrol.raw_mark <> enrol.final_mark THEN 1 END)
FROM term INNER JOIN enrol on term.term_id = enrol.term_id
GROUP BY term.year
ORDER BY term.year DESC
year | |
---|---|
2024 | 0 |
2023 | 1 |
2022 | 0 |
2021 | 0 |
… | … |