OverviewTeaching: 30 min
Exercises: 30 minQuestions
How do you aggregate records in SQL?Objectives
to understand how to aggregate records in SQL
Aggregation allows us to combine results by grouping records based on value and calculating combined values in groups.
Let’s go to the articles table and find out how many entries there are. Using the wildcard simply counts the number of records (rows)
SELECT COUNT(*) FROM articles;
We can also find out how many authors have participated in these articles.
SELECT COUNT(*), SUM(author_count) FROM articles;
There are many other aggregate functions included in SQL including
Write a query that returns the total, average, minimum and maximum number of citations of all of the articles.
SELECT SUM(citation_count), AVG(citation_count), MIN(citation_count), MAX(citation_count) FROM articles;
Now, let’s see how many articles were published in each journal. We do this
GROUP BY clause
SELECT issns, COUNT( * ) FROM articles GROUP BY issns;
GROUP BY tells SQL what field or fields we want to use to aggregate the data.
If we want to group by multiple fields, we give
GROUP BY a comma separated list.
Write queries that return:
- How many citations that were counted each month a) in total; b) per journal.
- Average number of citations of each journal in each month.
- Can you modify the above queries combining them into one?
SELECT month, SUM(citation_count) FROM articles GROUP BY month;
SELECT month, issns, SUM(citation_count) FROM articles GROUP BY month, issns;
SELECT month, issns, AVG(citation_count) FROM articles GROUP BY month, issns;
SELECT month, issns, AVG(citation_count), SUM(citation_count) FROM articles GROUP BY month, issns;
In the previous lesson, we have seen the keywords
WHERE, allowing to
filter the results according to some criteria. SQL offers a mechanism to
filter the results based on aggregate functions, through the
For example, we can adapt the last request we wrote to only return information about journals with 10 or more published articles:
SELECT issns, COUNT( * ) FROM articles GROUP BY issns HAVING COUNT( * ) >= 10;
HAVING keyword works exactly like the
WHERE keyword, but uses
aggregate functions instead of database fields.
If you use
AS in your query to rename a column,
HAVING can use this
information to make the query more readable. For example, in the above
query, we can call the
COUNT(*) by another name, like
occurrences. This can be written this way:
SELECT issns, COUNT( * ) AS occurrences FROM articles GROUP BY issns HAVING occurrences >= 10;
Note that in both queries,
HAVING comes after
GROUP BY. One way to
think about this is: the data are retrieved (
SELECT), can be filtered
WHERE), then joined in groups (
GROUP BY); finally, we only select some
of these groups (
Write a query that returns, from the
articlestable, the average number of
citationsfor each journal, only for the journals with 5 or more citations on average.
SELECT issns, AVG(citation_count) FROM articles GROUP BY issns HAVING AVG(citation_count)>=5;
Write a query that returns, from the
articlestable: the total, average, min and max number of citations for each journal. Can you modify it so that it outputs only journals with more than 5 citations in average?
SELECT issns, SUM(citation_count), AVG(citation_count), MIN(citation_count), MAX(citation_count) FROM articles GROUP BY issns HAVING AVG(citation_count)>5;
Ordering aggregated results.
We can order the results of our aggregation by a specific column, including the aggregated column. Let’s count the number of articles published in each journal, ordered by the count
SELECT issns, COUNT( * ) FROM articles GROUP BY issns ORDER BY COUNT( * ) DESC;
Saving queries for future use
It is not uncommon to repeat the same operation more than once, for example for monitoring or reporting purposes. SQL comes with a very powerful mechanism to do this: views. Views are queries saved in the database. You query it as a (virtual) table, populated every time you query it.
Creating a view from a query requires to add
CREATE VIEW viewname AS
before the query itself. For example, if we want to save the query giving
the number of journals in a view, we can write
CREATE VIEW journal_counts AS SELECT issns, COUNT(*) FROM articles GROUP BY issns;
Now, we will be able to access these results with a much shorter notation:
SELECT * FROM journal_counts;
Assuming we do not need this view anymore, we can remove it from the database almost as we would a table:
DROP VIEW journal_counts;
You can also add a view using Create View in the View menu and see the results in the Views tab just like a table
Write a query that returns the number of articles published in each journal on each month, sorted from most popular journal to the ones with least publications each month starting from the most recent records. Save this query as a
CREATE VIEW journal_counts AS SELECT COUNT(*), month FROM articles GROUP BY issns, month ORDER BY count(*) desc, month desc;
SQL is ideal for aggregating database records