Practice Topics
- Aggregation with GROUP BY
- Filtering Using HAVING
- Basic Joins
Aggregation with GROUP BY
Overview
Aggregation
is when you collapse a detailed dataset into a summarized one. Picture an Excel file with a pivot table tab summarizing a more detailed data tab. Aggregation is the equivalent of the pivot table tab.
When you set up a pivot table in Excel, you have to drag fields from a dataset into the rows, columns, and values areas. The values area
is where you determine which measures you want to summarize
, then you set them as a SUM
, MIN
, MAX
, COUNT
, or AVERAGE
. You do the same thing when using aggregation in a SQL query -> identify the measures you want to summarize
, then wrap those fields with common aggregate functions such as SUM()
, COUNT()
, MIN()
, MAX()
, AVG()
, or COUNT(DISTINCT )
.
When setting up a pivot table in Excel, you also define the fields that you want to display and summarize by
in the rows area
. You do the same thing when using aggregation in a SQL query. In this case, you list the fields that you want to summarize by in the SELECT statement
(before the aggregate fields such as SUM()
, COUNT()
, etc.). Then, list those same fields in the GROUP BY
statement (after the FROM
and WHERE
statements). When any field listed in the GROUP BY
has a change in value, the aggregate fields will reset.
Example: Using the AdventureWorks2019.Sales.SalesOrderHeader
table, determine the number of invoices (hint: use COUNT(*) )
and the total sales amount (hint: use SUM(TotalDue) )
for each CustomerId
. Use a table alias name
and be sure to attach the alias to columns in the SELECT
and GROUP BY
.
SELECT a.CustomerID, COUNT(*) AS NumOfInvoices, SUM(a.TotalDue) AS TotalSalesAmount FROM AdventureWorks2019.Sales.SalesOrderHeader a GROUP BY a.CustomerID;
Self Practice
- Write a query to find the average
ListPrice
of all products in theAdventureWorks2019.Production.Product
table. - Write a query to count the number of employees in the
AdventureWorks2019.HumanResources.Employee
table. - Write a query to find the total number of products for each
Color
in theAdventureWorks2019.Production.Product
table. Group byColor
.
SELECT AVG(ListPrice) AS AverageListPrice FROM AdventureWorks2019.Production.Product;
SELECT COUNT(*) AS NumberOfEmployees FROM AdventureWorks2019.HumanResources.Employee;
SELECT Color, COUNT(*) AS NumberOfProducts FROM AdventureWorks2019.Production.Product GROUP BY Color;
Filtering Using HAVING
Overview
Filtering in the HAVING
statement exclusively applies to aggregation
. It is very similar to filtering in the WHERE
statement with one BIG exception -> this filter is applied at the very end
of an aggregate query. By contrast, filtering in the WHERE
statement applies to detailed records before aggregation is applied.
Imagine our Excel comparison from above. Filtering using HAVING
would be equivalent to filtering on the result of a pivot table. Filtering using WHERE
would be equivalent to filtering on the detailed data tab that the pivot table is built upon.
Example: Using the AdventureWorks2019.Sales.SalesOrderHeader
table, determine the number of invoices (hint: use COUNT(*) )
and the total sales amount (hint: use SUM(TotalDue) )
for each CustomerId
. Use a table alias name
and be sure to attach the alias to columns in the SELECT
, GROUP BY
, and HAVING
. Only return records with an aggregated total sales amount less than one million dollars.
SELECT a.CustomerID, COUNT(*) AS NumOfInvoices, SUM(a.TotalDue) AS TotalSalesAmount FROM AdventureWorks2019.Sales.SalesOrderHeader a GROUP BY a.CustomerID HAVING SUM(a.TotalDue) < 1000000;
Self Practice
- Write a query to find the number of
SalesOrderID
for eachCustomerID
in theAdventureWorks2019.Sales.SalesOrderHeader
table, but only include customers with more than 5 orders.
SELECT CustomerID, COUNT(SalesOrderID) AS NumberOfOrders FROM AdventureWorks2019.Sales.SalesOrderHeader GROUP BY CustomerID HAVING COUNT(SalesOrderID) > 5;
Basic Joins
Overview
Table joins
are used to combine two or more tables within the same SQL query. The purpose of brining multiple tables together within the same query is because most queries need to do one of two things:
- Increase the number of fields available to display in a query’s output.
- Filter records based on existing (or non-existing) scenarios found in related tables.
LastNames
to each invoice, you must retrieve them from the AdventureWorks2019.Person.Person
table. To do this, you would follow the below steps:
- Connect the
AdventureWorks2019.Sales.SalesOrderHeader
table to theAdventureWorks2019.Sales.Customer
table based on the common field,CustomerID
. - Connect the
AdventureWorks2019.Person.Person
table to theAdventureWorks2019.Sales.Customer
table based onPersonID
in theCustomer
table andBusinessEntityID
in thePerson
table.BusinessEntityID
is a generic field name forPersonID
, so we treat it as the same. - Select the LastName from the
AdventureWorks2019.Person.Person
table.
Example: As described above, display the CustomerID
and TotalDue
for each invoice from the AdventureWorks2019.Sales.SalesOrderHeader
table, then add customer FirstName
and LastName
from the AdventureWorks2019.Person.Person
table. Remember to use the AdventureWorks2019.Sales.Customer
as a bridge table to connect the SalesOrderHeader
to the Person
table.
SELECT a.CustomerID, a.TotalDue, c.FirstName, c.LastName FROM AdventureWorks2019.Sales.SalesOrderHeader a INNER JOIN AdventureWorks2019.Sales.Customer b ON a.CustomerID = b.CustomerID INNER JOIN AdventureWorks2019.Person.Person c ON b.PersonID = c.BusinessEntityID;
Self Practice
- Write a query to join the
AdventureWorks2019.Person.Person
table and theAdventureWorks2019.HumanResources.Employee
table onBusinessEntityID
and selectFirstName
,LastName
, andJobTitle
. - Write a query to join the
AdventureWorks2019.Sales.SalesOrderDetail
table and theAdventureWorks2019.Production.Product
table onProductID
and selectSalesOrderID
,ProductID
, andName
.
SELECT p.FirstName, p.LastName, e.JobTitle FROM AdventureWorks2019.Person.Person p INNER JOIN AdventureWorks2019.HumanResources.Employee e ON p.BusinessEntityID = e.BusinessEntityID;
SELECT sod.SalesOrderID, sod.ProductID, p.Name FROM AdventureWorks2019.Sales.SalesOrderDetail sod INNER JOIN AdventureWorks2019.Production.Product p ON sod.ProductID = p.ProductID;
Conclusion
As we wrap up this week’s SQL homework assignments, we’ve delved into more complex aspects of SQL, including aggregation
, GROUP BY
clauses, HAVING
clauses, and basic joins
. These exercises are designed to build your confidence and proficiency in SQL, equipping you with the skills necessary for more advanced queries in the future.
Remember, consistent practice is key to mastering SQL. If you find yourself needing extra help or wanting to deepen your understanding, don’t hesitate to sign up for our private or group training sessions. These sessions offer personalized guidance and a collaborative learning environment to further enhance your SQL skills. Stay tuned for next week’s assignments, and keep pushing your boundaries as you continue on your journey to becoming an SQL expert!
Pingback: SQL Week 3 Practice Problems - Data Coaching