Practice Topics
- Aggregation with GROUP BY
- Filtering Using HAVING
- Basic Joins
Aggregation with GROUP BY
Overview
Aggregation is when you collapse a detailed dataset into a summarized one. Picture an Excel file with a pivot table tab summarizing a more detailed data tab. Aggregation is the equivalent of the pivot table tab.
When you set up a pivot table in Excel, you have to drag fields from a dataset into the rows, columns, and values areas. The values area is where you determine which measures you want to summarize, then you set them as a SUM, MIN, MAX, COUNT, or AVERAGE. You do the same thing when using aggregation in a SQL query -> identify the measures you want to summarize, then wrap those fields with common aggregate functions such as SUM(), COUNT(), MIN(), MAX(), AVG(), or COUNT(DISTINCT ).
When setting up a pivot table in Excel, you also define the fields that you want to display and summarize by in the rows area. You do the same thing when using aggregation in a SQL query. In this case, you list the fields that you want to summarize by in the SELECT statement (before the aggregate fields such as SUM(), COUNT(), etc.). Then, list those same fields in the GROUP BY statement (after the FROM and WHERE statements). When any field listed in the GROUP BY has a change in value, the aggregate fields will reset.
Example: Using the AdventureWorks2019.Sales.SalesOrderHeader table, determine the number of invoices (hint: use COUNT(*) ) and the total sales amount (hint: use SUM(TotalDue) ) for each CustomerId. Use a table alias name and be sure to attach the alias to columns in the SELECT and GROUP BY.
SELECT a.CustomerID, COUNT(*) AS NumOfInvoices, SUM(a.TotalDue) AS TotalSalesAmount FROM AdventureWorks2019.Sales.SalesOrderHeader a GROUP BY a.CustomerID;
Self Practice
- Write a query to find the average
ListPriceof all products in theAdventureWorks2019.Production.Producttable. - Write a query to count the number of employees in the
AdventureWorks2019.HumanResources.Employeetable. - Write a query to find the total number of products for each
Colorin theAdventureWorks2019.Production.Producttable. Group byColor.
SELECT AVG(ListPrice) AS AverageListPrice FROM AdventureWorks2019.Production.Product;
SELECT COUNT(*) AS NumberOfEmployees FROM AdventureWorks2019.HumanResources.Employee;
SELECT Color, COUNT(*) AS NumberOfProducts FROM AdventureWorks2019.Production.Product GROUP BY Color;
Filtering Using HAVING
Overview
Filtering in the HAVING statement exclusively applies to aggregation. It is very similar to filtering in the WHERE statement with one BIG exception -> this filter is applied at the very end of an aggregate query. By contrast, filtering in the WHERE statement applies to detailed records before aggregation is applied.
Imagine our Excel comparison from above. Filtering using HAVING would be equivalent to filtering on the result of a pivot table. Filtering using WHERE would be equivalent to filtering on the detailed data tab that the pivot table is built upon.
Example: Using the AdventureWorks2019.Sales.SalesOrderHeader table, determine the number of invoices (hint: use COUNT(*) ) and the total sales amount (hint: use SUM(TotalDue) ) for each CustomerId. Use a table alias name and be sure to attach the alias to columns in the SELECT, GROUP BY, and HAVING. Only return records with an aggregated total sales amount less than one million dollars.
SELECT a.CustomerID, COUNT(*) AS NumOfInvoices, SUM(a.TotalDue) AS TotalSalesAmount FROM AdventureWorks2019.Sales.SalesOrderHeader a GROUP BY a.CustomerID HAVING SUM(a.TotalDue) < 1000000;
Self Practice
- Write a query to find the number of
SalesOrderIDfor eachCustomerIDin theAdventureWorks2019.Sales.SalesOrderHeadertable, but only include customers with more than 5 orders.
SELECT CustomerID, COUNT(SalesOrderID) AS NumberOfOrders FROM AdventureWorks2019.Sales.SalesOrderHeader GROUP BY CustomerID HAVING COUNT(SalesOrderID) > 5;
Basic Joins
Overview
Table joins are used to combine two or more tables within the same SQL query. The purpose of brining multiple tables together within the same query is because most queries need to do one of two things:
- Increase the number of fields available to display in a query’s output.
- Filter records based on existing (or non-existing) scenarios found in related tables.
LastNames to each invoice, you must retrieve them from the AdventureWorks2019.Person.Person table. To do this, you would follow the below steps:
- Connect the
AdventureWorks2019.Sales.SalesOrderHeadertable to theAdventureWorks2019.Sales.Customertable based on the common field,CustomerID. - Connect the
AdventureWorks2019.Person.Persontable to theAdventureWorks2019.Sales.Customertable based onPersonIDin theCustomertable andBusinessEntityIDin thePersontable.BusinessEntityIDis a generic field name forPersonID, so we treat it as the same. - Select the LastName from the
AdventureWorks2019.Person.Persontable.
Example: As described above, display the CustomerID and TotalDue for each invoice from the AdventureWorks2019.Sales.SalesOrderHeader table, then add customer FirstName and LastName from the AdventureWorks2019.Person.Person table. Remember to use the AdventureWorks2019.Sales.Customer as a bridge table to connect the SalesOrderHeader to the Person table.
SELECT a.CustomerID, a.TotalDue, c.FirstName, c.LastName FROM AdventureWorks2019.Sales.SalesOrderHeader a INNER JOIN AdventureWorks2019.Sales.Customer b ON a.CustomerID = b.CustomerID INNER JOIN AdventureWorks2019.Person.Person c ON b.PersonID = c.BusinessEntityID;
Self Practice
- Write a query to join the
AdventureWorks2019.Person.Persontable and theAdventureWorks2019.HumanResources.Employeetable onBusinessEntityIDand selectFirstName,LastName, andJobTitle. - Write a query to join the
AdventureWorks2019.Sales.SalesOrderDetailtable and theAdventureWorks2019.Production.Producttable onProductIDand selectSalesOrderID,ProductID, andName.
SELECT p.FirstName, p.LastName, e.JobTitle FROM AdventureWorks2019.Person.Person p INNER JOIN AdventureWorks2019.HumanResources.Employee e ON p.BusinessEntityID = e.BusinessEntityID;
SELECT sod.SalesOrderID, sod.ProductID, p.Name FROM AdventureWorks2019.Sales.SalesOrderDetail sod INNER JOIN AdventureWorks2019.Production.Product p ON sod.ProductID = p.ProductID;
Conclusion
As we wrap up this week’s SQL homework assignments, we’ve delved into more complex aspects of SQL, including aggregation, GROUP BY clauses, HAVING clauses, and basic joins. These exercises are designed to build your confidence and proficiency in SQL, equipping you with the skills necessary for more advanced queries in the future.
Remember, consistent practice is key to mastering SQL. If you find yourself needing extra help or wanting to deepen your understanding, don’t hesitate to sign up for our private or group training sessions. These sessions offer personalized guidance and a collaborative learning environment to further enhance your SQL skills. Stay tuned for next week’s assignments, and keep pushing your boundaries as you continue on your journey to becoming an SQL expert!