| SQL GROUP BY |
|
In this article I will describe the purpose and best practices of group by clause of select statement. Any reporting select query requires group by clause to aggregate some results by particular parameters. That is critical for overall system performance to follow some best practices of group by queries. GROUP BY purposeAggregate functions Group by can't be discussed without appealing to aggregate functions. So Aggregate functions are embedded transact-sql functions that iterate through the recordset and collect the result. The main aggregate functions compliant with ANSI SQL 92 are MIN, MAX, SUM, COUNT, AVG. For visual understanding let's create a test table and fill it in with some test data create table testgroupby ( So let's use aggregation functions against the testgroupby table. select COUNT (*), MIN(population), MAX( population), SUM(population) And you will see the next results
We see that aggregate functions iterated though all rows of testgroupby table and returned the result. Now if we want to see the aggregate values per country we need to use group by clause of select statement. In ANSI SQL 92, SQL Server 2000, SQL Server 2005, SQL Server 2008 and any other version of most dbmss this clause goes right after where clause (if exists, otherwise - right after from clause). So the query calculating countries' populations will look like select country, SUM(population) and the group by query output How GROUP BY worksThe columns specified in group by list divide table into groups and then aggregate functions are executed against each group separately as against the entire table in first example. in our case the table is divided into groups the next way
All fields listed in select statement when you use aggregate functions with group by statement must be either results of operations with aggregate functions (i.e. sum=SUM(population), field1= SUM(Population)/AVG(Population)) or appear in group by fields list. For instance we can't include fields ID or city in our previous select statement because the next error will occur Msg 8120, Level 16, State 1, Line 1 This is obvious, remember that we already divided the tables into groups and summarized population thus we left only one row per country. Now it is meaningless to address back to the cities. GROUP BY uses GROUP BY has a wide usage in reporting queries. Remember that you can use group by not only against a table like testgroupby in our examples but against any resultset. This means that if you have a select statement with joins and where clause you can always apply aggregate functions and group by clause to this statement. You can filter results of group by query using HAVING clause. This works the same as where clause but applies to the resultset after all aggregate functions are calculated and rows are merged. For instance select country, pop= SUM(population) both queries return the same data
but the first one is much more compact and readable as well as less resource consumptive even though in this particular case execution plans for both queries are the same. GROUP BY instead of DISCTINCTYou can always use group by when you need to gather distinct values of a table/resultset, the rule is simple - include all columns into group by clause select country again, the plans are the same. GROUP BY best practicesRemember that group by statement dramatically affects performance of the query and try to follow the next principlesKeep group by fields list as short as possible. That means that aggregations and GROUP BY clause must usually be formed in a nested select. For instance imagine we have another table - countrieslist create table countrieslist ( Now when we want to see the overall county info including its population the worst choice is the next query select Now lets try to put the aggregation and group by statement into the nested query (derived table)
In this particular case the second query plan is worse than the first but this is only because we deal with only 8 records totally. The information provided within this article applies to all database management systems compliant with ANSI SQL-92 not only transact-sql of sql server family (SQL Server 2005, SQL Server 2008) but also to Oracle, Sybase, MySQL etc.
Comments (0) |