Introduction to SQL
SQL stands for Structured Query Language and it forms the foundation for many of the tasks that an analyst does for their job. An analyst will use SQL on a frequent basis and it is an important skill to learn as you query relational databases in order to obtain the information that you need to analyze or need to create reports. I use Microsoft SQL Server 2014 at home and at work and the examples that I will use in these lessons can be downloaded and used freely from the Microsoft website (for instructions, see Geek’s Corner below).
SQL is easy to learn once you understand the basic structure of the language. In this article, I will teach you the basics of SQL, such as SELECT statements, the FROM clause, the basics of JOINS, the WHERE clause, how to use GROUP BY and ORDER BY, and how to select the TOP and DISTINCT records from a dataset. We will then look at some more advanced SQL topics such as UNIONs, the CASE expression, some advanced JOINS and GROUP BYs.
The SELECT Statement and the FROM Clause
At the heart of SQL is the SELECT statement. The SELECT statement allows you to retrieve records from a table (or dataset). The SELECT statement has the following syntax:
SELECT column FROM table
To see the SELECT statement in action, open up Microsoft SQL Server Management Studio and start a new query using the AdventureWorks2014 database. If you would like to select all of the columns from a table, instead of typing each columns name, you can use an asterisk ( * ) as a wildcard statement to indicate that you want to return all available columns from the table(s). It is good practice however to only include the columns that you want to return as including everything can have a negative impact on performance.
What does this query do? It returns all of the columns from the Person table where the first name is Jerry.
You can mix columns from a table with constants in your SELECT statement in order to produce more sophisticated outputs. You can also give aliases to columns or constants by adding a letter or name after each one. In the example below, Help From a Geek is repeated in each row in your output and the name of that column is HFG. The column Operator has been renamed to OperatorName. You don’t need to include the AS after your column name that you wish to alias as something else, but it helps to make your code more readable and is considered good practice. In the example below, we are using the AdventureWorksDW2014 database:
SELECT AccountType, Operator AS OperatorName, ‘Help From a Geek’ AS HFG FROM DimAccount
The Basics of Joins
The FROM clause in your query can be extended and made more powerful with joins. Joins allow you to bring in data from multiple related tables into your database and gives you fine control over how you want to relate those tables together. There are several different types of joins in SQL, but there are three basic ones that you need to understand:
When you are using a join, you need to specify the join criteria in your SQL statement. You do this by using the ON keyword. It is also good practice to use an alias in the join as well, and if people within your company typically alias tables the same way, you should stick to that convention (if the CountryCode table is usually aliased cc by others in your company, you should use cc when you alias the table as well).
The WHERE Clause
Using a WHERE clause in the SELECT statement will filter the resultset and conditionally return specific records. The WHERE clause can be used to insert, update, and delete data from a table or tables, but in our examples, we will look at using it to conditionally retrieve data. The WHERE clause is always used in combination with an operator of some sort (such as = (equals), <> or != (not equals), > (greater than), < (less than), >= (greater than or equal to), <= (less than or equal to), or BETWEEN (within the range specified).) Let’s look at an example where we are trying to return only the records with account types that are equal to Assets:
Let’s look at another example using a different operator and see how we can chain multiple conditions together in our WHERE clause. Let’s say that we want to only see records that are the AccountType of Assets with amounts that are greater than or equal to 15:
One of the most useful features when doing analytics in SQL is grouping. Grouping allows you to aggregate your results and perform advanced filtering on the data. In order to group your resultset, you use the GROUP BY clause in your SELECT statement. It is common for the GROUP BY clause to be used with an aggregation function and/or a HAVING clause. Let’s look at an example of a basic GROUP BY statement with a HAVING clause:
It is important to remember that all of the columns that appear in your SELECT statement column list that are not aggregated will have to appear in your GROUP BY clause.
Order By Clause
You can use the ORDER BY clause to determine the order of the records that are returned by your query. SQL defaults to sorting records in ascending order, but you can easily change it to descending order. Let’s take a look at a basic ORDER BY clause example:
When you are setting up your ORDER BY clause, it is best practice for you to use the name of the column(s). You can also use the column ordinal as well to order the results, although this doesn’t generate code that is easily readable and is not standard practice.
Selecting Distinct Records
There is a keyword in SQL that allows you to return the unique occurrences of the values in your resultset without having to resort to using the GROUP BY clause – it is called the DISTINCT statement. Let’s look at an example of how a DISTINCT statement works:
Selecting Top Records
The TOP expression is incredibly useful and I use it all of the time when building complex queries in order to limit my resultset and not put undo strain on the database. When you use the TOP expression in the SELECT statement (or in any other data manipulation statements), it allows you to restrict the number of results that are returned by our query to a set number or percent of rows. For example, the following query returns the top 10 records from the DimAccount table:
Alternatively, we can also return the top X percent of records from our query (as in the example below where we return the top 10 percent of records in the DimAccount table):