Introduction to SQL

SQL stands for Structured Query Language and it forms the foundation for many of the tasks that an analyst does for their job.  An analyst will use SQL on a frequent basis and it is an important skill to learn as you query relational databases in order to obtain the information that you need to analyze or need to create reports.  I use Microsoft SQL Server 2014 at home and at work and the examples that I will use in these lessons can be downloaded and used freely from the Microsoft website (for instructions, see Geek’s Corner below).

SQL is easy to learn once you understand the basic structure of the language.  In this article, I will teach you the basics of SQL, such as SELECT statements, the FROM clause, the basics of JOINS, the WHERE clause, how to use GROUP BY and ORDER BY, and how to select the TOP and DISTINCT records from a dataset.  We will then look at some more advanced SQL topics such as UNIONs, the CASE expression, some advanced JOINS and GROUP BYs.

The SELECT Statement and the FROM Clause

At the heart of SQL is the SELECT statement.  The SELECT statement allows you to retrieve records from a table (or dataset).  The SELECT statement has the following syntax:

SELECT column FROM table

To see the SELECT statement in action, open up Microsoft SQL Server Management Studio and start a new query using the AdventureWorks2014 database.  If you would like to select all of the columns from a table, instead of typing each columns name, you can use an asterisk ( * ) as a wildcard statement to indicate that you want to return all available columns from the table(s).  It is good practice however to only include the columns that you want to return as including everything can have a negative impact on performance.

What does this query do?  It returns all of the columns from the Person table where the first name is Jerry.

You can mix columns from a table with constants in your SELECT statement in order to produce more sophisticated outputs.  You can also give aliases to columns or constants by adding a letter or name after each one.  In the example below, Help From a Geek is repeated in each row in your output and the name of that column is HFG.  The column Operator has been renamed to OperatorName.  You don’t need to include the AS after your column name that you wish to alias as something else, but it helps to make your code more readable and is considered good practice. In the example below, we are using the AdventureWorksDW2014 database:

AccountType, Operator AS OperatorName, ‘Help From a Geek’ AS HFG
FROM DimAccount

The Basics of Joins

The FROM clause in your query can be extended and made more powerful with joins.  Joins allow you to bring in data from multiple related tables into your database and gives you fine control over how you want to relate those tables together.  There are several different types of joins in SQL, but there are three basic ones that you need to understand:

Inner Join
Venn diagram showing two tables and an inner join

Inner Join

Inner joins are the most common type of join.  Inner joins return the intersections of two different tables or datasets (as illustrated in the diagram).
Left Join
Venn diagram illustrating a left join with the left-most circle fully colored in

Left Join

Left joins return all of the records in the table on the left of the join and only those related records from the right table (as illustrated in the diagram).  It is also possible for you to perform a right join if you want to achieve the opposite affect, although this is much more rare.  It is common practice to simply flip the positions of the tables in your query when you want to achieve this outcome.
Full Join
Venn diagram illustrating a full join in sql where both circles are fully colored in

Full Join

Full joins return the union of both tables or datasets (as illustrated in the diagram).


When using a left join or a full join, you might get NULL columns back for rows that do not have a relationship in the non-primary table or dataset

When you are using a join, you need to specify the join criteria in your SQL statement.  You do this by using the ON keyword.  It is also good practice to use an alias in the join as well, and if people within your company typically alias tables the same way, you should stick to that convention (if the CountryCode table is usually aliased cc by others in your company, you should use cc when you alias the table as well).

INNER JOINs are also simply called JOINS and if you see a query with JOIN, you should understand that it is an INNER JOIN

The WHERE Clause

Using a WHERE clause in the SELECT statement will filter the resultset and conditionally return specific records.  The WHERE clause can be used to insert, update, and delete data from a table or tables, but in our examples, we will look at using it to conditionally retrieve data.  The WHERE clause is always used in combination with an operator of some sort (such as = (equals), <> or != (not equals), > (greater than), < (less than), >= (greater than or equal to), <= (less than or equal to), or BETWEEN (within the range specified).)  Let’s look at an example where we are trying to return only the records with account types that are equal to Assets:

Let’s look at another example using a different operator and see how we can chain multiple conditions together in our WHERE clause.  Let’s say that we want to only see records that are the AccountType of Assets with amounts that are greater than or equal to 15:

Grouping Results

One of the most useful features when doing analytics in SQL is grouping.  Grouping allows you to aggregate your results and perform advanced filtering on the data.  In order to group your resultset, you use the GROUP BY clause in your SELECT statement.  It is common for the GROUP BY clause to be used with an aggregation function and/or a HAVING clause.  Let’s look at an example of a basic GROUP BY statement with a HAVING clause:

It is important to remember that all of the columns that appear in your SELECT statement column list that are not aggregated will have to appear in your GROUP BY clause.

Order By Clause

You can use the ORDER BY clause to determine the order of the records that are returned by your query.  SQL defaults to sorting records in ascending order, but you can easily change it to descending order. Let’s take a look at a basic ORDER BY clause example:

When you are setting up your ORDER BY clause, it is best practice for you to use the name of the column(s).  You can also use the column ordinal as well to order the results, although this doesn’t generate code that is easily readable and is not standard practice.

Selecting Distinct Records

There is a keyword in SQL that allows you to return the unique occurrences of the values in your resultset without having to resort to using the GROUP BY clause – it is called the DISTINCT statement.  Let’s look at an example of how a DISTINCT statement works:

One of the most common uses of the DISTINCT keyword is to validate the unique key of a dataset or a table in order to make sure taht your understanding of the data matches what actually exists within the database. Take the DimDate table in the AdventureWorksDW database for example.  FullDateAlternateKey is defined, according to the name of the column, as the alternate key for that table.  Therefore, running the distinct set of FullDateAlternateKey produces a record count equal to COUNT (*) on the table.

Selecting Top Records

The TOP expression is incredibly useful and I use it all of the time when building complex queries in order to limit my resultset and not put undo strain on the database.  When you use the TOP expression in the SELECT statement (or in any other data manipulation statements), it allows you to restrict the number of results that are returned by our query to a set number or percent of rows.  For example, the following query returns the top 10 records from the DimAccount table:

Alternatively, we can also return the top X percent of records from our query (as in the example below where we return the top 10 percent of records in the DimAccount table):

About jvaudio

I have masters degrees in information systems management, project management, and computer science. I have bachelors degrees in technical management and finance.

I love to learn. I love to write. I love technology. I love math.

Visit My Website
View All Posts
Recommended Posts