Relational Database Management
Systems for Epidemiologists:
Outline
SQL Basics
Retrieving Data from a Table
What is SQL?
SQL is the standard programming language to create, update, delete, and retrieve data stored in a RDMS.
SQL is defined by rules of syntax (words and symbols allowed) and semantics (meaning).
Three versions:
ANSI-89 SQL ANSI-92 SQL
SQL Syntax
SELECT fname, lname
FROM casetble
WHERE status='alive'
ORDER BY lname, fname;
SQL Syntax
SELECT fname, lname
FROM casetble
WHERE status='alive'
ORDER BY lname, fname;
Clauses
Columns Tables
SQL Syntax
SELECT fname, lname
FROM casetble
WHERE status='alive'
ORDER BY lname, fname;
SQL Syntax
SELECT fname, lname
FROM casetble
WHERE status='alive'
ORDER BY lname, fname;
SQL Syntax
SELECT fname, lname
FROM casetble
WHERE status='alive'
ORDER BY lname, fname;
Elements of SQL Style
SQL statements
can be in uppercase or lowercase (case insensitive). can extend across multiple lines, so long as you do not split words or quoted strings in two.
To improve readability and maintenance,
Begin each SQL statement on a new line.
Use uppercase for SQL keywords (eg, SELECT, NULL, CHARACTER).
SELECT
SELECT retrieves rows, columns, and derived values from one or more tables.
SELECT Example
SELECT fname, lname
SELECT Example
SELECT *
AS
The AS statement can be used to create a column alias (an alternative name/identifier) that you
specify to control how column headings are displayed in a result.
Syntax:
SELECT column1 AS alias1, column2 AS alias2, ...
AS Example
SELECT fname AS “First Name”, lname AS “Last Name”
DISTINCT
Results of queries oftentimes contain duplicate values for a particular column.
The DISTINCT keyword eliminates duplicate rows from a result.
Syntax:
DISTINCT Example
SELECT DISTINCT fname, lname
Notes on DISTINCT
If the SELECT DISTINCT clause contains more than one column, the values of all the columns specified determined the uniqueness of rows.
All NULLS are considered duplicates of each other (even though they technically never equal each other since the values are unknown).
ORDER BY
Rows in a query result are unordered and should be viewed as arbitrary.
The ORDER BY clause can be used to sort rows by a specified column (or columns) in
ascending/descending order.
Syntax:
SELECT DISTINCT column(s) FROM table
ORDER BY sort_column1 [ASC | DESC],
...
ORDER BY Example
SELECT fname, lname, city, state
FROM patients
ORDER BY state ASC,
ORDER BY Example
SELECT fname, lname, city, state
FROM patients
ORDER BY 4 ASC,
3 DESC;
ORDER BY Example
SELECT fname AS “First Name”, lname AS “Last Name”,
Notes on ORDER BY
The default for sorting is ascending order.
The sorting column(s) do not need to appear in the resulting query.
If the ORDER BY columns do not identify each row uniquely, rows with duplicate values will be listed in arbitrary order.
A DBMS uses a collating sequence (or collation) to determine the order in which characters are
WHERE
The WHERE clause can be used to filter
unwanted rows in a result (ie, yield a subset of all rows in the result with a specified condition).
Syntax:
SELECT column(s) FROM table
Types of Conditions
Condition SQL Operators
Comparison =, <>, <, <=, >, >=
Pattern Matching LIKE
Range Filtering BETWEEN
List Filtering IN
Comparison Operators
Operator Descriptors
= Equal to
<> Not equal to
< Less than
<= Less than or equal to
> Greater than
WHERE Example
SELECT caseid, fname, lname
FROM patients
WHERE Example
SELECT caseid, fname, lname
FROM patients
WHERE Example
SELECT caseid, fname, lname
FROM patients
WHERE Example
SELECT caseid,
fname, lname,
MONTH(datevar) AS “Month”, DAY(datevar) AS “Day”
FROM patients
Notes on WHERE
Occasionally, you may need to specify multiple conditions in a single WHERE clause.
You can use the AND, OR or NOT operators to combine two or more conditions into a compound condition.
AND, OR, and NOT operators are known as Boolean operators; they are designed to work with “truth”
AND Example
SELECT caseid,
fname, lname,
MONTH(datevar) AS “Month”
FROM patients
Truth Table for Two Conditions
AND TRUE FALSE UNKNOWN TRUE TRUE FALSE UNKNOWN FALSE FALSE FALSE FALSE
Truth Table for Two Conditions
OR TRUE FALSE UNKNOWN
TRUE TRUE TRUE TRUE
NOT Example
SELECT caseid,
fname, lname,
MONTH(datevar) AS “Month”
FROM patients
NOT Example
SELECT caseid,
fname, lname,
MONTH(datevar) AS “Month”
FROM patients
NOT Truth Table
Condition NOT Condition TRUE FALSE
Equivalent Conditions
This condition Is equivalent to:
NOT (p AND q) (NOT p) AND (NOT q)
NOT (p OR q) (NOT p) OR (NOT q)
Order of Evaluation
When you use multiple logical operators in a compound condition, the order of evaluation is:
(1) NOT
LIKE
You can use the LIKE operator to retrieve partial information for a character string (not numbers or date/times) rather than an exact value.
LIKE uses a pattern that values are matched against.
Pattern: quoted string that contains the literal characters to match and any combination of wildcards.
Wildcards: special characters used to match parts of a value.
Wildcard Operators
Operator Matches %
_ Any one character
BETWEEN
Use the BETWEEN clause to determine whether a given value falls within a specified range.
BETWEEN works with character strings, numbers, and date/times.
The range contains a low and high value, separated by AND (inclusive).
Equivalent Statements
SELECT fname, lname
FROM patients
WHERE zip BETWEEN 94510 AND 94515;
SELECT fname, lname
FROM patients
List Filtering with IN
The IN clause can be used to display records with any value in a specified list for a particular
IN Example
Example:
SELECT fname, lname, state FROM patients
IS NULL
Recall: NULLs represent missing or unknown values.
IS NULL can be used to find records with NULL values
Syntax:
SELECT columns
FROM table
IS NULL Example
SELECT caseid,
fname, lname
FROM patients
Additional Operators & Functions
Arithmetic Operations (+. -, *, /) Concatenation (||)
Extracting Text (SUBSTRING())
Changing Case (UPPER() and LOWER()) Trimming Characters (TRIM())
Length of a String (CHARACTER_LENGTH() or LEN())
Next Time
Summarizing and Grouping Data
Aggregate functions (MIN, MAX, SUM, AVG, COUNT)
Grouping rows with GROUP BY Filtering groups with HAVING
Joins