On this page you will find SQL Interview Questions with answers. These are commonly asked questions during SQL job interviews. They range from simple to hard.
A DBMS (Database Management System) is a program that creates, controls, and manages data. And a RDBMS (Relational Database Management System) is a database that stores data in tables (relations) that are composed of rows and columns. RDBMS examples include SQL Server, Oracle, MySQL, and others.
Note: the word 'relational' does NOT refer to the fact that tables are related to each other (which is the case), but a relation is simply a table.
SQL stands for Structured Query Language, which is a standard language to access and manage data in a database. MySQL, on the other hand, is a comprehensive relational database management system, that implements SQL as its query language.
A database is an organized form of data which can be easily retrieved, manipulated, stored, and managed by a DBMS (Database Management System).
In SQL Server, a schema is a collection of database objects, such as, tables, indexes, and triggers, that are connected to a user, i.e. the schema owner. A database can have multiple schemas which can be helpful in partitioning and organizing the data. Built-in SQL Server schemas include dbo, guest, sys, and INFORMATION_SCHEMA.
Note: The word schema is also often used to describe the design or structure of the database, i.e. the data model. An ERD (Entity Relationship Diagram) is commonly referred to as a 'database schema', but, we think the word 'data model' is more appropriate.
The categories are:
1. DDL (Data Definition Language)
Creates database objects and defines database structure.
It includes commands like CREATE
, ALTER
, and DROP
.
2. DML (Data Manipulation Language)
Retrieves, manipulates, and removes data from a database.
Commands include: SELECT
, INSERT
, UPDATE
, and DELETE
.
3. DCL (Data Control Language)
Provides access privileges to users in the database.
Commands include: GRANT
and REVOKE
.
4. TCL (Transaction Control Language)
Organizes SQL data changes into logical transactions.
Commands include BEGIN
, TRANSACTION
, COMMIT
, ROLLBACK
, and SAVEPOINT
.
A table is a data storage structure that is organized in rows and columns, much like a spreadsheet.
The columns are often called fields and rows that identified as records.
A field has a data type which specifies what data can and cannot be stored.
For example, a Customer table may have 3 fields: Id, FirstName and LastName.
Data records fill the rows, for example, Id = 10, FirstName = Gregory,
and LastName = McNeil. The table accepts an unlimited number of customer records.
Constraints specify rules that the data in the table must adhere to. These rules may be
applied to single or multiple columns in a table.
Examples of constraints include: NOT NULL
, CHECK
, DEFAULT
,
UNIQUE
, PRIMARY KEY
, and FOREIGN KEY
.
A primary key is one or more columns that uniquely identify a row. The primary key column value cannot be NULL and must be unique. Here's how primary keys are typically created:
/* Create table with a single field as primary key */ CREATE TABLE Customer ( Id INT IDENTITY, FirstName VARCHAR(50) NOT NULL, LastName VARCHAR(50) NOT NULL, CONSTRAINT PK_Custoer PRIMARY KEY(Id) ) /* Create table with multiple fields as primary key Don't use FirstName and LastName as PK in real-world solutions. */ CREATE TABLE Customer ( Id INT NOT NULL, FirstName VARCHAR(50) NOT NULL, LastName VARCHAR(50) NOT NULL, CONSTRAINT PK_Customer PRIMARY KEY(FirstName, LastName) )
Note: Single column primary keys (IDENTITY fields) are almost always preferred.
A foreign key is a column in one table that references the primary key in another table.
It is used to identify relationships between two tables and maintain referential integrity
between the tables. Foreign keys also help speed up JOIN operations (with proper indexing).
Here's how a foreign key is added to a table:
/* Add a foreign key constraint to SupplierId in the Product table */ ALTER TABLE Product ADD CONSTRAINT FK_ProductSupplier FOREIGN KEY (SupplierId) REFERENCS Supplier(Id)
A constraint on a column that ensures that all values in the table are unique. This means that the column will not accept any duplicate values. For instance, customers cannot have duplicate email addresses; they must be unique. Only one NULL value can exist.
Note: SQL Server enforces uniqueness by adding a unique index to the table.
An identity is a unique auto-generated number that is added when a new record is inserted into to a table. This is commonly used for the primary key on a table.
A JOIN operation matches rows from one table with another table based on their related columns (usually foreign keys and primary keys). It is used to merge data from two different tables. There are 4 JOIN types and 2 special types:
JOIN
The most common type of JOIN that returns all rows from two tables where the condition is satisfied. Also called INNER JOIN.
LEFT JOIN
Returns all the rows from the left table and only the matching rows from the right table where the condition is fulfilled.
RIGHT JOIN
Returns all the rows from the right table and only matching rows on the left table where the condition is fulfilled.
FULL JOIN
Returns all matching rows regardless whether they are from the left or the right table.
Special types:
SELF-JOIN
A JOIN in which a table is joined to itself. Self-join uses table aliases to give different names to the same table within the query.
CROSS-JOIN
Can be defined as the cartesian product of the two unrelated tables in a JOIN. Result sets are often very large; it has the same number of rows as the product of the number of rows in the two tables.
An index helps speed up database searches by quickly locate matching rows in a table.
Without the index SQL Server would have to go
through all rows in the table and check each row for a match (this is called a table-scan).
This would be very slow in a large database.
Data integrity refers to the quality and accuracy of the data stored in the database. Can be enforced with integrity constraints which ensures that no data is entered that does not meet the business rules.
Normalization is the process of creating a data model that minimizes redundancy and avoids data duplication. The aim of normalization is to have a particular data item occur only once in the entire database.
Denormalization is the loosening of normalization rules by carefully introducing redundancy and allowing certain duplicate data elements into the database. The aim of denormalization is to simplify SQL and to increase performance by limiting the number of JOIN operations.
Normal forms are normalization rules.
First Normal Form (1NF)
Removes all duplicate columns from a table. Tables must be created for related data and identify unique columns.
Second Normal Form (2NF)
Meets the requirements from 1NF. Subsets of data must be placed in separate tables that are identified with primary keys.
Third Normal Form (3NF)
This should meet the 2NF requirements. Columns that are not related to the primary key must be removed from the table.
Fourth Normal Form (4NF)
Meets all the 3NF requirements. Columns should not have multi-valued dependencies.
A view is a virtual table that is based on an SQL SELECT query. The table does not physically exist but the query creates it when the view is used. A view can have data from one or more tables.
A relationship between two tables where one row can have many matching rows in another table. For example, a customer can have multiple orders, or an author can publish many books.
A database query is code that retrieves information from a database, commonly a SELECT statement.
A subquery is a query within another query. Also known as a nested or inner query. Subqueries are always executed first and their results are returned to the main query.
A subquery that is related (correlated) to the outer query by referencing one or more columns from the outer query.
Query optimization is changing a query so that it returns the same result but with better performance by taking up fewer resources.
A stored procedure is a function with SQL statements and T-SQL logic to perform a (usually) complex operation in the database. It is reusable unit of code that can be executed as often as necessary.
Stored procedures are modular programming units that are created once
and can be reused many times. This allows for faster execution and reduced network traffic.
It can also provide better data security.
Disadvantages include: it requires T-SQL skills (which may be difficult to find), the code
is not portable to other database vendors, and including business logic in the database
is not always seen as the best architectural approach.
A trigger is T-SQL code that automatically executes in response to some table or database event. For example, when a new product is added to a shopping cart, a trigger can re-calculate the new total with tax and shipping costs.
Local variables are only visible and can only be used inside a function, trigger, or stored procedure. Global variables are visible and accessible throughout the system.
The DELETE command removes one or more rows from a table.
The operation is logged in the transaction log, and, if necessary, can be rolled back (undone).
TRUNCATE removes all rows from a table.
This operation is not logged in the transaction log
and therefore cannot be rolled back. Compared to DELETE, TRUNCATE is extremely fast.
A data warehouse is a central repository with data from multiple sources. The data is consolidated, transformed, and made available for data analytics and data mining processing.
A set of rules that determine how character data can be sorted and compared. Different languages have different collation requirements.
T-SQL, or Transact-SQL, is a proprietory extension to SQL that is available in SQL Server and Sybase. T-SQL is a procedural programming language with support for local variables, conditionals, string processing, exception handling, and more. Stored procedures, for example, are written with T-SQL.
An OLTP solution manages day-to-day transactions, such as data entry, e-commerce, and credit card processing. Usually these are many, but short transactions. OLTP solutions are write intensive with INSERT, UPDATE, and DELETE statements.
An OLAP solution manages a large database with aggregated historical data from different sources. The data is commonly stored in a multi-dimensional data model (star schema). These systems are mostly used for data analytics and data mining purposes. OLAP solutions are read intensive using SELECT statements primarily.
SQL clauses are used to limit the size of the result set by providing conditions to a query. Examples are WHERE and HAVING clauses.
A UNION combines the results of two queries with duplicate rows eliminated.
For the UNION to work, the columns in the two queries must exactly match.
An alias is an alternative name that can be given to a table or a column.
Below is an example with both types:
SELECT ProductName AS Product, UnitPrice AS Price FROM Product P JOIN Supplier S ON S.Id = P.SupplierId
Note: Product and Price are column aliases. P and S are table aliases.
Aggregate functions perform mathematical calculations and return single values. They are commonly used in GROUP BY or HAVING clauses. Here are some examples:
AVG()
- calculates averageCOUNT()
- returns total number or rowsMIN()
- calculates minimum valueMAX()
- calculates maximum valueFIRST()
- returns the first returned valueLAST()
- returns the last returned valueScalar functions also return single values. Here are some commonly used examples:
LEN()
- returns the total length of the fieldUCASE()
- converts value to uppercaseLCASE()
- converts value to lowercaseMID()
- extracts substringCONCAT()
- concatenates stringsRAND()
- generates random numberROUND()
- rounds off numberNOW()
- returns current dateFORMAT()
- formats value to display
The GETDATE()
function returns the current system date and time.
ACID stands for Atomicity, Consistency, Isolation, Durability.
Together they ensure that data transactions
are processed consistently and reliably.
Atomicity
means the transactions are treated as a unit, that is, they either succeed or fail.
In other words, if one part in a transaction fails then the entire transaction fails and
the state of the database is left unchanged.
Consistency
ensures that the data meets all the validation rules.
In other words, all data are stored in a correct state.
Isolation
refers to concurrency control.
The transactions occur in isolation from each other.
Durability
means that once a transaction is committed (written to disk), it will not be affected by
power loss, crash, or any sort of error.
Operators are special keywords or characters reserved for performing
particular operations. The operations fall into 6 categories:
Arithmetic
Perform mathematical operations on numerical data.
They include addition (+), subtraction (-), multiplication (*), division (/),
remainder or modulus (%), and more.
Logical
Evaluate expressions and return a True or False values.
Examples include: ALL, AND, ANY, ISNULL, EXISTS, BETWEEN, IN, LIKE, NOT, OR, and UNIQUE.
Comparison
Perform comparisons between two values. Examples include equal to (=), not equal to (!= or <>),
less than (<), greater than (>
), less than or equal to (<=), greater than or equal to (>=),
not less than (!<), not greater than (!>
), and others.
Bitwise
Preform bit manipulations between two expressions of type integer.
Examples include AND (& symbol), OR (|, ^), NOT (~), and others.
Compound
Perform operations on a variable before setting the variable's result to the operation's result.
Examples include: add equals (+=), subtract equals (-=), multiply equals (*=), divide equals (/=),
modulo equals (%=), and others.
String
Perform concatenation and pattern matching of strings. Examples include: + (String concatenation),
+= (String concatenation assignment), % (Wildcard), [] (Character(s) matches),
[^] (Character(s) not to match), _ (Wildcard match one character), and others.
A NULL value represents a value that is unavailable, unknown, or not applicable. It is not zero, an empty string, or a blank space.
The BETWEEN operator checks for values that fall in a value range, whereas the IN operator checks for values contained in a specific set of values.
SELECT * FROM Customer WHERE Id BETWEEN 50 AND 100 SELECT * FROM Customer WHERE Id IN (52,60,126,175)
The LIKE operator is used for pattern matching and can be used together
with these wildcards:
% - matches zero or more characters.
SELECT * FROM Product WHERE ProductName LIKE 'C%'
_ (underscore) - matches exactly one character.
SELECT * FROM Product WHERE ProductName LIKE 'Hanniba_'
A SELECT statement retrieves data from a database. The results are returned in a tabular form called a result set.
SELECT * FROM Supplier
A database cursor is a control structure that allows for traversal of records in a database. Cursors also facilitate processing after traversal, such as retrieval, addition and deletion of database records. Cursors can be viewed as a pointer to one row in a set of rows.
The ISNULL()
function converts NULL values to another value.
The function returns the value of the second parameter if the first parameter is NULL.
If the first parameter is anything other than NULL, it is left unchanged.
SQL Injection is a hack attack that exploits a vulnerability in the application code which allows attackers to access back-end operations and retrieve, updated, and destroy database data. Malicious SQL statements are usually inserted into input fields which are then concatenated by the program into an SQL string. This kind of attack can be mitigated by using paramaterized queries.
The RANK function assigns each row with a rank within an ordered
partition in the result set. If 2 or more rows are assigned the same rank, then the next
number in the ranking will be its previous rank plus the number of duplicate numbers.
For example, if we have 3 records at rank 4, then the next rank would be 7.
The DENSE_RANK function assigns a unique rank for each row within a partition
without any gaps (i.e. dense). Rankings are always in
consecutive order. If 2 or more rows are assigned the same rank, then the next rank
will be the next sequential number. For example, if we have 3 records at rank 4, then the
next rank would be 5.
A deadlock is an unwanted situation where two or more transactions are waiting indefinitely for one another to release the locks.
Comments are explanations in SQL or T-SQL code that are only
visible and readable by programmers. They provide details about
the SQL statements to make them easier to understand.
Single line comments can be written by starting at two consecutive hyphens (--)
.
Multiline comments must start with /*
and ends with */
.