Whether you're a beginner or looking to polish your skills, this blog post breaks down essential SQL concepts with clear explanations and practical examples of potential SQL interview questions. Prepare to your next SQL interview!
So you’re gearing up for a SQL interview? You’ve come to the right place! Whether you’re a fresh graduate or a professional, this blog will walk you through the essential SQL concepts you need to know to impress your interviewer. We’ll tackle common questions, look into detailed explanations, and each topic with clear, practical examples. Let’s get started!
SQL stands for Structured Query Language. It’s the standard language for interacting with relational databases (databases where data is stored in tables with relationships between them). Think of it as the language you use to talk to the database and tell it what information you need.
Example:
SELECT name, age FROM employees WHERE department = 'Sales';
This SQL statement tells the database to find the names and age of all employees in the “Sales” department from the “Employees” table.
Just like spoken languages, SQL has variations called dialects. These dialects are similar in core functionality but differ in some features and syntax. Popular dialects include:
Knowing the specific dialect used by the company you’re interviewing for is beneficial.
A DBMS is a software application that you use to interact with a database. It allows you to:
Popular DBMS examples include MySQL, PostgreSQL, and Oracle.
Example:
CustomerID | Name | Address |
---|---|---|
1 | John Doe | 123 Main St |
2 | Jane Smith | 456 Oak Ave |
SQL commands are categorized into sublanguages based on their purpose:
DDL (Data Definition Language): Used for defining database structures. Key commands:
DML (Data Manipulation Language): Used for manipulating data within tables. Key commands:
DCL (Data Control Language): Manages user permissions and access control. Key commands:
TCL (Transaction Control Language): Manages database transactions (a group of SQL statements that should be executed together). Key commands:
Understanding these sublanguages and their key commands is fundamental for working with SQL.
Joins are used to combine data from multiple tables based on a shared field. JOINs are fundamental for querying relational databases effectively. They allow you to retrieve related data from multiple tables as if it were in a single table.
There are different types of joins:
Example Scenario: Imagine you have a database with two tables:
Example:
SELECT Customers.Name, Orders.OrderID, Orders.OrderDate
FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;
The most common type. It retrieves rows only when there is a match in both tables based on the specified join condition.
Example:
SELECT c.Name, o.OrderID, o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID;
Retrieves all rows from the left table (the first table in the join) and the matching rows from the right table. If there’s no match, it fills in NULLs for the right table’s columns.
Example:
SELECT c.Name, o.OrderID
FROM Customers c
LEFT JOIN Orders o ON c.CustomerID = o.CustomerID;
Retrieves all rows from the right table and the matching rows from the left table. If there’s no match, it fills in NULLs for the left table’s columns.
Example:
SELECT c.Name, o.OrderID
FROM Customers c
RIGHT JOIN Orders o ON c.CustomerID = o.CustomerID;
Retrieves all rows from both tables. If there’s no match in one table, NULLs are used.
Example:
SELECT c.Name, o.OrderID
FROM Customers c
FULL JOIN Orders o ON c.CustomerID = o.CustomerID;
Sometimes, you need to compare or combine data within the same table. A self-join is used to join a table to itself, usually using aliases to distinguish the “two copies” of the table.
Example:
SELECT e.Name AS EmployeeName, m.Name AS ManagerName
FROM Employees e
INNER JOIN Employees m ON e.ManagerID = m.EmployeeID;
JOINs are often based on the relationships between primary keys (unique identifiers in a table) and foreign keys (columns in one table that reference a primary key in another table). Foreign keys establish connections between tables, making JOINs more meaningful and efficient.
A subquery is simply a query within another query. Subqueries are used to retrieve data that will be used in the main query. Think of it like asking a question within a question.
Types of Subqueries: Where and How They’re Used
Example:
Employees:
EmployeeID | Name | DepartmentID | Salary |
---|---|---|---|
1 | John | 1 | 50000 |
2 | Jane | 1 | 55000 |
3 | Bob | 2 | 60000 |
4 | Alice | 2 | 62000 |
Departments:
DepartmentID | Name |
---|---|
1 | HR |
2 | IT |
Correlated Subquery: Depends on the outer query and cannot be executed independently. It’s like asking “What is the average salary of employees in this department?” where “this” refers to a department specified in the outer query.
SELECT Name, Salary
FROM Employees
WHERE Salary > (SELECT AVG(Salary) FROM Employees);
Result:
Name | Salary |
---|---|
Bob | 60000 |
Alice | 62000 |
This finds employees whose salaries are higher than the average salary of all employees. The subquery (SELECT AVG(Salary) FROM Employees)
calculates the average salary first, and then the main query uses that result for comparison.
Return a single value. Used in WHERE and HAVING clauses, as well as within other expressions.
Example:
SELECT productName
FROM Products
WHERE price = (SELECT MAX(price) FROM Products);
Return a single row. Used in WHERE and HAVING clauses when comparing to a single row.
Example:
SELECT *
FROM Employees
WHERE salary = (SELECT salary FROM Employees WHERE EmployeeID = 101);
Return multiple rows and columns. Often used in FROM clauses as if they were tables.
Example:
SELECT *
FROM (SELECT CustomerID, SUM(OrderAmount) AS TotalSpent FROM Orders GROUP BY CustomerID) AS HighSpenders
WHERE TotalSpent > 500;
Non-Correlated Subquery: Can be executed independently of the outer query. It’s like asking “What is the highest salary in the company?” and then using that result in the main query.
SELECT e1.Name, e1.Salary, e1.DepartmentID
FROM Employees e1
WHERE e1.Salary = (
SELECT MAX(e2.Salary)
FROM Employees e2
WHERE e2.DepartmentID = e1.DepartmentID
);
Result:
Name | Salary | DepartmentID |
---|---|---|
Jane | 55000 | 1 |
Alice | 62000 | 2 |
Aggregate functions perform calculations on a set of values and return a single value as a result. They are essential for summarizing data and getting insights.
Common Aggregate Functions:
COUNT()
: Counts the number of rows.SUM()
: Calculates the sum of values in a column.AVG()
: Calculates the average of values in a column.MAX()
: Finds the highest value in a column.MIN()
: Finds the lowest value in a column.The GROUP BY clause is used with aggregate functions to group rows with the same values in one or more columns.
Example:
SELECT department, AVG(salary) AS average_salary
FROM employees
GROUP BY department;
GROUP BY
clause groups employees by their departments, and the AVG(Salary)
function calculates the average salary for each group.
Both WHERE
and HAVING
are used to filter data, but they work at different stages of a query:
WHERE
: Filters rows before grouping or aggregation. Think of it like selecting specific books from your library before organizing them by genre.HAVING
: Filters groups after grouping or aggregation. It’s like selecting specific genres after you’ve already organized the books.Example:
SELECT Department, AVG(Salary) AS AverageSalary
FROM Employees
WHERE HireDate > '2022-01-01'
GROUP BY Department
HAVING AVG(Salary) > 70000;
WHERE
clause filters employees based on their hire date, and the HAVING
clause filters departments based on their average salary.
The DISTINCT
keyword is used to eliminate duplicate rows from the result set of a SELECT
query. It’s like selecting only one copy of each book from your library, even if you have multiple copies.
Example:
SELECT DISTINCT Department
FROM Employees;
Stored procedures are like pre-written scripts that you can store in the database and execute whenever you need them. They can accept input parameters, perform complex operations, and return results.
Advantages:
Example:
CREATE PROCEDURE GetEmployeesByDepartment
@DepartmentName VARCHAR(50)
AS
BEGIN
SELECT EmployeeName, Salary
FROM Employees
WHERE Department = @DepartmentName;
END;
Window functions perform calculations across a set of rows related to the current row, without grouping the rows into a single output like aggregate functions. Think of it as looking at a “window” of data around each row to perform calculations.
Common Window Functions:
ROW_NUMBER()
: Assigns a unique sequential number to each row within a partition.RANK()
: Assigns a rank to each row within a partition, allowing for ties (same rank for equal values).DENSE_RANK()
: Assigns a rank to each row without gaps, even if there are ties.LAG()
: Accesses data from a previous row.LEAD()
: Accesses data from a subsequent row.How would you rank rows in a SQL query? Example:
SELECT EmployeeName, Salary,
RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS RankInDepartment
FROM Employees;
RANK()
window function. The PARTITION BY
clause divides the data into partitions (departments), and the ORDER BY
clause determines the ranking order.
CTEs are named temporary result sets that you can define within a single query. Think of them like creating a temporary table that exists only for the duration of the query. They help break down complex queries into smaller, more manageable parts, improving readability and organization.
Example:
WITH TopSalaries AS (
SELECT EmployeeName, Salary
FROM Employees
ORDER BY Salary DESC
LIMIT 10
)
SELECT *
FROM TopSalaries
WHERE Salary = 60000;
Query optimization is crucial for ensuring that your SQL queries run efficiently, especially when dealing with large datasets.
Optimization Techniques:
SELECT *
: Select only the necessary columns instead of retrieving all columns.WHERE
Clause Wisely: Filter data early in the query to reduce the amount of data processed.Example:
-- Inefficient query
SELECT *
FROM Employees
WHERE DepartmentID = (SELECT DepartmentID FROM Departments WHERE DepartmentName = 'Sales');
-- Optimized query with join
SELECT e.EmployeeName, e.Salary
FROM Employees e
INNER JOIN Departments d ON e.DepartmentID = d.DepartmentID
WHERE d.DepartmentName = 'Sales';
Both UNION
and UNION ALL
combine the results of two or more SELECT
statements, but:
UNION
: Removes duplicate rows from the final result set.UNION ALL
: Includes all rows, including duplicates.Example:
SELECT EmployeeName FROM Employees
UNION
SELECT CustomerName FROM Customers;
What is the purpose of a UNION operator? How is it different from a JOIN?
Difference from JOIN:
The CASE statement in SQL is used for conditional logic within a query. It allows you to perform different actions or return different values based on certain conditions.
Example:
You want to categorize customers based on their total spending:
SELECT
CustomerID,
SUM(TotalAmount) AS TotalSpent,
CASE
WHEN SUM(TotalAmount) > 1000 THEN 'High Value'
WHEN SUM(TotalAmount) BETWEEN 500 AND 1000 THEN 'Medium Value'
ELSE 'Low Value'
END AS CustomerCategory
FROM Orders
GROUP BY CustomerID;
Explanation: The CASE statement checks the TotalSpent and assigns a CustomerCategory based on the specified conditions.
Triggers are special stored procedures that automatically execute when a specific event occurs in the database. They are commonly used for:
Example:
CREATE TRIGGER UpdateSalaryHistory
ON Employees
AFTER UPDATE
AS
BEGIN
IF UPDATE(Salary)
BEGIN
INSERT INTO SalaryHistory (EmployeeID, OldSalary, NewSalary, UpdateDate)
SELECT inserted.EmployeeID, deleted.Salary, inserted.Salary, GETDATE()
FROM inserted
INNER JOIN deleted ON inserted.EmployeeID = deleted.EmployeeID;
END;
END;
These operators are particularly useful in correlated subqueries to check if a related record exists in another table.
IN Operator: Comparing to Multiple Values
The IN operator checks if a value exists within a set of values returned by a subquery.
Example:
SELECT * FROM Employees
WHERE DepartmentID IN (SELECT DepartmentID FROM Departments WHERE Location = 'New York');
Feature | Data Warehouse | OLTP Database |
---|---|---|
Purpose | Analytical reporting | Transaction processing |
Data | Historical, aggregated | Current, transactional |
Queries | Complex, analytical | Simple, transactional |
Schema | Denormalized, star schema | Normalized, relational |
Performance | Optimized for read operations | Optimized for write operations |
A view is a virtual table that doesn’t physically store data but presents a customized perspective of data from one or more base tables.
CREATE VIEW EmployeeDepartmentView AS
SELECT EmployeeName, DepartmentID
FROM Employees;
You can then query the view:
SELECT * FROM EmployeeDepartmentView;
A materialized view is a pre-computed view that stores the results of a query as a physical table.
Database partitioning is a technique used to divide large tables or indexes into smaller, more manageable pieces called partitions.
Example:
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
OrderDate DATE,
-- ... other columns
)
PARTITION BY RANGE (YEAR(OrderDate)) (
PARTITION p0 VALUES LESS THAN (2022),
PARTITION p1 VALUES LESS THAN (2023),
PARTITION p2 VALUES LESS THAN (2024),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
Example:
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
Region VARCHAR(50),
-- ... other columns
)
PARTITION BY LIST (Region) (
PARTITION NorthAmerica VALUES ('USA', 'Canada', 'Mexico'),
PARTITION Europe VALUES ('UK', 'France', 'Germany'),
PARTITION Asia VALUES ('China', 'India', 'Japan')
);
Example:
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
-- ... other columns
)
PARTITION BY HASH (OrderID)
PARTITIONS 4;
Transactions are fundamental to maintaining data integrity. The ACID properties guarantee reliable transaction processing.
ACID Properties in Detail:
Stored procedures offer many advantages, but it’s essential to use them effectively. Here are some best practices for writing maintainable and efficient stored procedures.
Stored Procedure Best Practices:
Example:
CREATE PROCEDURE UpdateCustomerInfo
@CustomerID INT,
@NewEmail VARCHAR(100) = NULL, -- Optional parameter
@NewPhone VARCHAR(20) = NULL -- Optional parameter
AS
BEGIN
BEGIN TRY
UPDATE Customers
SET Email = ISNULL(@NewEmail, Email), -- Update only if provided
Phone = ISNULL(@NewPhone, Phone) -- Update only if provided
WHERE CustomerID = @CustomerID;
END TRY
BEGIN CATCH
-- Handle errors and log information
END CATCH;
END;
Indexing is crucial for performance, but choosing the right type of index and optimizing its use can be challenging.
Advanced Indexing Strategies:
Database security is paramount. Here are some key practices for safeguarding your SQL databases.
SQL Security Best Practices:
Normalization is a process of organizing data in a database to:
Normalization involves creating multiple tables with relationships between them, ensuring that each piece of information is stored only once.
Here are some SQL interview questions for practice:
Write a SQL query to find the employees who have a salary greater than the average salary of their department.
SELECT e.EmployeeID, e.EmployeeName, e.Salary
FROM Employees e
INNER JOIN (
SELECT DepartmentID, AVG(Salary) AS AvgSalary
FROM Employees
GROUP BY DepartmentID
) AS DeptAvg ON e.DepartmentID = DeptAvg.DepartmentID
WHERE e.Salary > DeptAvg.AvgSalary;
Explanation:
Given a table “Orders” with columns (OrderID, CustomerID, OrderDate, TotalAmount), write a query to find the top 3 customers who have spent the most in the last quarter.
SELECT CustomerID, SUM(TotalAmount) AS TotalSpent
FROM Orders
WHERE OrderDate >= DATE_SUB(CURDATE(), INTERVAL 3 MONTH)
GROUP BY CustomerID
ORDER BY TotalSpent DESC
LIMIT 3;
Explanation:
You have a table “Products” with columns (ProductID, ProductName, CategoryID) and a table “Categories” with columns (CategoryID, CategoryName). Write a query to list all products along with their category names, even if a product doesn’t have a category assigned.
SELECT p.ProductName, c.CategoryName
FROM Products p
LEFT JOIN Categories c ON p.CategoryID = c.CategoryID;
Explanation:
Write a SQL query to find the second highest salary in the “Employees” table.
SELECT MAX(Salary) AS SecondHighestSalary
FROM Employees
WHERE Salary < (SELECT MAX(Salary) FROM Employees);
Explanation:
Write a query to delete all records from the ‘Temporary’ table without dropping the table itself.
TRUNCATE TABLE Temporary;
Explanation:
You have a table called ‘Events’ with columns (EventID, EventDate, EventType). Write a query to display the number of events of each type that occurred in the last week.
SELECT EventType, COUNT(*) AS EventCount
FROM Events
WHERE EventDate >= DATE_SUB(CURDATE(), INTERVAL 1 WEEK)
GROUP BY EventType;
Explanation:
Write a SQL query to find all customers who have placed orders for at least two different products.
SELECT CustomerID
FROM Orders
GROUP BY CustomerID
HAVING COUNT(DISTINCT ProductID) >= 2;
Explanation:
You have a table “Employees” with a column “HireDate.” Write a query to calculate the number of years each employee has been working for the company.
SELECT
EmployeeID,
EmployeeName,
DATEDIFF(YEAR, HireDate, GETDATE()) AS YearsOfService
FROM Employees;
Explanation:
Given a table ‘Transactions’ with columns (TransactionID, CustomerID, Amount, TransactionDate), write a query to find the average transaction amount for each customer on weekends.
SELECT CustomerID, AVG(Amount) AS AverageWeekendTransaction
FROM Transactions
WHERE DATEPART(WEEKDAY, TransactionDate) IN (1, 7)
GROUP BY CustomerID;
Explanation:
Write a query to find the employees who were hired in the same month and year as their manager.
SELECT e.EmployeeID, e.EmployeeName
FROM Employees e
INNER JOIN Employees m ON e.ManagerID = m.EmployeeID
WHERE MONTH(e.HireDate) = MONTH(m.HireDate)
AND YEAR(e.HireDate) = YEAR(m.HireDate);
Explanation:
You have a table “Students” with columns (StudentID, FirstName, LastName). Write a query to display the full names of all students in reverse order (LastName, FirstName).
SELECT LastName + ', ' + FirstName AS FullName
FROM Students;
Explanation:
Write a query to update the field “status” in table “applications” from 0 to 1 where the app_id is greater than 1000.
UPDATE applications
SET status = 1
WHERE app_id > 1000;
Explanation:
You have a table ‘Logs’ with columns (LogID, UserID, Action, Timestamp). Write a query to find the most recent action performed by each user.
SELECT UserID, Action, Timestamp
FROM (
SELECT UserID, Action, Timestamp,
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY Timestamp DESC) AS RowNum
FROM Logs
) AS RankedLogs
WHERE RowNum = 1;
Explanation:
Write a query to find the products that have never been ordered.
Tables:
SELECT p.ProductName
FROM Products p
LEFT JOIN Orders o ON p.ProductID = o.ProductID
WHERE o.OrderID IS NULL;
Explanation:
Write a query to find the customers who have placed orders in every month of the current year.
Table: Orders: (OrderID, CustomerID, OrderDate, …)
SELECT CustomerID
FROM Orders
WHERE YEAR(OrderDate) = YEAR(GETDATE())
GROUP BY CustomerID
HAVING COUNT(DISTINCT MONTH(OrderDate)) = MONTH(GETDATE());
Explanation:
You have a table “Employees” with columns (EmployeeID, EmployeeName, ManagerID). Write a query to find the employees who are not managers (i.e., their EmployeeID doesn’t appear in the ManagerID column).
-- Using LEFT JOIN
SELECT e.EmployeeID, e.EmployeeName
FROM Employees e
LEFT JOIN Employees m ON e.EmployeeID = m.ManagerID
WHERE m.EmployeeID IS NULL;
-- Using NOT EXISTS
SELECT EmployeeID, EmployeeName
FROM Employees e
WHERE NOT EXISTS (
SELECT 1
FROM Employees m
WHERE m.ManagerID = e.EmployeeID
);
Explanation:
Write a query to calculate the running total of order amounts for each customer, ordered by order date.
Table: Orders: (OrderID, CustomerID, OrderDate, TotalAmount)
SELECT
OrderID,
CustomerID,
OrderDate,
TotalAmount,
SUM(TotalAmount) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS RunningTotal
FROM Orders;
Explanation:
You have a table “Calls” with columns (CallID, StartTime, EndTime). Write a query to find the overlapping calls, where the start time of one call falls within the duration of another call.
SELECT c1.*
FROM Calls c1
INNER JOIN Calls c2 ON c1.CallID <> c2.CallID
WHERE c1.StartTime BETWEEN c2.StartTime AND c2.EndTime;
Explanation:
Write a query to find the average number of orders placed by customers each day of the week.
Table: Orders: (OrderID, CustomerID, OrderDate, …)
SELECT
DAYNAME(OrderDate) AS DayOfWeek,
AVG(OrderCount) AS AverageOrders
FROM (
SELECT OrderDate, COUNT(*) AS OrderCount
FROM Orders
GROUP BY OrderDate
) AS DailyOrders
GROUP BY DayOfWeek
ORDER BY DayOfWeek;
Explanation:
Write a query to find the customers who have placed the highest number of orders in each product category.
Tables:
WITH CustomerOrderCounts AS (
SELECT c.CustomerID, cat.CategoryName, COUNT(*) AS OrderCount,
RANK() OVER (PARTITION BY cat.CategoryName ORDER BY COUNT(*) DESC) AS RankInCategory
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
JOIN Products p ON o.ProductID = p.ProductID
JOIN Categories cat ON p.CategoryID = cat.CategoryID
GROUP BY c.CustomerID, cat.CategoryName
)
SELECT CustomerID, CategoryName, OrderCount
FROM CustomerOrderCounts
WHERE RankInCategory = 1;
Explanation:
Write a query to find the missing values in a sequence of numbers (e.g., order IDs, invoice numbers).
Table: Orders: (OrderID, …)
-- Assuming OrderID is an integer sequence
WITH ExpectedOrderIDs AS (
SELECT generate_series(MIN(OrderID), MAX(OrderID)) AS ExpectedID
FROM Orders
)
SELECT ExpectedID
FROM ExpectedOrderIDs
EXCEPT
SELECT OrderID
FROM Orders;
Explanation:
You have a table “Employees” with a “Salary” column. Write a query to distribute the employees into three salary groups (Low, Medium,, High) based on equal-sized salary ranges.
This question requires dividing data into quantiles.
SELECT EmployeeID, EmployeeName, Salary,
CASE
WHEN Salary <= (SELECT PERCENTILE_CONT(0.33) WITHIN GROUP (ORDER BY Salary) FROM Employees) THEN 'Low'
WHEN Salary <= (SELECT PERCENTILE_CONT(0.66) WITHIN GROUP (ORDER BY Salary) FROM Employees) THEN 'Medium'
ELSE 'High'
END AS SalaryGroup
FROM Employees;
Explanation:
This question combines time manipulation, grouping, and aggregation.
Table: Calls: (CallID, StartTime, …)
SELECT
EXTRACT(HOUR FROM StartTime) AS CallHour,
COUNT(*) AS CallCount
FROM Calls
GROUP BY CallHour
ORDER BY CallCount DESC
LIMIT 1;
Explanation:
This question requires self-joining the table and comparing manager IDs.
Table: Employees: (EmployeeID, EmployeeName, ManagerID)
SELECT e1.EmployeeName, e2.EmployeeName
FROM Employees e1
INNER JOIN Employees e2 ON e1.ManagerID = e2.ManagerID AND e1.EmployeeID < e2.EmployeeID;
Explanation:
The key to acing SQL interview questions is consistent practice. Work through various SQL problems, experiment with different query approaches, and analyze the results.
Online SQL Practice Platforms:
Sample Databases:
Real-World Datasets: