SQL Interview Quick Guide: From Basics to Interview Ready

Whether you're a beginner or looking to polish your skills, this blog post breaks down essential SQL concepts with clear explanations and practical examples of potential SQL interview questions. Prepare to your next SQL interview!

SQL Interview Quick Guide: From Basics to Interview Ready

SQL Interview Preparation: Ace Your Next Interview

So you’re gearing up for a SQL interview? You’ve come to the right place! Whether you’re a fresh graduate or a professional, this blog will walk you through the essential SQL concepts you need to know to impress your interviewer. We’ll tackle common questions, look into detailed explanations, and each topic with clear, practical examples. Let’s get started!

 

1. What is SQL?

SQL stands for Structured Query Language. It’s the standard language for interacting with relational databases (databases where data is stored in tables with relationships between them). Think of it as the language you use to talk to the database and tell it what information you need.

  • Create and manage databases: Imagine building a library to store all your data.
  • Add new data (INSERT): Like adding new books to your library.
  • Update existing data (UPDATE): Correcting outdated information or adding new details.
  • Delete data (DELETE): Removing books you no longer need.
  • Retrieve specific data (SELECT): Finding the exact information you’re looking for.

Example:

SQL
SELECT name, age FROM employees WHERE department = 'Sales';

This SQL statement tells the database to find the names and age of all employees in the “Sales” department from the “Employees” table.

 

2. What are SQL Dialects?

Just like spoken languages, SQL has variations called dialects. These dialects are similar in core functionality but differ in some features and syntax. Popular dialects include:

  • MySQL: Open source, known for its speed and ease of use.
  • PostgreSQL: Open source, robust and feature-rich.
  • Microsoft SQL Server: Microsoft’s proprietary database system.
  • Oracle: Powerful enterprise-level database.

Knowing the specific dialect used by the company you’re interviewing for is beneficial.

 

3. What is a Database Management System (DBMS)?

A DBMS is a software application that you use to interact with a database. It allows you to:

  • Create and manage databases.
  • Define data structures (tables).
  • Insert, update, delete, and retrieve data.

Popular DBMS examples include MySQL, PostgreSQL, and Oracle.

 

4. What are Tables and Fields in SQL?

  • Table: An organized collection of data stored in rows and columns. Think of it like a spreadsheet.
  • Field (Column): Represents a specific type of information in a table. For instance, a “Customer” table might have fields like “CustomerID,” “Name,” “Address,” etc.
  • Record (Row): A single entry in a table. For example, a single customer’s information would be one record in the “Customer” table.

Example:

CustomerID Name Address
1 John Doe 123 Main St
2 Jane Smith 456 Oak Ave

 

5. What are the different subsets/Sublanguages of SQL?

SQL commands are categorized into sublanguages based on their purpose:

  • DDL (Data Definition Language): Used for defining database structures. Key commands:

    • CREATE TABLE: Creates a new table.
    • ALTER TABLE: Modifies an existing table.
    • DROP TABLE: Deletes a table.
    • TRUNCATE: Used to delete all records from a table while preserving its structure
    • RENAME: Used to rename a database object
  • DML (Data Manipulation Language): Used for manipulating data within tables. Key commands:

    • INSERT: Adds new data.
    • UPDATE: Modifies existing data.
    • DELETE: Removes data.
    • SELECT: Retrieves data (we’ll explore this in detail later).
  • DCL (Data Control Language): Manages user permissions and access control. Key commands:

    • GRANT: Gives permissions to users.
    • REVOKE: Takes away permissions.
  • TCL (Transaction Control Language): Manages database transactions (a group of SQL statements that should be executed together). Key commands:

    • COMMIT: Saves changes permanently.
    • ROLLBACK: Undoes changes made during a transaction.

Understanding these sublanguages and their key commands is fundamental for working with SQL.

 

6. What are Joins in SQL?

Joins are used to combine data from multiple tables based on a shared field. JOINs are fundamental for querying relational databases effectively. They allow you to retrieve related data from multiple tables as if it were in a single table.

There are different types of joins:

  • Inner Join: Returns rows where the shared field matches in both tables.
  • Left Join: Returns all rows from the left table and matching rows from the right table.
  • Right Join: Returns all rows from the right table and matching rows from the left table.
  • Full Join: Returns all rows from both tables, even if there’s no match in the shared field.

Example Scenario: Imagine you have a database with two tables:

  • Customers: Contains customer information (CustomerID, Name, Email, etc.)
  • Orders: Contains order information (OrderID, CustomerID, OrderDate, etc.)

Example:

SQL
SELECT Customers.Name, Orders.OrderID, Orders.OrderDate
FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

 

Choosing the Right Join

INNER JOIN

The most common type. It retrieves rows only when there is a match in both tables based on the specified join condition.

Example:

SQL
SELECT c.Name, o.OrderID, o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID;

 

LEFT JOIN (or LEFT OUTER JOIN)

Retrieves all rows from the left table (the first table in the join) and the matching rows from the right table. If there’s no match, it fills in NULLs for the right table’s columns.

Example:

SQL
SELECT c.Name, o.OrderID
FROM Customers c
LEFT JOIN Orders o ON c.CustomerID = o.CustomerID;
 

RIGHT JOIN (or RIGHT OUTER JOIN)

Retrieves all rows from the right table and the matching rows from the left table. If there’s no match, it fills in NULLs for the left table’s columns.

Example:

SQL
SELECT c.Name, o.OrderID
FROM Customers c
RIGHT JOIN Orders o ON c.CustomerID = o.CustomerID;
 

FULL JOIN (or FULL OUTER JOIN)

Retrieves all rows from both tables. If there’s no match in one table, NULLs are used.

Example:

SQL
SELECT c.Name, o.OrderID
FROM Customers c
FULL JOIN Orders o ON c.CustomerID = o.CustomerID;

 

Self-JOIN: Joining a Table to Itself

Sometimes, you need to compare or combine data within the same table. A self-join is used to join a table to itself, usually using aliases to distinguish the “two copies” of the table.

Example:

SQL
SELECT e.Name AS EmployeeName, m.Name AS ManagerName
FROM Employees e
INNER JOIN Employees m ON e.ManagerID = m.EmployeeID;

 

Extra Tip:

Understanding Primary and Foreign Keys in JOINs

JOINs are often based on the relationships between primary keys (unique identifiers in a table) and foreign keys (columns in one table that reference a primary key in another table). Foreign keys establish connections between tables, making JOINs more meaningful and efficient.

 

7. What is a Subquery?

A subquery is simply a query within another query. Subqueries are used to retrieve data that will be used in the main query. Think of it like asking a question within a question.

Types of Subqueries: Where and How They’re Used

  • Non-Correlated: Can run independently of the outer query.
  • Correlated: Depends on the outer query to run.

Example:

Employees:

EmployeeID Name DepartmentID Salary
1 John 1 50000
2 Jane 1 55000
3 Bob 2 60000
4 Alice 2 62000

Departments:

DepartmentID Name
1 HR
2 IT

Correlated Subquery: Depends on the outer query and cannot be executed independently. It’s like asking “What is the average salary of employees in this department?” where “this” refers to a department specified in the outer query.

SQL
SELECT Name, Salary
FROM Employees
WHERE Salary > (SELECT AVG(Salary) FROM Employees);

Result:

Name Salary
Bob 60000
Alice 62000

This finds employees whose salaries are higher than the average salary of all employees. The subquery (SELECT AVG(Salary) FROM Employees) calculates the average salary first, and then the main query uses that result for comparison.

  • Scalar Subqueries

Return a single value. Used in WHERE and HAVING clauses, as well as within other expressions.

Example:

SQL
SELECT productName 
FROM Products
WHERE price = (SELECT MAX(price) FROM Products);

  • Row Subqueries

Return a single row. Used in WHERE and HAVING clauses when comparing to a single row.

Example:

SQL
SELECT * 
FROM Employees 
WHERE salary = (SELECT salary FROM Employees WHERE EmployeeID = 101);

  • Table Subqueries

Return multiple rows and columns. Often used in FROM clauses as if they were tables.

Example:

SQL
SELECT * 
FROM (SELECT CustomerID, SUM(OrderAmount) AS TotalSpent FROM Orders GROUP BY CustomerID) AS HighSpenders
WHERE TotalSpent > 500;

Non-Correlated Subquery: Can be executed independently of the outer query. It’s like asking “What is the highest salary in the company?” and then using that result in the main query.

SQL
SELECT e1.Name, e1.Salary, e1.DepartmentID
FROM Employees e1
WHERE e1.Salary = (
    SELECT MAX(e2.Salary)
    FROM Employees e2
    WHERE e2.DepartmentID = e1.DepartmentID
);

Result:

Name Salary DepartmentID
Jane 55000 1
Alice 62000 2

 

8. What are Aggregate Functions?

Aggregate functions perform calculations on a set of values and return a single value as a result. They are essential for summarizing data and getting insights.

Common Aggregate Functions:

  • COUNT(): Counts the number of rows.
  • SUM(): Calculates the sum of values in a column.
  • AVG(): Calculates the average of values in a column.
  • MAX(): Finds the highest value in a column.
  • MIN(): Finds the lowest value in a column.

GROUP BY Clause: Grouping Data for Aggregation

The GROUP BY clause is used with aggregate functions to group rows with the same values in one or more columns.

Example:

SQL
SELECT department, AVG(salary) AS average_salary 
FROM employees
GROUP BY department;
This calculates the average salary for each department. The GROUP BY clause groups employees by their departments, and the AVG(Salary) function calculates the average salary for each group.

 

9. What is the Difference between WHERE and HAVING?

Both WHERE and HAVING are used to filter data, but they work at different stages of a query:

  • WHERE: Filters rows before grouping or aggregation. Think of it like selecting specific books from your library before organizing them by genre.
  • HAVING: Filters groups after grouping or aggregation. It’s like selecting specific genres after you’ve already organized the books.

Example:

SQL
SELECT Department, AVG(Salary) AS AverageSalary
FROM Employees
WHERE HireDate > '2022-01-01'
GROUP BY Department
HAVING AVG(Salary) > 70000;
This finds departments where the average salary of employees hired after January 1, 2022, is greater than $70,000. The WHERE clause filters employees based on their hire date, and the HAVING clause filters departments based on their average salary.

 

10. What is the Purpose of DISTINCT?

The DISTINCT keyword is used to eliminate duplicate rows from the result set of a SELECT query. It’s like selecting only one copy of each book from your library, even if you have multiple copies.

Example:

SQL
SELECT DISTINCT Department
FROM Employees;
This returns a list of all the unique departments in the “Employees” table, without repeating any department names.

 

11. What are Stored Procedures?

Stored procedures are like pre-written scripts that you can store in the database and execute whenever you need them. They can accept input parameters, perform complex operations, and return results.

Advantages:

  • Reusability: You can use the same stored procedure multiple times without rewriting the code.
  • Performance: Stored procedures are pre-compiled and optimized, so they execute faster.
  • Security: You can control access to data through stored procedures, enhancing database security.

Example:

SQL
CREATE PROCEDURE GetEmployeesByDepartment 
    @DepartmentName VARCHAR(50)
AS
BEGIN
    SELECT EmployeeName, Salary
    FROM Employees
    WHERE Department = @DepartmentName;
END;
This creates a stored procedure called “GetEmployeesByDepartment” that takes a department name as input and returns the names and salaries of employees in that department.

 

12. What are Window Functions?

Window functions perform calculations across a set of rows related to the current row, without grouping the rows into a single output like aggregate functions. Think of it as looking at a “window” of data around each row to perform calculations.

Common Window Functions:

  • ROW_NUMBER(): Assigns a unique sequential number to each row within a partition.
  • RANK(): Assigns a rank to each row within a partition, allowing for ties (same rank for equal values).
  • DENSE_RANK(): Assigns a rank to each row without gaps, even if there are ties.
  • LAG(): Accesses data from a previous row.
  • LEAD(): Accesses data from a subsequent row.

How would you rank rows in a SQL query? Example:

SQL
SELECT EmployeeName, Salary,
       RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS RankInDepartment
FROM Employees;
This query ranks employees within each department based on their salary, using the RANK() window function. The PARTITION BY clause divides the data into partitions (departments), and the ORDER BY clause determines the ranking order.

 

13. What is the Purpose of Common Table Expressions (CTEs)?

CTEs are named temporary result sets that you can define within a single query. Think of them like creating a temporary table that exists only for the duration of the query. They help break down complex queries into smaller, more manageable parts, improving readability and organization.

Example:

SQL
WITH TopSalaries AS (
    SELECT EmployeeName, Salary
    FROM Employees
    ORDER BY Salary DESC
    LIMIT 10
)
SELECT * 
FROM TopSalaries
WHERE Salary = 60000;
This CTE, named “TopSalaries,” retrieves the top 10 highest-paid employees. The main query then selects from this CTE, filtering for employees in the “Marketing” department.

 

14. How Can You Optimize SQL Queries for Better Performance?

Query optimization is crucial for ensuring that your SQL queries run efficiently, especially when dealing with large datasets.

Optimization Techniques:

  • Use Indexes: Indexes speed up data retrieval by creating a lookup table.
  • Avoid SELECT *: Select only the necessary columns instead of retrieving all columns.
  • Use WHERE Clause Wisely: Filter data early in the query to reduce the amount of data processed.
  • Optimize Joins: Choose the appropriate join type and ensure that join conditions are based on indexed columns.
  • Use Stored Procedures: Pre-compiled and optimized stored procedures can execute faster than ad-hoc queries.
  • Analyze Query Plans: Use database tools to analyze the execution plan of your query and identify potential bottlenecks.

Example:

SQL
-- Inefficient query
SELECT * 
FROM Employees
WHERE DepartmentID = (SELECT DepartmentID FROM Departments WHERE DepartmentName = 'Sales'); 

-- Optimized query with join
SELECT e.EmployeeName, e.Salary
FROM Employees e
INNER JOIN Departments d ON e.DepartmentID = d.DepartmentID
WHERE d.DepartmentName = 'Sales';
The optimized query uses a join instead of a subquery, which can often be more efficient.

 

15. What is the Difference Between UNION and UNION ALL?

Both UNION and UNION ALL combine the results of two or more SELECT statements, but:

  • UNION: Removes duplicate rows from the final result set.
  • UNION ALL: Includes all rows, including duplicates.

Example:

SQL
SELECT EmployeeName FROM Employees
UNION
SELECT CustomerName FROM Customers;
This returns a list of unique names from both the “Employees” and “Customers” tables.

What is the purpose of a UNION operator? How is it different from a JOIN?

  • The UNION operator is used to combine the results of two or more SELECT statements into a single result set. It’s important to note that UNION removes duplicate rows from the final output.

Difference from JOIN:

  • JOIN: Combines columns from different tables based on a shared column.
  • UNION: Combines rows from different SELECT statements.

 

16. What is the purpose of the CASE statement in SQL? Provide an example.

The CASE statement in SQL is used for conditional logic within a query. It allows you to perform different actions or return different values based on certain conditions.

Example:

You want to categorize customers based on their total spending:

SQL
SELECT 
    CustomerID,
    SUM(TotalAmount) AS TotalSpent,
    CASE
        WHEN SUM(TotalAmount) > 1000 THEN 'High Value'
        WHEN SUM(TotalAmount) BETWEEN 500 AND 1000 THEN 'Medium Value'
        ELSE 'Low Value'
    END AS CustomerCategory
FROM Orders
GROUP BY CustomerID;

Explanation: The CASE statement checks the TotalSpent and assigns a CustomerCategory based on the specified conditions.

 

17. What are Triggers in SQL?

Triggers are special stored procedures that automatically execute when a specific event occurs in the database. They are commonly used for:

  • Maintaining data integrity: Enforcing business rules or preventing invalid data changes.
  • Auditing: Tracking changes made to data.
  • Performing actions based on data changes: For example, updating a related table when a row is inserted into another table.

Example:

SQL
CREATE TRIGGER UpdateSalaryHistory
ON Employees
AFTER UPDATE
AS
BEGIN
    IF UPDATE(Salary)
    BEGIN
        INSERT INTO SalaryHistory (EmployeeID, OldSalary, NewSalary, UpdateDate)
        SELECT  inserted.EmployeeID, deleted.Salary, inserted.Salary, GETDATE()
        FROM inserted
        INNER JOIN deleted ON inserted.EmployeeID = deleted.EmployeeID;
    END;
END;
This trigger “UpdateSalaryHistory” automatically inserts a record into the “SalaryHistory” table whenever an employee’s salary is updated.

 

18. EXISTS and NOT EXISTS Operators: Checking for Existence

  • EXISTS: Returns true if the subquery returns at least one row.
  • NOT EXISTS: Returns true if the subquery returns no rows.

These operators are particularly useful in correlated subqueries to check if a related record exists in another table.

IN Operator: Comparing to Multiple Values

The IN operator checks if a value exists within a set of values returned by a subquery.

Example:

SQL
SELECT * FROM Employees
WHERE DepartmentID IN (SELECT DepartmentID FROM Departments WHERE Location = 'New York');

 

19. Describe the concept of data warehousing and how it differs from traditional online transaction processing (OLTP) databases?

Data Warehousing:

  • A system designed for analytical and reporting purposes.
  • Stores historical data from various sources.
  • Optimized for complex queries and data aggregations.

Online Transaction Processing (OLTP):

  • Focuses on handling day-to-day business transactions.
  • Optimized for fast data inserts, updates, and retrievals.
  • Emphasizes data integrity and concurrency control.

Key Differences:

Feature Data Warehouse OLTP Database
Purpose Analytical reporting Transaction processing
Data Historical, aggregated Current, transactional
Queries Complex, analytical Simple, transactional
Schema Denormalized, star schema Normalized, relational
Performance Optimized for read operations Optimized for write operations

 

20. What are Views in Databases?

A view is a virtual table that doesn’t physically store data but presents a customized perspective of data from one or more base tables.

Key Points:

  • Virtual Table: A view doesn’t hold data itself; it retrieves data from the underlying tables when queried.
  • Customized Perspective: Views allow you to select specific columns, filter data, join tables, and even use aggregate functions to create a tailored view of the data.
  • Security: Views can be used to restrict access to sensitive data by exposing only specific columns or rows to certain users.

Example:

SQL
CREATE VIEW EmployeeDepartmentView AS
SELECT EmployeeName, DepartmentID
FROM Employees;

You can then query the view:

SQL
SELECT * FROM EmployeeDepartmentView;

Types of Views:

  • Simple View: Based on a single table.
  • Complex View: Involves multiple tables, joins, subqueries, or aggregate functions.
  • Materialized View: Physically stores the view’s data for faster access.

 

21. Explain about Materialized Views?

A materialized view is a pre-computed view that stores the results of a query as a physical table.

Advantages:

  • Improved Query Performance: Data is readily available, eliminating the need to execute the underlying query each time.
  • Reduced Query Complexity: Complex queries can be pre-computed and stored in the materialized view, simplifying subsequent queries.
  • Data Aggregation: Materialized views can be used to pre-aggregate data, making reporting and analysis faster.

Drawbacks:

  • Data Staleness: Materialized views can become outdated if the underlying data changes. Refresh mechanisms are needed to keep them synchronized.
  • Storage Overhead: They require additional storage space to store the pre-computed results.

 

22. Describe the concept of database partitioning and its benefits?

Database partitioning is a technique used to divide large tables or indexes into smaller, more manageable pieces called partitions.

Key Aspects:

  • Data Distribution: Partitions spread data across multiple storage units or servers.
  • Logical Division: Each partition is a subset of the data, typically based on a specific criterion.

Types of Partitioning:

1. Range Partitioning:

  • Divides data based on a range of values in a particular column (often a date or numeric column).
  • Common use cases: Archiving historical data, separating data by date ranges (e.g., monthly sales data).

Example:

SQL
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    OrderDate DATE,
    -- ... other columns
)
PARTITION BY RANGE (YEAR(OrderDate)) (
    PARTITION p0 VALUES LESS THAN (2022),
    PARTITION p1 VALUES LESS THAN (2023),
    PARTITION p2 VALUES LESS THAN (2024),
    PARTITION p3 VALUES LESS THAN MAXVALUE
);

2. List Partitioning:

  • Partitions data based on a list of specific values in a column.
  • Useful for categorizing data based on discrete values (e.g., regions, departments).

Example:

SQL
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    Region VARCHAR(50),
    -- ... other columns
)
PARTITION BY LIST (Region) (
    PARTITION NorthAmerica VALUES ('USA', 'Canada', 'Mexico'),
    PARTITION Europe VALUES ('UK', 'France', 'Germany'),
    PARTITION Asia VALUES ('China', 'India', 'Japan')
);

3. Hash Partitioning:

  • Distributes data based on a hash function applied to a partitioning key.
  • Useful for evenly spreading data across partitions, especially when there’s no natural range or list to partition by.

Example:

SQL
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    -- ... other columns
)
PARTITION BY HASH (OrderID)
PARTITIONS 4;

Benefits of Partitioning:

  • Improved Query Performance: Queries that target specific partitions can run significantly faster.
  • Enhanced Manageability: Easier to perform maintenance tasks (backups, index rebuilds) on smaller partitions.
  • Increased Availability: Partitions can be distributed across multiple storage devices, improving data availability.

 

23. Transactions: Ensuring Data Consistency

Transactions are fundamental to maintaining data integrity. The ACID properties guarantee reliable transaction processing.

ACID Properties in Detail:

  • Atomicity: Ensures that a transaction is treated as a single, indivisible unit of work. Either all changes within the transaction are applied, or none are.
  • Consistency: Guarantees that a transaction brings the database from one valid state to another, preserving data integrity rules.
  • Isolation: Prevents concurrent transactions from interfering with each other. Changes made by one transaction are not visible to other transactions until the transaction is committed.
  • Durability: Ensures that once a transaction is committed, the changes are permanently stored and will survive system failures.

 

24. Stored Procedure Best Practices

Stored procedures offer many advantages, but it’s essential to use them effectively. Here are some best practices for writing maintainable and efficient stored procedures.

Stored Procedure Best Practices:

  • Modular Design: Break down complex logic into smaller, reusable procedures.
  • Parameterization: Use parameters to pass input values, improving security and preventing SQL injection.
  • Error Handling: Implement error handling using TRY…CATCH blocks to gracefully manage exceptions.
  • Naming Conventions: Use clear and consistent naming conventions for procedures and parameters.
  • Documentation: Document the purpose, parameters, and expected behavior of stored procedures.
  • Version Control: Track changes to stored procedures using version control systems.

Example:

SQL
CREATE PROCEDURE UpdateCustomerInfo 
    @CustomerID INT,
    @NewEmail VARCHAR(100) = NULL, -- Optional parameter
    @NewPhone VARCHAR(20) = NULL  -- Optional parameter
AS
BEGIN
    BEGIN TRY
        UPDATE Customers
        SET Email = ISNULL(@NewEmail, Email), -- Update only if provided
            Phone = ISNULL(@NewPhone, Phone)  -- Update only if provided
        WHERE CustomerID = @CustomerID;
    END TRY
    BEGIN CATCH
        -- Handle errors and log information
    END CATCH;
END;
This stored procedure updates customer information, allowing for optional parameters and error handling.

 

25. Indexing Strategies: Beyond the Basics

Indexing is crucial for performance, but choosing the right type of index and optimizing its use can be challenging.

Advanced Indexing Strategies:

  • Composite Indexes: Create indexes on multiple columns to optimize queries that filter on those columns together.
  • Covering Indexes: Include all columns needed for a query in the index to avoid accessing the actual table.
  • Filtered Indexes: Create indexes on a subset of data based on specific criteria.
  • Index Usage Statistics: Monitor index usage to identify unused or inefficient indexes.
  • Index Fragmentation: Manage index fragmentation to maintain performance.

 

26. Security Considerations: Protecting Your Data

Database security is paramount. Here are some key practices for safeguarding your SQL databases.

SQL Security Best Practices:

  • Least Privilege Principle: Grant users only the necessary permissions to perform their tasks.
  • Strong Passwords and Authentication: Enforce strong password policies and use secure authentication mechanisms.
  • Data Encryption: Encrypt sensitive data both in transit and at rest.
  • Regular Auditing: Track database activity and monitor for suspicious behavior.
  • Input Validation: Validate and sanitize user input to prevent SQL injection attacks.

 

27. What is the purpose of normalization in database design?

Normalization is a process of organizing data in a database to:

  • Minimize data redundancy: Reduce duplicated data.
  • Improve data integrity: Ensure data consistency and accuracy.
  • Optimize data storage: Use space more efficiently.

Normalization involves creating multiple tables with relationships between them, ensuring that each piece of information is stored only once.

More about Normalization

 

SQL Interview: Code-Focused Challenges

Here are some SQL interview questions for practice:

1. Find employees with above-average department salary

Write a SQL query to find the employees who have a salary greater than the average salary of their department.

SQL
SELECT e.EmployeeID, e.EmployeeName, e.Salary
FROM Employees e
INNER JOIN (
    SELECT DepartmentID, AVG(Salary) AS AvgSalary
    FROM Employees
    GROUP BY DepartmentID
) AS DeptAvg ON e.DepartmentID = DeptAvg.DepartmentID
WHERE e.Salary > DeptAvg.AvgSalary;

Explanation:

  • Subquery (DeptAvg): Calculates the average salary for each department using AVG(Salary) and groups them by DepartmentID.
  • Join: Joins the Employees table with the DeptAvg subquery results based on DepartmentID.
  • Filter: Selects employees whose salary (e.Salary) is greater than the average salary of their department (DeptAvg.AvgSalary).

 

2. Top 3 customers by spending in the last quarter

Given a table “Orders” with columns (OrderID, CustomerID, OrderDate, TotalAmount), write a query to find the top 3 customers who have spent the most in the last quarter.

SQL
SELECT CustomerID, SUM(TotalAmount) AS TotalSpent
FROM Orders
WHERE OrderDate >= DATE_SUB(CURDATE(), INTERVAL 3 MONTH)
GROUP BY CustomerID
ORDER BY TotalSpent DESC
LIMIT 3;

Explanation:

  • Filter Orders: Selects orders from the last quarter using DATE_SUB to calculate the date three months ago.
  • Group by Customer: Groups orders by CustomerID.
  • Sum Total Amount: Calculates the total amount spent by each customer using SUM(TotalAmount).
  • Order and Limit: Orders the results by TotalSpent in descending order and retrieves the top 3 using LIMIT.

 

3. List products with category names (including products without categories)

You have a table “Products” with columns (ProductID, ProductName, CategoryID) and a table “Categories” with columns (CategoryID, CategoryName). Write a query to list all products along with their category names, even if a product doesn’t have a category assigned.

SQL
SELECT p.ProductName, c.CategoryName
FROM Products p
LEFT JOIN Categories c ON p.CategoryID = c.CategoryID;

Explanation:

  • Left Join: A left join ensures that all products from the “Products” table are included in the result set.
  • Join Condition: Joins based on the shared CategoryID column.
  • Missing Categories: If a product has no matching category, the CategoryName column will be NULL.

 

4. Find the second highest salary

Write a SQL query to find the second highest salary in the “Employees” table.

SQL
SELECT MAX(Salary) AS SecondHighestSalary
FROM Employees
WHERE Salary < (SELECT MAX(Salary) FROM Employees);

Explanation:

  • Subquery: Finds the highest salary in the table.
  • Outer Query: Selects the maximum salary (MAX(Salary)) from the “Employees” table, excluding the highest salary identified by the subquery.

 

5. Delete all records from a table without dropping it

Write a query to delete all records from the ‘Temporary’ table without dropping the table itself.

SQL
TRUNCATE TABLE Temporary;

Explanation:

  • TRUNCATE is a DDL (Data Definition Language) command that quickly removes all rows from a table without logging individual deletions, making it faster than DELETE for clearing an entire table.

 

6. Count events by type in the last week

You have a table called ‘Events’ with columns (EventID, EventDate, EventType). Write a query to display the number of events of each type that occurred in the last week.

SQL
SELECT EventType, COUNT(*) AS EventCount
FROM Events
WHERE EventDate >= DATE_SUB(CURDATE(), INTERVAL 1 WEEK) 
GROUP BY EventType;

Explanation:

  • Date Filtering: Selects events from the last week using DATE_SUB to calculate the date one week ago.
  • Group by EventType: Groups events by EventType.
  • Count Events: Counts the number of events for each type using COUNT(*).

 

7. Find customers who ordered at least two different products

Write a SQL query to find all customers who have placed orders for at least two different products.

SQL
SELECT CustomerID
FROM Orders
GROUP BY CustomerID
HAVING COUNT(DISTINCT ProductID) >= 2;

Explanation:

  • Group by Customer: Groups orders by CustomerID.
  • Count Distinct Products: Counts the number of distinct products ordered by each customer using COUNT(DISTINCT ProductID).
  • Filter: Selects customers who have ordered at least two different products (>= 2).

 

8. Calculate years of service for each employee

You have a table “Employees” with a column “HireDate.” Write a query to calculate the number of years each employee has been working for the company.

SQL
SELECT 
    EmployeeID, 
    EmployeeName,
    DATEDIFF(YEAR, HireDate, GETDATE()) AS YearsOfService
FROM Employees;

Explanation:

  • DATEDIFF(YEAR, HireDate, GETDATE()) calculates the difference in years between the HireDate and the current date (GETDATE()).

 

9. Average weekend transaction amount per customer

Given a table ‘Transactions’ with columns (TransactionID, CustomerID, Amount, TransactionDate), write a query to find the average transaction amount for each customer on weekends.

SQL
SELECT CustomerID, AVG(Amount) AS AverageWeekendTransaction
FROM Transactions
WHERE DATEPART(WEEKDAY, TransactionDate) IN (1, 7)
GROUP BY CustomerID;

Explanation:

  • Filter by Weekends: Selects transactions that occurred on weekends using DATEPART(WEEKDAY, TransactionDate) and checking for Saturday (7) or Sunday (1).
  • Group by Customer: Groups transactions by CustomerID.
  • Calculate Average: Calculates the average transaction amount (AVG(Amount)) for each customer.

 

10. Find employees hired in the same month and year as their manager

Write a query to find the employees who were hired in the same month and year as their manager.

SQL
SELECT e.EmployeeID, e.EmployeeName
FROM Employees e
INNER JOIN Employees m ON e.ManagerID = m.EmployeeID
WHERE MONTH(e.HireDate) = MONTH(m.HireDate) 
  AND YEAR(e.HireDate) = YEAR(m.HireDate);

Explanation:

  • Self-Join: Joins the “Employees” table to itself, using aliases e for the employee and m for the manager.
  • Join Condition: Joins based on the employee’s ManagerID matching the manager’s EmployeeID.
  • Date Comparison: Filters for employees hired in the same month and year as their manager using MONTH() and YEAR() functions.

 

11. Display full names in reverse order

You have a table “Students” with columns (StudentID, FirstName, LastName). Write a query to display the full names of all students in reverse order (LastName, FirstName).

SQL
SELECT LastName + ', ' + FirstName AS FullName
FROM Students;

Explanation:

  • The + operator is used to concatenate the LastName, a comma, and the FirstName to create the full name.

 

12. Update status field for specific app_ids

Write a query to update the field “status” in table “applications” from 0 to 1 where the app_id is greater than 1000.

SQL
UPDATE applications
SET status = 1
WHERE app_id > 1000;

Explanation:

  • UPDATE sets the status field to 1.
  • WHERE specifies the condition for updating rows (only where app_id is greater than 1000).

 

13. Find the most recent action for each user

You have a table ‘Logs’ with columns (LogID, UserID, Action, Timestamp). Write a query to find the most recent action performed by each user.

SQL
SELECT UserID, Action, Timestamp
FROM (
    SELECT UserID, Action, Timestamp,
           ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY Timestamp DESC) AS RowNum
    FROM Logs
) AS RankedLogs
WHERE RowNum = 1;

Explanation:

  • Subquery (RankedLogs):
    • Partitions the data by UserID.
    • Assigns a row number (ROW_NUMBER()) to each action within the partition, ordered by Timestamp in descending order (most recent first).
  • Outer Query:
    • Selects rows where RowNum is 1, effectively retrieving the most recent action for each user.

 

14. Find products that have never been ordered

Write a query to find the products that have never been ordered.

Tables:

  • Products: (ProductID, ProductName, …)
  • Orders: (OrderID, CustomerID, ProductID, …)
SQL
SELECT p.ProductName
FROM Products p
LEFT JOIN Orders o ON p.ProductID = o.ProductID
WHERE o.OrderID IS NULL;

Explanation:

  • Left Join: Ensures that all products are included in the result, even if they haven’t been ordered.
  • NULL Check: The WHERE clause filters for rows where o.OrderID is NULL, indicating that there’s no matching order for that product.

 

15. Find customers who placed orders every month this year

Write a query to find the customers who have placed orders in every month of the current year.

Table: Orders: (OrderID, CustomerID, OrderDate, …)

SQL
SELECT CustomerID
FROM Orders
WHERE YEAR(OrderDate) = YEAR(GETDATE())
GROUP BY CustomerID
HAVING COUNT(DISTINCT MONTH(OrderDate)) = MONTH(GETDATE());

Explanation:

  • Filter for Current Year: Selects orders placed in the current year.
  • Group by Customer: Groups orders by CustomerID.
  • Count Distinct Months: Counts the distinct months in which each customer placed orders.
  • Compare with Current Month: Checks if the count of distinct months equals the current month, indicating orders in all months.

 

16. Find employees who are not managers

You have a table “Employees” with columns (EmployeeID, EmployeeName, ManagerID). Write a query to find the employees who are not managers (i.e., their EmployeeID doesn’t appear in the ManagerID column).

SQL
-- Using LEFT JOIN
SELECT e.EmployeeID, e.EmployeeName
FROM Employees e
LEFT JOIN Employees m ON e.EmployeeID = m.ManagerID
WHERE m.EmployeeID IS NULL;

-- Using NOT EXISTS
SELECT EmployeeID, EmployeeName
FROM Employees e
WHERE NOT EXISTS (
    SELECT 1
    FROM Employees m
    WHERE m.ManagerID = e.EmployeeID
);

Explanation:

  • LEFT JOIN Approach: The left join includes all employees. The WHERE clause filters for employees where m.EmployeeID is NULL, indicating they are not managers.
  • NOT EXISTS Approach: The subquery checks if the employee’s ID exists as a manager’s ID. The NOT EXISTS clause selects employees for whom the subquery returns no rows.

 

17. Calculate running total of order amounts for each customer

Write a query to calculate the running total of order amounts for each customer, ordered by order date.

Table: Orders: (OrderID, CustomerID, OrderDate, TotalAmount)

SQL
SELECT 
    OrderID, 
    CustomerID,
    OrderDate,
    TotalAmount,
    SUM(TotalAmount) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS RunningTotal
FROM Orders;

Explanation:

  • SUM(TotalAmount) OVER (PARTITION BY CustomerID ORDER BY OrderDate) calculates the running total of TotalAmount for each customer, ordered by OrderDate.

 

18. Find overlapping calls

You have a table “Calls” with columns (CallID, StartTime, EndTime). Write a query to find the overlapping calls, where the start time of one call falls within the duration of another call.

SQL
SELECT c1.*
FROM Calls c1
INNER JOIN Calls c2 ON c1.CallID <> c2.CallID
WHERE c1.StartTime BETWEEN c2.StartTime AND c2.EndTime;

Explanation:

  • Self-Join: Joins the “Calls” table to itself, using aliases c1 and c2.
  • Exclude Same Call: The c1.CallID <> c2.CallID condition prevents a call from being compared to itself.
  • Overlap Check: The WHERE clause checks if c1.StartTime falls between the StartTime and EndTime of another call (c2).

 

19. Average number of orders per day of the week

Write a query to find the average number of orders placed by customers each day of the week.

Table: Orders: (OrderID, CustomerID, OrderDate, …)

SQL
SELECT 
    DAYNAME(OrderDate) AS DayOfWeek,
    AVG(OrderCount) AS AverageOrders
FROM (
    SELECT OrderDate, COUNT(*) AS OrderCount
    FROM Orders
    GROUP BY OrderDate
) AS DailyOrders
GROUP BY DayOfWeek
ORDER BY DayOfWeek;

Explanation:

  • Inner Query (DailyOrders):
    • Calculates the number of orders placed on each day (OrderCount).
  • Outer Query:
    • Extracts the day of the week using DAYNAME(OrderDate).
    • Calculates the average number of orders for each day of the week using AVG(OrderCount).
    • Orders the results by DayOfWeek.

 

20. Find customers with highest order count in each category

Write a query to find the customers who have placed the highest number of orders in each product category.

Tables:

  • Customers: (CustomerID, CustomerName, …)
  • Orders: (OrderID, CustomerID, ProductID, …)
  • Products: (ProductID, ProductName, CategoryID, …)
  • Categories: (CategoryID, CategoryName, …)
SQL
WITH CustomerOrderCounts AS (
    SELECT c.CustomerID, cat.CategoryName, COUNT(*) AS OrderCount,
           RANK() OVER (PARTITION BY cat.CategoryName ORDER BY COUNT(*) DESC) AS RankInCategory
    FROM Customers c
    JOIN Orders o ON c.CustomerID = o.CustomerID
    JOIN Products p ON o.ProductID = p.ProductID
    JOIN Categories cat ON p.CategoryID = cat.CategoryID
    GROUP BY c.CustomerID, cat.CategoryName
)
SELECT CustomerID, CategoryName, OrderCount
FROM CustomerOrderCounts
WHERE RankInCategory = 1;

Explanation:

  • CTE (CustomerOrderCounts):
    • Joins all relevant tables to get customer, category, and order information.
    • Groups by CustomerID and CategoryName.
    • Counts the orders for each customer in each category (OrderCount).
    • Ranks customers within each category based on OrderCount using RANK() OVER (PARTITION BY … ORDER BY …).
  • Outer Query:
    • Selects the customer with the highest order count (rank = 1) in each category.

 

21. Find missing values in a sequence

Write a query to find the missing values in a sequence of numbers (e.g., order IDs, invoice numbers).

Table: Orders: (OrderID, …)

SQL
-- Assuming OrderID is an integer sequence
WITH ExpectedOrderIDs AS (
    SELECT generate_series(MIN(OrderID), MAX(OrderID)) AS ExpectedID
    FROM Orders
)
SELECT ExpectedID
FROM ExpectedOrderIDs
EXCEPT
SELECT OrderID
FROM Orders;

Explanation:

  • CTE (ExpectedOrderIDs): Generates a series of numbers from the minimum to the maximum OrderID using the generate_series() function (PostgreSQL specific).
  • EXCEPT Clause: The EXCEPT clause returns values that exist in the first set (expected IDs) but not in the second set (actual OrderIDs).

 

22. Distribute employees into salary groups

You have a table “Employees” with a “Salary” column. Write a query to distribute the employees into three salary groups (Low, Medium,, High) based on equal-sized salary ranges.

This question requires dividing data into quantiles.

SQL
SELECT EmployeeID, EmployeeName, Salary,
       CASE 
           WHEN Salary <= (SELECT PERCENTILE_CONT(0.33) WITHIN GROUP (ORDER BY Salary) FROM Employees) THEN 'Low'
           WHEN Salary <= (SELECT PERCENTILE_CONT(0.66) WITHIN GROUP (ORDER BY Salary) FROM Employees) THEN 'Medium'
           ELSE 'High'
       END AS SalaryGroup
FROM Employees;

Explanation:

  • The PERCENTILE_CONT() (or a similar function depending on your SQL dialect) is used to calculate the 33rd and 66th percentiles of salaries. The CASE statement assigns salary groups based on the calculated percentiles.

 

23. Write a query to find the busiest hour of the day in terms of the number of calls made.

This question combines time manipulation, grouping, and aggregation.

Table: Calls: (CallID, StartTime, …)

SQL
SELECT 
    EXTRACT(HOUR FROM StartTime) AS CallHour,
    COUNT(*) AS CallCount
FROM Calls
GROUP BY CallHour
ORDER BY CallCount DESC
LIMIT 1;

Explanation:

  • Extract Hour: Extracts the hour of the day from the StartTime using EXTRACT(HOUR FROM …).
  • Group by Hour: Groups calls by the extracted hour.
  • Count Calls: Counts the number of calls in each hour.
  • Order and Limit: Orders the results by call count in descending order and retrieves the top hour (busiest hour).

 

24. Write a query to find the pairs of employees who have the same manager.

This question requires self-joining the table and comparing manager IDs.

Table: Employees: (EmployeeID, EmployeeName, ManagerID)

SQL
SELECT e1.EmployeeName, e2.EmployeeName
FROM Employees e1
INNER JOIN Employees e2 ON e1.ManagerID = e2.ManagerID AND e1.EmployeeID < e2.EmployeeID;

Explanation:

  • Self-Join: Joins the “Employees” table to itself, using aliases e1 and e2.
  • Matching Managers: Joins based on employees having the same ManagerID.
  • Avoid Duplicates: The e1.EmployeeID < e2.EmployeeID condition prevents duplicate pairs and ensures each pair is listed only once.

 

Practice, Practice, Practice!

The key to acing SQL interview questions is consistent practice. Work through various SQL problems, experiment with different query approaches, and analyze the results.

Online SQL Practice Platforms:

Sample Databases:

  • Use sample databases like the Northwind database or create your own to test queries.

Real-World Datasets:

  • Explore publicly available datasets on Kaggle or government websites to practice with real-world data.