0% found this document useful (0 votes)
59 views34 pages

Myntra SQL

The document contains a series of SQL interview questions and their corresponding solutions, focusing on various data manipulation tasks such as finding the second highest salary, calculating running totals, identifying duplicates, and optimizing queries. Each question includes sample input tables, SQL queries, and expected outputs. The questions cover a range of SQL concepts including joins, window functions, and aggregate functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views34 pages

Myntra SQL

The document contains a series of SQL interview questions and their corresponding solutions, focusing on various data manipulation tasks such as finding the second highest salary, calculating running totals, identifying duplicates, and optimizing queries. Each question includes sample input tables, SQL queries, and expected outputs. The questions cover a range of SQL concepts including joins, window functions, and aggregate functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

MYNTRA DATA ANALYST INTERVIEW

ǪUESTIONS
YOE: 1-5
CTC:
Ǫ1. Find the second highest salary without using LIMIT,
OFFSET, or TOP
Input Table: Employees

EmpID Name Salary

1 Alice 60000

2 Bob 80000

3 Charlie 75000

4 David 80000

5 Eve 90000

SǪL Ǫuery:

SELECT MAX(Salary) AS Second_Highest_Salary

FROM Employees

WHERE Salary < (

SELECT MAX(Salary) FROM Employees

);

Expected Output:

Second_Highest_Salary

80000

Ǫ2. Given a table of orders, write a query to find the running


total (cumulative sum) of revenue for each day
Input Table: Orders

OrderID OrderDate Revenue

101 2024-01-01 100

102 2024-01-01 200

103 2024-01-02 150

104 2024-01-03 300

105 2024-01-03 100

SǪL Ǫuery:

SELECT

OrderDate,

SUM(Revenue) AS Daily_Revenue,

SUM(SUM(Revenue)) OVER (ORDER BY OrderDate) AS Running_Total_Revenue

FROM Orders

GROUP BY OrderDate

ORDER BY OrderDate;

Expected Output:

OrderDate Daily_Revenue Running_Total_Revenue

2024-01-01 300 300

2024-01-02 150 450

2024-01-03 400 850

Ǫ3. Write an SǪL query to identify employees who earn more


than their managers
Input Table: Employees
EmpID Name Salary ManagerID

1 Alice 70000 NULL

2 Bob 80000 1

3 Charlie 60000 1

4 David 90000 2

5 Eve 85000 2

SǪL Ǫuery:

SELECT e.EmpID, e.Name, e.Salary, e.ManagerID

FROM Employees e

JOIN Employees m ON e.ManagerID = m.EmpID

WHERE e.Salary > m.Salary;

Expected Output:

EmpID Name Salary ManagerID

2 Bob 80000 1

4 David 90000 2

5 Eve 85000 2

Ǫ4. Find the top N customers who made the highest


purchases, ensuring no duplicates if customers have the same
purchase amount
Input Table: Purchases

CustomerID CustomerName PurchaseAmount

1 Alice 5000
CustomerID CustomerName PurchaseAmount

2 Bob 7000

3 Charlie 7000

4 David 6500

5 Eve 6000

Goal:

Get the Top N distinct purchase amounts and the corresponding customers (no
duplicates for same amount).
Let’s say N = 2, i.e., top 2 distinct purchase amounts.

SǪL Ǫuery:

WITH RankedPurchases AS (

SELECT

CustomerID,

CustomerName,

PurchaseAmount,

DENSE_RANK() OVER (ORDER BY PurchaseAmount DESC) AS Rank

FROM Purchases

SELECT

CustomerID,

CustomerName,

PurchaseAmount

FROM RankedPurchases
WHERE Rank <= 2;

Expected Output:

CustomerID CustomerName PurchaseAmount

2 Bob 7000

3 Charlie 7000

4 David 6500

DENSE_RANK() ensures that equal amounts get the same rank, so you don’t
exclude tied top purchasers.

Ǫ5. Identify consecutive login streaks for users where they


logged in for at least three consecutive days
Input Table: UserLogins

UserID LoginDate

101 2024-05-01

101 2024-05-02

101 2024-05-03

101 2024-05-05

102 2024-05-10

102 2024-05-11

102 2024-05-12

102 2024-05-13

103 2024-05-15
UserID LoginDate

103 2024-05-17

Goal:

Find users who logged in for at least 3 consecutive days.

SǪL Ǫuery:

WITH RankedLogins AS (

SELECT

UserID,

LoginDate,

ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY LoginDate) AS rn

FROM UserLogins

),

Streaks AS (

SELECT

UserID,

LoginDate,

DATE_SUB(LoginDate, INTERVAL rn DAY) AS StreakGroup

FROM RankedLogins

),

GroupedStreaks AS (

SELECT

UserID,

MIN(LoginDate) AS StartDate,
MAX(LoginDate) AS EndDate,

COUNT(*) AS StreakLength

FROM Streaks

GROUP BY UserID, StreakGroup

SELECT *

FROM GroupedStreaks

WHERE StreakLength >= 3;

Expected Output:

UserID StartDate EndDate StreakLength

101 2024-05-01 2024-05-03 3

102 2024-05-10 2024-05-13 4

The idea is to use ROW_NUMBER() to find gaps between actual login date and
sequential row number. Same gap means consecutive logins.

Ǫ6. Write an efficient query to detect duplicate records in a


table and delete only the extra duplicates, keeping one copy

Input Table: Customers

CustomerID Name Email

1 Alice [email protected]

2 Bob [email protected]

3 Alice [email protected]
CustomerID Name Email

4 Charlie [email protected]

5 Alice [email protected]

Goal:

• Identify duplicate rows based on Name + Email


• Keep only 1 copy, delete the rest

SǪL Ǫuery (MySǪL / PostgreSǪL style):

-- Step 1: Delete duplicates but keep one row

DELETE FROM Customers

WHERE CustomerID NOT IN (

SELECT MIN(CustomerID)

FROM Customers

GROUP BY Name, Email

);

MIN(CustomerID) helps keep one copy, and deletes others having the same Name
+ Email.

Output Table After Deletion:

CustomerID Name Email

1 Alice [email protected]

2 Bob [email protected]

4 Charlie [email protected]
SǪL Ǫuery (If using CTE in PostgreSǪL):

WITH Duplicates AS (

SELECT *,

ROW_NUMBER() OVER (PARTITION BY Name, Email ORDER BY CustomerID) AS rn

FROM Customers

DELETE FROM Customers

WHERE CustomerID IN (

SELECT CustomerID FROM Duplicates WHERE rn > 1

);

Ǫ7. You have a table with millions of records. How would you
optimize a slow-performing query that involves multiple joins
and aggregations?

Goal: Optimize a slow query on large data involving:


• Multiple joins

Aggregations (SUM, COUNT, GROUP BY, etc.)

Optimization Techniques:

1. Add Proper Indexes

• Add indexes on JOIN keys and WHERE clause columns:

CREATE INDEX idx_customer_id ON Orders(customer_id);

CREATE INDEX idx_order_date ON Orders(order_date);

2. Use EXPLAIN Plan


• Analyze the query execution using:

EXPLAIN ANALYZE

SELECT ...

FROM ...

JOIN ...

WHERE ...

• Identify full scans, slow joins, missing indexes

3. Avoid SELECT *

• Always project only required columns.

-- Instead of this

SELECT * FROM Orders

-- Use this

SELECT OrderID, CustomerID, Revenue FROM Orders

4. Use CTEs or Temp Tables for Intermediate Results

WITH RevenuePerCustomer AS (

SELECT CustomerID, SUM(Revenue) AS TotalRev

FROM Orders

GROUP BY CustomerID

SELECT c.CustomerName, r.TotalRev

FROM RevenuePerCustomer r

JOIN Customers c ON c.CustomerID = r.CustomerID;

Avoids redundant computation and improves readability.

5. Join Order Matters

• Join smaller filtered datasets first to reduce row multiplication.


6. Partitioning and Batching

• If processing huge date-wise data, use table partitioning

• For maintenance scripts: batch delete or batch insert

7. Materialized Views or Summary Tables

• Precompute aggregations and store them:

CREATE MATERIALIZED VIEW MonthlyRevenue AS

SELECT CustomerID, MONTH(OrderDate) AS Month, SUM(Revenue) AS TotalRevenue

FROM Orders

GROUP BY CustomerID, MONTH(OrderDate);

8. Use Analytical Functions Instead of Subqueries (where applicable)

Final Tip:

Combine EXPLAIN plans with index strategy, column selection, and CTEs to
dramatically improve performance.

Ǫ8. Retrieve the first order for each customer, ensuring that
ties (multiple first orders on the same date) are handled
correctly

Input Table: Orders

OrderID CustomerID OrderDate Amount

1 101 2024-05-01 500

2 101 2024-05-01 450

3 101 2024-05-03 600

4 102 2024-05-02 700


OrderID CustomerID OrderDate Amount

5 103 2024-05-01 300

6 103 2024-05-01 350

Goal:

• Get each customer's first order(s)

• Handle ties (same earliest date → return all matching orders)

SǪL Ǫuery using RANK():

WITH RankedOrders AS (

SELECT *,

RANK() OVER (PARTITION BY CustomerID ORDER BY OrderDate ASC) AS rnk

FROM Orders

SELECT OrderID, CustomerID, OrderDate, Amount

FROM RankedOrders

WHERE rnk = 1;

Output:

OrderID CustomerID OrderDate Amount

1 101 2024-05-01 500

2 101 2024-05-01 450

4 102 2024-05-02 700

5 103 2024-05-01 300


OrderID CustomerID OrderDate Amount

6 103 2024-05-01 350

RANK() handles ties. If you only want one row, use ROW_NUMBER() instead.

Ǫ9. Find products that were never purchased by any


customer

Input Tables:

Products

ProductID ProductName

1 Shoes

2 Bag

3 Watch

4 Hat

OrderItems

OrderID ProductID Ǫuantity

101 1 2

102 3 1

103 1 1

Goal:

List all products from Products table that do not exist in OrderItems.
SǪL Ǫuery using LEFT JOIN:

SELECT p.ProductID, p.ProductName

FROM Products p

LEFT JOIN OrderItems o ON p.ProductID = o.ProductID

WHERE o.ProductID IS NULL;

You can also use NOT IN or NOT EXISTS depending on performance and data
distribution.

Output:

ProductID ProductName

2 Bag

4 Hat

Ǫ10. Given a table with customer transactions, find


customers who made transactions in every month of the year

Input Table: Transactions

TransactionID CustomerID TransactionDate

12 101 2024-01-15

3 101 2024-02-10

... 101 2024-03-20

13 ... ...

14 101 2024-12-01

102 2024-01-05
TransactionID CustomerID TransactionDate

15 102 2024-04-17

(Assume full data includes all 12 months for Customer 101 but not for 102)

Goal:

• Find customers who transacted in all 12 months of a given year.

SǪL Ǫuery (PostgreSǪL / MySǪL):

SELECT CustomerID

FROM (

SELECT CustomerID,

EXTRACT(MONTH FROM TransactionDate) AS txn_month

FROM Transactions

WHERE EXTRACT(YEAR FROM TransactionDate) = 2024

GROUP BY CustomerID, EXTRACT(MONTH FROM TransactionDate)

) AS monthly_txns

GROUP BY CustomerID

HAVING COUNT(DISTINCT txn_month) = 12;

Output:

CustomerID

101

Ensures the customer has one or more transactions in all 12 distinct months
Ǫ11. Find employees who have the same salary as another
employee in the same department

Input Table: Employees

EmpID EmpName Department Salary

1 Alice Sales 50000

2 Bob Sales 60000

3 Carol Sales 50000

4 David HR 45000

5 Eva HR 45000

6 Frank HR 47000

Goal:

• List employees whose salary matches someone else’s in the same department

SǪL Ǫuery (Self Join):

SELECT e1.EmpID, e1.EmpName, e1.Department, e1.Salary

FROM Employees e1

JOIN Employees e2

ON e1.Department = e2.Department

AND e1.Salary = e2.Salary

AND e1.EmpID <> e2.EmpID;

Output:
EmpID EmpName Department Salary

1 Alice Sales 50000

3 Carol Sales 50000

4 David HR 45000

5 Eva HR 45000

Self join filters for other employees in the same department with same salary but
different IDs

Ǫ12. Retrieve the department with the highest total salary


paid to employees

Input Table: Employees

EmpID EmpName Department Salary

1 Alice Sales 50000

2 Bob Sales 60000

3 Carol HR 45000

4 David HR 47000

5 Eva Tech 90000

6 Frank Tech 80000

Goal:

Return the department with the maximum total salary paid.


SǪL Ǫuery (Using ORDER BY + LIMIT or RANK()):

SELECT Department, SUM(Salary) AS TotalSalary

FROM Employees

GROUP BY Department

ORDER BY TotalSalary DESC

LIMIT 1;

Alternate using Window Function (if LIMIT is restricted):

WITH DeptSalaries AS (

SELECT Department, SUM(Salary) AS TotalSalary

FROM Employees

GROUP BY Department

),

RankedDepts AS (

SELECT *, RANK() OVER (ORDER BY TotalSalary DESC) AS rnk

FROM DeptSalaries

SELECT Department, TotalSalary

FROM RankedDepts

WHERE rnk = 1;

Output:

Department TotalSalary

Tech 170000
Ǫ13. Rank orders based on order value for each customer
and return only the top 3 orders per customer

Input Table: Orders

OrderID CustomerID OrderValue

1 101 1000

2 101 1500

3 101 1200

4 101 800

5 102 700

6 102 1000

7 102 1200

8 102 1300

Goal:

• Rank orders for each customer by value (high to low)

• Return only top 3 orders per customer

SǪL Ǫuery using RANK() or ROW_NUMBER():

WITH RankedOrders AS (

SELECT *,

RANK() OVER (PARTITION BY CustomerID ORDER BY OrderValue DESC) AS rnk

FROM Orders

)
SELECT OrderID, CustomerID, OrderValue

FROM RankedOrders

WHERE rnk <= 3;

Use ROW_NUMBER() if you want to break ties.

Output:

OrderID CustomerID OrderValue

2 101 1500

3 101 1200

1 101 1000

8 102 1300

7 102 1200

6 102 1000

Ǫ14. Find the median salary of employees using SǪL (without


built-in median functions)

Input Table: Employees

EmpID EmpName Salary

1 Alice 50000

2 Bob 60000

3 Carol 55000
EmpID EmpName Salary

4 David 70000

5 Eva 75000

Goal:

Find the median salary, which is:

• Middle value (for odd count)

• Average of two middle values (for even count)

SǪL Ǫuery (Generic C DBMS-agnostic using Window Functions):

WITH RankedSalaries AS (

SELECT Salary,

ROW_NUMBER() OVER (ORDER BY Salary) AS rn,

COUNT(*) OVER () AS total_count

FROM Employees

),

MedianCalc AS (

SELECT Salary, total_count, rn

FROM RankedSalaries

WHERE rn = (total_count + 1) / 2 -- for odd

OR rn = total_count / 2 -- for even

OR rn = total_count / 2 + 1 -- for even

SELECT ROUND(AVG(Salary), 2) AS MedianSalary

FROM MedianCalc;
Output:

MedianSalary

60000.00

Works for both odd and even numbers of rows.

Ǫ15. Pivot a table where each row represents a sales


transaction and transform it into a summary format where each
column represents a month

Input Table: Sales

TransactionID CustomerID SaleAmount SaleDate

1 101 1000 2024-01-15

2 101 1500 2024-02-12

3 101 700 2024-03-10

4 102 1200 2024-01-10

5 102 800 2024-03-21

Goal:

Pivot the table to show:

• Each CustomerID in one row

• Monthly sales in separate columns like Jan, Feb, Mar, etc.

SǪL Ǫuery using CASE + SUM (Standard SǪL approach):


SELECT

CustomerID,

SUM(CASE WHEN MONTH(SaleDate) = 1 THEN SaleAmount ELSE 0 END) AS Jan,

SUM(CASE WHEN MONTH(SaleDate) = 2 THEN SaleAmount ELSE 0 END) AS Feb,

SUM(CASE WHEN MONTH(SaleDate) = 3 THEN SaleAmount ELSE 0 END) AS Mar

-- Extend up to Dec if needed

FROM Sales

GROUP BY CustomerID;

Use TO_CHAR(SaleDate, 'MM') for PostgreSǪL or EXTRACT(MONTH FROM


SaleDate) as needed.

Output:

CustomerID Jan Feb Mar

101 1000 1500 700

102 1200 0 800

Ǫ16. Find the moving average of sales for the last 7 days for
each product

Input Table: Sales

ProductID SaleDate SaleAmount

P1 2024-06-01 100

P1 2024-06-02 200

P1 2024-06-03 150
ProductID SaleDate SaleAmount

P1 2024-06-04 250

P1 2024-06-05 100

P1 2024-06-06 300

P1 2024-06-07 200

P1 2024-06-08 500

P2 2024-06-01 120

P2 2024-06-02 180

Goal:

For each ProductID and SaleDate, calculate the 7-day moving average of SaleAmount.

SǪL Ǫuery using RANGE BETWEEN INTERVAL (if supported):

SELECT

ProductID,

SaleDate,

ROUND(AVG(SaleAmount) OVER (

PARTITION BY ProductID

ORDER BY SaleDate

RANGE BETWEEN INTERVAL 6 DAY PRECEDING AND CURRENT ROW

), 2) AS MovingAvg7Days

FROM Sales;

Alternate using ROWS if RANGE with intervals is unsupported:

SELECT

ProductID,
SaleDate,

ROUND(AVG(SaleAmount) OVER (

PARTITION BY ProductID

ORDER BY SaleDate

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW

), 2) AS MovingAvg7Days

FROM Sales;

Output (sample for P1):

ProductID SaleDate MovingAvg7Days

P1 2024-06-01 100.00

P1 2024-06-02 150.00

P1 2024-06-03 150.00

P1 2024-06-04 175.00

P1 2024-06-05 160.00

P1 2024-06-06 183.33

P1 2024-06-07 185.71

P1 2024-06-08 228.57

Ǫ17. Find the first and last occurrence of each event per user

Input Table: Events


UserID EventName EventTime

1 login 2024-05-01 10:00:00

1 login 2024-05-01 15:00:00

1 logout 2024-05-01 18:00:00

2 login 2024-05-02 09:00:00

2 login 2024-05-02 10:30:00

2 logout 2024-05-02 11:30:00

2 logout 2024-05-02 12:00:00

Goal:

Return each UserID, EventName, the first and last time that event occurred.

SǪL Ǫuery using MIN() and MAX():

SELECT

UserID,

EventName,

MIN(EventTime) AS FirstOccurrence,

MAX(EventTime) AS LastOccurrence

FROM Events

GROUP BY UserID, EventName;

Output:

UserID EventName FirstOccurrence LastOccurrence

1 login 2024-05-01 10:00:00 2024-05-01 15:00:00


UserID EventName FirstOccurrence LastOccurrence

1 logout 2024-05-01 18:00:00 2024-05-01 18:00:00

2 login 2024-05-02 09:00:00 2024-05-02 10:30:00

2 logout 2024-05-02 11:30:00 2024-05-02 12:00:00

Ǫ18. Identify users who have placed an order in two


consecutive months but not in the third month

Input Table: Orders

UserID OrderDate

101 2023-01-15

101 2023-02-10

101 2023-03-05

102 2023-04-12

102 2023-05-18

103 2023-06-20

103 2023-07-25

103 2023-09-03

Goal:

Find users who ordered in 2 consecutive months, but not in the next (third) month.

Solution Logic:
1. Extract Year-Month from OrderDate.

2. Rank months per user.

3. Self-join or lag-lead to compare 3-month streaks.

SǪL Ǫuery (Using LAG + LEAD):

WITH MonthData AS (

SELECT

UserID,

DATE_TRUNC('month', OrderDate) AS MonthStart

FROM Orders

GROUP BY UserID, DATE_TRUNC('month', OrderDate)

),

WithLagLead AS (

SELECT

UserID,

MonthStart,

LAG(MonthStart, 1) OVER (PARTITION BY UserID ORDER BY MonthStart) AS


PrevMonth,

LEAD(MonthStart, 1) OVER (PARTITION BY UserID ORDER BY MonthStart) AS


NextMonth

FROM MonthData

SELECT DISTINCT UserID

FROM WithLagLead

WHERE

DATE_PART('month', MonthStart) - DATE_PART('month', PrevMonth) = 1


AND (

NextMonth IS NULL OR DATE_PART('month', NextMonth) - DATE_PART('month',


MonthStart) > 1

);

This logic checks for a consecutive previous month, but gap or missing third
month.

Output:

UserID

102

103

Ǫ19. Find the most frequently purchased product category by


each user over the past year

Input Table: Orders

UserID OrderDate Category

101 2024-07-01 Electronics

101 2024-07-15 Fashion

101 2024-08-01 Electronics

102 2024-07-10 Grocery

102 2024-08-20 Grocery

102 2024-09-05 Fashion

103 2024-07-01 Fashion


UserID OrderDate Category

103 2024-07-10 Fashion

103 2024-08-01 Fashion

Goal:

Find the top product category (by frequency) per user in the last 12 months.

SǪL Ǫuery (Using RANK()):

WITH LastYearOrders AS (

SELECT *

FROM Orders

WHERE OrderDate >= CURRENT_DATE - INTERVAL '12 months'

),

CategoryCounts AS (

SELECT

UserID,

Category,

COUNT(*) AS CategoryCount

FROM LastYearOrders

GROUP BY UserID, Category

),

Ranked AS (

SELECT *,

RANK() OVER (PARTITION BY UserID ORDER BY CategoryCount DESC) AS rnk

FROM CategoryCounts
)

SELECT UserID, Category, CategoryCount

FROM Ranked

WHERE rnk = 1;

Output:

UserID Category CategoryCount

101 Electronics 2

102 Grocery 2

103 Fashion 3

Ǫ20. Write a query to generate a sequential ranking of


products based on total sales, but reset the ranking for each
year

Input Table: Sales

ProductID SaleAmount SaleDate

P1 500 2022-01-15

P2 700 2022-03-22

P1 300 2022-05-20

P3 800 2022-07-01

P1 1000 2023-02-10

P2 600 2023-03-15
ProductID SaleAmount SaleDate

P3 900 2023-04-12

Goal:
• Rank products by total sales per year.

Rankings should restart each year.

Use RANK() or DENSE_RANK() for ties.

Step-by-Step Logic:

1. Extract the year from SaleDate.

2. Aggregate total sales per product per year.

3. Use a window function (RANK() or DENSE_RANK()) to assign rank within each


year.

SǪL Ǫuery:

WITH YearlySales AS (

SELECT

EXTRACT(YEAR FROM SaleDate) AS SalesYear,

ProductID,

SUM(SaleAmount) AS TotalSales

FROM Sales

GROUP BY EXTRACT(YEAR FROM SaleDate), ProductID

),

RankedProducts AS (

SELECT *,

RANK() OVER (PARTITION BY SalesYear ORDER BY TotalSales DESC) AS SalesRank


FROM YearlySales

SELECT *

FROM RankedProducts

ORDER BY SalesYear, SalesRank;

Output:

SalesYear ProductID TotalSales SalesRank

2022 P3 800 1

2022 P2 700 2

2022 P1 800 1

2023 P1 1000 1

2023 P3 900 2

2023 P2 600 3

If two products have the same total sales, RANK() assigns the same rank and skips
the next number. Use DENSE_RANK() if you want continuous ranking without gaps.

You might also like