0% found this document useful (0 votes)
1K views

Shopee Python-Pandas Test (45 Mins)

1. The dataset contains listing information from over 464,000 rows with 12 columns of data on items, shops, sales metrics and more. 2. Analysis found over 26,000 unique shops, over 1,000 preferred or cross-border shops, and over 100,000 products with zero sales. 3. The top categories and shops by unique product count and estimated revenue were identified. Duplicated listings within shops were also flagged and further analyzed.

Uploaded by

Gyan Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views

Shopee Python-Pandas Test (45 Mins)

1. The dataset contains listing information from over 464,000 rows with 12 columns of data on items, shops, sales metrics and more. 2. Analysis found over 26,000 unique shops, over 1,000 preferred or cross-border shops, and over 100,000 products with zero sales. 3. The top categories and shops by unique product count and estimated revenue were identified. Duplicated listings within shops were also flagged and further analyzed.

Uploaded by

Gyan Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Shopee Python-Pandas Test (45 mins)

In this task, you'll be analysing listings data from our Shopee Platform.

You may use the Pycharm IDE installed, Sublime or other windows native text editors. Please save
your python source code on the desktop. You may use the internet for help.

The dataset is stored in the Test_Pandas.xlsx file. It contains listing information posted on Shopee.
One single listing corresponds to one row in the dataset.

The dataset has 12 columns, and 464433 rows.

Here are the brief descriptions of each column:

Itemid - a unique ID of the product


Shopid - a unique ID of the shop
item_name - product title
item_description - detailed product description
item_variation - stores variations of a product (e.g. different colours or sizes, in the format like
{variation 1 name: variation 1 price, variation 2 name: variation 2 price})
price - how much does the item sold
stock - how many stocks left
category - which category does the product belongs to
cb_option - 1 indicates the product is sold by a cross border shop
is_preferred - 1 indicates the product is sold by a preferred shop
sold_count - how many products have been sold
item_creation_date - when are the product uploaded by the seller
1. Use pandas function to read the Test_Pandas.xlsx file in:
a. Assign the result to a variable named “data”
b. Assign all column names to a variable named “columns”

2. Use pandas function to find:


a. How many unique shops are in the dataset?
b. How many unique preferred and cross border shops are in the dataset?
c. How many products have zero sold count?
d. How many products were created in the year 2018?

3. Use pandas function to find:


a. Top 3 Preferred shops’ shopid that have the largest number of unique products
b. Top 3 Categories that have the largest number of unique cross-border products

4. Find Top 3 shopid with the highest revenue (Assumption: the product price has not been
changed.)

5. Find number of products that have more than 3 variations (do not include products with 3 or
fewer variations)

6. Use pandas function to identify duplicated listings within each shop (If listing A and B in shop
S have the exactly same product title, product detailed description, and price, both listing A
and B are considered as duplicated listings)
a. Mark those duplicated listings with True otherwise False and store the marking
result in a new column named “is_duplicated”
b. Find duplicate listings that has less than 2 sold count and store the result in a new
excel file named “duplicated_listings.xlsx”
c. Find the preferred shop shopid that have the most number of duplicated listings

You might also like