Supermarket Sales Analysis using Python & Tableau
Start using Python to answer business questions and get insights.
Hello world, this time I will practice data analysis on a supermarket-sales case study using Python and Tableau as a visualization dashboard.
Business Overview
Supermarket growth in most various cities is increasing and market competition is intense. Average supermarket sales growth rate in the past 3 months fell 4.5%
Business Questions
How we can increase sales with customer membership by 10% in 3 months?
Clean & Manipulate Data
- Data Observations
we can use df.info()
to do an overview of our data set.
We can see if no missing value has been found, and we need to merge Date and Time column, then change column type from object to datetime.
- Merge Column & Change Format
#Merge Date and Time column
df['Date time']=df['Date'] +" "+ df['Time']
#Change Date time column type
df['Date time'] = pd.to_datetime(df['Date time'])
df['Date'] = pd.to_datetime(df['Date'])
df[[‘Date time’]].info()
df[[‘Date time’]].head()
the data format is as desired, which is in the form of datetime.
- Add Columns
We will add some columns that that will be needed in the analysis process, such as add month column, hour column and revenue column.
This is the function of changing the data format that we did above. Because if the data is still in the form of object then we will not be able to process the data in a time base format.
# Add Month column
df['Month'] = df['Date time'].dt.month
# Add Hour column
df['Hour'] = df['Date time'].dt.hour
# Add Revenue column
df['Revenue'] = df['Unit price']*df['Quantity']
we will use the month column to see the monthly trend of revenue and the hour column to see the number of shopping activities that occur in hours. then we add a revenue column to validate the data.
- Remove Columns
- cogs column value is equal to revenue calculation
- tax column value is equal to gross income
- gross margin percentage columns are all equal
- we will use the revenue column instead of the total column
# Remove suspicious and unnecessary columns
df.drop(['Tax 5%', 'Total', 'cogs', 'gross margin percentage', 'gross income','Time’], axis=1, inplace=True)
This is an interesting discovery because some of the columns above have quite important roles in analysis, but the data is quite suspicious so we decide to discard them.
Therefore, validating the data also needs to be done, as we did above by adding the revenue column.
Since this is for practice, we will keep using this dataset because there is still a lot of information that can be displayed.
- Check & Remove Outliers
# Check outliers
check_col = ['Unit price','Quantity','Revenue']
df.boxplot(check_col, figsize=(7,7))
# Remove outliers
for x in ['Revenue']:
q75,q25 = np.percentile(df.loc[:,x],[75,25])
intr_qr = q75-q25
max = q75+(1.5*intr_qr)
min = q25-(1.5*intr_qr)
df.loc[df[x] < min,x] = np.nan
df.loc[df[x] > max,x] = np.nan
df = df.dropna(axis=0)
Exploratory Data Analysis
- Correlation between Columns
Revenue has positive correlation with Unit Price and Quantity. So we know that to increase revenue is mainly by increasing the number of sales.
- Overall Revenue Trend
By looking at the trend line, we can see that there was a decrease in revenue over the 3-month period. And there was a drop in February that we don’t know the cause of.
- Total Products Sales
Health and beauty was the least purchased product by people, with a total of 854 unit of goods. While Electronic accessories was the most purchased by people with 971 unit of goods. Next, we will look at the average price and the rating given by customers.
- Unit Price Per Products Line
The highest average sales price is at product Sports and travel with $58.76, while the lowest is at product Electronic accessories with $53.55. Now we can see that the average price for products electronic accessories is the cheapest, which can be one reason why this product is sold a lot. Then the rest we will see through the ratings given by customers.
- Rating Per Products Line
Overall, the total average rating of each product line is almost the same. Food and beverages gets the highest total average rating of 7.13, while Home and lifestyle receives the lowest of 6.87. Therefore, we must improve the quality of the products we sell and reorganize their location or sell products according to the trend.
- Total Revenue By City Branch
In February all branches experienced a decline in revenue. Then the branch in Mandalay experienced a declining revenue trend in 3 months and generated the lowest proportion of revenue at 33.2% with total revenue of $99.19K. To see the cause of why the Mandalay branch has decreased revenue, we will try to find out the average rating given by customers.
- Rating Per City Branch
From the graph above, we can find out that one of the causes of the decline in Mandalay branch revenue is due to poor service. But we still need to evaluate employees in all branches to improve their service.
- Total Revenue By Customer
Customers with membership contribute the most in revenue, with a proportion of 51.35% or $153.4K, followed by normal customers with 48.65% or $145.3
- Total Customer Transactions
Both members and normal customers have almost the same number of transactions. Membership customers have as many as 498 transactions, and normal customers have 493 transactions.
- Rating By Customer Type
Normal customers give a higher average rating than membership customers.
We get fascinating information from some of the customer charts above, because membership customers contribute the most revenue and transactions but give lower average ratings than normal customers.
We need to create a program so that membership customers get more benefits than normal customers, such as adding cash back or discounts. Let’s see what the majority of customers use to insert this campaign.
- Customer Transactions By Payment
Membership customers mostly use credit card payment types and then cash. While normal customers mostly use e-wallet then cash.
- Sales Transactions Per Hour
From the graph we can see that peak shopping occurs around 3 pm (15.00 hrs) and 7 pm (19.00 hrs) at all branches in the three cities.
Summary & Recommendation
Summary
- There is a decrease in revenue trend in 3 months.
- Across all branches in the 3 cities, product electronic accessories was purchased the most, while product health and beauty was purchased the least.
- Product sports and travel got the highest average sales price.
- Most memberships use credit card payments.
- Mandalay branch experienced a decrease in revenue and contributed the lowest in 3 months, and also received the lowest rating.
- Membership customers contribute the most revenue and transactions but give lower average ratings than normal customers.
- The most shopping activity is around 3 pm and 7 pm.
- In February all branches experienced a decline in revenue.
Recommendation
- To increase sales, we can add variations for product lines that get a lot of sales. And for product lines with fewer sales, we can sell trending products and reorganize the layout and appearance.
- Improve quality control to replace bad products quickly and only sell products in good condition.
- Improve service quality by conducting training programs for employees.
- Loyalty programs for customers, especially for membership customers to clearly get rewards or discounts compared to normal customers. for example, by adding cashback with credit card payments.
- We should display ads to maximize the likelihood of customers buying products around 3 pm and 7 pm when most customers are shopping.