Below is a complete, end-to-end explanation for a Food Delivery Business, written exactly the way it is expected in projects, design documents, interviews, and architecture reviews.
๐ Food Delivery Business – Data Modeling from OLTP to Star Schema
1️⃣ Transaction Model (Enterprise / OLTP Model)
๐ฏ Purpose
Support day-to-day operations
Optimized for fast inserts & updates
Highly normalized (3NF)
๐ Core Transaction Tables
Customers
customer_id (PK)
name
phone
email
address_id
created_date
Restaurants
restaurant_id (PK)
restaurant_name
cuisine_type
city
rating
Menu_Items
menu_item_id (PK)
restaurant_id (FK)
item_name
category
price
is_available
Orders
order_id (PK)
customer_id (FK)
restaurant_id (FK)
order_datetime
order_status
total_amount
Order_Items
order_item_id (PK)
order_id (FK)
menu_item_id (FK)
quantity
item_price
Delivery_Partners
delivery_partner_id (PK)
name
vehicle_type
join_date
Payments
payment_id (PK)
order_id (FK)
payment_mode
payment_status
paid_amount
2️⃣ Need for Reporting & Analytics
๐ Business Questions
Daily / monthly revenue?
Top restaurants by sales?
Most ordered food items?
Average delivery time?
Customer repeat rate?
City-wise performance?
Partner productivity?
3️⃣ Problems Reporting Directly from Transaction Model ❌
| Problem | Explanation |
|---|---|
| Too many joins | 8–10 joins per report |
| Poor performance | Tables optimized for writes |
| Complex queries | Hard for BI tools |
| Mixed grain | Orders vs order items |
| No historical tracking | Overwritten data |
| Business logic scattered | Revenue logic repeated |
๐ Conclusion:
Transaction models are NOT designed for analytics.
4️⃣ Why We Need a Data Warehouse ✅
Benefits
Optimized for read & aggregation
Simple query structure
Historical data support
Business-friendly schema
Faster dashboards
5️⃣ Define Grain & Granularity (MOST IMPORTANT)
๐ฏ Reporting Focus
Analyze food orders and revenue
✅ Fact Table Grain
ONE ROW per menu item per order per day
Formally:
Order × Menu Item × Date
๐ Granularity
Lowest level: Order Item
Time level: Day
Allows roll-up to:
Order
Restaurant
City
Month / Year
6️⃣ Map Business Requirements to Fact & Dimension Tables
| Business Requirement | Table Type | Table Name |
|---|---|---|
| Revenue analysis | Fact | Fact_Order_Sales |
| Time-based analysis | Dimension | Dim_Date |
| Customer analysis | Dimension | Dim_Customer |
| Restaurant performance | Dimension | Dim_Restaurant |
| Menu analysis | Dimension | Dim_Menu_Item |
| Location analysis | Dimension | Dim_Location |
| Delivery partner analysis | Dimension | Dim_Delivery_Partner |
| Payment analysis | Dimension | Dim_Payment |
7️⃣ Fact & Dimension Tables List
⭐ Fact Table
Fact_Order_Sales
Grain: One row per menu item per order per day
Measures
quantity_ordered
item_sales_amount
discount_amount
delivery_fee
total_sales_amount
delivery_time_minutes
Foreign Keys
date_key
customer_key
restaurant_key
menu_item_key
delivery_partner_key
location_key
payment_key
⭐ Dimension Tables
Dim_Date
date_key (PK)
full_date
day
month
quarter
year
is_weekend
Dim_Customer
customer_key (PK)
customer_id
name
gender
age_group
city
signup_date
is_current (SCD2)
Dim_Restaurant
restaurant_key (PK)
restaurant_id
restaurant_name
cuisine_type
rating
is_active
Dim_Menu_Item
menu_item_key (PK)
item_name
category
veg_flag
price
Dim_Location
location_key (PK)
city
state
country
zone
Dim_Delivery_Partner
delivery_partner_key (PK)
partner_name
vehicle_type
experience_level
Dim_Payment
payment_key (PK)
payment_mode
payment_provider
8️⃣ Star Schema Structure ⭐
Dim_Date
|
Dim_Customer — Fact_Order_Sales — Dim_Restaurant
|
Dim_Menu_Item
|
Dim_Location
|
Dim_Delivery_Partner
|
Dim_Payment
9️⃣ Why STAR Schema is Preferred over Snowflake ⭐⭐⭐
| Star Schema | Snowflake Schema |
|---|---|
| Fewer joins | More joins |
| Faster queries | Slower |
| Simple for BI | Complex |
| Business friendly | Technical |
| Easy maintenance | Harder |
๐ Rule:
Use Snowflake only when dimensions are extremely large or reused.
๐ Example Business Queries
1️⃣ Daily Revenue
SELECT d.full_date, SUM(f.total_sales_amount) revenue
FROM Fact_Order_Sales f
JOIN Dim_Date d ON f.date_key = d.date_key
GROUP BY d.full_date;
2️⃣ Top 5 Restaurants by Revenue
SELECT r.restaurant_name, SUM(f.total_sales_amount) revenue
FROM Fact_Order_Sales f
JOIN Dim_Restaurant r ON f.restaurant_key = r.restaurant_key
GROUP BY r.restaurant_name
ORDER BY revenue DESC
LIMIT 5;
3️⃣ Most Ordered Menu Items
SELECT m.item_name, SUM(f.quantity_ordered) qty
FROM Fact_Order_Sales f
JOIN Dim_Menu_Item m ON f.menu_item_key = m.menu_item_key
GROUP BY m.item_name
ORDER BY qty DESC;
1️⃣1️⃣ Common Mistakes ❌
Mixed grain in fact table
Storing customer name in fact
No date dimension
Snowflaking without reason
Missing surrogate keys
Overloading fact tables
No documentation of grain
๐ฏ Final Summary
OLTP → operations
Data Warehouse → analytics
Grain defines correctness
Star schema = performance + simplicity
Here’s a comprehensive SQL DDL for all six fact tables in the Food Delivery Data Warehouse, following best practices (surrogate keys, proper data types, foreign keys, and measures only in fact tables).
1️⃣ Fact_Order_Sales
CREATE TABLE Fact_Order_Sales (
sales_key BIGINT IDENTITY(1,1) PRIMARY KEY,
date_key INT NOT NULL,
customer_key INT NOT NULL,
restaurant_key INT NOT NULL,
menu_item_key INT NOT NULL,
delivery_partner_key INT,
location_key INT,
payment_key INT,
order_id VARCHAR(50) NOT NULL,
order_line_no INT NOT NULL,
quantity_sold INT NOT NULL,
item_price DECIMAL(10,2) NOT NULL,
discount_amount DECIMAL(10,2),
delivery_fee DECIMAL(10,2),
total_sales_amount DECIMAL(12,2) NOT NULL,
cost_amount DECIMAL(10,2),
profit_amount AS (total_sales_amount - cost_amount) PERSISTED,
CONSTRAINT fk_order_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
CONSTRAINT fk_customer FOREIGN KEY (customer_key) REFERENCES Dim_Customer(customer_key),
CONSTRAINT fk_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key),
CONSTRAINT fk_menu_item FOREIGN KEY (menu_item_key) REFERENCES Dim_Menu_Item(menu_item_key),
CONSTRAINT fk_delivery_partner FOREIGN KEY (delivery_partner_key) REFERENCES Dim_Delivery_Partner(delivery_partner_key),
CONSTRAINT fk_location FOREIGN KEY (location_key) REFERENCES Dim_Location(location_key),
CONSTRAINT fk_payment FOREIGN KEY (payment_key) REFERENCES Dim_Payment(payment_key)
);
2️⃣ Fact_Customer_Activity
CREATE TABLE Fact_Customer_Activity (
activity_key BIGINT IDENTITY(1,1) PRIMARY KEY,
date_key INT NOT NULL,
customer_key INT NOT NULL,
orders_count INT NOT NULL,
total_spend DECIMAL(12,2),
avg_order_value DECIMAL(10,2),
last_order_days INT,
is_repeat_customer BIT,
CONSTRAINT fk_cust_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
CONSTRAINT fk_customer FOREIGN KEY (customer_key) REFERENCES Dim_Customer(customer_key)
);
3️⃣ Fact_Order_Forecast
CREATE TABLE Fact_Order_Forecast (
forecast_key BIGINT IDENTITY(1,1) PRIMARY KEY,
forecast_date_key INT NOT NULL,
restaurant_key INT NOT NULL,
location_key INT NOT NULL,
forecast_orders INT NOT NULL,
forecast_revenue DECIMAL(12,2),
model_version VARCHAR(50),
CONSTRAINT fk_forecast_date FOREIGN KEY (forecast_date_key) REFERENCES Dim_Date(date_key),
CONSTRAINT fk_forecast_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key),
CONSTRAINT fk_forecast_location FOREIGN KEY (location_key) REFERENCES Dim_Location(location_key)
);
4️⃣ Fact_Menu_Item_Performance
CREATE TABLE Fact_Menu_Item_Performance (
menu_item_perf_key BIGINT IDENTITY(1,1) PRIMARY KEY,
date_key INT NOT NULL,
menu_item_key INT NOT NULL,
restaurant_key INT NOT NULL,
quantity_sold INT NOT NULL,
item_revenue DECIMAL(12,2),
item_cost DECIMAL(12,2),
item_profit AS (item_revenue - item_cost) PERSISTED,
popularity_rank INT,
CONSTRAINT fk_mip_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
CONSTRAINT fk_mip_menu_item FOREIGN KEY (menu_item_key) REFERENCES Dim_Menu_Item(menu_item_key),
CONSTRAINT fk_mip_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key)
);
5️⃣ Fact_Season_Sales
CREATE TABLE Fact_Season_Sales (
season_sales_key BIGINT IDENTITY(1,1) PRIMARY KEY,
date_key INT NOT NULL,
season_key INT NOT NULL,
location_key INT NOT NULL,
total_orders INT NOT NULL,
total_revenue DECIMAL(12,2),
avg_delivery_time DECIMAL(6,2),
CONSTRAINT fk_ss_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
CONSTRAINT fk_ss_season FOREIGN KEY (season_key) REFERENCES Dim_Season(season_key),
CONSTRAINT fk_ss_location FOREIGN KEY (location_key) REFERENCES Dim_Location(location_key)
);
6️⃣ Fact_Delivery_Performance
CREATE TABLE Fact_Delivery_Performance (
delivery_perf_key BIGINT IDENTITY(1,1) PRIMARY KEY,
date_key INT NOT NULL,
order_id VARCHAR(50) NOT NULL,
delivery_partner_key INT NOT NULL,
restaurant_key INT NOT NULL,
location_key INT,
promised_time_minutes INT,
actual_time_minutes INT,
delay_minutes AS (actual_time_minutes - promised_time_minutes) PERSISTED,
sla_met_flag BIT,
CONSTRAINT fk_dp_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
CONSTRAINT fk_dp_partner FOREIGN KEY (delivery_partner_key) REFERENCES Dim_Delivery_Partner(delivery_partner_key),
CONSTRAINT fk_dp_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key),
CONSTRAINT fk_dp_location FOREIGN KEY (location_key) REFERENCES Dim_Location(location_key)
);
7️⃣ Fact_Store_Performance
CREATE TABLE Fact_Store_Performance (
store_perf_key BIGINT IDENTITY(1,1) PRIMARY KEY,
date_key INT NOT NULL,
restaurant_key INT NOT NULL,
total_orders INT NOT NULL,
total_revenue DECIMAL(12,2),
avg_order_value DECIMAL(10,2),
avg_rating DECIMAL(3,2),
cancellation_rate DECIMAL(5,2),
CONSTRAINT fk_sp_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
CONSTRAINT fk_sp_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key)
);
✅ Notes on DDL Design
Surrogate Keys for all fact tables (
BIGINT IDENTITY)Foreign keys reference dimension tables
Derived measures calculated in the fact table (
profit_amount,delay_minutes)Granularity enforced by primary key + business rules
Numeric measures only (no descriptive attributes)
If you want, I can also write the SQL DDL for all dimension tables to make this a fully deployable DW schema.
Do you want me to do that next?
No comments:
Post a Comment