Wednesday, January 21, 2026

22jan26 - Surrogate keys - when to use and not use

What is a Surrogate Key?

A surrogate key is an artificial, system-generated unique identifier for a row in a table.
Usually integer or bigint, auto-incremented or generated by the system.
Not derived from business data (unlike natural keys).

Example:


customer_key INT IDENTITY(1,1)  -- surrogate key

2️⃣ When to Use Surrogate Keys

Scenario	Reason / Benefit	Example
Data Warehouse / Fact-Dimension Modeling	Allows consistent foreign key joins, supports slowly changing dimensions (SCD), avoids issues when natural keys change.	`Dim_Customer` with `customer_key` used in `Fact_Order_Sales`
Slowly Changing Dimensions (Type 2)	A surrogate key lets you track history without altering historical fact data.	Customer changes address → new surrogate key row in `Dim_Customer`
Composite / Complex Natural Keys	Avoid long, multi-column foreign keys.	Order detail identified by `order_id + line_number` → simpler to use `order_line_key`
Integration from Multiple Systems	Different source systems may use different IDs → surrogate key standardizes references.	Merging `customer_id` from two CRMs
Performance / Indexing	Integer surrogate keys are more efficient for joins than long string keys.	Joining `Dim_Product.product_key` to fact table instead of `product_sku`

3️⃣ When NOT to Use Surrogate Keys

Scenario	Reason / Alternative
Transactional OLTP Tables where natural key is stable and simple	Surrogate key adds no value; natural key can uniquely identify the row.
When key has real-world meaning and is immutable	Using a surrogate key may cause unnecessary joins and duplication.
Small reference tables with static values	Surrogate key adds complexity without benefit.
Temporary / staging tables	Surrogate keys are unnecessary; use natural key or row hash.
Where natural key is already numeric, short, and stable	Surrogate key is redundant

4️⃣ Key Guidelines

Always use surrogate keys in data warehouse dimensions and fact table foreign keys.
Avoid surrogate keys in OLTP tables with stable, meaningful natural keys.
Never mix natural keys and surrogate keys for joins in the same context—it confuses lineage.
Surrogate keys are particularly helpful for history tracking (SCD2) and integration of multiple sources

22jan26 - Food Delivery Business – Data Modeling from OLAP - V1

Below is a complete, end-to-end explanation for a Food Delivery Business, written exactly the way it is expected in projects, design documents, interviews, and architecture reviews.

🍔 Food Delivery Business – Data Modeling from OLTP to Star Schema

1️⃣ Transaction Model (Enterprise / OLTP Model)

🎯 Purpose

Support day-to-day operations
Optimized for fast inserts & updates
Highly normalized (3NF)

📋 Core Transaction Tables

Customers

customer_id (PK)
name
phone
email
address_id
created_date

Restaurants

restaurant_id (PK)
restaurant_name
cuisine_type
city
rating

Menu_Items

menu_item_id (PK)
restaurant_id (FK)
item_name
category
price
is_available

Orders

order_id (PK)
customer_id (FK)
restaurant_id (FK)
order_datetime
order_status
total_amount

Order_Items

order_item_id (PK)
order_id (FK)
menu_item_id (FK)
quantity
item_price

Delivery_Partners

delivery_partner_id (PK)
name
vehicle_type
join_date

Payments

payment_id (PK)
order_id (FK)
payment_mode
payment_status
paid_amount

2️⃣ Need for Reporting & Analytics

📊 Business Questions

Daily / monthly revenue?
Top restaurants by sales?
Most ordered food items?
Average delivery time?
Customer repeat rate?
City-wise performance?
Partner productivity?

3️⃣ Problems Reporting Directly from Transaction Model ❌

Problem	Explanation
Too many joins	8–10 joins per report
Poor performance	Tables optimized for writes
Complex queries	Hard for BI tools
Mixed grain	Orders vs order items
No historical tracking	Overwritten data
Business logic scattered	Revenue logic repeated

📌 Conclusion:

Transaction models are NOT designed for analytics.

4️⃣ Why We Need a Data Warehouse ✅

Benefits

Optimized for read & aggregation
Simple query structure
Historical data support
Business-friendly schema
Faster dashboards

5️⃣ Define Grain & Granularity (MOST IMPORTANT)

🎯 Reporting Focus

Analyze food orders and revenue

✅ Fact Table Grain

ONE ROW per menu item per order per day

Formally:

Order × Menu Item × Date

📌 Granularity

Lowest level: Order Item
Time level: Day
Allows roll-up to:
- Order
- Restaurant
- City
- Month / Year

6️⃣ Map Business Requirements to Fact & Dimension Tables

Business Requirement	Table Type	Table Name
Revenue analysis	Fact	Fact_Order_Sales
Time-based analysis	Dimension	Dim_Date
Customer analysis	Dimension	Dim_Customer
Restaurant performance	Dimension	Dim_Restaurant
Menu analysis	Dimension	Dim_Menu_Item
Location analysis	Dimension	Dim_Location
Delivery partner analysis	Dimension	Dim_Delivery_Partner
Payment analysis	Dimension	Dim_Payment

7️⃣ Fact & Dimension Tables List

⭐ Fact Table

Fact_Order_Sales

Grain: One row per menu item per order per day

Measures

quantity_ordered
item_sales_amount
discount_amount
delivery_fee
total_sales_amount
delivery_time_minutes

Foreign Keys

date_key
customer_key
restaurant_key
menu_item_key
delivery_partner_key
location_key
payment_key

⭐ Dimension Tables

Dim_Date

date_key (PK)
full_date
day
month
quarter
year
is_weekend

Dim_Customer

customer_key (PK)
customer_id
name
gender
age_group
city
signup_date
is_current (SCD2)

Dim_Restaurant

restaurant_key (PK)
restaurant_id
restaurant_name
cuisine_type
rating
is_active

Dim_Menu_Item

menu_item_key (PK)
item_name
category
veg_flag
price

Dim_Location

location_key (PK)
city
state
country
zone

Dim_Delivery_Partner

delivery_partner_key (PK)
partner_name
vehicle_type
experience_level

Dim_Payment

payment_key (PK)
payment_mode
payment_provider

8️⃣ Star Schema Structure ⭐

                 Dim_Date
                    |
Dim_Customer — Fact_Order_Sales — Dim_Restaurant
                    |
              Dim_Menu_Item
                    |
              Dim_Location
                    |
          Dim_Delivery_Partner
                    |
                Dim_Payment

9️⃣ Why STAR Schema is Preferred over Snowflake ⭐⭐⭐

Star Schema	Snowflake Schema
Fewer joins	More joins
Faster queries	Slower
Simple for BI	Complex
Business friendly	Technical
Easy maintenance	Harder

📌 Rule:

Use Snowflake only when dimensions are extremely large or reused.

🔟 Example Business Queries

1️⃣ Daily Revenue

SELECT d.full_date, SUM(f.total_sales_amount) revenue
FROM Fact_Order_Sales f
JOIN Dim_Date d ON f.date_key = d.date_key
GROUP BY d.full_date;

2️⃣ Top 5 Restaurants by Revenue

SELECT r.restaurant_name, SUM(f.total_sales_amount) revenue
FROM Fact_Order_Sales f
JOIN Dim_Restaurant r ON f.restaurant_key = r.restaurant_key
GROUP BY r.restaurant_name
ORDER BY revenue DESC
LIMIT 5;

3️⃣ Most Ordered Menu Items

SELECT m.item_name, SUM(f.quantity_ordered) qty
FROM Fact_Order_Sales f
JOIN Dim_Menu_Item m ON f.menu_item_key = m.menu_item_key
GROUP BY m.item_name
ORDER BY qty DESC;

1️⃣1️⃣ Common Mistakes ❌

Mixed grain in fact table
Storing customer name in fact
No date dimension
Snowflaking without reason
Missing surrogate keys
Overloading fact tables
No documentation of grain

🎯 Final Summary

OLTP → operations
Data Warehouse → analytics
Grain defines correctness
Star schema = performance + simplicity

Here’s a comprehensive SQL DDL for all six fact tables in the Food Delivery Data Warehouse, following best practices (surrogate keys, proper data types, foreign keys, and measures only in fact tables).

1️⃣ Fact_Order_Sales

CREATE TABLE Fact_Order_Sales (
    sales_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    date_key INT NOT NULL,
    customer_key INT NOT NULL,
    restaurant_key INT NOT NULL,
    menu_item_key INT NOT NULL,
    delivery_partner_key INT,
    location_key INT,
    payment_key INT,
    order_id VARCHAR(50) NOT NULL,
    order_line_no INT NOT NULL,
    quantity_sold INT NOT NULL,
    item_price DECIMAL(10,2) NOT NULL,
    discount_amount DECIMAL(10,2),
    delivery_fee DECIMAL(10,2),
    total_sales_amount DECIMAL(12,2) NOT NULL,
    cost_amount DECIMAL(10,2),
    profit_amount AS (total_sales_amount - cost_amount) PERSISTED,
    CONSTRAINT fk_order_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_customer FOREIGN KEY (customer_key) REFERENCES Dim_Customer(customer_key),
    CONSTRAINT fk_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key),
    CONSTRAINT fk_menu_item FOREIGN KEY (menu_item_key) REFERENCES Dim_Menu_Item(menu_item_key),
    CONSTRAINT fk_delivery_partner FOREIGN KEY (delivery_partner_key) REFERENCES Dim_Delivery_Partner(delivery_partner_key),
    CONSTRAINT fk_location FOREIGN KEY (location_key) REFERENCES Dim_Location(location_key),
    CONSTRAINT fk_payment FOREIGN KEY (payment_key) REFERENCES Dim_Payment(payment_key)
);

2️⃣ Fact_Customer_Activity

CREATE TABLE Fact_Customer_Activity (
    activity_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    date_key INT NOT NULL,
    customer_key INT NOT NULL,
    orders_count INT NOT NULL,
    total_spend DECIMAL(12,2),
    avg_order_value DECIMAL(10,2),
    last_order_days INT,
    is_repeat_customer BIT,
    CONSTRAINT fk_cust_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_customer FOREIGN KEY (customer_key) REFERENCES Dim_Customer(customer_key)
);

3️⃣ Fact_Order_Forecast

CREATE TABLE Fact_Order_Forecast (
    forecast_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    forecast_date_key INT NOT NULL,
    restaurant_key INT NOT NULL,
    location_key INT NOT NULL,
    forecast_orders INT NOT NULL,
    forecast_revenue DECIMAL(12,2),
    model_version VARCHAR(50),
    CONSTRAINT fk_forecast_date FOREIGN KEY (forecast_date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_forecast_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key),
    CONSTRAINT fk_forecast_location FOREIGN KEY (location_key) REFERENCES Dim_Location(location_key)
);

4️⃣ Fact_Menu_Item_Performance

CREATE TABLE Fact_Menu_Item_Performance (
    menu_item_perf_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    date_key INT NOT NULL,
    menu_item_key INT NOT NULL,
    restaurant_key INT NOT NULL,
    quantity_sold INT NOT NULL,
    item_revenue DECIMAL(12,2),
    item_cost DECIMAL(12,2),
    item_profit AS (item_revenue - item_cost) PERSISTED,
    popularity_rank INT,
    CONSTRAINT fk_mip_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_mip_menu_item FOREIGN KEY (menu_item_key) REFERENCES Dim_Menu_Item(menu_item_key),
    CONSTRAINT fk_mip_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key)
);

5️⃣ Fact_Season_Sales

CREATE TABLE Fact_Season_Sales (
    season_sales_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    date_key INT NOT NULL,
    season_key INT NOT NULL,
    location_key INT NOT NULL,
    total_orders INT NOT NULL,
    total_revenue DECIMAL(12,2),
    avg_delivery_time DECIMAL(6,2),
    CONSTRAINT fk_ss_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_ss_season FOREIGN KEY (season_key) REFERENCES Dim_Season(season_key),
    CONSTRAINT fk_ss_location FOREIGN KEY (location_key) REFERENCES Dim_Location(location_key)
);

6️⃣ Fact_Delivery_Performance

CREATE TABLE Fact_Delivery_Performance (
    delivery_perf_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    date_key INT NOT NULL,
    order_id VARCHAR(50) NOT NULL,
    delivery_partner_key INT NOT NULL,
    restaurant_key INT NOT NULL,
    location_key INT,
    promised_time_minutes INT,
    actual_time_minutes INT,
    delay_minutes AS (actual_time_minutes - promised_time_minutes) PERSISTED,
    sla_met_flag BIT,
    CONSTRAINT fk_dp_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_dp_partner FOREIGN KEY (delivery_partner_key) REFERENCES Dim_Delivery_Partner(delivery_partner_key),
    CONSTRAINT fk_dp_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key),
    CONSTRAINT fk_dp_location FOREIGN KEY (location_key) REFERENCES Dim_Location(location_key)
);

7️⃣ Fact_Store_Performance

CREATE TABLE Fact_Store_Performance (
    store_perf_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    date_key INT NOT NULL,
    restaurant_key INT NOT NULL,
    total_orders INT NOT NULL,
    total_revenue DECIMAL(12,2),
    avg_order_value DECIMAL(10,2),
    avg_rating DECIMAL(3,2),
    cancellation_rate DECIMAL(5,2),
    CONSTRAINT fk_sp_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_sp_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key)
);

✅ Notes on DDL Design

Surrogate Keys for all fact tables (BIGINT IDENTITY)
Foreign keys reference dimension tables
Derived measures calculated in the fact table (profit_amount, delay_minutes)
Granularity enforced by primary key + business rules
Numeric measures only (no descriptive attributes)

If you want, I can also write the SQL DDL for all dimension tables to make this a fully deployable DW schema.

Do you want me to do that next?

22JAN26 - Best practices for Data modeling 1

---

## Data Modeling Best Practices – Headings

1. Start with Business Requirements (Not Tables)

2. Define Grain First (Critical Rule)

3. Choose the Right Model Type

4. Use Clear Naming Conventions

5. Design Clean Dimension Tables

6. Keep Fact Tables Lean

7. Enforce Data Integrity

8. Handle Dates Properly

9. Optimize for Query Performance

10. Plan for Data Growth

11. Document Everything

12. Validate with Real Data

13. Version Control Your Models

14. Avoid Over-Normalization in Warehouses

15. Design for BI & Reporting Tools

---------------------------------------------------------------------------------------------------------------------------

## 1. Start with Business Requirements (Not Tables)

**Best Practice**

* Understand **what questions the business wants answered**, not just source schemas.

**Why**

* Prevents over-engineering and unused columns/tables.

**How**

* Identify KPIs, reports, dashboards

* Define **facts**, **dimensions**, and **grain**

* Write requirements in plain language

*Example: “Daily sales by product and region”*

---------------------------------------------------------------------------------------------------------------------------

## 2. Define Grain First (Critical Rule)

**Best Practice**

* Clearly define the **lowest level of detail** for each fact table.

**Why**

* Prevents aggregation errors and duplicate data.

**Example**

* ❌ “Sales table” (too vague)

* ✅ “One row per product, per order, per day”

---

---------------------------------------------------------------------------------------------------------------------------

## 3. Choose the Right Model Type

### a) OLTP (Transactional Systems)

* Use **3NF (Third Normal Form)**

* Minimize redundancy

* Optimize for **writes**

### b) OLAP / Data Warehouse

* Use **Dimensional Modeling**

* Star schema preferred

* Optimize for **reads & analytics**

**Rule of Thumb**

> Applications → normalized

> Analytics → dimensional

---

---------------------------------------------------------------------------------------------------------------------------

## 4. Use Clear Naming Conventions

**Best Practice**

* Use **business-friendly, consistent names**

**Examples**

* `customer_id` (not `custid`)

* `order_date` (not `dt1`)

* Tables: singular or plural—choose one and stick to it

**Why**

* Improves usability for analysts and BI tools.

---------------------------------------------------------------------------------------------------------------------------

## 5. Design Clean Dimension Tables

**Best Practices**

* Use **surrogate keys** (not natural keys)

* Keep dimensions **wide, not deep**

* Handle Slowly Changing Dimensions (SCD)

**SCD Types**

* Type 1 – overwrite (no history)

* Type 2 – add new row (full history) ⭐ most common

* Type 3 – limited history

---------------------------------------------------------------------------------------------------------------------------

## 6. Keep Fact Tables Lean

**Best Practices**

* Store only:

* Foreign keys

* Numeric measures

* Avoid text fields in facts

**Example**

* ✔ `sales_amount`, `quantity`

* ❌ `customer_name`, `product_description`

---

---------------------------------------------------------------------------------------------------------------------------

## 7. Enforce Data Integrity

**Best Practice**

* Use constraints where possible

**Examples**

* Primary keys

* Foreign keys

* NOT NULL for required fields

**Why**

* Prevents silent data corruption.

---

---------------------------------------------------------------------------------------------------------------------------

## 8. Handle Dates Properly

**Best Practice**

* Use a **Date Dimension**, not raw dates everywhere

**Benefits**

* Easy filtering by:

* Year

* Month

* Quarter

* Fiscal periods

* Holidays

---

---------------------------------------------------------------------------------------------------------------------------

## 9. Optimize for Query Performance

**Best Practices**

* Index:

* Foreign keys

* Frequently filtered columns

* Use partitioning for large fact tables

* Avoid unnecessary joins

**Cloud Tip (AWS)**

* Redshift / Athena → sort keys & distribution keys

* RDS → proper indexing and query plans

---

---------------------------------------------------------------------------------------------------------------------------

## 10. Plan for Data Growth

**Best Practice**

* Design for **scale from day one**

**How**

* Avoid hard-coding values

* Use surrogate keys

* Anticipate new dimensions & measures

---

---------------------------------------------------------------------------------------------------------------------------

## 11. Document Everything

**Best Practice**

* Maintain data model documentation

**Include**

* Table purpose

* Column definitions

* Grain

* Business rules

**Why**

* Reduces dependency on individuals.

---

---------------------------------------------------------------------------------------------------------------------------

## 12. Validate with Real Data

**Best Practice**

* Test models using **actual data volumes**

**Checks**

* Row counts

* Duplicates

* Aggregation accuracy

* Edge cases

---

---------------------------------------------------------------------------------------------------------------------------

## 13. Version Control Your Models

**Best Practice**

* Treat data models like code

**How**

* Store DDL in Git

* Track schema changes

* Use migration scripts

---

---------------------------------------------------------------------------------------------------------------------------

## 14. Avoid Over-Normalization in Warehouses

**Best Practice**

* Prefer **simplicity over theoretical purity**

**Why**

* BI tools and users prefer fewer joins.

---

---------------------------------------------------------------------------------------------------------------------------

## 15. Design for BI & Reporting Tools

**Best Practice**

* Models should be **self-service friendly**

**How**

* Meaningful column names

* Pre-calculated metrics where useful

* Consistent dimensions across facts (conformed dimensions)

---

---------------------------------------------------------------------------------------------------------------------------

## Golden Rule Summary

> **A good data model is easy to understand, hard to misuse, and flexible to change.**

------------------------------------------------------------------------------------------------------------------------------

Monday, January 19, 2026

20JAN26 - Enterprise Data Model (EDM) Steps

Enterprise Data Model (EDM) Steps

Enterprise Data Model (EDM) steps to create a data warehouse for northwind database with 5 different reports with granule and granularity

Data Warehouse Design for Northwind

(With Granule & Granularity)

1. Define Business Objective

Goal
Provide consistent enterprise-wide reporting for:

Sales
Products
Customers
Employees
Logistics

EDM Focus
Single version of truth across all reports.

2. Identify Business Processes

(Kimball + EDM aligned)

Process	Description
Order Processing	Customer orders and order lines
Sales Analysis	Revenue & quantity
Product Analysis	Product & category trends
Customer Analysis	Buying behavior
Shipping Analysis	Delivery performance

3. Define Enterprise Data Domains (EDM)

Domain	Entities
Sales	Orders, Order Details
Customer	Customers
Product	Products, Categories, Suppliers
Employee	Employees
Time	Date
Logistics	Shippers

4. Conceptual Enterprise Data Model (High-Level)

Core Entities

Customer
Order
Order Line
Product
Employee
Shipper
Date

Key Relationships

Customer → Order (1:M)
Order → Order Line (1:M)
Product → Order Line (1:M)

At EDM level, no granularity yet—only business meaning.

5. Identify the Fact & Define the Granule (MOST IMPORTANT STEP)

Fact Table: FactSales

Granule (Grain)

One record per product per order per day

This means:

Each row = 1 product line item in an order
Lowest level of detail available from source

Granularity

Daily
Order Line level
Transactional

Supports all summary and detailed reports
Prevents double counting

6. Logical EDM → Dimensional Model

Fact Table

FactSales

Attribute	Description
OrderID	Degenerate dimension
DateKey	Order date
CustomerKey	FK
ProductKey	FK
EmployeeKey	FK
ShipperKey	FK
Quantity	Units sold
UnitPrice	Price
Discount	Discount
SalesAmount	Calculated

Dimension Tables (Granularity Alignment)

Dimension	Grain
DimCustomer	One row per customer
DimProduct	One row per product
DimEmployee	One row per employee
DimDate	One row per day
DimShipper	One row per shipper

7. Granule vs Granularity (Clear Distinction)

Term	Meaning	Example
Granule	What one row represents	One product in one order
Granularity	Level of detail	Daily, transactional

8. Reports with Granule & Granularity Mapping

Report 1: Executive Sales Summary

Required Granularity

Monthly / Yearly

Derived From Granule

Aggregated from order-line daily grain

KPIs

Total Revenue
Total Orders
Average Order Value

Safe aggregation because base grain is atomic

Report 2: Product Performance Report

Granularity

Product × Month

KPIs

Revenue per Product
Units Sold
Category Contribution %

Uses

DimProduct
FactSales

Report 3: Customer Analysis Report

Granularity

Customer × Quarter

KPIs

Customer Lifetime Value
Avg Purchase Frequency
Repeat Purchase Rate

Grain Dependency

Needs order-line detail to avoid duplication

Report 4: Employee Sales Performance

Granularity

Employee × Month

KPIs

Revenue per Employee
Orders Handled
Avg Discount %

Grain Rule

Employee linked at order level, rolled up from line items.

Report 5: Shipping & Logistics Report

Granularity

Shipper × Month

KPIs

Avg Shipping Days
On-Time Delivery %
Orders Shipped

Grain

Calculated using order-line facts + order dates

9. Why Lowest Granularity Wins (EDM Rule)

Benefit	Explanation
Flexibility	Supports future reports
Accuracy	No aggregation loss
Performance	Can pre-aggregate later
Governance	Single business definition

10. EDM Traceability Matrix

EDM Entity	DW Object	Grain
Order	FactSales	Order Line
Product	DimProduct	Product
Customer	DimCustomer	Customer
Employee	DimEmployee	Employee
Shipper	DimShipper	Shipper

11. Final EDM-Driven Architecture

Northwind OLTP
     ↓
 Staging (Raw Data)
     ↓
 Enterprise Data Warehouse
     ↓
 Star Schema (FactSales)
     ↓
 Aggregates / Data Marts
     ↓
 Reports & KPIs

Key Exam / Interview Takeaway

Always define the grain before building the fact table.
Granule = what one row means
Granularity = level of detail

Dimensional Data Modeling Steps

Dimensional Data Modeling Steps – Northwind Data Warehouse

1. Understand Business Requirements

Identify business users (Sales, Finance, Operations).
Define key business questions (sales performance, customer behavior, product demand).
Finalize reporting needs and KPIs.

2. Identify Business Processes

Typical Northwind business processes:

Sales Orders
Product Sales
Shipping & Logistics
Customer Transactions
Supplier Procurement

3. Declare the Grain (Most Important Step)

Decide exactly what a single row in each fact table represents
Example:
- One row per Order Line per Product per Day

This decision controls:

Level of detail
Storage size
Reporting flexibility

4. Identify Dimensions

Common dimensions from Northwind:

Date
Customer
Product
Employee
Supplier
Shipper
Category
Geography (Country, City, Region)

5. Identify Facts (Measures)

Numeric, additive values:

Sales Amount
Quantity Sold
Discount Amount
Freight Cost
Order Count

6. Design Star Schema

Fact table in the center
Dimension tables around it
Avoid snowflaking (keep dimensions denormalized)

7. Define Surrogate Keys

Create surrogate keys for all dimensions
Maintain Slowly Changing Dimensions (SCD Type 1 / Type 2 where required)

8. Design ETL Process

Extract data from Northwind OLTP
Clean, transform, and conform data
Load dimensions first, then facts

9. Validate with Sample Reports

Ensure the model supports all required reports
Check aggregations, filters, and drill-downs

10. Deploy & Optimize

Index fact tables
Partition by Date
Enable BI tools (Power BI / Tableau / QuickSight)

B. Core Star Schema (Northwind)

Fact Table

Fact_Sales

Column	Description
Date_Key
Product_Key
Customer_Key
Employee_Key
Shipper_Key
Quantity
Unit_Price
Discount
Sales_Amount
Freight

Dimension Tables

Dim_Date
Dim_Product
Dim_Customer
Dim_Employee
Dim_Shipper
Dim_Supplier
Dim_Category

C. Five Business Reports with Grain & Granularity

1. Sales Performance Report

Business Question:
How much revenue is generated over time?

Fact Table: Fact_Sales
Dimensions: Date, Product, Category

Grain (Granule):
One row per Product per Day

Granularity:

Daily → Monthly → Yearly

Measures:

Total Sales Amount
Total Quantity Sold

2. Customer Purchase Behavior Report

Business Question:
Which customers generate the most revenue?

Fact Table: Fact_Sales
Dimensions: Customer, Date, Geography

Grain:
One row per Customer per Order per Day

Granularity:

Order level → Monthly → Yearly

Measures:

Sales Amount
Order Count

3. Product Performance Report

Business Question:
Which products are best and worst sellers?

Fact Table: Fact_Sales
Dimensions: Product, Category, Supplier

Grain:
One row per Product per Order Line

Granularity:

Order Line → Product → Category

Measures:

Quantity Sold
Revenue
Discount Amount

4. Employee Sales Contribution Report

Business Question:
How well are employees performing in sales?

Fact Table: Fact_Sales
Dimensions: Employee, Date, Region

Grain:
One row per Employee per Order

Granularity:

Order → Month → Year

Measures:

Total Sales
Orders Handled

5. Shipping & Logistics Cost Report

Business Question:
What is the shipping cost trend by shipper?

Fact Table: Fact_Sales
Dimensions: Shipper, Date, Geography

Grain:
One row per Order per Shipper per Day

Granularity:

Daily → Monthly → Yearly

Measures:

Freight Cost
Number of Shipments

D. Summary Table

Report	Grain	Granularity
Sales Performance	Product per Day	Day → Month → Year
Customer Behavior	Customer per Order	Order → Month
Product Performance	Product per Order Line	Line → Product
Employee Performance	Employee per Order	Order → Year
Shipping Cost	Order per Shipper	Day → Year

EDM vs OLTP are both the same

Short answer:

👉 No. EDM is NOT the same as transactional OLTP — but they are closely related.

Let’s clear this up cleanly and precisely (this is a very common interview confusion).

1. Simple One-Line Difference

EDM = Blueprint / logical model of enterprise data
OLTP = Physical system that runs transactions

Think: EDM is the design, OLTP is the implementation

2. How They Relate (But Are Not the Same)

Aspect	EDM (Enterprise Data Model)	OLTP (Transactional System)
What it is	Logical / conceptual data model	Operational database
Purpose	Define what data exists and how it relates	Run business transactions
Usage	Architecture, governance, integration	Day-to-day operations
Normalization	Highly normalized (3NF)	Highly normalized (3NF)
Users	Data architects	Applications, end users
Data Volume	Structure-focused	Transaction-focused

3. Why People Confuse EDM with OLTP

Because:

Both are normalized
Both use ER diagrams
Both model entities like Customer, Order, Product
OLTP databases are usually built from EDM

➡️ But they are not the same thing.

4. Correct Relationship (Very Important)

EDM (Logical Blueprint)
        ↓
OLTP Schema (Physical Implementation)
        ↓
ODS (Optional)
        ↓
Dimensional Model (DW / Data Mart)

EDM defines what should exist
OLTP implements it with tables, indexes, constraints
Dimensional Model reshapes it for analytics

5. Example (Northwind)

EDM View (Logical)

Customer
Order
Order Line
Product
Supplier

Relationships only — no performance details.

OLTP View (Physical)

Customers table
Orders table
Order_Details table
Foreign keys, indexes, constraints
Supports INSERT / UPDATE / DELETE

Dimensional View (Analytics)

Fact_Sales
Dim_Customer
Dim_Product
Dim_Date

6. Key Interview Statement (Use This)

EDM is not an OLTP system; it is the logical foundation from which OLTP systems are built.

OLTP databases are physical realizations of an enterprise or application data model, often derived from an EDM.

7. Final Verdict

❌ EDM ≠ OLTP
✅ EDM → guides OLTP design
✅ OLTP → optimized for transactions
✅ Dimensional Model → optimized for analytics

If you want, I can also give:

EDM vs OLTP vs ODS vs DW in one table
Real interview Q&A
Architecture diagram explanation

Just say the word 👍

Monday, December 8, 2025

SQL : FIND SALES TREND

use db1;

DROP TABLE IF EXISTS Orders;

CREATE TABLE Orders (

OrderID INT PRIMARY KEY,

Amount DECIMAL(10, 2) NOT NULL,

OrderDate DATE NOT NULL,

IsReturn INT NOT NULL -- 0 for New Order, 1 for Return

);

-----

-- ``` insert stmts ----

INSERT INTO Orders (OrderID, Amount, OrderDate, IsReturn) VALUES

(1001, 45.00, '2025-12-01', 0),

(1002, 120.50, '2025-12-01', 0),

(1003, 25.99, '2025-12-02', 0),

(1004, 99.00, '2025-12-02', 0),

(1005, 15.00, '2025-12-02', 1),

(1006, 350.00, '2025-12-03', 0),

(1007, 75.25, '2025-12-03', 0),

(1008, 18.00, '2025-12-03', 0),

(1009, 22.00, '2025-12-03', 1),

(1010, 145.00, '2025-12-04', 0),

(1011, 55.00, '2025-12-04', 0),

(1012, 60.00, '2025-12-04', 0),

(1013, 70.00, '2025-12-04', 0),

(1014, 12.99, '2025-12-04', 0),

(1015, 88.00, '2025-12-05', 0),

(1016, 210.00, '2025-12-05', 0),

(1017, 10.00, '2025-12-05', 1),

(1018, 45.50, '2025-12-06', 0),

(1019, 105.00, '2025-12-06', 0),

(1020, 180.00, '2025-12-07', 0),

(1021, 50.00, '2025-12-07', 0),

(1022, 250.00, '2025-12-07', 1),

(1023, 33.00, '2025-12-08', 0),

(1024, 175.00, '2025-12-08', 0),

(1025, 110.00, '2025-12-09', 0),

(1026, 7.99, '2025-12-09', 0),

(1027, 150.00, '2025-12-10', 0),

(1028, 40.00, '2025-12-10', 0),

(1029, 25.00, '2025-12-10', 1),

(1030, 13.50, '2025-12-10', 0);

select * from orders ;

-- ------- query to find trend sales increasing or not ---

WITH DailyCounts AS (

-- 1. Calculate the total number of orders for each date

SELECT

OrderDate,

COUNT(OrderID) AS DailyOrderCount

FROM

Orders

GROUP BY

OrderDate

)

SELECT

OrderDate,

-- Get the order count from the previous day, using 0 as a default for the first day

LAG(DailyOrderCount, 1, 0) OVER (ORDER BY OrderDate) AS PreviousDayCount,

DailyOrderCount,

-- -----------------------------

-- 2. Compare current count to previous count to determine the trend

CASE

WHEN DailyOrderCount > LAG(DailyOrderCount, 1, 0) OVER (ORDER BY OrderDate) THEN 'Increasing ⬆️'

WHEN DailyOrderCount < LAG(DailyOrderCount, 1, 0) OVER (ORDER BY OrderDate) THEN 'Decreasing ⬇️'

ELSE 'Stable/Same'

END AS Trend

FROM

DailyCounts

ORDER BY

OrderDate;

PAGES

Wednesday, January 21, 2026

What is a Surrogate Key?

2️⃣ When to Use Surrogate Keys

3️⃣ When NOT to Use Surrogate Keys

4️⃣ Key Guidelines

🍔 Food Delivery Business – Data Modeling from OLTP to Star Schema

1️⃣ Transaction Model (Enterprise / OLTP Model)

🎯 Purpose

📋 Core Transaction Tables

Customers

Restaurants

Menu_Items

Orders

Order_Items

Delivery_Partners

Payments

2️⃣ Need for Reporting & Analytics

📊 Business Questions

3️⃣ Problems Reporting Directly from Transaction Model ❌

4️⃣ Why We Need a Data Warehouse ✅

Benefits

5️⃣ Define Grain & Granularity (MOST IMPORTANT)

🎯 Reporting Focus

✅ Fact Table Grain

📌 Granularity

6️⃣ Map Business Requirements to Fact & Dimension Tables

7️⃣ Fact & Dimension Tables List

⭐ Fact Table

Fact_Order_Sales

⭐ Dimension Tables

Dim_Date

Dim_Customer

Dim_Restaurant

Dim_Menu_Item

Dim_Location

Dim_Delivery_Partner

Dim_Payment

8️⃣ Star Schema Structure ⭐

9️⃣ Why STAR Schema is Preferred over Snowflake ⭐⭐⭐

🔟 Example Business Queries

1️⃣ Daily Revenue

2️⃣ Top 5 Restaurants by Revenue

3️⃣ Most Ordered Menu Items

1️⃣1️⃣ Common Mistakes ❌

🎯 Final Summary

1️⃣ Fact_Order_Sales

2️⃣ Fact_Customer_Activity

3️⃣ Fact_Order_Forecast

4️⃣ Fact_Menu_Item_Performance

5️⃣ Fact_Season_Sales

6️⃣ Fact_Delivery_Performance

7️⃣ Fact_Store_Performance

✅ Notes on DDL Design

Monday, January 19, 2026

Enterprise Data Model (EDM) Steps

Data Warehouse Design for Northwind

(With Granule & Granularity)

1. Define Business Objective

2. Identify Business Processes

3. Define Enterprise Data Domains (EDM)

4. Conceptual Enterprise Data Model (High-Level)

5. Identify the Fact & Define the Granule (MOST IMPORTANT STEP)

Fact Table: FactSales

Granule (Grain)

Granularity

6. Logical EDM → Dimensional Model

Fact Table

Dimension Tables (Granularity Alignment)

7. Granule vs Granularity (Clear Distinction)

8. Reports with Granule & Granularity Mapping

Report 1: Executive Sales Summary

Required Granularity

Derived From Granule

KPIs

Report 2: Product Performance Report

Granularity

KPIs

Uses

Report 3: Customer Analysis Report