Wednesday, January 21, 2026

22jan26 - Surrogate keys - when to use and not use

 

What is a Surrogate Key?

  • A surrogate key is an artificial, system-generated unique identifier for a row in a table.

  • Usually integer or bigint, auto-incremented or generated by the system.

  • Not derived from business data (unlike natural keys).

Example:

customer_key INT IDENTITY(1,1) -- surrogate key

2️⃣ When to Use Surrogate Keys

ScenarioReason / BenefitExample
Data Warehouse / Fact-Dimension ModelingAllows consistent foreign key joins, supports slowly changing dimensions (SCD), avoids issues when natural keys change.Dim_Customer with customer_key used in Fact_Order_Sales
Slowly Changing Dimensions (Type 2)A surrogate key lets you track history without altering historical fact data.Customer changes address → new surrogate key row in Dim_Customer
Composite / Complex Natural KeysAvoid long, multi-column foreign keys.Order detail identified by order_id + line_number → simpler to use order_line_key
Integration from Multiple SystemsDifferent source systems may use different IDs → surrogate key standardizes references.Merging customer_id from two CRMs
Performance / IndexingInteger surrogate keys are more efficient for joins than long string keys.Joining Dim_Product.product_key to fact table instead of product_sku

3️⃣ When NOT to Use Surrogate Keys

ScenarioReason / Alternative
Transactional OLTP Tables where natural key is stable and simpleSurrogate key adds no value; natural key can uniquely identify the row.
When key has real-world meaning and is immutableUsing a surrogate key may cause unnecessary joins and duplication.
Small reference tables with static valuesSurrogate key adds complexity without benefit.
Temporary / staging tablesSurrogate keys are unnecessary; use natural key or row hash.
Where natural key is already numeric, short, and stableSurrogate key is redundant

4️⃣ Key Guidelines

  1. Always use surrogate keys in data warehouse dimensions and fact table foreign keys.

  2. Avoid surrogate keys in OLTP tables with stable, meaningful natural keys.

  3. Never mix natural keys and surrogate keys for joins in the same context—it confuses lineage.

  4. Surrogate keys are particularly helpful for history tracking (SCD2) and integration of multiple sources

22jan26 - Food Delivery Business – Data Modeling from OLAP - V1

 Below is a complete, end-to-end explanation for a Food Delivery Business, written exactly the way it is expected in projects, design documents, interviews, and architecture reviews.


🍔 Food Delivery Business – Data Modeling from OLTP to Star Schema


1️⃣ Transaction Model (Enterprise / OLTP Model)

🎯 Purpose

  • Support day-to-day operations

  • Optimized for fast inserts & updates

  • Highly normalized (3NF)


📋 Core Transaction Tables

Customers

  • customer_id (PK)

  • name

  • phone

  • email

  • address_id

  • created_date

Restaurants

  • restaurant_id (PK)

  • restaurant_name

  • cuisine_type

  • city

  • rating

Menu_Items

  • menu_item_id (PK)

  • restaurant_id (FK)

  • item_name

  • category

  • price

  • is_available

Orders

  • order_id (PK)

  • customer_id (FK)

  • restaurant_id (FK)

  • order_datetime

  • order_status

  • total_amount

Order_Items

  • order_item_id (PK)

  • order_id (FK)

  • menu_item_id (FK)

  • quantity

  • item_price

Delivery_Partners

  • delivery_partner_id (PK)

  • name

  • vehicle_type

  • join_date

Payments

  • payment_id (PK)

  • order_id (FK)

  • payment_mode

  • payment_status

  • paid_amount


2️⃣ Need for Reporting & Analytics

📊 Business Questions

  • Daily / monthly revenue?

  • Top restaurants by sales?

  • Most ordered food items?

  • Average delivery time?

  • Customer repeat rate?

  • City-wise performance?

  • Partner productivity?


3️⃣ Problems Reporting Directly from Transaction Model ❌

ProblemExplanation
Too many joins8–10 joins per report
Poor performanceTables optimized for writes
Complex queriesHard for BI tools
Mixed grainOrders vs order items
No historical trackingOverwritten data
Business logic scatteredRevenue logic repeated

📌 Conclusion:

Transaction models are NOT designed for analytics.


4️⃣ Why We Need a Data Warehouse ✅

Benefits

  • Optimized for read & aggregation

  • Simple query structure

  • Historical data support

  • Business-friendly schema

  • Faster dashboards


5️⃣ Define Grain & Granularity (MOST IMPORTANT)

🎯 Reporting Focus

Analyze food orders and revenue


✅ Fact Table Grain

ONE ROW per menu item per order per day

Formally:

Order × Menu Item × Date

📌 Granularity

  • Lowest level: Order Item

  • Time level: Day

  • Allows roll-up to:

    • Order

    • Restaurant

    • City

    • Month / Year


6️⃣ Map Business Requirements to Fact & Dimension Tables

Business RequirementTable TypeTable Name
Revenue analysisFactFact_Order_Sales
Time-based analysisDimensionDim_Date
Customer analysisDimensionDim_Customer
Restaurant performanceDimensionDim_Restaurant
Menu analysisDimensionDim_Menu_Item
Location analysisDimensionDim_Location
Delivery partner analysisDimensionDim_Delivery_Partner
Payment analysisDimensionDim_Payment

7️⃣ Fact & Dimension Tables List

Fact Table

Fact_Order_Sales

Grain: One row per menu item per order per day

Measures

  • quantity_ordered

  • item_sales_amount

  • discount_amount

  • delivery_fee

  • total_sales_amount

  • delivery_time_minutes

Foreign Keys

  • date_key

  • customer_key

  • restaurant_key

  • menu_item_key

  • delivery_partner_key

  • location_key

  • payment_key


Dimension Tables

Dim_Date

  • date_key (PK)

  • full_date

  • day

  • month

  • quarter

  • year

  • is_weekend


Dim_Customer

  • customer_key (PK)

  • customer_id

  • name

  • gender

  • age_group

  • city

  • signup_date

  • is_current (SCD2)


Dim_Restaurant

  • restaurant_key (PK)

  • restaurant_id

  • restaurant_name

  • cuisine_type

  • rating

  • is_active


Dim_Menu_Item

  • menu_item_key (PK)

  • item_name

  • category

  • veg_flag

  • price


Dim_Location

  • location_key (PK)

  • city

  • state

  • country

  • zone


Dim_Delivery_Partner

  • delivery_partner_key (PK)

  • partner_name

  • vehicle_type

  • experience_level


Dim_Payment

  • payment_key (PK)

  • payment_mode

  • payment_provider


8️⃣ Star Schema Structure ⭐

                 Dim_Date
                    |
Dim_Customer — Fact_Order_Sales — Dim_Restaurant
                    |
              Dim_Menu_Item
                    |
              Dim_Location
                    |
          Dim_Delivery_Partner
                    |
                Dim_Payment

9️⃣ Why STAR Schema is Preferred over Snowflake ⭐⭐⭐

Star SchemaSnowflake Schema
Fewer joinsMore joins
Faster queriesSlower
Simple for BIComplex
Business friendlyTechnical
Easy maintenanceHarder

📌 Rule:

Use Snowflake only when dimensions are extremely large or reused.


🔟 Example Business Queries

1️⃣ Daily Revenue

SELECT d.full_date, SUM(f.total_sales_amount) revenue
FROM Fact_Order_Sales f
JOIN Dim_Date d ON f.date_key = d.date_key
GROUP BY d.full_date;

2️⃣ Top 5 Restaurants by Revenue

SELECT r.restaurant_name, SUM(f.total_sales_amount) revenue
FROM Fact_Order_Sales f
JOIN Dim_Restaurant r ON f.restaurant_key = r.restaurant_key
GROUP BY r.restaurant_name
ORDER BY revenue DESC
LIMIT 5;

3️⃣ Most Ordered Menu Items

SELECT m.item_name, SUM(f.quantity_ordered) qty
FROM Fact_Order_Sales f
JOIN Dim_Menu_Item m ON f.menu_item_key = m.menu_item_key
GROUP BY m.item_name
ORDER BY qty DESC;

1️⃣1️⃣ Common Mistakes ❌

  1. Mixed grain in fact table

  2. Storing customer name in fact

  3. No date dimension

  4. Snowflaking without reason

  5. Missing surrogate keys

  6. Overloading fact tables

  7. No documentation of grain


🎯 Final Summary

  • OLTP → operations

  • Data Warehouse → analytics

  • Grain defines correctness

  • Star schema = performance + simplicity


 Here’s a comprehensive SQL DDL for all six fact tables in the Food Delivery Data Warehouse, following best practices (surrogate keys, proper data types, foreign keys, and measures only in fact tables).


1️⃣ Fact_Order_Sales

CREATE TABLE Fact_Order_Sales (
    sales_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    date_key INT NOT NULL,
    customer_key INT NOT NULL,
    restaurant_key INT NOT NULL,
    menu_item_key INT NOT NULL,
    delivery_partner_key INT,
    location_key INT,
    payment_key INT,
    order_id VARCHAR(50) NOT NULL,
    order_line_no INT NOT NULL,
    quantity_sold INT NOT NULL,
    item_price DECIMAL(10,2) NOT NULL,
    discount_amount DECIMAL(10,2),
    delivery_fee DECIMAL(10,2),
    total_sales_amount DECIMAL(12,2) NOT NULL,
    cost_amount DECIMAL(10,2),
    profit_amount AS (total_sales_amount - cost_amount) PERSISTED,
    CONSTRAINT fk_order_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_customer FOREIGN KEY (customer_key) REFERENCES Dim_Customer(customer_key),
    CONSTRAINT fk_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key),
    CONSTRAINT fk_menu_item FOREIGN KEY (menu_item_key) REFERENCES Dim_Menu_Item(menu_item_key),
    CONSTRAINT fk_delivery_partner FOREIGN KEY (delivery_partner_key) REFERENCES Dim_Delivery_Partner(delivery_partner_key),
    CONSTRAINT fk_location FOREIGN KEY (location_key) REFERENCES Dim_Location(location_key),
    CONSTRAINT fk_payment FOREIGN KEY (payment_key) REFERENCES Dim_Payment(payment_key)
);

2️⃣ Fact_Customer_Activity

CREATE TABLE Fact_Customer_Activity (
    activity_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    date_key INT NOT NULL,
    customer_key INT NOT NULL,
    orders_count INT NOT NULL,
    total_spend DECIMAL(12,2),
    avg_order_value DECIMAL(10,2),
    last_order_days INT,
    is_repeat_customer BIT,
    CONSTRAINT fk_cust_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_customer FOREIGN KEY (customer_key) REFERENCES Dim_Customer(customer_key)
);

3️⃣ Fact_Order_Forecast

CREATE TABLE Fact_Order_Forecast (
    forecast_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    forecast_date_key INT NOT NULL,
    restaurant_key INT NOT NULL,
    location_key INT NOT NULL,
    forecast_orders INT NOT NULL,
    forecast_revenue DECIMAL(12,2),
    model_version VARCHAR(50),
    CONSTRAINT fk_forecast_date FOREIGN KEY (forecast_date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_forecast_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key),
    CONSTRAINT fk_forecast_location FOREIGN KEY (location_key) REFERENCES Dim_Location(location_key)
);

4️⃣ Fact_Menu_Item_Performance

CREATE TABLE Fact_Menu_Item_Performance (
    menu_item_perf_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    date_key INT NOT NULL,
    menu_item_key INT NOT NULL,
    restaurant_key INT NOT NULL,
    quantity_sold INT NOT NULL,
    item_revenue DECIMAL(12,2),
    item_cost DECIMAL(12,2),
    item_profit AS (item_revenue - item_cost) PERSISTED,
    popularity_rank INT,
    CONSTRAINT fk_mip_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_mip_menu_item FOREIGN KEY (menu_item_key) REFERENCES Dim_Menu_Item(menu_item_key),
    CONSTRAINT fk_mip_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key)
);

5️⃣ Fact_Season_Sales

CREATE TABLE Fact_Season_Sales (
    season_sales_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    date_key INT NOT NULL,
    season_key INT NOT NULL,
    location_key INT NOT NULL,
    total_orders INT NOT NULL,
    total_revenue DECIMAL(12,2),
    avg_delivery_time DECIMAL(6,2),
    CONSTRAINT fk_ss_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_ss_season FOREIGN KEY (season_key) REFERENCES Dim_Season(season_key),
    CONSTRAINT fk_ss_location FOREIGN KEY (location_key) REFERENCES Dim_Location(location_key)
);

6️⃣ Fact_Delivery_Performance

CREATE TABLE Fact_Delivery_Performance (
    delivery_perf_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    date_key INT NOT NULL,
    order_id VARCHAR(50) NOT NULL,
    delivery_partner_key INT NOT NULL,
    restaurant_key INT NOT NULL,
    location_key INT,
    promised_time_minutes INT,
    actual_time_minutes INT,
    delay_minutes AS (actual_time_minutes - promised_time_minutes) PERSISTED,
    sla_met_flag BIT,
    CONSTRAINT fk_dp_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_dp_partner FOREIGN KEY (delivery_partner_key) REFERENCES Dim_Delivery_Partner(delivery_partner_key),
    CONSTRAINT fk_dp_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key),
    CONSTRAINT fk_dp_location FOREIGN KEY (location_key) REFERENCES Dim_Location(location_key)
);

7️⃣ Fact_Store_Performance

CREATE TABLE Fact_Store_Performance (
    store_perf_key BIGINT IDENTITY(1,1) PRIMARY KEY,
    date_key INT NOT NULL,
    restaurant_key INT NOT NULL,
    total_orders INT NOT NULL,
    total_revenue DECIMAL(12,2),
    avg_order_value DECIMAL(10,2),
    avg_rating DECIMAL(3,2),
    cancellation_rate DECIMAL(5,2),
    CONSTRAINT fk_sp_date FOREIGN KEY (date_key) REFERENCES Dim_Date(date_key),
    CONSTRAINT fk_sp_restaurant FOREIGN KEY (restaurant_key) REFERENCES Dim_Restaurant(restaurant_key)
);

✅ Notes on DDL Design

  • Surrogate Keys for all fact tables (BIGINT IDENTITY)

  • Foreign keys reference dimension tables

  • Derived measures calculated in the fact table (profit_amount, delay_minutes)

  • Granularity enforced by primary key + business rules

  • Numeric measures only (no descriptive attributes)


If you want, I can also write the SQL DDL for all dimension tables to make this a fully deployable DW schema.

Do you want me to do that next?



22JAN26 - Best practices for Data modeling 1

  

---


## Data Modeling Best Practices – Headings


1. Start with Business Requirements (Not Tables)

2. Define Grain First (Critical Rule)

3. Choose the Right Model Type

4. Use Clear Naming Conventions

5. Design Clean Dimension Tables

6. Keep Fact Tables Lean

7. Enforce Data Integrity

8. Handle Dates Properly

9. Optimize for Query Performance

10. Plan for Data Growth

11. Document Everything

12. Validate with Real Data

13. Version Control Your Models

14. Avoid Over-Normalization in Warehouses

15. Design for BI & Reporting Tools


 

---------------------------------------------------------------------------------------------------------------------------


## 1. Start with Business Requirements (Not Tables)


**Best Practice**


* Understand **what questions the business wants answered**, not just source schemas.


**Why**


* Prevents over-engineering and unused columns/tables.


**How**


* Identify KPIs, reports, dashboards

* Define **facts**, **dimensions**, and **grain**

* Write requirements in plain language

  *Example: “Daily sales by product and region”*



---------------------------------------------------------------------------------------------------------------------------


## 2. Define Grain First (Critical Rule)


**Best Practice**


* Clearly define the **lowest level of detail** for each fact table.


**Why**


* Prevents aggregation errors and duplicate data.


**Example**


* ❌ “Sales table” (too vague)

* ✅ “One row per product, per order, per day”


---

---------------------------------------------------------------------------------------------------------------------------


## 3. Choose the Right Model Type


### a) OLTP (Transactional Systems)


* Use **3NF (Third Normal Form)**

* Minimize redundancy

* Optimize for **writes**


### b) OLAP / Data Warehouse


* Use **Dimensional Modeling**

* Star schema preferred

* Optimize for **reads & analytics**


**Rule of Thumb**


> Applications → normalized

> Analytics → dimensional


---

---------------------------------------------------------------------------------------------------------------------------


## 4. Use Clear Naming Conventions


**Best Practice**


* Use **business-friendly, consistent names**


**Examples**


* `customer_id` (not `custid`)

* `order_date` (not `dt1`)

* Tables: singular or plural—choose one and stick to it


**Why**


* Improves usability for analysts and BI tools.


---------------------------------------------------------------------------------------------------------------------------


## 5. Design Clean Dimension Tables


**Best Practices**


* Use **surrogate keys** (not natural keys)

* Keep dimensions **wide, not deep**

* Handle Slowly Changing Dimensions (SCD)


**SCD Types**


* Type 1 – overwrite (no history)

* Type 2 – add new row (full history) ⭐ most common

* Type 3 – limited history


---------------------------------------------------------------------------------------------------------------------------


## 6. Keep Fact Tables Lean


**Best Practices**


* Store only:


  * Foreign keys

  * Numeric measures

* Avoid text fields in facts


**Example**


* ✔ `sales_amount`, `quantity`

* ❌ `customer_name`, `product_description`


---

---------------------------------------------------------------------------------------------------------------------------


## 7. Enforce Data Integrity


**Best Practice**


* Use constraints where possible


**Examples**


* Primary keys

* Foreign keys

* NOT NULL for required fields


**Why**


* Prevents silent data corruption.


---

---------------------------------------------------------------------------------------------------------------------------


## 8. Handle Dates Properly


**Best Practice**


* Use a **Date Dimension**, not raw dates everywhere


**Benefits**


* Easy filtering by:


  * Year

  * Month

  * Quarter

  * Fiscal periods

  * Holidays


---

---------------------------------------------------------------------------------------------------------------------------


## 9. Optimize for Query Performance


**Best Practices**


* Index:


  * Foreign keys

  * Frequently filtered columns

* Use partitioning for large fact tables

* Avoid unnecessary joins


**Cloud Tip (AWS)**


* Redshift / Athena → sort keys & distribution keys

* RDS → proper indexing and query plans


---

---------------------------------------------------------------------------------------------------------------------------


## 10. Plan for Data Growth


**Best Practice**


* Design for **scale from day one**


**How**


* Avoid hard-coding values

* Use surrogate keys

* Anticipate new dimensions & measures


---

---------------------------------------------------------------------------------------------------------------------------


## 11. Document Everything


**Best Practice**


* Maintain data model documentation


**Include**


* Table purpose

* Column definitions

* Grain

* Business rules


**Why**


* Reduces dependency on individuals.


---

---------------------------------------------------------------------------------------------------------------------------


## 12. Validate with Real Data


**Best Practice**


* Test models using **actual data volumes**


**Checks**


* Row counts

* Duplicates

* Aggregation accuracy

* Edge cases


---

---------------------------------------------------------------------------------------------------------------------------


## 13. Version Control Your Models


**Best Practice**


* Treat data models like code


**How**


* Store DDL in Git

* Track schema changes

* Use migration scripts


---

---------------------------------------------------------------------------------------------------------------------------


## 14. Avoid Over-Normalization in Warehouses


**Best Practice**


* Prefer **simplicity over theoretical purity**


**Why**


* BI tools and users prefer fewer joins.


---

---------------------------------------------------------------------------------------------------------------------------


## 15. Design for BI & Reporting Tools


**Best Practice**


* Models should be **self-service friendly**


**How**


* Meaningful column names

* Pre-calculated metrics where useful

* Consistent dimensions across facts (conformed dimensions)


---

---------------------------------------------------------------------------------------------------------------------------


## Golden Rule Summary


> **A good data model is easy to understand, hard to misuse, and flexible to change.**


------------------------------------------------------------------------------------------------------------------------------


 


Monday, January 19, 2026

20JAN26 - Enterprise Data Model (EDM) Steps

 

Enterprise Data Model (EDM) Steps

  Enterprise Data Model (EDM) steps to create a data warehouse for northwind database with 5 different reports with granule and granularity  

Data Warehouse Design for Northwind

(With Granule & Granularity)


1. Define Business Objective

Goal
Provide consistent enterprise-wide reporting for:

  • Sales

  • Products

  • Customers

  • Employees

  • Logistics

EDM Focus
Single version of truth across all reports.


2. Identify Business Processes

(Kimball + EDM aligned)

ProcessDescription
Order ProcessingCustomer orders and order lines
Sales AnalysisRevenue & quantity
Product AnalysisProduct & category trends
Customer AnalysisBuying behavior
Shipping AnalysisDelivery performance

3. Define Enterprise Data Domains (EDM)

DomainEntities
SalesOrders, Order Details
CustomerCustomers
ProductProducts, Categories, Suppliers
EmployeeEmployees
TimeDate
LogisticsShippers

4. Conceptual Enterprise Data Model (High-Level)

Core Entities

  • Customer

  • Order

  • Order Line

  • Product

  • Employee

  • Shipper

  • Date

Key Relationships

  • Customer → Order (1:M)

  • Order → Order Line (1:M)

  • Product → Order Line (1:M)

At EDM level, no granularity yet—only business meaning.


5. Identify the Fact & Define the Granule (MOST IMPORTANT STEP)

🎯 Fact Table: FactSales

🔹 Granule (Grain)

One record per product per order per day

This means:

  • Each row = 1 product line item in an order

  • Lowest level of detail available from source

🔹 Granularity

  • Daily

  • Order Line level

  • Transactional

✔ Supports all summary and detailed reports
✔ Prevents double counting


6. Logical EDM → Dimensional Model

Fact Table

FactSales

AttributeDescription
OrderIDDegenerate dimension
DateKeyOrder date
CustomerKeyFK
ProductKeyFK
EmployeeKeyFK
ShipperKeyFK
QuantityUnits sold
UnitPricePrice
DiscountDiscount
SalesAmountCalculated

Dimension Tables (Granularity Alignment)

DimensionGrain
DimCustomerOne row per customer
DimProductOne row per product
DimEmployeeOne row per employee
DimDateOne row per day
DimShipperOne row per shipper

7. Granule vs Granularity (Clear Distinction)

TermMeaningExample
GranuleWhat one row representsOne product in one order
GranularityLevel of detailDaily, transactional

8. Reports with Granule & Granularity Mapping


📊 Report 1: Executive Sales Summary

Required Granularity

  • Monthly / Yearly

Derived From Granule

  • Aggregated from order-line daily grain

KPIs

  • Total Revenue

  • Total Orders

  • Average Order Value

✔ Safe aggregation because base grain is atomic


📊 Report 2: Product Performance Report

Granularity

  • Product × Month

KPIs

  • Revenue per Product

  • Units Sold

  • Category Contribution %

Uses

  • DimProduct

  • FactSales


📊 Report 3: Customer Analysis Report

Granularity

  • Customer × Quarter

KPIs

  • Customer Lifetime Value

  • Avg Purchase Frequency

  • Repeat Purchase Rate

Grain Dependency

  • Needs order-line detail to avoid duplication


📊 Report 4: Employee Sales Performance

Granularity

  • Employee × Month

KPIs

  • Revenue per Employee

  • Orders Handled

  • Avg Discount %

Grain Rule

Employee linked at order level, rolled up from line items.


📊 Report 5: Shipping & Logistics Report

Granularity

  • Shipper × Month

KPIs

  • Avg Shipping Days

  • On-Time Delivery %

  • Orders Shipped

Grain

  • Calculated using order-line facts + order dates


9. Why Lowest Granularity Wins (EDM Rule)

BenefitExplanation
FlexibilitySupports future reports
AccuracyNo aggregation loss
PerformanceCan pre-aggregate later
GovernanceSingle business definition

10. EDM Traceability Matrix

EDM EntityDW ObjectGrain
OrderFactSalesOrder Line
ProductDimProductProduct
CustomerDimCustomerCustomer
EmployeeDimEmployeeEmployee
ShipperDimShipperShipper

11. Final EDM-Driven Architecture

Northwind OLTP
     ↓
 Staging (Raw Data)
     ↓
 Enterprise Data Warehouse
     ↓
 Star Schema (FactSales)
     ↓
 Aggregates / Data Marts
     ↓
 Reports & KPIs

Key Exam / Interview Takeaway

Always define the grain before building the fact table.
Granule = what one row means
Granularity = level of detail

Dimensional Data Modeling Steps

 

  Dimensional Data Modeling  Steps – Northwind Data Warehouse

1. Understand Business Requirements

  • Identify business users (Sales, Finance, Operations).

  • Define key business questions (sales performance, customer behavior, product demand).

  • Finalize reporting needs and KPIs.


2. Identify Business Processes

Typical Northwind business processes:

  • Sales Orders

  • Product Sales

  • Shipping & Logistics

  • Customer Transactions

  • Supplier Procurement


3. Declare the Grain (Most Important Step)

  • Decide exactly what a single row in each fact table represents

  • Example:

    • One row per Order Line per Product per Day

This decision controls:

  • Level of detail

  • Storage size

  • Reporting flexibility


4. Identify Dimensions

Common dimensions from Northwind:

  • Date

  • Customer

  • Product

  • Employee

  • Supplier

  • Shipper

  • Category

  • Geography (Country, City, Region)


5. Identify Facts (Measures)

Numeric, additive values:

  • Sales Amount

  • Quantity Sold

  • Discount Amount

  • Freight Cost

  • Order Count


6. Design Star Schema

  • Fact table in the center

  • Dimension tables around it

  • Avoid snowflaking (keep dimensions denormalized)


7. Define Surrogate Keys

  • Create surrogate keys for all dimensions

  • Maintain Slowly Changing Dimensions (SCD Type 1 / Type 2 where required)


8. Design ETL Process

  • Extract data from Northwind OLTP

  • Clean, transform, and conform data

  • Load dimensions first, then facts


9. Validate with Sample Reports

  • Ensure the model supports all required reports

  • Check aggregations, filters, and drill-downs


10. Deploy & Optimize

  • Index fact tables

  • Partition by Date

  • Enable BI tools (Power BI / Tableau / QuickSight)



B. Core Star Schema (Northwind)

Fact Table

Fact_Sales

ColumnDescription
Date_Key
Product_Key
Customer_Key
Employee_Key
Shipper_Key
Quantity
Unit_Price
Discount
Sales_Amount
Freight

Dimension Tables

  • Dim_Date

  • Dim_Product

  • Dim_Customer

  • Dim_Employee

  • Dim_Shipper

  • Dim_Supplier

  • Dim_Category



C. Five Business Reports with Grain & Granularity


1. Sales Performance Report

Business Question:
How much revenue is generated over time?

Fact Table: Fact_Sales
Dimensions: Date, Product, Category

Grain (Granule):
👉 One row per Product per Day

Granularity:

  • Daily → Monthly → Yearly

Measures:

  • Total Sales Amount

  • Total Quantity Sold


2. Customer Purchase Behavior Report

Business Question:
Which customers generate the most revenue?

Fact Table: Fact_Sales
Dimensions: Customer, Date, Geography

Grain:
👉 One row per Customer per Order per Day

Granularity:

  • Order level → Monthly → Yearly

Measures:

  • Sales Amount

  • Order Count


3. Product Performance Report

Business Question:
Which products are best and worst sellers?

Fact Table: Fact_Sales
Dimensions: Product, Category, Supplier

Grain:
👉 One row per Product per Order Line

Granularity:

  • Order Line → Product → Category

Measures:

  • Quantity Sold

  • Revenue

  • Discount Amount


4. Employee Sales Contribution Report

Business Question:
How well are employees performing in sales?

Fact Table: Fact_Sales
Dimensions: Employee, Date, Region

Grain:
👉 One row per Employee per Order

Granularity:

  • Order → Month → Year

Measures:

  • Total Sales

  • Orders Handled


5. Shipping & Logistics Cost Report

Business Question:
What is the shipping cost trend by shipper?

Fact Table: Fact_Sales
Dimensions: Shipper, Date, Geography

Grain:
👉 One row per Order per Shipper per Day

Granularity:

  • Daily → Monthly → Yearly

Measures:

  • Freight Cost

  • Number of Shipments



D. Summary Table

ReportGrainGranularity
Sales PerformanceProduct per DayDay → Month → Year
Customer BehaviorCustomer per OrderOrder → Month
Product PerformanceProduct per Order LineLine → Product
Employee PerformanceEmployee per OrderOrder → Year
Shipping CostOrder per ShipperDay → Year

EDM vs OLTP are both the same

 Short answer:

👉 No. EDM is NOT the same as transactional OLTP — but they are closely related.

Let’s clear this up cleanly and precisely (this is a very common interview confusion).


1. Simple One-Line Difference

  • EDM = Blueprint / logical model of enterprise data

  • OLTP = Physical system that runs transactions

Think: EDM is the design, OLTP is the implementation


2. How They Relate (But Are Not the Same)

AspectEDM (Enterprise Data Model)OLTP (Transactional System)
What it isLogical / conceptual data modelOperational database
PurposeDefine what data exists and how it relatesRun business transactions
UsageArchitecture, governance, integrationDay-to-day operations
NormalizationHighly normalized (3NF)Highly normalized (3NF)
UsersData architectsApplications, end users
Data VolumeStructure-focusedTransaction-focused

3. Why People Confuse EDM with OLTP

Because:

  • Both are normalized

  • Both use ER diagrams

  • Both model entities like Customer, Order, Product

  • OLTP databases are usually built from EDM

➡️ But they are not the same thing.


4. Correct Relationship (Very Important)

EDM (Logical Blueprint)
        ↓
OLTP Schema (Physical Implementation)
        ↓
ODS (Optional)
        ↓
Dimensional Model (DW / Data Mart)
  • EDM defines what should exist

  • OLTP implements it with tables, indexes, constraints

  • Dimensional Model reshapes it for analytics


5. Example (Northwind)

EDM View (Logical)

  • Customer

  • Order

  • Order Line

  • Product

  • Supplier

Relationships only — no performance details.


OLTP View (Physical)

  • Customers table

  • Orders table

  • Order_Details table

  • Foreign keys, indexes, constraints

  • Supports INSERT / UPDATE / DELETE


Dimensional View (Analytics)

  • Fact_Sales

  • Dim_Customer

  • Dim_Product

  • Dim_Date


6. Key Interview Statement (Use This)

EDM is not an OLTP system; it is the logical foundation from which OLTP systems are built.

or

OLTP databases are physical realizations of an enterprise or application data model, often derived from an EDM.


7. Final Verdict

  • ❌ EDM ≠ OLTP

  • ✅ EDM → guides OLTP design

  • ✅ OLTP → optimized for transactions

  • ✅ Dimensional Model → optimized for analytics

If you want, I can also give:

  • EDM vs OLTP vs ODS vs DW in one table

  • Real interview Q&A

  • Architecture diagram explanation

Just say the word 👍

Monday, December 8, 2025

SQL : FIND SALES TREND

 use db1;


 

DROP TABLE IF EXISTS Orders;

 

CREATE TABLE Orders (

    OrderID INT PRIMARY KEY,

    Amount DECIMAL(10, 2) NOT NULL,

    OrderDate DATE NOT NULL,

    IsReturn INT NOT NULL  -- 0 for New Order, 1 for Return

);

 

-----


 

 

-- ``` insert stmts ---- 

INSERT INTO Orders (OrderID, Amount, OrderDate, IsReturn) VALUES

(1001, 45.00, '2025-12-01', 0),

(1002, 120.50, '2025-12-01', 0),

(1003, 25.99, '2025-12-02', 0),

(1004, 99.00, '2025-12-02', 0),

(1005, 15.00, '2025-12-02', 1),

(1006, 350.00, '2025-12-03', 0),

(1007, 75.25, '2025-12-03', 0),

(1008, 18.00, '2025-12-03', 0),

(1009, 22.00, '2025-12-03', 1),

(1010, 145.00, '2025-12-04', 0),

(1011, 55.00, '2025-12-04', 0),

(1012, 60.00, '2025-12-04', 0),

(1013, 70.00, '2025-12-04', 0),

(1014, 12.99, '2025-12-04', 0),

(1015, 88.00, '2025-12-05', 0),

(1016, 210.00, '2025-12-05', 0),

(1017, 10.00, '2025-12-05', 1),

(1018, 45.50, '2025-12-06', 0),

(1019, 105.00, '2025-12-06', 0),

(1020, 180.00, '2025-12-07', 0),

(1021, 50.00, '2025-12-07', 0),

(1022, 250.00, '2025-12-07', 1),

(1023, 33.00, '2025-12-08', 0),

(1024, 175.00, '2025-12-08', 0),

(1025, 110.00, '2025-12-09', 0),

(1026, 7.99, '2025-12-09', 0),

(1027, 150.00, '2025-12-10', 0),

(1028, 40.00, '2025-12-10', 0),

(1029, 25.00, '2025-12-10', 1),

(1030, 13.50, '2025-12-10', 0);


select * from orders ;


-- -------  query to find trend sales increasing or not ---


WITH DailyCounts AS (

    -- 1. Calculate the total number of orders for each date

    SELECT

        OrderDate,

        COUNT(OrderID) AS DailyOrderCount

    FROM

        Orders

    GROUP BY

        OrderDate

)

SELECT

    OrderDate,

        -- Get the order count from the previous day, using 0 as a default for the first day

    LAG(DailyOrderCount, 1, 0) OVER (ORDER BY OrderDate) AS PreviousDayCount,

    DailyOrderCount,

    

    -- -----------------------------

    -- 2. Compare current count to previous count to determine the trend

    CASE

        WHEN DailyOrderCount > LAG(DailyOrderCount, 1, 0) OVER (ORDER BY OrderDate) THEN 'Increasing ⬆️'

        WHEN DailyOrderCount < LAG(DailyOrderCount, 1, 0) OVER (ORDER BY OrderDate) THEN 'Decreasing ⬇️'

        ELSE 'Stable/Same'

    END AS Trend

FROM

    DailyCounts

ORDER BY

    OrderDate;

23feb26 - cab service project -data modeling project

  The Full Data Modeling Implementation Steps Phase Description & Methodology Primary Tools Phase 1: Discovery Ask the 10-Section Questi...