Wednesday, January 21, 2026

22jan26 - Surrogate keys - when to use and not use

 

What is a Surrogate Key?

  • A surrogate key is an artificial, system-generated unique identifier for a row in a table.

  • Usually integer or bigint, auto-incremented or generated by the system.

  • Not derived from business data (unlike natural keys).

Example:

customer_key INT IDENTITY(1,1) -- surrogate key

2️⃣ When to Use Surrogate Keys

ScenarioReason / BenefitExample
Data Warehouse / Fact-Dimension ModelingAllows consistent foreign key joins, supports slowly changing dimensions (SCD), avoids issues when natural keys change.Dim_Customer with customer_key used in Fact_Order_Sales
Slowly Changing Dimensions (Type 2)A surrogate key lets you track history without altering historical fact data.Customer changes address → new surrogate key row in Dim_Customer
Composite / Complex Natural KeysAvoid long, multi-column foreign keys.Order detail identified by order_id + line_number → simpler to use order_line_key
Integration from Multiple SystemsDifferent source systems may use different IDs → surrogate key standardizes references.Merging customer_id from two CRMs
Performance / IndexingInteger surrogate keys are more efficient for joins than long string keys.Joining Dim_Product.product_key to fact table instead of product_sku

3️⃣ When NOT to Use Surrogate Keys

ScenarioReason / Alternative
Transactional OLTP Tables where natural key is stable and simpleSurrogate key adds no value; natural key can uniquely identify the row.
When key has real-world meaning and is immutableUsing a surrogate key may cause unnecessary joins and duplication.
Small reference tables with static valuesSurrogate key adds complexity without benefit.
Temporary / staging tablesSurrogate keys are unnecessary; use natural key or row hash.
Where natural key is already numeric, short, and stableSurrogate key is redundant

4️⃣ Key Guidelines

  1. Always use surrogate keys in data warehouse dimensions and fact table foreign keys.

  2. Avoid surrogate keys in OLTP tables with stable, meaningful natural keys.

  3. Never mix natural keys and surrogate keys for joins in the same context—it confuses lineage.

  4. Surrogate keys are particularly helpful for history tracking (SCD2) and integration of multiple sources

No comments:

Post a Comment

29jan26 - Physical model without a Logical one?

  Can you create a Physical model without a Logical one? The short answer: Yes. Technically, you can jump straight into a Physical Data Mod...