What is a Surrogate Key?
-
A surrogate key is an artificial, system-generated unique identifier for a row in a table.
-
Usually integer or bigint, auto-incremented or generated by the system.
-
Not derived from business data (unlike natural keys).
Example:
2️⃣ When to Use Surrogate Keys
| Scenario | Reason / Benefit | Example |
|---|---|---|
| Data Warehouse / Fact-Dimension Modeling | Allows consistent foreign key joins, supports slowly changing dimensions (SCD), avoids issues when natural keys change. | Dim_Customer with customer_key used in Fact_Order_Sales |
| Slowly Changing Dimensions (Type 2) | A surrogate key lets you track history without altering historical fact data. | Customer changes address → new surrogate key row in Dim_Customer |
| Composite / Complex Natural Keys | Avoid long, multi-column foreign keys. | Order detail identified by order_id + line_number → simpler to use order_line_key |
| Integration from Multiple Systems | Different source systems may use different IDs → surrogate key standardizes references. | Merging customer_id from two CRMs |
| Performance / Indexing | Integer surrogate keys are more efficient for joins than long string keys. | Joining Dim_Product.product_key to fact table instead of product_sku |
3️⃣ When NOT to Use Surrogate Keys
| Scenario | Reason / Alternative |
|---|---|
| Transactional OLTP Tables where natural key is stable and simple | Surrogate key adds no value; natural key can uniquely identify the row. |
| When key has real-world meaning and is immutable | Using a surrogate key may cause unnecessary joins and duplication. |
| Small reference tables with static values | Surrogate key adds complexity without benefit. |
| Temporary / staging tables | Surrogate keys are unnecessary; use natural key or row hash. |
| Where natural key is already numeric, short, and stable | Surrogate key is redundant |
4️⃣ Key Guidelines
-
Always use surrogate keys in data warehouse dimensions and fact table foreign keys.
-
Avoid surrogate keys in OLTP tables with stable, meaningful natural keys.
-
Never mix natural keys and surrogate keys for joins in the same context—it confuses lineage.
-
Surrogate keys are particularly helpful for history tracking (SCD2) and integration of multiple sources
No comments:
Post a Comment