Tuesday, May 12, 2026

12may26- Mistakes in data modeling

 

1. The "Hard-Coded" Unit of Measure

In our manufacturing project, we built a schema for Raw Materials but ignored the detail that different global plants might use different measurement systems.

  • The Mistake: We modeled the Quantity column as a simple decimal and assumed everyone used Metric (kilograms) without including a UnitID or UOM column.

  • The Consequence: When the company acquired a US-based plant using Imperial units, the system began mixing pounds and kilograms in the same column. Production batches failed because the formulas were chemically incorrect, leading to thousands of dollars in wasted physical inventory.

  • How we handled it: We performed a massive data migration to add a UnitOfMeasure reference table. We then implemented a calculation layer in the database that automatically converted all inputs into a "Base Unit" while storing the original local unit for the plant’s UI.


2. Overlooking "Time-Travel" (Temporal Data)

We ignored the detail that Product Costs are not static; they fluctuate based on the market price of raw materials like steel or plastic.

  • The Mistake: We simply overwrote the UnitCost in the Parts table whenever the price changed, rather than creating a historical price table.

  • The Consequence: When the finance team tried to run a "Year-to-Date" profitability report, the system calculated January’s sales using May’s high inflation prices. All historical financial reports were wrong, making it look like the company had lost money all year when it hadn't.

  • How we handled it: We moved the cost attribute out of the main Parts table and into a PartPriceHistory table. We used EffectiveDate and EndDate columns so the system could "look back in time" and join the correct price to the date the order was actually placed.


3. The "Flat" Equipment Hierarchy

In our asset management module, we ignored the detail that a "Machine" is actually a collection of smaller, replaceable components with their own maintenance schedules.

  • The Mistake: We modeled the Equipment table as a flat list, treating a "CNC Lathe" as a single entity rather than a parent with child components (like motors or sensors).

  • The Consequence: Maintenance teams couldn't track the specific life cycle of a $10,000 motor inside the machine. When the motor was moved to a different lathe, the service history was lost, leading to redundant repairs and a critical "unplanned downtime" event because a sensor's age was hidden.

  • How we handled it: We redesigned the model to support a Self-Referencing Relationship (a ParentEquipmentID column within the table). This allowed us to build a "Tree" structure where every sub-part could have its own history while still belonging to the main machine.

1. The "Single Point of Failure" for Parts

In our supply chain module, we ignored the detail that a single component could be sourced from multiple vendors at different lead times.

  • The Mistake: We modeled the Parts table with a single VendorID column, assuming a 1-to-1 relationship between a part and its supplier.

  • The Consequence: During a global shipping crisis, our primary vendor went dark. Because the system couldn't store "Alternative Vendors" or "Secondary Lead Times," the MRP (Material Requirements Planning) logic failed. It told the factory we were out of stock for weeks, even though a local supplier had the parts ready, leading to a complete production halt.

  • How we handled it: We broke the relationship out into a Part_Vendor_Xref table. This allowed us to store multiple suppliers for one part, rank them by "Preference," and store unique lead times for each, allowing the system to automatically suggest the backup vendor when the primary failed.


2. Ignoring "Micro-Downtime" in IoT Streams

On the shop floor, we ignored the detail that sensors send "heartbeat" data every second, which creates massive volume compared to standard business records.

  • The Mistake: We tried to store every single raw sensor "ping" (Temperature: 70.1, 70.2...) in the same relational database used for our shipping labels.

  • The Consequence: Within three months, the database grew to several terabytes. Simple queries to check an order status—which used to take milliseconds—began taking 30 seconds to run because the server was choked by billions of rows of sensor data. The "Dashboard" the floor managers used started crashing daily.

  • How we handled it: We implemented a Medallion Architecture. We kept the raw "Bronze" data in a cheap data lake and only moved "Aggregated" data (e.g., Average Temp per Hour) into the relational database. This kept the main system lean while still allowing engineers to dig into the raw "noise" if a machine actually broke down.


3. The "Missing Link" in Genealogy (Traceability)

We ignored the detail that a "Batch" of finished goods isn't just one thing—it’s the result of specific batches of raw materials used on a specific day.

  • The Mistake: We recorded that "Batch A" was completed on "Line 5," but we didn't model the relationship to the specific Lot Numbers of the raw chemicals poured into the machine that morning.

  • The Consequence: A supplier issued a recall on a contaminated chemical. Because our data model lacked "Recursive Traceability," we couldn't prove which finished products were "clean." As a result, we had to recall 100% of our monthly output instead of just the 5% that was actually affected, costing the company millions in unnecessary losses.

  • How we handled it: We added a Material_Issue_Log table that acted as the bridge. It captured the exact LotID of every raw material at the moment it was scanned into a WorkOrder. Now, if a raw material is flagged, we can run one query to find the exact 200 boxes that need to be pulled from the shelves.

No comments:

Post a Comment

12may26 - challenges in data modeling

 Transitioning from technical design to the actual execution of a project brings a different set of headaches. In a manufacturing environmen...