Tuesday, June 25, 2024

CDM vs LDM vs PDM

Conceptual vs logical vs physical data models

by Sonny Rivera, Senior Analytics Evangelist at ThoughtSpot

Published Jun 2, 2023

Data modeling is not about creating diagrams for documentation sake. It’s about creating a shared understanding between the business and the data teams, building trust, and delivering value with data. It’s also an investment. An investment in your data systems' stability, reliability, and future adaptability. Like all valuable initiatives, it will require some additional effort upfront. Still, in the long term, the returns on investments in terms of communication, efficiency, data quality, and scalability are substantial.

In this article, you’ll see how conceptual, logical, and physical models can positively impact your business by understanding how they affect these crucial aspects of data modeling:

Shared understanding and communication
Data relationships, structure, and organization
Performance, scalability, and efficiency
Data quality and consistency
Adaptability, scalability, and future-proofing

You’ll also gain the knowledge you’ll need to build a world-class data model and get the buy-in from your stakeholders.

What is a conceptual data model?

A conceptual data model (CDM) operates at a high level, providing an overarching perspective on the organization's data needs. It defines a broad and simplified view of the data a business utilizes or plans to utilize in its daily operations. Conceptual data modeling aims to create a shared understanding of the business by capturing the essential concepts of a business process. These essential concepts are usually captured in an Entity Relationship Diagram (ERD) and the accompanying entity definitions.

Developing a conceptual data model assists your team and stakeholders in understanding the essentials and the big picture – what kind of data you're working with and how different data entities relate. In addition, it creates a shared understanding of the business process and a common language for all technical and non-technical members. While many data teams struggle with communication and trust with their stakeholders, the CDM can help by promoting effective communication.

A conceptual data model example that shows entities and relationships for a hotel.

Figure 1 - Example conceptual data model—hotel reservations

Definitions of the CDM entities and relationships.

Figure 2 - Example conceptual data model entity definitions

Benefits of a conceptual data model

So, now that you have a clear definition of a conceptual data model, you are probably thinking, what are the benefits? The benefits of a conceptual data model fall into two areas: creating shared understanding and improving communication and collaboration. Both lead to a greater sense of stakeholder trust, The conceptual model also ensures that a physical model is designed such that it not only produces correct results, but enables the business to ask valid questions. For example, can one customer have multiple reservations? Or is the customer and the guest always one-and-the-same?

Shared understanding

Conceptual data models use simple, business-oriented terminology to describe and define data objects and their relationships, creating a universal language that both technical and non-technical stakeholders can understand. They also provide visual representations which are easier to understand than written documents, especially for complex relationships. This shared understanding of the business process is crucial to building stakeholder trust and confidence.

Communication and collaboration

Conceptual models can serve as a focal point for discussions between stakeholders, data teams, and different lines of business, fostering collaboration and trust. In addition, they can help clarify requirements, identify gaps or redundancies, and promote alignment on business rules and policies. In short, communication and collaboration lead to trust, which you’ll need when making important decisions on cost, delivery, and functionality.

What is a logical data model?

A logical data model (LDM) contains representations that fully defines relationships in data, adding the details and structure of essential entities. It’s important to note that the LDM remains data platform agnostic because it focuses on business needs, flexibility, and portability.

The LDM includes the specific attributes of each entity, the relationships between entities, and the cardinality of those relationships. It gives your team a solid framework to follow as you build your systems, and it can be used to effectively and efficiently plan and implement cloud data warehouses, data marts, application databases, or data analysis datasets.

In the conceptual data model, the entities and relationships were all defined. The next step is to use data modeling best practices to go from a conceptual data model to a logical data model; one typically needs to perform the following steps:

Validate the CMD by verifying the entities and relationships are correct in the CDM.
Identify additional attributes for each entity in the model, identify the other attributes needed to complete the entity.
Identify candidate keys by reviewing all the attributes on the entity and identify which attributes uniquely identify the entity. There may be more than one. The goal is to identify which attributes are the best candidates for primary and unique keys.
Select primary keys in collaboration with your stakeholders select the primary key from candidate attributes from the list of candidate keys. The best practice is to have a single attribute that is the primary key, but that’s not always possible.
Normalize to the data model for your specific application. For this case, we are using 3rd Normal Form (3NF) which minimizes redundancy by removing repeating elements from our entity and ensuring all attributes depend only on the primary key. A common approach for analytics modeling is use a dimensional modeling. Unlike 3NF, where the goal is to eliminate redundancy, dimensional modeling involves denormalization to minimize joins and improve query performance. Pro Tip: When entities fail to meet the requirements for 3NF, the resolution is usually to create a new entity and relationship.
Create the ERD entities as normalizing may result in new entities, attributes, relationships, or redistributing attributes to the new entities.
Define the entity relationships and ensure that the relationships are correct by validating them with the stakeholders. Pro Tip: Entities are usually related to only a few other entities, so many relationships may indicate a design issue. For example, in Figure 3, the Hotel entity is connected to a Room entity via the “Has” relationship.
Define the cardinality of relationships by determining how many instances of one entity are related to instances in the related entity. The most common cardinalities are one-to-one, one-to-many, and many-to-many. Cardinality may also be optional. For example in Figure 3, you can see the one-to-many relationship between the Hotel and Room entities. This is often read as “a Hotel has many Rooms.”
Validate the model by verifying the logical data model against business requirements. This includes checking with stakeholders, end-users, and developers to ensure the model accurately represents the necessary data and its relationships.
Iterate and refine for optimal business value. Modeling is often an iterative process, so you might need to refine or update the logical data model based on feedback.

Chart showing an example of a logical data model using hotel reservation situation

Figure 3 - Logical data model example

Benefits of a logical data model

Just like we did in the conceptual data model, let’s review the primary benefits of a logical data model. The primary benefits of logical data modeling fall into two categories: agile data modeling and data organization. Both of which, maintain the scope of the initiative and improve for efficiency.

Agile data modeling

Logical data modeling is an iterative process that can also be agile. Since logical data models are technology-agnostic, they provide a conduit for iterating over the business logic and business process with the stakeholders. This flexibility ensures adaptability and scalability for future business needs.

Data relationships, structure, and organization

A good LDM presents a clear picture of the data's structure and interrelationships, making it easier to understand the system. The model encourages consistency and standardization to avoid redundancy and maintain data integrity. A logical data model ultimately serves as a blueprint for the physical data model and facilitates database design. It enables database administrators and software engineers to design the physical database efficiently and effectively.

Quality assurance

By identifying errors or inconsistencies in the early stages of system development, logical data models contribute to higher data quality, reliability, and lower cost. LDMs also validate the business requirements by aligning the data structure with business needs.

Of course, other benefits may apply to your organization, but remember that logical data modeling is crucial to better understanding data from a business perspective, better communication among stakeholders, and a solid foundation for physical database design.

What is a physical data model?

A physical data model (PDM) is a data model that represents relational data objects. It describes the technology-specific and database-specific implementation of the data model and is the last step in transforming from a logical data model to a working database. A physical data model includes all the needed physical details to build a database.

chart depicting an example of a physical data model using a hotel scenario

Figure 4 - Physical data model

The transition from a logical data model to a physical data model is an iterative process. It involves further refining the data model to achieve the desired database design. Good physical data design often requires an in-depth understanding of data platforms and modeling. Here are the general steps involved in this process:

Select the data platform where the data model will reside as it will affect future physical data design decisions. This is also important because the model will take advantage of platform specific capabilities.
Convert logical entities into physical tables. Each entity in your logical model needs to be converted into a table or even multiple tables. Recall that during the logical data modeling process, candidate keys were identified. Select a primary key for the table from those same candidate keys.
Define the columns by converting each attribute in the logical model into a column on the corresponding table. Additionally, define the data type for each column (integer, varchar, date, etc.).
Define the relationships between parent and child tables. This is achieved by creating a foreign key (FK) attribute in the child table that references the primary key (PK) attribute of the parent table.
Verify the tables are in 3NF which ensure eliminated data redundancy and data integrity. This may involve splitting or combining tables according to the 3NF normalization principles and specific capabilities of the data platform. In Figure 4, a table called Hotel_Customer was added to account for the many-to-many relationship between Hotel and Customer.
Define indexes and partitions by identifying the most commonly used attributes for sorting, filtering and join tables. Optimizing indexes and partitions vary greatly by data platform and use case, so it’s often a very iterative process. It’s best to KISS (keep it simple smarty) it when getting started on optimizing indexes and partitions.
Implement table constraints by defining primary keys, unique keys, null/not-null checks, and other logical constraints on the table. Logical constraints identified in the LDM should also be expressed in the physical data design even when the data platform does not enforce those constraints. Pro Tip: Modern cloud data platforms don’t usually enforce all constraints even though the constraints can be defined. Although the constraints may not be enforced, they are still valuable because they are used by downstream applications and analytics tools to enforce data integrity, create complex data analysis, and improve performance.
Implement the programmability aspects of your data model by creating views, stored procedures, triggers, streams, and tasks. Depending on the complexity of your data platform, you may need to implement views (virtual tables based on a query), stored procedures (pre-compiled groups of SQL analytics statements), triggers (automated actions in response to events), Streams (near real-time ingestion of events), and Tasks (automated data pipelines).
Model validation and standards should be validated with the stakeholders ensuring that PDM meets the business requirements defined in the logical model. Validation testing should also be performed on well defined test data. It’s common to validate and verify the physical data model by creating queries that answer the questions defined by the business.

While there are multiple steps and iterations in physical data modeling, having a well-defined LDM ensures the physical modeling process is efficient and effective. The PDM aims to create a model that accurately represents the business and data requirements and works efficiently within the chosen platform.

Benefits of a physical data model

Most databases are complex, regularly changing, and evolving, so what are the benefits of physical data modeling and having a physical data model (PDM)? The primary benefits for the data team are speed while the business teams benefit from scalability and future-proofing.

Speed and efficiency

The physical data model (PDM) is about how your data will be physically stored, structured, and securely accessed, and it’s optimized for the specific cloud data platform or on-premises database. PDMs ensure optimal configuration for cost, scale, storage, and performance. In a world with consumption-based data platforms, having a well-modeled physical data model ultimately saves time, money, and resources.

Scalability and future-proofing

A well-modeled physical data model makes it much easier to manage the inevitable changes as your business grows and evolves. PDMs provide a blueprint and scalable framework that can accommodate new data sources, business rules, or system integrations, resulting in lower system complexity, reduced obsolescence risk, and faster market speed.

Comparison of conceptual, logical, and physical data models

Table 1 summarizes the conceptual, logical, and physical data models described above.

Table 1 - Overview of the model types

Wrapping it all up

Should you be modeling data with conceptual, logical, and physical models? I strongly recommend that you do. Data modeling is an investment and will have substantial returns in the future, including increased speed to market, higher quality, lower cost, reduced product risks, and great trust and collaboration with your stakeholders.

Unfortunately, some data teams ignore best practices for data modeling in the modern data stack and continue to model using the One Big Table (OBT) approach. While OBTs are often chosen for the perceived simplicity and similarity of working with data in a grid or Excel, they miss the opportunity to maximize your ROI.

Data modeling should be agile and delivered in incremental releases reducing the time to market and costs. It’s a valuable business investment that ensures your organization's current and future success. In data, the old saying “pay me now, or pay me later” is really “pay me now, or pay me 10x later.”

On your next data project, ask yourself a few questions.

Am I building the next data asset or just the next tech debt?
Do stakeholders and I have a shared understanding of the business process and model?
Does our model support self-service analytics to drive greater value?
Will improved data quality improve trust and confidence in me or my team?
Can we reduce the complexity and future-proof our data assets?

So get out there and start building a world-class data product to create business value and delight your stakeholders!

Data models list

1. Banking data model ER diagrams

https://blog.cloudnloud.com/retail-banking-data-models

https://pk-group.weebly.com/banking-data-model.html

2. Vehicle sales data

3..

Thursday, June 20, 2024

-- My doubts list and links

is MARIA DB used by companies

------------------------

Data governance tools and links

Sailpoint

https://www.sailpoint.com/identity-library/enterprise-data-governance/

--> Pending topics list

1. what DATA PROFILING TOOLs are you using

2. snowflake features

3. AWS features Did you use ?

4. AZURE features Did you use ?

5. GCP features Did you use ?

---

6. what questions do you ask the client about the project

7. Data governance all topics

Wednesday, May 29, 2024

Star Schema Snowflake Schema - differences

When to use star schema and when to go for snowflake ?

If you have a few dimensions and low cardinality (less unique values in dims ) , but you require fast query execution, a star schema is the right choice.
However, if you have several dimensions and high cardinality, a snowflake schema will be a better scheme

.NO	Star Schema	Snowflake Schema
1.	In star schema, The fact tables and the dimension tables are contained.	While in snowflake schema, The fact tables, dimension tables as well as sub dimension tables are contained.
2.	Star schema is a top-down model.	While it is a bottom-up model.
3.	Star schema uses more space.	While it uses less space.
4.	It takes less time for the execution of queries.	While it takes more time than star schema for the execution of queries.
5.	In star schema, Normalization is not used.	While in this, Both normalization and denormalization are used.
6.	It’s design is very simple.	While it’s design is complex.
7.	The query complexity of star schema is low.	While the query complexity of snowflake schema is higher than star schema.
8.	It’s understanding is very simple.	While it’s understanding is difficult.
9.	It has less number of foreign keys.	While it has more number of foreign keys.
10.	It has high data redundancy.	While it has low data redundancy.

Basis of Distinction	Star Schema	Snowflake Schema
Definition and Meaning	Both fact tables and dimension tables are present in a star schema.	Dimension tables, sub-dimension tables, and fact tables are all included in a snowflake schema.
Type of Model	The star schema is a top-down type of model.	The snowflake schema is a bottom-up type of model.
Space	Star schema uses more space compared to Snowflake Schema.	Snowflake schema uses less space comparatively.
Joint Relations	In a star schema, relationships between tables are represented by a single join, resulting in a simple data structure for fast query performance and easy data analysis.	The snowflake schema has a complex data structure with multiple levels of relationships between tables, represented by multiple joins. This can make the data structure more difficult to understand and result in slower query performance.
Response Time for Queries	Star schemas have faster query execution times due to a single join of a fact table and its attributes in dimensional tables.	Snowflake schemas require complex joins between tables, which can slow down query processing and impact other OLAP products.
Normalization	In a star schema, dimension tables are not organized in a normalized form. They are typically denormalized and contain multiple levels of information about a particular subject in a single table.	Dimension tables in snowflake schema are normalized.
Design Complexity	Has a simpler design compared to snowflake schema.	More complex design compared to star schema.
Query Complexity	Star schemas have simpler query design due to the fact the table is joined to only one level of dimensional tables.	Snowflake schemas, on another hand, have a more complex query design due to the need for multiple joins between the fact table and its dimensional tables. This leads to additional overhead in query writing.
Understanding Complexity	It is simpler to understand compared to snowflake schema.	More complex to understand compared to star schema.
Foreign Keys	Have a lesser number of foreign keys.	Comparatively has more foreign keys.
Data Redundancy	The star schema stores redundant data in the dimension tables.	The snowflake design fully normalizes the dimension tables and prevents data redundancy,
Advantages	Simple and easy-to-understand data structure. Fast query performance due to the single join between the fact table and its dimensional tables. Suitable for large volumes of data Good for ad-hoc querying and data analysis.	Normalized data structure reduces redundancy and increases data integrity. Allows for more complex relationships between data. Allows for easier data maintenance and management. Good for more structured predictable querying.
Disadvantages	The star schema has a limited ability to depict complex relationships between data. Can suffer from data redundancy and decreased data integrity. May not be suitable for smaller volumes of data.	The more complex data structure can be harder to understand and work with. Multiple joins between tables can result in slower query performance. Requires more storage and processing resources due to the larger number of tables.

Advantages and Disadvantages of Star Schema

Star schema suits applications requiring simplicity and performance as it has fewer tables and joins.

The main demerit of the star schema is its denormalization and redundancy. This can lead to wide dimension tables that take up more storage space and rake up the risk of data inconsistency.

As the schema flattens the dimension hierarchies into a single table, their depth and breadth gets limited. This makes the star schema unsuitable for data warehouses having complex and dynamic dimensions that require accuracy and high data normalization.

Which schema is right for you?

The answer depends on your specific needs and requirements. If you’re looking for a simple, efficient cloud data warehouse solution, a star schema might be the best option. But if you need more flexibility to accommodate changing data requirements, a snowflake schema may be a better choice. No matter which schema you choose, ThoughtSpot can help you get the most out of your data. Most BI tools require a specific schema design to be used, ThoughtSpot has no such restrictions.

Star Schema	Snowflake Schema
Hierarchies for the dimensions are stored in the dimensional table.	Hierarchies are divided into separate tables.
It contains a fact table surrounded by dimension tables.	One fact table surrounded by dimension table which are in turn surrounded by dimension table
In a star schema, only single join creates the relationship between the fact table and any dimension tables.	A snowflake schema requires many joins to fetch the data.
Simple DB Design.	Very Complex DB Design.
Denormalized Data structure and query also run faster.	Normalized Data Structure.
High level of Data redundancy	Very low-level data redundancy
Single Dimension table contains aggregated data.	Data Split into different Dimension Tables.
Cube processing is faster.	Cube processing might be slow because of the complex join.
Offers higher performing queries using Star Join Query Optimization. Tables may be connected with multiple dimensions.	The Snowflake schema is represented by centralized fact table which unlikely connected with multiple dimensions.

logical data model vs. physical data model diff

Key differences: logical data model vs. physical data model

A logical data model is a data model that provides a detailed, structured description of data elements and the connections between them. It includes all entities — a specific object transferred from the real world (relevant to business) — and the relationships among them. These entities have defined their attributes as their characteristics.

What Is a Physical Data Model?

A physical data model specifies how the data model will be built in the database. It outlines all table structures, including column name, data types, column constraints, primary key and foreign key with indexes to the relevant table column, relationships between tables, stored procedures, and views.

	Logical data model	Physical data model
Platform-dependent database	No.	Yes.
Data structure	Entities, attributes, PKs, and FKs.	Database tables, rows, PKs, FKs, and data types.
Programmatical features	No.	Triggers and stored procedures.
Objective	Visualize business logic with data structures.	Organize data structure for database design.
Creators	Business analysts and data architects.	Software developers, programmers, and database administrators.
Complexity	Simple.	Complex.
When to use	To understand enterprise systems and business rules.	To plan, implement, and optimize data storage when you’re developing applications.

Data-modeling-made-easy

PAGES