Wednesday, May 29, 2024

Star Schema Snowflake Schema - differences

 

 When to use star schema and when to go for snowflake ?

 If you have a few dimensions   and low cardinality (less unique values in dims ) , but you require fast query execution, a star schema is the right choice. 

 However, if you have several dimensions and high cardinality, a snowflake schema will be a better scheme


.NO

Star Schema

Snowflake Schema

1.

In star schema, The fact tables and the dimension tables are contained.

While in snowflake schema, The fact tables, dimension tables as well as sub dimension tables are contained.

2.

Star schema is a top-down model.

While it is a bottom-up model.

3.

Star schema uses more space.

While it uses less space.

4.

It takes less time for the execution of queries.

While it takes more time than star schema for the execution of queries.

5.

In star schema, Normalization is not used.

While in this, Both normalization and denormalization are used.

6.

It’s design is very simple.

While it’s design is complex.

7.

The query complexity of star schema is low.

While the query complexity of snowflake schema is higher than star schema.

8.

It’s understanding is very simple.

While it’s understanding is difficult.

9.

It has less number of foreign keys.

While it has more number of foreign keys.

10.

It has high data redundancy.

While it has low data redundancy.

 

Basis of Distinction

Star Schema

Snowflake Schema

Definition and Meaning

Both fact tables and dimension tables are present in a star schema.

Dimension tables,  sub-dimension tables, and fact tables are all included in a snowflake schema.

Type of Model

The star schema is a top-down type of model.

The snowflake schema is a bottom-up type of model.

Space

Star schema uses more space compared to Snowflake Schema.

Snowflake schema uses less space comparatively.

Joint Relations

In a star schema, relationships between tables are represented by a single join, resulting in a simple data structure for fast query performance and easy data analysis.

The snowflake schema has a complex data structure with multiple levels of relationships between tables, represented by multiple joins. This can make the data structure more difficult to understand and result in slower query performance.

Response Time for Queries

Star schemas have faster query execution times due to a single join of a fact table and its attributes in dimensional tables.

Snowflake schemas require complex joins between tables, which can slow down query processing and impact other OLAP products.

Normalization

In a star schema, dimension tables are not organized in a normalized form. They are typically denormalized and contain multiple levels of information about a particular subject in a single table.

Dimension tables in snowflake schema are normalized.

Design Complexity

Has a simpler design compared to snowflake schema.

More complex design compared to star schema.

Query Complexity

Star schemas have simpler query design due to the fact the table is joined to only one level of dimensional tables.

Snowflake schemas, on another hand, have a more complex query design due to the need for multiple joins between the fact table and its dimensional tables. This leads to additional overhead in query writing.

Understanding Complexity

It is simpler to understand compared to snowflake schema. 

More complex to understand compared to star schema.

Foreign Keys

Have a lesser number of foreign keys.

Comparatively has more foreign keys.

Data Redundancy

The star schema stores redundant data in the dimension tables.

The snowflake design fully normalizes the dimension tables and prevents data redundancy,

Advantages

  • Simple and easy-to-understand data structure.
  • Fast query performance due to the single join between the fact table and its dimensional tables.
  • Suitable for large volumes of data
  • Good for ad-hoc querying and data analysis.
  • Normalized data structure reduces redundancy and increases data integrity.
  • Allows for more complex relationships between data.
  • Allows for easier data maintenance and management.
  • Good for more structured predictable querying.

Disadvantages

  • The star schema has a limited ability to depict complex relationships between data.
  • Can suffer from data redundancy and decreased data integrity.
  • May not be suitable for smaller volumes of data.
  • The more complex data structure can be harder to understand and work with.
  • Multiple joins between tables can result in slower query performance.
  • Requires more storage and processing resources due to the larger number of tables.


Advantages and Disadvantages of Star Schema

Star schema suits applications requiring simplicity and performance as it has fewer tables and joins.

The main demerit of the star schema is its denormalization and redundancy. This can lead to wide dimension tables that take up more storage space and rake up the risk of data inconsistency. 

As the schema flattens the dimension hierarchies into a single table, their depth and breadth gets limited. This makes the star schema unsuitable for data warehouses having complex and dynamic dimensions that require accuracy and high data normalization.


Which schema is right for you?

The answer depends on your specific needs and requirements. If you’re looking for a simple, efficient cloud data warehouse solution, a star schema might be the best option. But if you need more flexibility to accommodate changing data requirements, a snowflake schema may be a better choice. No matter which schema you choose, ThoughtSpot can help you get the most out of your data. Most BI tools require a specific schema design to be used, ThoughtSpot has no such restrictions.


Star SchemaSnowflake Schema
Hierarchies for the dimensions are stored in the dimensional table.Hierarchies are divided into separate tables.
It contains a fact table surrounded by dimension tables.One fact table surrounded by dimension table which are in turn surrounded by dimension table
In a star schema, only single join creates the relationship between the fact table and any dimension tables.A snowflake schema requires many joins to fetch the data.
Simple DB Design.Very Complex DB Design.
Denormalized Data structure and query also run faster.Normalized Data Structure.
High level of Data redundancyVery low-level data redundancy
Single Dimension table contains aggregated data.Data Split into different Dimension Tables.
Cube processing is faster.Cube processing might be slow because of the complex join.
Offers higher performing queries using Star Join Query Optimization.
Tables may be connected with multiple dimensions.
The Snowflake schema is represented by centralized fact table which unlikely connected with multiple dimensions.

logical data model vs. physical data model diff

 

Key differences: logical data model vs. physical data model

logical data model is a data model that provides a detailed, structured description of data elements and the connections between them. It includes all entities — a specific object transferred from the real world (relevant to business) — and the relationships among them. These entities have defined their attributes as their characteristics.




What Is a Physical Data Model?

A physical data model specifies how the data model will be built in the database. It outlines all table structures, including column name, data types, column constraints, primary key and foreign key with indexes to the relevant table column, relationships between tables, stored procedures, and views.



 

Logical data model

Physical data model

Platform-dependent database

No.

Yes.

Data structure 

Entities, attributes, PKs, and FKs.

Database tables, rows, PKs, FKs, and data types.

Programmatical features

No.

Triggers and stored procedures.

Objective

Visualize business logic with data structures.

Organize data structure for database design.

Creators

Business analysts and data architects.

Software developers, programmers, and database administrators.

Complexity

Simple.

Complex.

When to use

To understand enterprise systems and business rules.

To plan, implement, and optimize data storage when you’re developing applications. 


Wednesday, March 13, 2024

OLAP & OLTP diff

 

Online Analytical Processing (OLAP)

Online Analytical Processing (OLAP) consists of a type of software tool that is used for data analysis for business decisions. OLAP provides an environment to get insights from the database retrieved from multiple database systems at one time. 


 

OLAP Examples

Any type of Data Warehouse System is an OLAP system. The uses of the OLAP System are described below.


Spotify analyzed songs  

Netflix movie recommendation system.



Benefits of OLAP Services

OLAP services help in keeping consistency and calculation.

We can store planning, analysis, and budgeting for business   


OLAP services provide a multidimensional view of data, which helps in applying operations on data in various ways.


Drawbacks of OLAP Services

OLAP Services requires professionals to handle the data because of its complex modeling procedure. 


OLAP services are expensive to implement and maintain in cases when datasets are large.


We can perform an analysis of data only after extraction and transformation of data 


 

Online Transaction Processing (OLTP)

  OLTP administers the day-to-day transactions of an organization. 


OLTP Examples

An example considered for OLTP System is ATM Center  


ATM center is an OLTP application.

OLTP handles the ACID properties during data transactions via the application.

It’s also used for Online banking, Online airline ticket booking, sending a text message, add a book to the shopping cart.

OLTP vs OLAP

OLTP vs OLAP


Benefits of OLTP Services

OLTP services allow users to read, write and delete data  


OLTP services help in increasing users and transactions which helps in real-time access to data. 


Drawbacks of OLTP Services

OLTP has limited analysis capability  


OLTP has high maintenance costs because of frequent maintenance, backups, and recovery.


OLTP Services get hampered when  there is a hardware failure  


Difference between OLAP and OLTP

CategoryOLAP (Online Analytical Processing)OLTP (Online Transaction Processing)
DefinitionIt is well-known as an online database query management system.It is well-known as an online database modifying system.
Data sourceConsists of historical data from various Databases.Consists of only operational current data. 
Method usedIt makes use of a data warehouse.It makes use of a standard database management system (DBMS).
ApplicationIt is subject-oriented. Used for Data Mining, Analytics, Decisions making, etc.It is application-oriented. Used for business tasks.
NormalizedIn an OLAP database, tables are not normalized.In an OLTP database, tables are normalized (3NF).
Usage of dataThe data is used in planning, problem-solving, and decision-making.The data is used to perform day-to-day fundamental operations.
TaskIt provides a multi-dimensional view of different business tasks.It reveals a snapshot of present business tasks.
PurposeIt serves the purpose to extract information for analysis and decision-making.It serves the purpose to Insert, Update, and Delete information from the database.
Volume of dataA large amount of data is stored typically in TB, PBThe size of the data is relatively small as the historical data is archived in MB, and GB.
QueriesRelatively slow as the amount of data involved is large. Queries may take hours.Very Fast as the queries operate on 5% of the data.
Update The OLAP database is not often updated. As a result, data integrity is unaffected.The data integrity constraint must be maintained in an OLTP database.
Backup and RecoveryIt only needs backup from time to time as compared to OLTP.The backup and recovery process is maintained rigorously
Processing timeThe processing of complex queries can take a lengthy time.It is comparatively fast in processing because of simple and straightforward queries.
Types of usersThis data is generally managed by CEO, MD, and GM.This data is managed by clerksForex and managers.
OperationsOnly read and rarely write operations.Both read and write operations.
UpdatesWith lengthy, scheduled batch operations, data is refreshed on a regular basis.The user initiates data updates, which are brief and quick.
Nature of audienceThe process is focused on the customer.   The process is focused on the market. 
Database DesignDesign with a focus on the subject. Design that is focused on the application.
ProductivityImproves the efficiency of business analysts.Enhances the user’s productivity.
S.No.OLAPOLTP
1OLAP stands for Online analytical processing.OLTP stands for online transaction processing.
2It includes software tools that help in analyzing data mainly for business decisions.It helps in managing online database modification.
3It utilizes the data warehouse.It utilizes traditional approaches of DBMS.
4It is popular as an online database query management system.It is popular as an online database modifying system.
5OLAP employs the data warehouse.OLTP employs traditional DBMS.
6It holds old data from various Databases.It holds current operational data.
7Here the tables are not normalized.Here, the tables are normalized.
8It allows only read and hardly write operations.It allows both read and write operations.
9Here, the complex queries are involved.Here, the queries are simple.

23feb26 - cab service project -data modeling project

  The Full Data Modeling Implementation Steps Phase Description & Methodology Primary Tools Phase 1: Discovery Ask the 10-Section Questi...