Data-modeling-made-easy: March 2024

Category	OLAP (Online Analytical Processing)	OLTP (Online Transaction Processing)
Definition	It is well-known as an online database query management system.	It is well-known as an online database modifying system.
Data source	Consists of historical data from various Databases.	Consists of only operational current data.
Method used	It makes use of a data warehouse.	It makes use of a standard database management system (DBMS).
Application	It is subject-oriented. Used for Data Mining, Analytics, Decisions making, etc.	It is application-oriented. Used for business tasks.
Normalized	In an OLAP database, tables are not normalized.	In an OLTP database, tables are normalized (3NF).
Usage of data	The data is used in planning, problem-solving, and decision-making.	The data is used to perform day-to-day fundamental operations.
Task	It provides a multi-dimensional view of different business tasks.	It reveals a snapshot of present business tasks.
Purpose	It serves the purpose to extract information for analysis and decision-making.	It serves the purpose to Insert, Update, and Delete information from the database.
Volume of data	A large amount of data is stored typically in TB, PB	The size of the data is relatively small as the historical data is archived in MB, and GB.
Queries	Relatively slow as the amount of data involved is large. Queries may take hours.	Very Fast as the queries operate on 5% of the data.
Update	The OLAP database is not often updated. As a result, data integrity is unaffected.	The data integrity constraint must be maintained in an OLTP database.
Backup and Recovery	It only needs backup from time to time as compared to OLTP.	The backup and recovery process is maintained rigorously
Processing time	The processing of complex queries can take a lengthy time.	It is comparatively fast in processing because of simple and straightforward queries.
Types of users	This data is generally managed by CEO, MD, and GM.	This data is managed by clerksForex and managers.
Operations	Only read and rarely write operations.	Both read and write operations.
Updates	With lengthy, scheduled batch operations, data is refreshed on a regular basis.	The user initiates data updates, which are brief and quick.
Nature of audience	The process is focused on the customer.	The process is focused on the market.
Database Design	Design with a focus on the subject.	Design that is focused on the application.
Productivity	Improves the efficiency of business analysts.	Enhances the user’s productivity.

https://docs.google.com/spreadsheets/d/1Dou4e0g5r61INb24nbdDKQFXFjcaTEuk0R-A1h9AHCk/edit#gid=1145721315

sno

Question

Answer

Links

Types of Fact tables

1. Transactional Fact Tables
Transactional fact tables are designed to capture individual business events or transactions.

2. Periodic Snapshot Fact Tables
Periodic snapshot fact tables provide a summarized view of metrics over regular time intervals.

3. Accumulating Snapshot Fact Tables
Accumulating snapshot fact tables are designed to track the stages of a business process or workflow.

4. Factless facts

https://easy-data-modeling.blogspot.com/2024/03/types-of-fact-tables.html

types of fact tables 51

1. Additive
( can be added like total qty , sales )
2. Semi additive
( cannot be added but updated like account balance )
3. Non Additive
( numerical but cannot be added like today price of scotk )

Granularity and Granule

Granularity in data refers to the level of detail

> high level of granularity would have a large number of individual pieces (columns) of information, such as individual records or measurements.

>Data that has a low level of granularity would have a small number of individual pieces of information, such as summary data or aggregated data. Data granularity can affect how it is used and analyzed, and can impact the accuracy and usefulness of the results.

what are 3 levels of Data modeling

1. Conceptual model
list of entities and relationships , no attributes diagram only
2. logical
3. physical model

Logical and physical model - Differences ?

Star schema & Snowflake schema - Differences ?

Types of Dimension tables (30, 46,48)

1) SCD
(slow changing )

2) RCD
(Rapid changing )

3) JUNK

4) INFERRED .

5) DEFERRED

6) CONFORMED (multiple facts )

7) DEGENERATE (invoice)

8) ROLE PLAY
(same meaning ,
multiple names)

9) STATIC

10) DYNAMIC

10. SPARSE

11) DENSE

12) SHRUNKEN

1 Roleplay dimension
Role playing dimensions are dimensions that are used more than once in a fact table, each time with a different meaning or role.
when we want two attributes from the source entity to be connected to 1 entity as destination, that is not possible. In such cases, we can connect both the columns of one table to 2nd table, but with a new role name that is called as role play dimension something like connecting address and customer table with 2 columns bill to address and ship to address

2 Conformed dimension
Conformed dimensions are dimensions that are shared by multiple stars

3 Degenerate dimension
A column of the key column in the fact table that does not have the associated dimension table but used for reporting and analysis, such column is called degenerate dimension
For ex, we have a fact table with customer_id, product_id, branch_id, employee_id, bill_no

4 Junk dimension
This combined dimension table covers all possible set of values which might occur in fact table and is known as junk dimension.5 Shrunken dimension

5 Static dimensions
Static dimension items do not change with time; they are fixed components that are always returned in a freeform table.
The time , date , country code , status codes etc

6. Dynamic dim
which change dynamically based on values and formula

7. Deferred dims
Sometimes the fact table have to be loaded with some data, like customer id. But the dimension of customer is not loaded with that customer id. In such cases, we delay inserting such rows into fact table And later on, once the the dimension of customer is loaded with this customerId, then we add this rows into fact table.
This type of delaying the fact table is called deferred dimensions

8 Inferred Dimensions
When the fact data is available to be loaded in the source but related dimension table does not have that
eg: like the customerId is in sales orders table but not in customer table .. in such case we add an sample entry in customer table and enable 'inferred dimension' flag and later point of time we make a original entry into customer table and disable the inferred dimension

9 A slowly changing dimension (SCD)
scd is a dimension which contains relatively static data which can change slowly but unpredictably, rather than according to a regular schedule

SCD

Type 0 Ignore any changes and audit the changes.
Type 1 Overwrite the changes
Type 2 History will be added as a new row.
Type 3 History will be added as a new column.
Type 4 A new dimension will be added
Type 6 Combination of Type 2 and Type 3

10 Rapidly changing dimensions (RCD)
Rapidly changing dimension (RCD) is a dimension which has attributes where values will be getting changed often.

11) sparse dim
Some of the dimension tables are not loaded with full values

12) dense dim
Sometimes most of the columns of the dimension tables are loaded, fully loaded. So such dimensions

13) shrunken dim
Sometimes we create a replica of the original dimension, but it is with lesser, fewer number of columns than the original table

What is factless fact table

A factless Most of the columns are empty. Such dimensions are called sparse dimensions fact table is a table that contains only foreign keys to dimensions, but no numeric facts. Factless fact tables are used to capture events or situations that have no measurable outcome

What is Data Mart

Data mart is a subset of datawarehouse purpose-built to support the analytical needs of a particular department, or line of business, or geographic region.

Steps in Datawarehouse life cycle steps

Requirements gathering
Requirements analysis
High level design (HLD)
Low level design (LLD)
Development – Automating scripts
Testing
User Acceptance Testing (UAT)
Project release or deployment
Warranty support

https://www.linkedin.com/pulse/data-warehouse-project-life-cycle-design-karumanchi-sujatha/?trk=article-ssr-frontend-pulse_more-articles_related-content-card

ERWIN qsns

Erwin - Data modeler workspace

is the space where the Tables are created and connected

Erwin - Domains

A domain is like a custom datatype an independent model object that you can use to quickly assign properties to an attribute or column

Erwin - types of relationships

Identifying relationship.
Non-identifying relationship.
Subtype relationship.
Many-to-many relationship.
Recursive relationship.

Erwin - which erwin features did you use

1. Name hardening
2. Domains for Data types
3. Naming standards( abbrevations)
4. Fwd & reverse engineering
5. Merge data models
6. complete compare
7. sub type and super type

14.1

Erwin - which feature used to merge the subtype tables to supertype

Erwin - Forward and reverse engineering steps

Forward engr steps :
Actions > schema > Generate scripts

Reverse engnr steps :
Actions > reverse engnr > logical / physical >
Database (or) script file > connect > ok

https://bookshelf.erwin.com/bookshelf/public_html/2019R2/Content/User%20Guides/erwin%20Help/foward_engineering_schema_generation.html

Erwin - Complete compare feature

Complete Compare is a powerful tool that lets you view and resolve the differences between two models, or a model and a database or script file

Erwin - Naming standards

Use the Naming Standard Editor to create naming standards and develop a glossary that contains glossary words, their abbreviations, and alternate abbreviations.

Access the Model Explore> open Naming Standard Editor > Define naming standards for logical and physical models> Create abbreviations

Erwin - Name hardening

Name hardening is a mechanism in Erwin Data Modeler that prevents physical model names from changing when logical names change

Erwin - querying tool

Erwin - subject areas

Click Model, Subject Areas. The Subject Area Editor opens. Click New. A new subject area

Erwin - identifying and
non identifying relationships differences...........

Identifying : Represented by a solid line, the primary key of the parent migrates to the primary key area of the child entity or table eg : Book & chapter tables , Country and state tables

Non-identifying : The child entity is not dependent on the parent entity for its identity and can exist without it .. eg : supplier & product list ,

Fact and dimension table - differences

. Fact tables contain numerical data, while dimension tables provide context and background information.

https://www.simplilearn.com/fact-table-vs-dimension-table-article#what_is_a_fact_table

What is Datawarehouse

A data warehouse is a centralized system that stores data from various sources for reporting and analysis

Data warehouses can:
Consolidate data: Store data from multiple sources into a single source of truth
Store historical data: Store and analyze long-term historical data
Analyze data about a particular subject or functional area:

what is ODS( operational data store)

An operational data store usually stores and processes data in real time. An ODS is connected to multiple data sources and pulls data into a central location.

OLTP & OLAP- differences (pg 17) 53

https://easy-data-modeling.blogspot.com/2024/03/olap-oltp-diff.html

what is most likely reason that sql query may run slowly

Possible Reasons for Slow SQL Queries

Lack of Indexes: I
Complex Joins:
Large Data Volume:
Inefficient Query Structure:
Database Configuration: Configuration settings such as memory allocation, cache size, and connection limits can affect performance.
Server Load: High server load due to concurrent queries or insufficient resources (CPU, memory, I/O) can slow down execution times.
Network Latency: If the database is remote, network latency can add to the time it takes to execute a query.
Statistics Outdated: Database optimizers rely on statistics to create execution plans.
Locking and Blocking: If a query is waiting for locks held by other transactions, it can slow down execution.

Types of Joins and Their Impact on Execution Time

INNER JOIN: Generally efficient if both tables have indexes on the join columns.
LEFT JOIN (or RIGHT JOIN): These can be slower than INNER JOINS because they need to return all records f
FULL OUTER JOIN: Typically the slowest join type, as it combines the results o
CROSS JOIN: This join produces a Cartesian product, It's usually slower and should be used with caution.
SELF JOIN: While not inherently slow, performance

Optimization Tips

Indexing:
Analyze Execution Plans: Use tools provided by your database (like EXPLAIN in PostgreSQL) to
Refactor Queries: Simplify complex queries,
Monitor Server Performance:

what operations can be done on OLAP cube

1) ROLL UP
2) DRILL DOWN
3) SLICE & DICE
4) PIVOT

data quality checks -pg 28

1. CONSISTENCY
2. ACCURACY
3. COMPLETENESS
4. AUDITABLITY
5. ORDERLINESS
6. UNIQUENESS
7. TIMELINESS

types of indexes 53

1. NORMAL INDEX
2. BITMAP INDEX
3. UNIQUE & NON UNIQUE INDEX
4. CLUSTERED & NON CLUSTERED
5. PARTITIONED & NON PARTITIONED
6. EXPRESSION BASED
7. B- TREE

surrogate key (33)

Surrogate key is a unique sequential number key column
Generated by the database operation
not provided by the client
it makes it easy for the developers operations

Normalization (34)

1NF - There should not be multiple values in each cell
2NF- in 1NF and All non Non key attributes should be dependent on key attributes and all.
3NF - It should be in 2NF - And there should not be any transitive dependency

sql query : find nth highest salary - 54

Difference between Row_number(), Rank() and Dense_rank() is

In case of value is same then
> Row_number() gives : sequential number
> Rank () : Assigns same Rank but the (next rank )= (total previous ranks +1 )
> Dense_Rank() : Assigns same rank and next rank = next sequential number

=========================================
a) Rownum :

( To find 2nd highest salary using ROW_NUMBER)

select empid,salary from (
   select e.*,row_number() over (order by salary desc) 
   as row_num from emp e )
where row_num = 3 ;

======================
b) using Rank :

-- To show only 2nd highest -- 
select * from
    (select e.*,rank() over (order by salary desc) as rank 
     from emp  e )
where rank = 2;

-- ** To display ranks ** --  
    select  rank() over (order by salary desc) as rank,e.* 
    from emp  e ;

======================

c ) Dense Rank ( BEST )

-- To show only 2nd highest --

select * from

(select e.*,dense_rank() over (order by salary desc) as rank

from emp e )

where rank = 2;

-- ** To display ranks ** --

select dense_rank() over (order by salary desc) as rank,e.*

from emp e ;

======================

d) using count distinct
select empno ,sal from emp e1
where 3 =
(select count(distinct sal )
from emp e2
where e2.sal > e1.sal )

sql query : find and delete duplicate rows

Two methods to find and delete duplicate rows
-=================================
1. Delete all rows other than min id value ..(used when id column is not duplicate but other columns values are duplicate)
===========================================
Method 1. using id Column (if ID column is unique, but title is duplicate)
================================
delete films
where film_id
not in (select min(film_id) from films group by title, rel_date)
===========================================
Method 2. using CTE ( Two steps)
============================================
step 1 :

with CTE as
(select id ,name , email , row_number()
over (partition by name , email order by name , email ) rownum
from sales

step 2 :
Delete from CTE where rownum > 1
===========================================
Method 3. using ROWID
============================================
Delect from table1
where rowid not in (select min(rowid) from table 1
group by col1,col2)

what is inner join and outer join count of below
A B
1 1
1 1
2 2
2
3
null null

Inner join = count of matching rows
outer join = inner join + left side unmatched

inner = 6 rows
left outer = 6 + 1 = 7 rows

Which tools used for Data profiling tools

Informatica data profiling
Open source data quality profiling

Which tools used for 'schema comparision'

1) SQL Server Data Tools (SSDT)

2) SQLDBDiff. : SQLDBDiff is a powerful and intuitive tool that compares the

What are ACID properties

Atomicity - each statement in a transaction Either the entire statement is executed, or none of it is executed.

Consistency - This means that integrity constraints must be maintained so that the database is consistent before and after the transaction

Isolation -Changes occurring in a particular transaction will not be visible to any other transaction until that transaction has been committed

Durability - once the transaction has completed execution, the updates and modifications to the database are stored in and written to disk and are permanent

How to do Data model testing

Data model can tested using the following:
Unit testing: This involves testing each component of the data model individually to ensure that it performs as expected.
Integration testing: This involves testing how different components of the data model work together
using test data

What are roles of data modeler 61

get the data from data source
identify connect the data source
pick a share data set
select storage mode identify the query performance issues
use or create a data flow

profile the data
identify data anomalies
examine data structures

clean transform load data
resolve inconsistencies and unexpected values null values data quality issues
apply value replacements identify the keys for joints
identify and transform column data types
apply naming conventions two columns
perform data loading
resolve data input errors

model the data
designer data model
define the tables
configure table and column properties
flatten the parent child hierarchy
role playing dimensions
define relationships and cardinality
design the data model
resolve many to many relationships
design a common date table
define the data level granularity

what are layers in Datalake

1. landing layer
2. curated layer
3. application data layer ( summary data)
4. Sand box layer
5. Target layer

Layers of big data architecture

Roles of data architect

Data architect tasks and responsibilities

Typical responsibilities range from evaluating the current data architecture to keeping databases secure. Depending on your organisation and industry, your day-to-day tasks might include:

Translating business requirements into databases, data warehouses, and data streams.

Creating procedures to ensure data accuracy and accessibility.

Analysing, planning, and defining data architecture framework, including security, reference data, metadata, and master data.

Creating and implementing data management processes and procedures.

Collaborating with other teams within the organisation to devise and implement data strategies, build models, and assess shareholder needs and goals.

Researching data acquisition opportunities.

Developing application programming interfaces (APIs) to retrieve data.

steps to prepare STTM

Steps Involved in Source to Target Mapping
You can map your data from a source of your choice to your desired destination by implementing the following steps:
Step 1: Defining the Attributes
Step 2: Mapping the Attributes
Step 3: Transforming the Data
Step 4: Give Standard Naming Conventions Specifics
Step 5: Establish Flows for Data Mapping
Step 6: Establish Data Transformation Guidelines
Step 7: Testing the Mapping Process
Step 8: Deploying the Mapping Process
Step 9: Maintaining the Mapping Process

Which Azure features did you work

Which AWS features did you work

What is Data vault modeling and contents

steps for Dimensional model

Diff. View and materialized view

What is data lienage

Types of Nosql databases

Types of Data models

Kimball and inmon model matrix

What is Data governance (pg 109 and 116)

Data governance is the process of managing the
availability, usability, integrity and security
of the data in enterprise systems,
based on internal standards and policies
that also control data usage.

Effective data governance ensures that
data is consistent and trustworthy
and doesn't get misused.

Data governance is increasingly critical as organizations are
expanding data privacy regulations and
rely more and more on data analytics
to help optimize operations and
drive business decision-making.

https://www.thoughtspot.com/data-trends/data-governance/how-to-implement-data-governance

https://data-modeling-made-easy.blogspot.com/2024/08/how-to-implement-data-governance.html

Advantages of ELT (Extract, Load, Transform)	ELT (Extract, Load, Transform) adv ELT is better suited for larger data sets, situations where timeliness is crucial, or when you must keep raw data available for future analysis. For instance, ELT is faster than ETL because it loads data directly into the destination system and can perform transformations in similarity.

What is a Data Lake?	What is a Data Lake? Amazon Web Services https://aws.amazon.com › ... › Analytics A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. You can store your data as-is,
data lakehouse	A data lakehouse is a data architecture that combines a data lake and a data warehouse to provide a flexible solution for business intelligence and data science. It allows organizations to use low-cost, flexible storage for all types of data, while also benefiting from the data management, schema, and governance of a warehouse. Data lakes and data warehouses are both storage systems for big data used by business analysts, data engineers, and data scientists. They have different architectures and serve different business needs: Data lakes Store raw, structured, and semi-structured data from internal and external sources. They are good for data analytics, machine learning, experimenting with data, and training AI models. Data lakes prioritize storage volume and cost over performance, and can store any type or volume of data in full fidelity. They also allow users to ingest data from any system at any speed, and process data in real time or batch mode. Data warehouses Store structured data from multiple sources. They are good for business intelligence (BI), data visualizations, reporting, and accessing reliable, quality data. Data warehouses are designed for the fastest query performance, and are optimized to analyze relational data coming from transactional systems and line of business application

SQL : query to show Total salary of each department

SELECT

Id, empname, deptid, Salary,

SUM(Salary) OVER(PARTITION BY deptid) AS SUM_SAL

FROM

emp ORDER BY id

===========================

# Id, empname, deptid, Salary, SUM_SAL

'1', 'john', '1', '20000', '50000'

'2', 'kem', '1', '30000', '50000'

'3', 'chan', '2', '10000', '30000'

'4', 'henry', '2', '20000', '30000'

'5', 'bill', '3', '30000', '30000'

'6', 'salt', '4', '20000', '20000'

SQL : query to sum of salary by dept and
cumulative salary by each dept

SELECT deptid,salary ,

Sum(salary)

OVER(

partition BY deptid

ORDER BY id) AS CUMulative_SAL

FROM emp

ORDER BY id

===============================

# deptid, salary, CUMulative_SAL

'1', '20000', '20000'

'1', '30000', '50000'

'2', '10000', '10000'

'2', '20000', '30000'

'3', '30000', '30000'

'4', '20000', '20000'

=========================================================================

68. When to use star schema and when to go for snowflake ?

If you have a few dimensions and low cardinality (less unique values in dims ) , but you require fast query execution, a star schema is the right choice.
However, if you have several dimensions and high cardinality, a snowflake schema will be a better scheme

=========================================================================

69. features of snowflake db

Snowflake can seamlessly integrate with popular cloud providers such as AWS, Azure, and Google Cloud Platform. It can be scaled up or down automatically, ensuring efficient data loading, integration, and analysis. This allows multiple users to run numerous workloads simultaneously without concerns about resources

https://airbyte.com/data-engineering-resources/snowflake-features#:~:text=Snowflake%20can%20seamlessly%20integrate%20with,simultaneously%20without%20concerns%20about%20resources.

https://docs.snowflake.com/en/user-guide/intro-supported-features

https://staragile.com/blog/snowflake-features

A Compact List of Snowflake Features

=========================================================================

ERWIN - how maintain versions

Model Versions
A model version records the various developmental changes that the model has undergone. Each time you save any changes in a model, a new model version is created with a sequential version number. Therefore, each version is preserved and serves as a record of each set of changes made in a particular save.

Two types of model versions exist:

Delta Version: A delta version is created when you initially save a new model or when you save changes to an existing model.

If you do not want to maintain multiple versions, you can clear the Maintain multiple versions check box. If you clear the checkbox, then erwin� Data Modeler does not create a delta version for any incremental save. erwin� Data Modeler updates the current version every time you save the model.

Named Version:

A named version is created from a delta version, or from another named version. A named version represents a milestone in the development of the model, or at least a version that you want to keep indefinitely. When you save the changes in a named version, the changes are saved in the same version.

The Based upon field in the Edit Catalog pane displays the version number from which the Named version is created.

=========================================================================

Create a Delta Version

You create a delta version when you save a model initially or when you incrementally save an existing model.

Follow these steps:

Click File, Mart, Open.
The Open Model dialog opens.
Select a model and click OK.
The model opens.
Make necessary changes to the model.
Click File, Mart, Save.
A delta version of the model is created with the incremental changes.

=========================================================================

Create a Named Version

A named version of a model represents a milestone in the development of the model. You create a named version to keep that model version indefinitely.

Follow these steps:

Click File, Mart, Catalog Manager.
The Catalog Manager opens.
Select a model version, right-click and click Mark Version.
A named version is created with a default name.
Edit the name of the named version and press enter.
A new named version is created in the catalog.

=========================================================================

Compare Model Versions

Compare two model versions of a data model to view the differences between them.

Follow these steps:

Click File, Mart, Catalog Manager.
The Catalog Manager dialog opens.
Hold the Ctrl key and select two model versions of a model and click Version Compare.
Click Compare.
The Complete Compare wizard opens.
Follow the instructions on the wizard pages to make your selection.
The Resolve Difference dialog opens. Review the differences and use the tools on the toolbar to work with and manage the changes.
Click Finish.
The differences in the model versions are resolved and the Resolve Difference dialog closes.
Click Close.
The Complete Compare wizard closes.

=========================================================================

When to use a surrogate key

A surrogate key is a unique identifier for a row in a database table that's often used when a natural key is not suitable as a primary key. Surrogate keys can be useful in many situations, including:

When natural keys change
Natural keys can become useless in a data warehouse if they change in the source, such as when migrating to a new system. Surrogate keys, however, don't change while the row exists.
When natural keys are too long or complex
Surrogate keys can be simpler than natural keys.
When natural keys aren't unique
Surrogate keys can be used to create a unique primary key when natural keys aren't.
To maintain data warehouse information
Surrogate keys can help maintain data warehouse information when dimensions change.
To improve query performance
Surrogate keys, which are often simple integer values, can provide better performance during data processing and business queries.
To simplify references
Surrogate keys can simplify references from one table to another and joins when tables are referenced in queries.
To deal with slowly changing dimensions
Surrogate keys can be used to deal with slowly changing dimensions (SCDs), which are attributes of dimension tables that change over

=========================================================================

what is data lineage ? and tools

Data lineage is the process of tracking the flow of data over time, providing a clear understanding of where the data originated, how it has changed, and its ultimate destination within the data pipeline.

https://www.montecarlodata.com/blog-open-source-data-lineage-tools/

=========================================================================

How do you calculate the size of each table

What are the datatypes and their Memory size calculation

https://data-modeling-made-easy.blogspot.com/2024/07/sql-data-types-and-sizes.html

=========================================================================

What is Data governance

Data governance is

a set of processes, policies, roles, metrics, and standards
that help management processes to keep

data secure, private, accurate, and usable throughout its life cycle.

=========================================================================

Contents of Data Governance :

Data Architechture Management
Data Development
Database operations Management
Data security Management
Reference & Master data Management
DWH & BI Management
Document and content Management
Meta Data Management
Data Quality Management

=========================================================================

Data Governance Framework Steps :

=========================================================================

Goals	Methods	People	Processes	Technology	culture
Security	Policies	sponsor	Issues Management	ingestion	collaboration
privacy	Guides	owner	change Management	cataloging	Crowd sourcing
compliance	Guardrails	steward	quality Management	Data preparation	communication
Quality	Gates	curator	cataloging	Data analysis	Sharing
Integration	Code of Ethics	coach	Measurement	Pipeline Management	Reuse
Metadata	curating	consumer	Monitoring
Retention	coaching	Stakeholder
Risk
Impact

=========================================================

Explain Gdpr & Hipaa in data governance?

HIPAA (Health Insurance Portability and Accountability Act) is United States legislation)

What are the primary HIPAA goals?

To limit the use of protected health information to those with a “need to know”
To penalize those who do not comply with confidentiality regulations

What health information is protected?

Any healthcare information with an identifier that links a specific patient to healthcare information (name, social security number, telephone number, email address, street address, among others)

https://www.techtarget.com/searchhealthit/definition/HIPAA

https://www.ncbi.nlm.nih.gov/books/NBK500019/

GDPR (The General Data Protection Regulation )

https://en.wikipedia.org/wiki/General_Data_Protection_Regulation

https://gdpr.eu/what-is-gdpr/

The General Data Protection Regulation (GDPR) is a European law that protects individuals' personal information and their fundamental rights and freedoms

=========================================================================

What is Medallion architechture ?

https://www.blogger.com/blog/posts/6964310590349567722

===============================================================

where do we use data vault modeling

A data vault is a data modeling approach and methodology used in enterprise data warehousing to handle complex and varying data structures. It combines the strengths of 3rd normal form and star schema.

===============================================================

SQL : Department wise highest salary

select deptid,max(salary) from emp group by deptid;

===============================================================

>> To display emp name and id of each department

SELECT deptid, id,empname, salary FROM emp

WHERE (deptid,salary) IN

(

select deptid,max(salary) from emp group by deptid

)

===============================================================

Parquet data types list

https://data-modeling-made-easy.blogspot.com/2024/10/parquet-datatypes.html

===============================================================

S.No.	OLAP	OLTP
1	OLAP stands for Online analytical processing.	OLTP stands for online transaction processing.
2	It includes software tools that help in analyzing data mainly for business decisions.	It helps in managing online database modification.
3	It utilizes the data warehouse.	It utilizes traditional approaches of DBMS.
4	It is popular as an online database query management system.	It is popular as an online database modifying system.
5	OLAP employs the data warehouse.	OLTP employs traditional DBMS.
6	It holds old data from various Databases.	It holds current operational data.
7	Here the tables are not normalized.	Here, the tables are normalized.
8	It allows only read and hardly write operations.	It allows both read and write operations.
9	Here, the complex queries are involved.	Here, the queries are simple.

PAGES

Wednesday, March 13, 2024

OLAP & OLTP diff

Online Analytical Processing (OLAP)

Online Analytical Processing (OLAP) consists of a type of software tool that is used for data analysis for business decisions. OLAP provides an environment to get insights from the database retrieved from multiple database systems at one time.

OLAP Examples

Any type of Data Warehouse System is an OLAP system. The uses of the OLAP System are described below.

Spotify analyzed songs

Netflix movie recommendation system.

Benefits of OLAP Services

OLAP services help in keeping consistency and calculation.

We can store planning, analysis, and budgeting for business

OLAP services provide a multidimensional view of data, which helps in applying operations on data in various ways.

Drawbacks of OLAP Services

OLAP Services requires professionals to handle the data because of its complex modeling procedure.

OLAP services are expensive to implement and maintain in cases when datasets are large.

We can perform an analysis of data only after extraction and transformation of data

Online Transaction Processing (OLTP)

OLTP administers the day-to-day transactions of an organization.

OLTP Examples

An example considered for OLTP System is ATM Center

ATM center is an OLTP application.

OLTP handles the ACID properties during data transactions via the application.

It’s also used for Online banking, Online airline ticket booking, sending a text message, add a book to the shopping cart.

OLTP vs OLAP

OLTP vs OLAP

Benefits of OLTP Services

OLTP services allow users to read, write and delete data

OLTP services help in increasing users and transactions which helps in real-time access to data.

Drawbacks of OLTP Services

OLTP has limited analysis capability

OLTP has high maintenance costs because of frequent maintenance, backups, and recovery.

OLTP Services get hampered when there is a hardware failure

Difference between OLAP and OLTP

Wednesday, March 6, 2024

Data modeling FAQ -Excel

Data architect tasks and responsibilities

A Compact List of Snowflake Features

Create a Delta Version

=========================================================================

Create a Named Version

Compare Model Versions

Bank dm data model diagram and sql