Data governance my notes .pdf
61 |
What is Data governance |
Data governance is the process of managing the
Effective data governance ensures that Data governance is increasingly critical as
organizations are https://data-modeling-made-easy.blogspot.com/2024/08/how-to-implement-data-governance.html |
Advantages
of ELT (Extract, Load, Transform) |
ELT
(Extract, Load, Transform) adv |
|
|
What is a
Data Lake? |
|
data
lakehouse |
A data
lakehouse |
SQL :
query to show Total salary of each
department
|
SELECT Id, empname,
deptid, Salary, SUM(Salary)
OVER(PARTITION BY deptid) AS SUM_SAL FROM emp ORDER BY
id # Id, empname, deptid, Salary, SUM_SAL '1', 'john', '1', '20000', '50000' '2', 'kem', '1', '30000', '50000' '3', 'chan', '2', '10000', '30000' '4', 'henry', '2', '20000', '30000' '5', 'bill', '3', '30000', '30000' '6', 'salt', '4', '20000', '20000' |
SQL : query to sum of salary by dept and |
Sum(salary) OVER( partition BY deptid ORDER BY id) AS CUMulative_SAL FROM emp ORDER BY id =============================== # deptid, salary, CUMulative_SAL '1', '20000', '20000' '1', '30000', '50000' '2', '10000', '10000' '2', '20000', '30000' '3', '30000', '30000' '4', '20000', '20000' |
If you have a few dimensions and low cardinality (less unique values in dims ) , but you require fast query execution, a star schema is the right choice.
However, if you have several dimensions and high cardinality, a snowflake schema will be a better scheme
https://airbyte.com/data-engineering-resources/snowflake-features#:~:text=Snowflake%20can%20seamlessly%20integrate%20with,simultaneously%20without%20concerns%20about%20resources.
https://docs.snowflake.com/en/user-guide/intro-supported-features
https://staragile.com/blog/snowflake-features
A Compact List of Snowflake Features
- Decoupling of storage and compute in Snowflake
- Auto-Resume, Auto-Suspend, Auto-Scale
- Workload Separation and Concurrency
- Snowflake Administration
- Cloud Agnostic
- Semi-structured Data Storage
- Data Exchange
- Time Travel
- Cloning
- Snowpark
- Snowsight
- Security Features
- Snowflake Pricing
ERWIN - how maintain versions
Model Versions
A model version records the various developmental changes that the model has undergone. Each time you save any changes in a model, a new model version is created with a sequential version number. Therefore, each version is preserved and serves as a record of each set of changes made in a particular save.
Two types of model versions exist:
Delta Version: A delta version is created when you initially save a new model or when you save changes to an existing model.
If you do not want to maintain multiple versions, you can clear the Maintain multiple versions check box. If you clear the checkbox, then erwin� Data Modeler does not create a delta version for any incremental save. erwin� Data Modeler updates the current version every time you save the model.
Named Version:
The Based upon field in the Edit Catalog pane displays the version number from which the Named version is created.
Create a Delta Version
You create a delta version when you save a model initially or when you incrementally save an existing model.
Follow these steps:
- Click File, Mart, Open.
The Open Model dialog opens.
- Select a model and click OK.
The model opens.
- Make necessary changes to the model.
- Click File, Mart, Save.
A delta version of the model is created with the incremental changes.
=========================================================================
Create a Named Version
A named version of a model represents a milestone in the development of the model. You create a named version to keep that model version indefinitely.
Follow these steps:
- Click File, Mart, Catalog Manager.
The Catalog Manager opens.
- Select a model version, right-click and click Mark Version.
A named version is created with a default name.
- Edit the name of the named version and press enter.
A new named version is created in the catalog.
Compare Model Versions
Compare two model versions of a data model to view the differences between them.
Follow these steps:
- Click File, Mart, Catalog Manager.
The Catalog Manager dialog opens.
- Hold the Ctrl key and select two model versions of a model and click Version Compare.
- Click Compare.
The Complete Compare wizard opens.
- Follow the instructions on the wizard pages to make your selection.
The Resolve Difference dialog opens. Review the differences and use the tools on the toolbar to work with and manage the changes.
- Click Finish.
The differences in the model versions are resolved and the Resolve Difference dialog closes.
- Click Close.
The Complete Compare wizard closes.
- Natural keys can become useless in a data warehouse if they change in the source, such as when migrating to a new system. Surrogate keys, however, don't change while the row exists.
- Surrogate keys can be simpler than natural keys.
- Surrogate keys can be used to create a unique primary key when natural keys aren't.
- Surrogate keys can help maintain data warehouse information when dimensions change.
- Surrogate keys, which are often simple integer values, can provide better performance during data processing and business queries.
- Surrogate keys can simplify references from one table to another and joins when tables are referenced in queries.
- Surrogate keys can be used to deal with slowly changing dimensions (SCDs), which are attributes of dimension tables that change over
what is data lineage ? and tools
Data lineage is the process of tracking the flow of data over time, providing a clear understanding of where the data originated, how it has changed, and its ultimate destination within the data pipeline.
https://www.montecarlodata.com/blog-open-source-data-lineage-tools/
How do you calculate the size of each table
What are the datatypes and their Memory size calculation
https://data-modeling-made-easy.blogspot.com/2024/07/sql-data-types-and-sizes.html
=========================================================================
What is Data governance
- Data governance is
- a set of processes, policies, roles, metrics, and standards
- that help management processes to keep
- data secure, private, accurate, and usable throughout its life cycle.
Contents of Data Governance :
- Data Architechture Management
- Data Development
- Database operations Management
- Data security Management
- Reference & Master data Management
- DWH & BI Management
- Document and content Management
- Meta Data Management
- Data Quality Management
Goals |
Methods |
People |
Processes |
Technology |
culture |
Security |
Policies |
sponsor |
Issues
Management |
ingestion |
collaboration |
privacy |
Guides |
owner |
change
Management |
cataloging |
Crowd
sourcing |
compliance |
Guardrails |
steward |
quality
Management |
Data
preparation |
communication |
Quality |
Gates |
curator |
cataloging |
Data
analysis |
Sharing |
Integration |
Code of
Ethics |
coach |
Measurement |
Pipeline Management |
Reuse |
Metadata |
curating |
consumer |
Monitoring |
|
|
Retention |
coaching |
Stakeholder |
|
|
|
Risk |
|
|
|
|
|
Impact |
|
|
|
|
|
- HIPAA (Health Insurance Portability and Accountability Act) is United States legislation)
What are the primary HIPAA goals?
- To limit the use of protected health information to those with a “need to know”
- To penalize those who do not comply with confidentiality regulations
What health information is protected?
- Any healthcare information with an identifier that links a specific patient to healthcare information (name, social security number, telephone number, email address, street address, among others)
- GDPR (The General Data Protection Regulation )
No comments:
Post a Comment