| | | |
| | | |
sno | Question | Answer | Links |
1 | Types of Fact tables | 1. Transactional Fact Tables Transactional fact tables are designed to capture individual business events or transactions.
2. Periodic Snapshot Fact Tables Periodic snapshot fact tables provide a summarized view of metrics over regular time intervals.
3. Accumulating Snapshot Fact Tables Accumulating snapshot fact tables are designed to track the stages of a business process or workflow.
4. Factless facts | https://easy-data-modeling.blogspot.com/2024/03/types-of-fact-tables.html |
45 | types of fact tables 51 | 1. Additive ( can be added like total qty , sales ) 2. Semi additive ( cannot be added but updated like account balance ) 3. Non Additive ( numerical but cannot be added like today price of scotk ) | |
2 | Granularity and Granule | Granularity in data refers to the level of detail
> high level of granularity would have a large number of individual pieces (columns) of information, such as individual records or measurements.
>Data that has a low level of granularity would have a small number of individual pieces of information, such as summary data or aggregated data. Data granularity can affect how it is used and analyzed, and can impact the accuracy and usefulness of the results. | |
3 | what are 3 levels of Data modeling | 1. Conceptual model list of entities and relationships , no attributes diagram only 2. logical 3. physical model | |
4 | Logical and physical model - Differences ? | | |
5 | Star schema & Snowflake schema - Differences ? | | |
6 | Types of Dimension tables (30, 46,48) |
1) SCD (slow changing )
|
2) RCD (Rapid changing )
|
3) JUNK
|
|
4) INFERRED .
|
5) DEFERRED
|
6) CONFORMED (multiple facts )
|
|
7) DEGENERATE (invoice)
|
|
8) ROLE PLAY (same meaning , multiple names)
|
|
9) STATIC
|
10) DYNAMIC
|
10. SPARSE
|
11) DENSE
|
12) SHRUNKEN
|
|
1 Roleplay dimension Role playing dimensions are dimensions that are used more than once in a fact table, each time with a different meaning or role. when we want two attributes from the source entity to be connected to 1 entity as destination, that is not possible. In such cases, we can connect both the columns of one table to 2nd table, but with a new role name that is called as role play dimension something like connecting address and customer table with 2 columns bill to address and ship to address
2 Conformed dimension Conformed dimensions are dimensions that are shared by multiple stars
3 Degenerate dimension A column of the key column in the fact table that does not have the associated dimension table but used for reporting and analysis, such column is called degenerate dimension For ex, we have a fact table with customer_id, product_id, branch_id, employee_id, bill_no
4 Junk dimension This combined dimension table covers all possible set of values which might occur in fact table and is known as junk dimension.5 Shrunken dimension
5 Static dimensions Static dimension items do not change with time; they are fixed components that are always returned in a freeform table. The time , date , country code , status codes etc
6. Dynamic dim which change dynamically based on values and formula
7. Deferred dims Sometimes the fact table have to be loaded with some data, like customer id. But the dimension of customer is not loaded with that customer id. In such cases, we delay inserting such rows into fact table And later on, once the the dimension of customer is loaded with this customerId, then we add this rows into fact table. This type of delaying the fact table is called deferred dimensions
8 Inferred Dimensions When the fact data is available to be loaded in the source but related dimension table does not have that eg: like the customerId is in sales orders table but not in customer table .. in such case we add an sample entry in customer table and enable 'inferred dimension' flag and later point of time we make a original entry into customer table and disable the inferred dimension
9 A slowly changing dimension (SCD) scd is a dimension which contains relatively static data which can change slowly but unpredictably, rather than according to a regular schedule SCD Type 0 Ignore any changes and audit the changes. Type 1 Overwrite the changes Type 2 History will be added as a new row. Type 3 History will be added as a new column. Type 4 A new dimension will be added Type 6 Combination of Type 2 and Type 3
10 Rapidly changing dimensions (RCD) Rapidly changing dimension (RCD) is a dimension which has attributes where values will be getting changed often.
11) sparse dim Some of the dimension tables are not loaded with full values
12) dense dim Sometimes most of the columns of the dimension tables are loaded, fully loaded. So such dimensions
13) shrunken dim Sometimes we create a replica of the original dimension, but it is with lesser, fewer number of columns than the original table | |
7 | What is factless fact table | A factless Most of the columns are empty. Such dimensions are called sparse dimensions fact table is a table that contains only foreign keys to dimensions, but no numeric facts. Factless fact tables are used to capture events or situations that have no measurable outcome | |
8 | What is Data Mart | Data mart is a subset of datawarehouse purpose-built to support the analytical needs of a particular department, or line of business, or geographic region. | |
9 | Steps in Datawarehouse life cycle steps | Requirements gathering Requirements analysis High level design (HLD) Low level design (LLD) Development – Automating scripts Testing User Acceptance Testing (UAT) Project release or deployment Warranty support | |
| | | |
10 | ERWIN qsns | | |
11 | Erwin - Data modeler workspace | is the space where the Tables are created and connected | |
12 | Erwin - Domains | A domain is like a custom datatype an independent model object that you can use to quickly assign properties to an attribute or column | |
13 | Erwin - types of relationships | Identifying relationship. Non-identifying relationship. Subtype relationship. Many-to-many relationship. Recursive relationship. | |
14 | Erwin - which erwin features did you use | 1. Name hardening 2. Domains for Data types 3. Naming standards( abbrevations) 4. Fwd & reverse engineering 5. Merge data models 6. complete compare 7. sub type and super type | |
14.1 | Erwin - which feature used to merge the subtype tables to supertype | | |
15 | Erwin - Forward and reverse engineering steps | Forward engr steps : Actions > schema > Generate scripts
Reverse engnr steps : Actions > reverse engnr > logical / physical > Database (or) script file > connect > ok | |
16 | Erwin - Complete compare feature | Complete Compare is a powerful tool that lets you view and resolve the differences between two models, or a model and a database or script file | |
17 | Erwin - Naming standards | Use the Naming Standard Editor to create naming standards and develop a glossary that contains glossary words, their abbreviations, and alternate abbreviations.
Access the Model Explore> open Naming Standard Editor > Define naming standards for logical and physical models> Create abbreviations | |
18 | Erwin - Name hardening | Name hardening is a mechanism in Erwin Data Modeler that prevents physical model names from changing when logical names change | |
19 | Erwin - querying tool | | |
20 | Erwin - subject areas | Click Model, Subject Areas. The Subject Area Editor opens. Click New. A new subject area | |
21 | Erwin - identifying and non identifying relationships differences........... | Identifying : Represented by a solid line, the primary key of the parent migrates to the primary key area of the child entity or table eg : Book & chapter tables , Country and state tables
Non-identifying : The child entity is not dependent on the parent entity for its identity and can exist without it .. eg : supplier & product list , | |
22 | Fact and dimension table - differences | . Fact tables contain numerical data, while dimension tables provide context and background information. | |
23 | What is Datawarehouse | A data warehouse is a centralized system that stores data from various sources for reporting and analysis
Data warehouses can: Consolidate data: Store data from multiple sources into a single source of truth Store historical data: Store and analyze long-term historical data Analyze data about a particular subject or functional area: | |
24 | what is ODS( operational data store) | An operational data store usually stores and processes data in real time. An ODS is connected to multiple data sources and pulls data into a central location. | |
25 | OLTP & OLAP- differences (pg 17) 53 | https://easy-data-modeling.blogspot.com/2024/03/olap-oltp-diff.html | |
| | | |
28 | | | |
| | | |
30 | what is most likely reason that sql query may run slowly | Possible Reasons for Slow SQL Queries - Lack of Indexes: I
- Complex Joins:
- Large Data Volume:
- Inefficient Query Structure:
- Database Configuration: Configuration settings such as memory allocation, cache size, and connection limits can affect performance.
- Server Load: High server load due to concurrent queries or insufficient resources (CPU, memory, I/O) can slow down execution times.
- Network Latency: If the database is remote, network latency can add to the time it takes to execute a query.
- Statistics Outdated: Database optimizers rely on statistics to create execution plans.
- Locking and Blocking: If a query is waiting for locks held by other transactions, it can slow down execution.
Types of Joins and Their Impact on Execution Time - INNER JOIN: Generally efficient if both tables have indexes on the join columns.
- LEFT JOIN (or RIGHT JOIN): These can be slower than INNER JOINS because they need to return all records f
- FULL OUTER JOIN: Typically the slowest join type, as it combines the results o
- CROSS JOIN: This join produces a Cartesian product, It's usually slower and should be used with caution.
- SELF JOIN: While not inherently slow, performance
Optimization Tips - Indexing:
- Analyze Execution Plans: Use tools provided by your database (like EXPLAIN in PostgreSQL) to
- Refactor Queries: Simplify complex queries,
-
- Monitor Server Performance:
| |
| | | |
| | | |
42 | what operations can be done on OLAP cube | 1) ROLL UP 2) DRILL DOWN 3) SLICE & DICE 4) PIVOT | |
43 | data quality checks -pg 28 | 1. CONSISTENCY 2. ACCURACY 3. COMPLETENESS 4. AUDITABLITY 5. ORDERLINESS 6. UNIQUENESS 7. TIMELINESS | |
44 | types of indexes 53 | 1. NORMAL INDEX 2. BITMAP INDEX 3. UNIQUE & NON UNIQUE INDEX 4. CLUSTERED & NON CLUSTERED 5. PARTITIONED & NON PARTITIONED 6. EXPRESSION BASED 7. B- TREE | |
| | | |
47 | surrogate key (33) | Surrogate key is a unique sequential number key column Generated by the database operation not provided by the client it makes it easy for the developers operations | |
48 | Normalization (34) | 1NF - There should not be multiple values in each cell 2NF- in 1NF and All non Non key attributes should be dependent on key attributes and all. 3NF - It should be in 2NF - And there should not be any transitive dependency | |
49 | sql query : find nth highest salary - 54 | Difference between Row_number(), Rank() and Dense_rank() is
In case of value is same then > Row_number() gives : sequential number > Rank () : Assigns same Rank but the (next rank )= (total previous ranks +1 ) > Dense_Rank() : Assigns same rank and next rank = next sequential number
========================================= a) Rownum :
( To find 2nd highest salary using ROW_NUMBER)
select empid,salary from ( select e.*,row_number() over (order by salary desc) as row_num from emp e ) where row_num = 3 ; ====================== b) using Rank :
-- To show only 2nd highest -- select * from (select e.*,rank() over (order by salary desc) as rank from emp e ) where rank = 2;
-- ** To display ranks ** -- select rank() over (order by salary desc) as rank,e.* from emp e ;
======================
c ) Dense Rank ( BEST )
-- To show only 2nd highest -- select * from (select e.*,dense_rank() over (order by salary desc) as rank from emp e ) where rank = 2;
-- ** To display ranks ** -- select dense_rank() over (order by salary desc) as rank,e.* from emp e ;
======================
d) using count distinct select empno ,sal from emp e1 where 3 = (select count(distinct sal ) from emp e2 where e2.sal > e1.sal ) | |
50 | sql query : find and delete duplicate rows | Two methods to find and delete duplicate rows -================================= 1. Delete all rows other than min id value ..(used when id column is not duplicate but other columns values are duplicate) =========================================== Method 1. using id Column (if ID column is unique, but title is duplicate) ================================ delete films where film_id not in (select min(film_id) from films group by title, rel_date) =========================================== Method 2. using CTE ( Two steps) ============================================ step 1 :
with CTE as (select id ,name , email , row_number() over (partition by name , email order by name , email ) rownum from sales
step 2 : Delete from CTE where rownum > 1 =========================================== Method 3. using ROWID ============================================ Delect from table1 where rowid not in (select min(rowid) from table 1 group by col1,col2) | |
| | | |
52 | what is inner join and outer join count of below A B 1 1 1 1 2 2 2 3 null null | Inner join = count of matching rows outer join = inner join + left side unmatched
inner = 6 rows left outer = 6 + 1 = 7 rows | |
53 | Which tools used for Data profiling tools | Informatica data profiling Open source data quality profiling | |
54 | Which tools used for 'schema comparision' | 1) SQL Server Data Tools (SSDT)
2) SQLDBDiff. : SQLDBDiff is a powerful and intuitive tool that compares the | |
55 | What are ACID properties | Atomicity - each statement in a transaction Either the entire statement is executed, or none of it is executed.
Consistency - This means that integrity constraints must be maintained so that the database is consistent before and after the transaction
Isolation -Changes occurring in a particular transaction will not be visible to any other transaction until that transaction has been committed
Durability - once the transaction has completed execution, the updates and modifications to the database are stored in and written to disk and are permanent | |
56 | How to do Data model testing | Data model can tested using the following: Unit testing: This involves testing each component of the data model individually to ensure that it performs as expected. Integration testing: This involves testing how different components of the data model work together using test data | |
51 | What are roles of data modeler 61 | get the data from data source identify connect the data source pick a share data set select storage mode identify the query performance issues use or create a data flow
profile the data identify data anomalies examine data structures
clean transform load data resolve inconsistencies and unexpected values null values data quality issues apply value replacements identify the keys for joints identify and transform column data types apply naming conventions two columns perform data loading resolve data input errors
model the data designer data model define the tables configure table and column properties flatten the parent child hierarchy role playing dimensions define relationships and cardinality design the data model resolve many to many relationships design a common date table define the data level granularity | |
52 | what are layers in Datalake | 1. landing layer 2. curated layer 3. application data layer ( summary data) 4. Sand box layer 5. Target layer | |
53 | Layers of big data architecture | .. | |
54 | Roles of data architect | Data architect tasks and responsibilitiesTypical responsibilities range from evaluating the current data architecture to keeping databases secure. Depending on your organisation and industry, your day-to-day tasks might include: Analysing, planning, and defining data architecture framework, including security, reference data, metadata, and master data.
| |
| | | |
56 | steps to prepare STTM | Steps Involved in Source to Target Mapping You can map your data from a source of your choice to your desired destination by implementing the following steps: Step 1: Defining the Attributes Step 2: Mapping the Attributes Step 3: Transforming the Data Step 4: Give Standard Naming Conventions Specifics Step 5: Establish Flows for Data Mapping Step 6: Establish Data Transformation Guidelines Step 7: Testing the Mapping Process Step 8: Deploying the Mapping Process Step 9: Maintaining the Mapping Process wo | |
| | | |
58 | Which Azure features did you work | | |
59 | Which AWS features did you work | | |
60 | What is Data vault modeling and contents | | |
| | |
| |
62 | steps for Dimensional model | | |
63 | Diff. View and materialized view | | |
64 | What is data lienage | | |
65 | Types of Nosql databases | | |
66 | Types of Data models | | |
67 | Kimball and inmon model matrix | | |
|
|
|