Data-modeling-made-easy: 25jan26 - how to do profilng in which tool and how

Sunday, January 25, 2026

25jan26 - how to do profilng in which tool and how

Here’s a clean, interview-ready rewrite focused on one 3rd-party tool (Informatica Data Quality), with step-by-step profiling and reporting. You can say this almost verbatim.

Tool used for data profiling

I use Informatica Data Quality (IDQ) to profile data files and generate data quality reports before loading them into the database.

Step-by-step: How I do data profiling in Informatica Data Quality

Step 1: Ingest the data file

Connect the source file (CSV / Excel / flat file / S3 / database)
Define the source metadata in IDQ
Validate column names and data types

📌 Purpose: Ensure the file is readable and structurally correct.

Step 2: Run Column Profiling

Use Column Profile in Informatica Analyst
Analyze each column for:
- Data type distribution
- Null and blank percentage
- Min / Max values
- Distinct count
- Value frequency

📌 Purpose: Understand the content and detect obvious issues.

Step 3: Run Data Domain & Pattern Analysis

Apply data domains (date, email, phone, numeric)
Use pattern analysis to detect invalid formats

Examples:

Invalid email formats
Date columns stored as strings
Mixed data types in one column

📌 Purpose: Validate format and consistency.

Step 4: Identify Duplicates

Use duplicate analysis on key fields
Identify exact and fuzzy duplicates (if needed)

Example:

Same order_id appearing multiple times
Same customer with slight name variations

📌 Purpose: Prevent double counting and incorrect metrics.

Step 5: Apply Business Rules

Create business rule transformations
Examples:
- Order amount > 0
- Delivery time between 0 and 180 minutes
- Order status ∈ allowed values

📌 Purpose: Ensure data follows business logic, not just technical rules.

Step 6: Generate Data Quality Score

Assign weights to rules (critical vs non-critical)
Calculate overall data quality score
Categorize issues:
- Critical
- Warning
- Informational

📌 Purpose: Measure readiness of the file.

Step 7: Create Data Quality Report

Generate profiling reports from Informatica Analyst
Report includes:
- Column statistics
- Failed rules
- Duplicate counts
- Data quality score

📌 Purpose: Provide transparency to stakeholders.

Step 8: Go / No-Go decision

If critical rules fail → Reject the file
Notify source system / upstream team
Reload only after correction

📌 Purpose: Prevent bad data entering the warehouse.

How I ensure the file is in good shape

A file is considered ready when schema validation passes, nulls and duplicates are within thresholds, business rules are satisfied, and the overall data quality score meets acceptance criteria.

Strong closing line (interview gold)

“Using Informatica Data Quality, I profile the data structurally, statistically, and against business rules, generate a data quality report, and enforce Go/No-Go criteria before loading the data.”

Just say 👍

Data-modeling-made-easy

PAGES