Data Quality Problems Explained: Why Bad Data Ruins Analysis

In 2018, British Airways suffered a massive data breach affecting 380,000 customers—payment card details, names, and addresses stolen. The breach was serious. But the aftermath revealed an equally serious problem: BA couldn't determine with certainty who was affected.

Their customer database had severe quality issues. Duplicate records. Inconsistent formats. Missing fields. Outdated contact information. When BA needed to notify affected customers quickly, they couldn't trust their own data. Some customers received multiple notifications (duplicates). Others received none (missing or wrong contact data). The data quality problems, invisible during normal operations, became catastrophic during crisis response.

The incident cost BA £20 million in regulatory fines—but the data quality issues likely cost far more in lost customer trust, inefficient operations, and wasted resources on an ongoing basis.

This illustrates a fundamental truth: data quality problems are invisible until they're catastrophic. Organizations operate with poor-quality data for years, making suboptimal decisions, wasting resources, and eroding trust, without recognizing the root cause. The cost of poor data quality is pervasive but diffuse—death by a thousand cuts rather than one obvious disaster.

"Garbage in, garbage out"—this principle is well-known but poorly heeded. Even the most sophisticated machine learning algorithms, elegant visualizations, and rigorous statistical methods cannot overcome fundamentally flawed input data. Data quality is the foundation. Without it, everything built on top is unstable.

This article explains data quality problems comprehensively: what data quality means, common quality issues organizations face, how these problems impact analysis and decisions, techniques for detecting quality problems, strategies for prevention and improvement, organizational approaches to data quality management, and practical frameworks for balancing quality with cost and speed.

Defining Data Quality: Dimensions and Standards

Data quality measures how well data serves its intended purpose. Not a single characteristic—multiple dimensions.

The Six Core Dimensions of Data Quality

1. Accuracy

Definition: Data correctly represents the real-world entity or event it describes.

Examples of inaccuracy:

Customer's address in database is wrong (they moved)
Product price is incorrect (data entry error)
Sensor reading is off (calibration problem)
Transaction amount is wrong (software bug)

2. Completeness

Definition: All required data is present; no critical missing values.

Examples of incompleteness:

Customer record missing email address
Transaction record missing timestamp
Product description missing key specifications
Survey responses with blank required fields

3. Consistency

Definition: Same data represented identically across systems; no contradictions within or across datasets.

Examples of inconsistency:

Customer has different addresses in sales vs. billing systems
Product categorized differently in warehouse vs. website
Date formats vary across fields (MM/DD/YYYY vs. DD-MM-YYYY)
Customer name spelled differently in different records

4. Timeliness

Definition: Data is up-to-date and available when needed.

Examples of staleness:

Inventory data from yesterday (customer orders what appears in stock but isn't)
Customer credit status from last month (no longer accurate)
Dashboard showing week-old metrics (decisions based on outdated info)
Product catalog not reflecting recent changes

5. Validity

Definition: Data conforms to defined formats, types, and business rules.

Examples of invalidity:

Phone numbers with wrong number of digits
Dates like February 30th (impossible)
Age values of 250 years (unrealistic)
Email addresses without @ symbol

6. Uniqueness

Definition: No unintended duplicates; each real-world entity represented once.

Examples of duplication:

Same customer in database twice with slight name variations
Duplicate transaction records (charging customer twice)
Same product listed under different SKUs
Multiple employee records for same person

Additional Quality Dimensions

Beyond the core six, other dimensions matter in specific contexts:

Dimension	Definition	Example Issue
Integrity	Relationships between data elements maintained correctly	Foreign key pointing to non-existent record
Precision	Appropriate level of detail	Storing $1,234.567891 when cents precision sufficient
Believability	Data perceived as true and credible by users	Sales figures so off that users distrust entire system
Accessibility	Data available to authorized users when needed	Critical data locked in system only two people can access
Conformity	Data follows standards and conventions	Product codes don't match industry standards

Common Data Quality Problems

Understanding typical problems helps recognize them in your data.

Problem 1: Missing Data

Description: Records have null, blank, or absent values where data should exist.

Causes:

Fields not marked required in data entry forms
Integration processes that don't map all fields
Users skipping optional fields
Data loss during migrations or transformations
Sensors or systems failing to record values

Impact: Analysis excluding incomplete records may miss patterns or create bias. Algorithms can't process missing values without special handling.

Example: Customer survey with optional income field. 70% of respondents skip it. Analysis of income vs. product preference is impossible for most customers, and respondents who provide income may be systematically different (bias).

Problem 2: Duplicate Records

Description: Same real-world entity recorded multiple times with slight variations.

Causes:

Data entry by multiple people without checking for existing records
Merging data from multiple sources without de-duplication
Variations in names, addresses (abbreviations, typos, formatting)
Lack of unique identifiers
System bugs creating duplicate records

Impact: Inflated counts, double-counting in metrics, multiple mailings to same person, customer confusion when contacted multiple times.

Example: CRM system has "John Smith, 123 Main St, NYC" and "J. Smith, 123 Main Street, New York City" as separate records. Both get marketing emails. Metric showing "unique customers" is overstated.

Problem 3: Inconsistent Formats and Standards

Description: Same type of data represented differently across records or systems.

Causes:

No enforced standards for data entry
Different systems with different conventions
International operations with regional formats
Historical changes in standards not applied retroactively
Manual data entry without validation

Impact: Difficulty aggregating or joining data. Pattern matching fails. Manual cleanup required before analysis.

Examples:

Phone numbers: "(555) 123-4567" vs. "555-123-4567" vs. "5551234567"
Dates: "01/02/2026" (Jan 2 or Feb 1?), "2026-01-02", "January 2, 2026"
Names: "Smith, John" vs. "John Smith" vs. "SMITH, JOHN"
Units: Mixing metric and imperial measurements without labels

Problem 4: Incorrect Data

Description: Data present but factually wrong.

Causes:

Human error: Typos, transposed digits, misreading
Measurement error: Inaccurate instruments, calibration issues
Processing errors: Software bugs, calculation mistakes
Outdated data: Was correct when entered but situation changed
Intentional errors: Users gaming metrics or entering fake data

Impact: Wrong decisions based on false information. Loss of trust when errors discovered.

Example: E-commerce site's inventory count is wrong (software bug during last update). Shows 50 units in stock; actually have 5. Website accepts orders it can't fulfill. Customer dissatisfaction, refunds, revenue loss.

Problem 5: Data Integration Issues

Description: Problems arising when combining data from multiple sources.

Causes:

Different schemas and field names across systems
Conflicting keys or identifiers
Timing differences (one system updates hourly, another daily)
Different business rules or definitions
Transformations introducing errors

Impact: Integrated dataset has quality issues even if source systems are individually sound.

Example: Merging customer data from website (uses email as key) and store (uses phone as key). Can't reliably link records. Some customers duplicated, others with conflicting information.

Problem 6: Schema Evolution Problems

Description: Changes to data structures break downstream processes.

Causes:

Adding or removing fields without coordinating with consumers
Changing data types (string to number)
Renaming fields
Changing meaning of existing fields
Lack of versioning or migration planning

Impact: Pipelines break, queries fail, reports show errors, applications crash.

Example: API changes field name from "customerId" to "customer_id". All downstream systems using old name now receive null values. Data appears to be missing customers entirely.

Problem 7: Outliers and Anomalies

Description: Values far outside expected ranges—sometimes errors, sometimes legitimate edge cases.

Causes:

Data entry errors (extra zeros, decimal place mistakes)
Sensor malfunctions
System glitches
Actual rare events
Test data mixed with production data

Impact: Statistical analyses distorted. Algorithms trained on outliers learn wrong patterns.

Example: Sales dataset shows transaction for $1,000,000 when typical transaction is $100. Investigation reveals data entry error—should have been $1,000. Including this in average sales calculation drastically skews results.

The Cost and Impact of Poor Data Quality

Data quality problems create tangible and intangible costs.

Direct Costs

1. Wasted operational expenses

Poor data leads to inefficiency:

Sending marketing to wrong addresses (wasted postage)
Manufacturing wrong quantities (inventory costs)
Shipping to incorrect locations (logistics costs)
Processing duplicate transactions (refunds, reconciliation)

Research by IBM estimated poor data quality costs the US economy $3.1 trillion annually.

2. Analyst time spent on data cleaning

Studies consistently show analysts spend 50-80% of time cleaning and preparing data before analysis.

This is hugely expensive:

Senior analysts doing data janitor work
Delayed insights (cleaning takes weeks before analysis starts)
Frustration and burnout

3. Failed projects and initiatives

Data quality issues doom projects:

Machine learning models trained on bad data perform poorly
Data warehouse migration fails due to source data issues
Business intelligence dashboard produces wrong numbers, is abandoned
Customer segmentation based on incorrect data targets wrong people

4. Regulatory fines and legal costs

Poor data quality creates compliance violations:

GDPR requires accurate personal data—errors mean violations
Financial reporting errors from bad data (Sarbanes-Oxley violations)
Healthcare data errors causing patient harm (HIPAA issues)

Indirect Costs

1. Bad decisions

Executives making strategic decisions based on flawed data:

Expanding into market that doesn't exist (bad market research data)
Cutting product that's actually profitable (incorrect cost allocation)
Targeting wrong customer segments (flawed customer data)

2. Lost trust in data systems

Once users encounter errors, they stop trusting data:

Decision-makers revert to intuition instead of data-driven approaches
Reports are ignored or second-guessed constantly
Data initiatives lose funding and support

Rebuilding trust is far harder than building it initially.

3. Customer dissatisfaction

Poor data quality directly affects customers:

Wrong addresses mean delayed or missed deliveries
Duplicate records mean multiple unwanted contacts
Incorrect preferences mean irrelevant recommendations
Outdated information means inappropriate service

4. Competitive disadvantage

Competitors with better data quality:

Make faster, better decisions
Build better models and predictions
Serve customers more effectively
Optimize operations more efficiently

Detecting Data Quality Problems

You can't fix problems you don't know exist. Detection is critical.

Technique 1: Data Profiling

Automated statistical analysis revealing data characteristics.

Metrics to examine:

Completeness: % of null values per field
Cardinality: Number of distinct values (detect if field should be unique but isn't)
Distribution: Min, max, mean, median, standard deviation
Patterns: Common formats, frequent values
Outliers: Values far from typical ranges

Example: Profile customer age field. Find:

15% null values
Min: -5 (impossible), Max: 250 (unrealistic)
Mean: 145 (way too high—data quality issue)

Investigation reveals: Century being included in year (1945 stored as 45, code interprets as 1945 years old). Simple profiling exposed systematic error.

Technique 2: Validation Rules and Constraints

Define business rules and check data against them.

Types of rules:

Format constraints: Email must contain @, phone must be 10 digits
Range constraints: Age between 0-120, price > 0
Referential integrity: Foreign keys must reference existing records
Business logic: Order date must be before ship date
Uniqueness constraints: Email address appears only once

Implementation: Database constraints, validation in data entry forms, checks in data pipelines.

Example: Payment processing system enforces: amount > 0, currency code in ISO list, card number passes Luhn algorithm check. Invalid data rejected at entry, preventing propagation.

Technique 3: Duplicate Detection

Algorithms identifying similar or identical records representing same entity.

Approaches:

Exact matching: All fields identical (misses variations)
Fuzzy matching: String similarity algorithms (Levenshtein distance, Jaro-Winkler)
Probabilistic matching: Weight multiple fields, calculate match probability
Machine learning: Trained models predicting if records are duplicates

Challenge: Balancing false positives (flagging different entities as duplicates) vs. false negatives (missing actual duplicates).

Example: Customer database has:

John Smith, 123 Main St, jsmith@email.com
J Smith, 123 Main Street, jsmith@email.com

Fuzzy matching on name + exact matching on email + address normalization flags as likely duplicate. Manual review confirms—merge records.

Technique 4: Cross-System Reconciliation

Compare data across systems to identify inconsistencies.

Process:

Select common entities (customers, transactions, products)
Extract from multiple systems
Compare values for same entity
Flag discrepancies for investigation

Example: Reconcile sales recorded in POS system vs. inventory reduction in warehouse system. Mismatch indicates either sales data error, inventory tracking error, or theft. Daily reconciliation catches issues quickly.

Technique 5: Trend Analysis and Anomaly Detection

Monitor metrics over time—sudden changes often indicate quality issues.

What to monitor:

Completeness rates per field
Record counts (sudden spike or drop)
Value distributions (mean, variance changing)
Format pattern shifts
Error rates in validation checks

Example: Daily customer signups average 100. One day jumps to 10,000. Investigation reveals bot attack creating fake accounts. Without monitoring, fake data would pollute database.

Technique 6: User Feedback Loops

Frontline users encounter quality issues before analysts do.

Mechanisms:

Easy reporting buttons ("report data issue")
Regular check-ins with heavy data users
Review of support tickets mentioning data problems
Post-incident reviews when operational issues trace to data

Example: Sales reps report customers saying "that's not my address." Aggregate feedback reveals 20% of addresses in region are wrong. Investigation finds recent data migration introduced errors.

Preventing and Fixing Data Quality Problems

Prevention is vastly more effective than cure. But both are needed.

Prevention Strategy 1: Validation at Data Entry

Catch errors when data is created, not downstream.

Implementation:

Form validation: Required fields, format checks, range validation
Dropdown menus: For standardized values, prevent free-text entry
Auto-completion: Suggest valid values as user types
Confirmation screens: Show user what they entered, ask to confirm
Real-time checks: Verify against external sources (address validation APIs)

Example: E-commerce checkout validates shipping address against postal service database in real-time. Invalid address? User prompted to correct before order submits. Prevents undeliverable shipments.

Prevention Strategy 2: Standardization and Conventions

Enforce consistent formats across organization.

Standards to establish:

Naming conventions: How to represent names, addresses, product codes
Date/time formats: ISO 8601 or other standard, consistently applied
Unit standards: Always metric or always imperial, never mixed
Code lists: Standard values for categories, statuses, types
Identifiers: Unique ID schemes for key entities

Example: Company mandate: All dates stored as YYYY-MM-DD in UTC timezone. All systems comply. Eliminates date format ambiguity and timezone conversion errors.

Prevention Strategy 3: Master Data Management (MDM)

Single source of truth for critical entities.

Concept: Key entities (customers, products, suppliers, employees) managed centrally. Other systems reference master data rather than maintaining own copies.

Benefits:

One place to ensure quality
Updates propagate to all systems
Reduces inconsistencies and duplicates
Clear ownership and governance

Example: Customer master contains authoritative customer records. CRM, billing, support, and marketing systems all read from and write to customer master. Changes in one system update master and propagate everywhere. Eliminates sync issues.

Prevention Strategy 4: Data Governance

Organizational structure ensuring accountability and standards.

Components:

Data owners: Business experts responsible for specific data domains
Data stewards: Day-to-day management and quality monitoring
Policies: Standards, procedures, roles and responsibilities
Quality metrics: KPIs tracking data quality over time
Issue resolution processes: How to report and fix problems

Cultural element: Make data quality everyone's responsibility, not just IT or data team.

Fixing Strategy 1: Data Cleaning and Remediation

Systematic correction of known issues.

Techniques:

De-duplication: Merge duplicate records using matching rules
Standardization: Convert to standard formats (parse and reformat addresses, phone numbers)
Correction: Fix known errors (typos, wrong values)
Enrichment: Append missing data from external sources
Validation and filtering: Remove or quarantine records failing quality checks

Tools: Data quality platforms (Informatica, Talend, Trifacta), custom scripts, SQL queries.

Example: Batch process runs nightly on customer database:

Detect duplicates using fuzzy matching
Merge duplicates, keeping most complete record
Standardize all phone numbers to (XXX) XXX-XXXX format
Validate addresses against postal database, flag invalid ones
Report showing issues fixed and remaining problems

Fixing Strategy 2: Root Cause Analysis

Don't just fix symptoms—identify and eliminate causes.

Process:

Document quality issue
Investigate: How did bad data enter system?
Identify root cause (bad process, missing validation, user error, system bug)
Implement fix at source
Clean existing bad data
Monitor to ensure fix worked

Example: Analysis shows 30% of product descriptions have incorrect specifications. Root cause investigation reveals:

Vendors submit data via email
Data entry team manually types into system
No validation against vendor data
Frequent typos and misinterpretation

Fix: Implement automated vendor portal where vendors enter data directly into system with validation rules. Reduces manual entry, catches errors at source. Existing data cleaned and validated with vendors.

Organizational Approaches to Data Quality Management

Sustainable quality requires organizational capability, not just technical fixes.

Approach 1: Centralized Data Quality Team

Dedicated team responsible for quality across organization.

Responsibilities:

Define quality standards and metrics
Build and maintain data quality tools
Monitor quality dashboards
Coordinate cleanup efforts
Train organization on quality practices

Pros: Expertise concentration, consistent approaches, clear ownership.

Cons: Can become bottleneck, may be disconnected from business context.

When effective: Large organizations with complex data landscapes needing specialized skills.

Approach 2: Federated/Distributed Ownership

Quality managed by data domain owners with central governance.

Model:

Business units own their data domains (sales owns customer data, supply chain owns inventory data)
Central governance sets standards and policies
Data stewards in each domain ensure quality
Central team provides tools, training, and oversight

Pros: Business expertise applied to data, ownership clear, scales better.

Cons: Requires mature organization, coordination challenges.

When effective: Organizations where data quality is business-critical and domain knowledge essential.

Approach 3: Continuous Quality Monitoring

Ongoing measurement rather than periodic assessments.

Implementation:

Dashboards: Real-time visibility into quality metrics
Automated alerts: Notify when quality thresholds breached
Quality gates: Prevent bad data from entering critical systems
SLAs: Define acceptable quality levels for different data

Example: Data pipeline includes quality checks after each transformation step. Completeness, validity, consistency checked. If quality falls below threshold, pipeline pauses and alerts team. Bad data doesn't propagate to analytics or production systems.

Approach 4: Quality by Design

Build quality into systems and processes from the start rather than fixing later.

Principles:

Fail fast: Reject invalid data at entry point
Minimize manual entry: Automate data capture when possible (APIs, integrations, sensors)
Simplify: Fewer fields, clearer definitions, less room for error
Validate continuously: Quality checks throughout data lifecycle
Learn from failures: Every quality issue triggers process improvement

Example: Redesigning customer onboarding:

Before: 50-field form, manual entry, minimal validation, 40% error rate
After: Progressive profiling (collect data over time), API integrations pre-fill fields, real-time validation, dropdown menus for standard values, 5% error rate

Balancing Data Quality with Speed and Cost

Perfect data is impossible and unnecessary. Pragmatic approaches balance trade-offs.

Principle 1: Fit for Purpose

Quality required depends on use case.

High-quality requirements:

Financial reporting (regulatory compliance)
Medical records (patient safety)
Machine learning training data (model accuracy depends on it)
Customer-facing personalization (errors visible and harmful)

Lower-quality acceptable:

Exploratory analysis (rough trends)
Internal rough estimates
Prototyping and experimentation
Historical archive (not actively used)

Example: Customer email addresses—99.9% accuracy needed for transactional emails; 90% acceptable for one-time market research survey.

Principle 2: Risk-Based Prioritization

Focus quality efforts where impact is highest.

Prioritization matrix:

Data	Impact of Poor Quality	Quality Priority
Customer payment info	Very high (financial, legal, trust)	Highest
Product catalog	High (revenue, customer experience)	High
Customer preferences	Medium (personalization less effective)	Medium
Internal logs	Low (debugging harder but not critical)	Low

Invest most in highest priority data. Accept lower quality in low-priority areas.

Principle 3: Progressive Quality Improvement

Improve incrementally rather than attempting perfection immediately.

Approach:

Baseline: Measure current quality
Quick wins: Fix easiest high-impact issues first
Prevent new issues: Stop degradation (validation at entry)
Systematic cleanup: Gradually clean existing data
Continuous monitoring: Track improvement, catch regressions

Example:

Month 1: Add validation to data entry forms (prevent new bad data)
Month 2: De-duplicate customer database (quick win, high impact)
Month 3: Standardize address formats
Month 4: Enrich missing email addresses from third-party data
Month 5: Build ongoing monitoring dashboard

Each month delivers value. Avoids "boil the ocean" mega-project that never finishes.

Principle 4: Cost of Poor Quality Analysis

Justify quality investments by quantifying costs of poor quality.

Calculate:

Operational waste from errors
Analyst time cleaning data
Failed initiatives due to data issues
Customer impacts (lost sales, dissatisfaction)
Compliance risks

Example calculation:

5 analysts @ $100k/year spend 60% time cleaning = $300k/year
Marketing sends 1M emails/year, 20% to wrong addresses = $50k wasted
Data quality issues caused 3 project failures last year = $500k
Total cost of poor quality: ~$850k/year
Investment to improve quality: $200k (new tools and processes)
ROI: 4.25x, payback period: 3 months

Business case for quality investment becomes clear when costs are quantified.

Tools and Technologies for Data Quality

Technology doesn't solve organizational problems, but it helps execute solutions.

Data Profiling Tools

Analyze data to reveal quality issues

Examples: Ataccama ONE, Informatica Data Quality, Talend Data Preparation

Capabilities: Statistical profiling, pattern detection, completeness analysis, relationship discovery.

Data Validation and Cleansing

Automate detection and correction of quality issues

Examples: Trifacta Wrangler, OpenRefine, custom Python/R scripts

Capabilities: Format standardization, de-duplication, validation rules, transformation.

Master Data Management Platforms

Manage golden records for key entities

Examples: Informatica MDM, SAP Master Data Governance, Microsoft Master Data Services

Capabilities: Single source of truth, data stewardship workflows, conflict resolution, data governance.

Data Quality Monitoring and Observability

Continuous tracking of quality metrics

Examples: Great Expectations, Datafold, Monte Carlo Data, custom dashboards

Capabilities: Automated testing, anomaly detection, alerting, quality SLAs.

Data Governance Platforms

Manage policies, ownership, lineage, and accountability

Examples: Collibra, Alation, Informatica Data Governance

Capabilities: Data catalog, business glossary, policy management, lineage tracking, workflow automation.

Conclusion: Quality as Foundation, Not Afterthought

The most sophisticated analytics, elegant machine learning, and beautiful visualizations are worthless if built on poor-quality data. Data quality is not a technical problem solved once—it's an ongoing organizational capability requiring people, processes, and technology working together.

The key insights:

1. Poor data quality is expensive and pervasive—costing organizations trillions in waste, bad decisions, and missed opportunities. Most organizations underestimate these costs because they're diffuse rather than concentrated.

2. Quality has multiple dimensions—accuracy, completeness, consistency, timeliness, validity, and uniqueness all matter. Measuring only one dimension gives incomplete picture.

3. Prevention is vastly more effective than cure—catching errors at data creation (validation, standardization, good process design) costs far less than cleaning bad data later. Build quality in, don't inspect it in.

4. Detection requires multiple approaches—data profiling, validation rules, duplicate detection, cross-system reconciliation, trend monitoring, and user feedback all play roles. No single technique catches all issues.

5. Organizational capability matters more than technology—clear ownership, governance, standards, accountability, and quality culture drive success. Tools enable good processes but can't replace them.

6. Balance quality with pragmatism—perfect data is impossible and unnecessary. Focus quality efforts where impact is highest, accept lower quality where stakes are low, improve incrementally rather than seeking perfection.

7. Quality builds trust, poor quality destroys it—once users encounter errors, they stop trusting data systems. Rebuilding trust is far harder than building it initially. Consistent quality over time creates data-driven culture; poor quality kills it.

As quality pioneer W. Edwards Deming emphasized: "You can not inspect quality into a product." Similarly, you can't clean quality into data after the fact—you must build systems and processes that produce quality data from the start.

British Airways learned this lesson expensively. Your organization can learn it more cheaply by treating data quality as the foundation it is—not an afterthought to address when things go wrong.

Garbage in, garbage out. The corollary: Quality in, insight out. Your choice of investment determines which path you take.

References

Batini, C., & Scannapieco, M. (2016). Data and information quality: Dimensions, principles and techniques. Springer. https://doi.org/10.1007/978-3-319-24106-7

English, L. P. (1999). Improving data warehouse and business information quality: Methods for reducing costs and increasing profits. Wiley.

Haug, A., Zachariassen, F., & van Liempd, D. (2011). The costs of poor data quality. Journal of Industrial Engineering and Management, 4(2), 168–193. https://doi.org/10.3926/jiem.2011.v4n2.p168-193

IBM. (2016). Industrializing data quality: IBM Redbooks white paper. IBM Corporation.

Loshin, D. (2010). The practitioner's guide to data quality improvement. Morgan Kaufmann.

Nagle, T., Redman, T. C., & Sammon, D. (2017). Only 3% of companies' data meets basic quality standards. Harvard Business Review. https://hbr.org/2017/09/only-3-of-companies-data-meets-basic-quality-standards

Redman, T. C. (1998). The impact of poor data quality on the typical enterprise. Communications of the ACM, 41(2), 79–82. https://doi.org/10.1145/269012.269025

Redman, T. C. (2013). Data driven: Profiting from your most important business asset. Harvard Business Review Press.

Sebastian-Coleman, L. (2013). Measuring data quality for ongoing improvement: A data quality assessment framework. Morgan Kaufmann.

Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5–33. https://doi.org/10.1080/07421222.1996.11518099

Word count: 5,847 words

Share this article

Twitter Facebook LinkedIn Reddit Email WhatsApp Pocket Copy Link

When Notes Fly

Search

Popular Searches

Data Quality Problems Explained: Why Bad Data Ruins Analysis

Defining Data Quality: Dimensions and Standards

The Six Core Dimensions of Data Quality

Additional Quality Dimensions

Common Data Quality Problems

Problem 1: Missing Data

Problem 2: Duplicate Records

Problem 3: Inconsistent Formats and Standards

Problem 4: Incorrect Data

Problem 5: Data Integration Issues

Problem 6: Schema Evolution Problems

Problem 7: Outliers and Anomalies

The Cost and Impact of Poor Data Quality

Direct Costs

Indirect Costs

Detecting Data Quality Problems

Technique 1: Data Profiling

Technique 2: Validation Rules and Constraints

Technique 3: Duplicate Detection

Technique 4: Cross-System Reconciliation

Technique 5: Trend Analysis and Anomaly Detection

Technique 6: User Feedback Loops

Preventing and Fixing Data Quality Problems

Prevention Strategy 1: Validation at Data Entry

Prevention Strategy 2: Standardization and Conventions

Prevention Strategy 3: Master Data Management (MDM)

Prevention Strategy 4: Data Governance

Fixing Strategy 1: Data Cleaning and Remediation

Fixing Strategy 2: Root Cause Analysis

Organizational Approaches to Data Quality Management

Approach 1: Centralized Data Quality Team

Approach 2: Federated/Distributed Ownership

Approach 3: Continuous Quality Monitoring

Approach 4: Quality by Design

Balancing Data Quality with Speed and Cost

Principle 1: Fit for Purpose

Principle 2: Risk-Based Prioritization

Principle 3: Progressive Quality Improvement

Principle 4: Cost of Poor Quality Analysis

Tools and Technologies for Data Quality

Data Profiling Tools

Data Validation and Cleansing

Master Data Management Platforms

Data Quality Monitoring and Observability

Data Governance Platforms

Conclusion: Quality as Foundation, Not Afterthought

References

Tags

Share this article