Data Quality Problems Explained: Why Bad Data Ruins Analysis

In 2018, British Airways suffered a massive data breach affecting 380,000 customers—payment card details, names, and addresses stolen. The breach was serious. But the aftermath revealed an equally serious problem: BA couldn't determine with certainty who was affected.

Their customer database had severe quality issues. Duplicate records. Inconsistent formats. Missing fields. Outdated contact information. When BA needed to notify affected customers quickly, they couldn't trust their own data. Some customers received multiple notifications (duplicates). Others received none (missing or wrong contact data). The data quality problems, invisible during normal operations, became catastrophic during crisis response.

The incident cost BA £20 million in regulatory fines—but the data quality issues likely cost far more in lost customer trust, inefficient operations, and wasted resources on an ongoing basis.

This illustrates a fundamental truth: data quality problems are invisible until they're catastrophic. Organizations operate with poor-quality data for years, making suboptimal decisions, wasting resources, and eroding trust, without recognizing the root cause. The cost of poor data quality is pervasive but diffuse—death by a thousand cuts rather than one obvious disaster.

"Garbage in, garbage out"—this principle is well-known but poorly heeded. Even the most sophisticated machine learning algorithms, elegant visualizations, and rigorous statistical methods cannot overcome fundamentally flawed input data. Data quality is the foundation. Without it, everything built on top is unstable.

This article explains data quality problems comprehensively: what data quality means, common quality issues organizations face, how these problems impact analysis and decisions, techniques for detecting quality problems, strategies for prevention and improvement, organizational approaches to data quality management, and practical frameworks for balancing quality with cost and speed.


Defining Data Quality: Dimensions and Standards

Data quality measures how well data serves its intended purpose. Not a single characteristic—multiple dimensions.

The Six Core Dimensions of Data Quality

1. Accuracy

Definition: Data correctly represents the real-world entity or event it describes.

Examples of inaccuracy:

  • Customer's address in database is wrong (they moved)
  • Product price is incorrect (data entry error)
  • Sensor reading is off (calibration problem)
  • Transaction amount is wrong (software bug)

2. Completeness

Definition: All required data is present; no critical missing values.

Examples of incompleteness:

  • Customer record missing email address
  • Transaction record missing timestamp
  • Product description missing key specifications
  • Survey responses with blank required fields

3. Consistency

Definition: Same data represented identically across systems; no contradictions within or across datasets.

Examples of inconsistency:

  • Customer has different addresses in sales vs. billing systems
  • Product categorized differently in warehouse vs. website
  • Date formats vary across fields (MM/DD/YYYY vs. DD-MM-YYYY)
  • Customer name spelled differently in different records

4. Timeliness

Definition: Data is up-to-date and available when needed.

Examples of staleness:

  • Inventory data from yesterday (customer orders what appears in stock but isn't)
  • Customer credit status from last month (no longer accurate)
  • Dashboard showing week-old metrics (decisions based on outdated info)
  • Product catalog not reflecting recent changes

5. Validity

Definition: Data conforms to defined formats, types, and business rules.

Examples of invalidity:

  • Phone numbers with wrong number of digits
  • Dates like February 30th (impossible)
  • Age values of 250 years (unrealistic)
  • Email addresses without @ symbol

6. Uniqueness

Definition: No unintended duplicates; each real-world entity represented once.

Examples of duplication:

  • Same customer in database twice with slight name variations
  • Duplicate transaction records (charging customer twice)
  • Same product listed under different SKUs
  • Multiple employee records for same person

Additional Quality Dimensions

Beyond the core six, other dimensions matter in specific contexts:

Dimension Definition Example Issue
Integrity Relationships between data elements maintained correctly Foreign key pointing to non-existent record
Precision Appropriate level of detail Storing $1,234.567891 when cents precision sufficient
Believability Data perceived as true and credible by users Sales figures so off that users distrust entire system
Accessibility Data available to authorized users when needed Critical data locked in system only two people can access
Conformity Data follows standards and conventions Product codes don't match industry standards

Common Data Quality Problems

Understanding typical problems helps recognize them in your data.

Problem 1: Missing Data

Description: Records have null, blank, or absent values where data should exist.

Causes:

  • Fields not marked required in data entry forms
  • Integration processes that don't map all fields
  • Users skipping optional fields
  • Data loss during migrations or transformations
  • Sensors or systems failing to record values

Impact: Analysis excluding incomplete records may miss patterns or create bias. Algorithms can't process missing values without special handling.

Example: Customer survey with optional income field. 70% of respondents skip it. Analysis of income vs. product preference is impossible for most customers, and respondents who provide income may be systematically different (bias).

Problem 2: Duplicate Records

Description: Same real-world entity recorded multiple times with slight variations.

Causes:

  • Data entry by multiple people without checking for existing records
  • Merging data from multiple sources without de-duplication
  • Variations in names, addresses (abbreviations, typos, formatting)
  • Lack of unique identifiers
  • System bugs creating duplicate records

Impact: Inflated counts, double-counting in metrics, multiple mailings to same person, customer confusion when contacted multiple times.

Example: CRM system has "John Smith, 123 Main St, NYC" and "J. Smith, 123 Main Street, New York City" as separate records. Both get marketing emails. Metric showing "unique customers" is overstated.

Problem 3: Inconsistent Formats and Standards

Description: Same type of data represented differently across records or systems.

Causes:

  • No enforced standards for data entry
  • Different systems with different conventions
  • International operations with regional formats
  • Historical changes in standards not applied retroactively
  • Manual data entry without validation

Impact: Difficulty aggregating or joining data. Pattern matching fails. Manual cleanup required before analysis.

Examples:

  • Phone numbers: "(555) 123-4567" vs. "555-123-4567" vs. "5551234567"
  • Dates: "01/02/2026" (Jan 2 or Feb 1?), "2026-01-02", "January 2, 2026"
  • Names: "Smith, John" vs. "John Smith" vs. "SMITH, JOHN"
  • Units: Mixing metric and imperial measurements without labels

Problem 4: Incorrect Data

Description: Data present but factually wrong.

Causes:

  • Human error: Typos, transposed digits, misreading
  • Measurement error: Inaccurate instruments, calibration issues
  • Processing errors: Software bugs, calculation mistakes
  • Outdated data: Was correct when entered but situation changed
  • Intentional errors: Users gaming metrics or entering fake data

Impact: Wrong decisions based on false information. Loss of trust when errors discovered.

Example: E-commerce site's inventory count is wrong (software bug during last update). Shows 50 units in stock; actually have 5. Website accepts orders it can't fulfill. Customer dissatisfaction, refunds, revenue loss.

Problem 5: Data Integration Issues

Description: Problems arising when combining data from multiple sources.

Causes:

  • Different schemas and field names across systems
  • Conflicting keys or identifiers
  • Timing differences (one system updates hourly, another daily)
  • Different business rules or definitions
  • Transformations introducing errors

Impact: Integrated dataset has quality issues even if source systems are individually sound.

Example: Merging customer data from website (uses email as key) and store (uses phone as key). Can't reliably link records. Some customers duplicated, others with conflicting information.

Problem 6: Schema Evolution Problems

Description: Changes to data structures break downstream processes.

Causes:

  • Adding or removing fields without coordinating with consumers
  • Changing data types (string to number)
  • Renaming fields
  • Changing meaning of existing fields
  • Lack of versioning or migration planning

Impact: Pipelines break, queries fail, reports show errors, applications crash.

Example: API changes field name from "customerId" to "customer_id". All downstream systems using old name now receive null values. Data appears to be missing customers entirely.

Problem 7: Outliers and Anomalies

Description: Values far outside expected ranges—sometimes errors, sometimes legitimate edge cases.

Causes:

  • Data entry errors (extra zeros, decimal place mistakes)
  • Sensor malfunctions
  • System glitches
  • Actual rare events
  • Test data mixed with production data

Impact: Statistical analyses distorted. Algorithms trained on outliers learn wrong patterns.

Example: Sales dataset shows transaction for $1,000,000 when typical transaction is $100. Investigation reveals data entry error—should have been $1,000. Including this in average sales calculation drastically skews results.


The Cost and Impact of Poor Data Quality

Data quality problems create tangible and intangible costs.

Direct Costs

1. Wasted operational expenses

Poor data leads to inefficiency:

  • Sending marketing to wrong addresses (wasted postage)
  • Manufacturing wrong quantities (inventory costs)
  • Shipping to incorrect locations (logistics costs)
  • Processing duplicate transactions (refunds, reconciliation)

Research by IBM estimated poor data quality costs the US economy $3.1 trillion annually.

2. Analyst time spent on data cleaning

Studies consistently show analysts spend 50-80% of time cleaning and preparing data before analysis.

This is hugely expensive:

  • Senior analysts doing data janitor work
  • Delayed insights (cleaning takes weeks before analysis starts)
  • Frustration and burnout

3. Failed projects and initiatives

Data quality issues doom projects:

  • Machine learning models trained on bad data perform poorly
  • Data warehouse migration fails due to source data issues
  • Business intelligence dashboard produces wrong numbers, is abandoned
  • Customer segmentation based on incorrect data targets wrong people

4. Regulatory fines and legal costs

Poor data quality creates compliance violations:

  • GDPR requires accurate personal data—errors mean violations
  • Financial reporting errors from bad data (Sarbanes-Oxley violations)
  • Healthcare data errors causing patient harm (HIPAA issues)

Indirect Costs

1. Bad decisions

Executives making strategic decisions based on flawed data:

  • Expanding into market that doesn't exist (bad market research data)
  • Cutting product that's actually profitable (incorrect cost allocation)
  • Targeting wrong customer segments (flawed customer data)

2. Lost trust in data systems

Once users encounter errors, they stop trusting data:

  • Decision-makers revert to intuition instead of data-driven approaches
  • Reports are ignored or second-guessed constantly
  • Data initiatives lose funding and support

Rebuilding trust is far harder than building it initially.

3. Customer dissatisfaction

Poor data quality directly affects customers:

  • Wrong addresses mean delayed or missed deliveries
  • Duplicate records mean multiple unwanted contacts
  • Incorrect preferences mean irrelevant recommendations
  • Outdated information means inappropriate service

4. Competitive disadvantage

Competitors with better data quality:

  • Make faster, better decisions
  • Build better models and predictions
  • Serve customers more effectively
  • Optimize operations more efficiently

Detecting Data Quality Problems

You can't fix problems you don't know exist. Detection is critical.

Technique 1: Data Profiling

Automated statistical analysis revealing data characteristics.

Metrics to examine:

  • Completeness: % of null values per field
  • Cardinality: Number of distinct values (detect if field should be unique but isn't)
  • Distribution: Min, max, mean, median, standard deviation
  • Patterns: Common formats, frequent values
  • Outliers: Values far from typical ranges

Example: Profile customer age field. Find:

  • 15% null values
  • Min: -5 (impossible), Max: 250 (unrealistic)
  • Mean: 145 (way too high—data quality issue)

Investigation reveals: Century being included in year (1945 stored as 45, code interprets as 1945 years old). Simple profiling exposed systematic error.

Technique 2: Validation Rules and Constraints

Define business rules and check data against them.

Types of rules:

  • Format constraints: Email must contain @, phone must be 10 digits
  • Range constraints: Age between 0-120, price > 0
  • Referential integrity: Foreign keys must reference existing records
  • Business logic: Order date must be before ship date
  • Uniqueness constraints: Email address appears only once

Implementation: Database constraints, validation in data entry forms, checks in data pipelines.

Example: Payment processing system enforces: amount > 0, currency code in ISO list, card number passes Luhn algorithm check. Invalid data rejected at entry, preventing propagation.

Technique 3: Duplicate Detection

Algorithms identifying similar or identical records representing same entity.

Approaches:

  • Exact matching: All fields identical (misses variations)
  • Fuzzy matching: String similarity algorithms (Levenshtein distance, Jaro-Winkler)
  • Probabilistic matching: Weight multiple fields, calculate match probability
  • Machine learning: Trained models predicting if records are duplicates

Challenge: Balancing false positives (flagging different entities as duplicates) vs. false negatives (missing actual duplicates).

Example: Customer database has:

  1. John Smith, 123 Main St, jsmith@email.com
  2. J Smith, 123 Main Street, jsmith@email.com

Fuzzy matching on name + exact matching on email + address normalization flags as likely duplicate. Manual review confirms—merge records.

Technique 4: Cross-System Reconciliation

Compare data across systems to identify inconsistencies.

Process:

  • Select common entities (customers, transactions, products)
  • Extract from multiple systems
  • Compare values for same entity
  • Flag discrepancies for investigation

Example: Reconcile sales recorded in POS system vs. inventory reduction in warehouse system. Mismatch indicates either sales data error, inventory tracking error, or theft. Daily reconciliation catches issues quickly.

Technique 5: Trend Analysis and Anomaly Detection

Monitor metrics over time—sudden changes often indicate quality issues.

What to monitor:

  • Completeness rates per field
  • Record counts (sudden spike or drop)
  • Value distributions (mean, variance changing)
  • Format pattern shifts
  • Error rates in validation checks

Example: Daily customer signups average 100. One day jumps to 10,000. Investigation reveals bot attack creating fake accounts. Without monitoring, fake data would pollute database.

Technique 6: User Feedback Loops

Frontline users encounter quality issues before analysts do.

Mechanisms:

  • Easy reporting buttons ("report data issue")
  • Regular check-ins with heavy data users
  • Review of support tickets mentioning data problems
  • Post-incident reviews when operational issues trace to data

Example: Sales reps report customers saying "that's not my address." Aggregate feedback reveals 20% of addresses in region are wrong. Investigation finds recent data migration introduced errors.


Preventing and Fixing Data Quality Problems

Prevention is vastly more effective than cure. But both are needed.

Prevention Strategy 1: Validation at Data Entry

Catch errors when data is created, not downstream.

Implementation:

  • Form validation: Required fields, format checks, range validation
  • Dropdown menus: For standardized values, prevent free-text entry
  • Auto-completion: Suggest valid values as user types
  • Confirmation screens: Show user what they entered, ask to confirm
  • Real-time checks: Verify against external sources (address validation APIs)

Example: E-commerce checkout validates shipping address against postal service database in real-time. Invalid address? User prompted to correct before order submits. Prevents undeliverable shipments.

Prevention Strategy 2: Standardization and Conventions

Enforce consistent formats across organization.

Standards to establish:

  • Naming conventions: How to represent names, addresses, product codes
  • Date/time formats: ISO 8601 or other standard, consistently applied
  • Unit standards: Always metric or always imperial, never mixed
  • Code lists: Standard values for categories, statuses, types
  • Identifiers: Unique ID schemes for key entities

Example: Company mandate: All dates stored as YYYY-MM-DD in UTC timezone. All systems comply. Eliminates date format ambiguity and timezone conversion errors.

Prevention Strategy 3: Master Data Management (MDM)

Single source of truth for critical entities.

Concept: Key entities (customers, products, suppliers, employees) managed centrally. Other systems reference master data rather than maintaining own copies.

Benefits:

  • One place to ensure quality
  • Updates propagate to all systems
  • Reduces inconsistencies and duplicates
  • Clear ownership and governance

Example: Customer master contains authoritative customer records. CRM, billing, support, and marketing systems all read from and write to customer master. Changes in one system update master and propagate everywhere. Eliminates sync issues.

Prevention Strategy 4: Data Governance

Organizational structure ensuring accountability and standards.

Components:

  • Data owners: Business experts responsible for specific data domains
  • Data stewards: Day-to-day management and quality monitoring
  • Policies: Standards, procedures, roles and responsibilities
  • Quality metrics: KPIs tracking data quality over time
  • Issue resolution processes: How to report and fix problems

Cultural element: Make data quality everyone's responsibility, not just IT or data team.

Fixing Strategy 1: Data Cleaning and Remediation

Systematic correction of known issues.

Techniques:

  • De-duplication: Merge duplicate records using matching rules
  • Standardization: Convert to standard formats (parse and reformat addresses, phone numbers)
  • Correction: Fix known errors (typos, wrong values)
  • Enrichment: Append missing data from external sources
  • Validation and filtering: Remove or quarantine records failing quality checks

Tools: Data quality platforms (Informatica, Talend, Trifacta), custom scripts, SQL queries.

Example: Batch process runs nightly on customer database:

  1. Detect duplicates using fuzzy matching
  2. Merge duplicates, keeping most complete record
  3. Standardize all phone numbers to (XXX) XXX-XXXX format
  4. Validate addresses against postal database, flag invalid ones
  5. Report showing issues fixed and remaining problems

Fixing Strategy 2: Root Cause Analysis

Don't just fix symptoms—identify and eliminate causes.

Process:

  1. Document quality issue
  2. Investigate: How did bad data enter system?
  3. Identify root cause (bad process, missing validation, user error, system bug)
  4. Implement fix at source
  5. Clean existing bad data
  6. Monitor to ensure fix worked

Example: Analysis shows 30% of product descriptions have incorrect specifications. Root cause investigation reveals:

  • Vendors submit data via email
  • Data entry team manually types into system
  • No validation against vendor data
  • Frequent typos and misinterpretation

Fix: Implement automated vendor portal where vendors enter data directly into system with validation rules. Reduces manual entry, catches errors at source. Existing data cleaned and validated with vendors.


Organizational Approaches to Data Quality Management

Sustainable quality requires organizational capability, not just technical fixes.

Approach 1: Centralized Data Quality Team

Dedicated team responsible for quality across organization.

Responsibilities:

  • Define quality standards and metrics
  • Build and maintain data quality tools
  • Monitor quality dashboards
  • Coordinate cleanup efforts
  • Train organization on quality practices

Pros: Expertise concentration, consistent approaches, clear ownership.

Cons: Can become bottleneck, may be disconnected from business context.

When effective: Large organizations with complex data landscapes needing specialized skills.

Approach 2: Federated/Distributed Ownership

Quality managed by data domain owners with central governance.

Model:

  • Business units own their data domains (sales owns customer data, supply chain owns inventory data)
  • Central governance sets standards and policies
  • Data stewards in each domain ensure quality
  • Central team provides tools, training, and oversight

Pros: Business expertise applied to data, ownership clear, scales better.

Cons: Requires mature organization, coordination challenges.

When effective: Organizations where data quality is business-critical and domain knowledge essential.

Approach 3: Continuous Quality Monitoring

Ongoing measurement rather than periodic assessments.

Implementation:

  • Dashboards: Real-time visibility into quality metrics
  • Automated alerts: Notify when quality thresholds breached
  • Quality gates: Prevent bad data from entering critical systems
  • SLAs: Define acceptable quality levels for different data

Example: Data pipeline includes quality checks after each transformation step. Completeness, validity, consistency checked. If quality falls below threshold, pipeline pauses and alerts team. Bad data doesn't propagate to analytics or production systems.

Approach 4: Quality by Design

Build quality into systems and processes from the start rather than fixing later.

Principles:

  • Fail fast: Reject invalid data at entry point
  • Minimize manual entry: Automate data capture when possible (APIs, integrations, sensors)
  • Simplify: Fewer fields, clearer definitions, less room for error
  • Validate continuously: Quality checks throughout data lifecycle
  • Learn from failures: Every quality issue triggers process improvement

Example: Redesigning customer onboarding:

  • Before: 50-field form, manual entry, minimal validation, 40% error rate
  • After: Progressive profiling (collect data over time), API integrations pre-fill fields, real-time validation, dropdown menus for standard values, 5% error rate

Balancing Data Quality with Speed and Cost

Perfect data is impossible and unnecessary. Pragmatic approaches balance trade-offs.

Principle 1: Fit for Purpose

Quality required depends on use case.

High-quality requirements:

  • Financial reporting (regulatory compliance)
  • Medical records (patient safety)
  • Machine learning training data (model accuracy depends on it)
  • Customer-facing personalization (errors visible and harmful)

Lower-quality acceptable:

  • Exploratory analysis (rough trends)
  • Internal rough estimates
  • Prototyping and experimentation
  • Historical archive (not actively used)

Example: Customer email addresses—99.9% accuracy needed for transactional emails; 90% acceptable for one-time market research survey.

Principle 2: Risk-Based Prioritization

Focus quality efforts where impact is highest.

Prioritization matrix:

Data Impact of Poor Quality Quality Priority
Customer payment info Very high (financial, legal, trust) Highest
Product catalog High (revenue, customer experience) High
Customer preferences Medium (personalization less effective) Medium
Internal logs Low (debugging harder but not critical) Low

Invest most in highest priority data. Accept lower quality in low-priority areas.

Principle 3: Progressive Quality Improvement

Improve incrementally rather than attempting perfection immediately.

Approach:

  1. Baseline: Measure current quality
  2. Quick wins: Fix easiest high-impact issues first
  3. Prevent new issues: Stop degradation (validation at entry)
  4. Systematic cleanup: Gradually clean existing data
  5. Continuous monitoring: Track improvement, catch regressions

Example:

  • Month 1: Add validation to data entry forms (prevent new bad data)
  • Month 2: De-duplicate customer database (quick win, high impact)
  • Month 3: Standardize address formats
  • Month 4: Enrich missing email addresses from third-party data
  • Month 5: Build ongoing monitoring dashboard

Each month delivers value. Avoids "boil the ocean" mega-project that never finishes.

Principle 4: Cost of Poor Quality Analysis

Justify quality investments by quantifying costs of poor quality.

Calculate:

  • Operational waste from errors
  • Analyst time cleaning data
  • Failed initiatives due to data issues
  • Customer impacts (lost sales, dissatisfaction)
  • Compliance risks

Example calculation:

  • 5 analysts @ $100k/year spend 60% time cleaning = $300k/year
  • Marketing sends 1M emails/year, 20% to wrong addresses = $50k wasted
  • Data quality issues caused 3 project failures last year = $500k
  • Total cost of poor quality: ~$850k/year
  • Investment to improve quality: $200k (new tools and processes)
  • ROI: 4.25x, payback period: 3 months

Business case for quality investment becomes clear when costs are quantified.


Tools and Technologies for Data Quality

Technology doesn't solve organizational problems, but it helps execute solutions.

Data Profiling Tools

Analyze data to reveal quality issues

Examples: Ataccama ONE, Informatica Data Quality, Talend Data Preparation

Capabilities: Statistical profiling, pattern detection, completeness analysis, relationship discovery.

Data Validation and Cleansing

Automate detection and correction of quality issues

Examples: Trifacta Wrangler, OpenRefine, custom Python/R scripts

Capabilities: Format standardization, de-duplication, validation rules, transformation.

Master Data Management Platforms

Manage golden records for key entities

Examples: Informatica MDM, SAP Master Data Governance, Microsoft Master Data Services

Capabilities: Single source of truth, data stewardship workflows, conflict resolution, data governance.

Data Quality Monitoring and Observability

Continuous tracking of quality metrics

Examples: Great Expectations, Datafold, Monte Carlo Data, custom dashboards

Capabilities: Automated testing, anomaly detection, alerting, quality SLAs.

Data Governance Platforms

Manage policies, ownership, lineage, and accountability

Examples: Collibra, Alation, Informatica Data Governance

Capabilities: Data catalog, business glossary, policy management, lineage tracking, workflow automation.


Conclusion: Quality as Foundation, Not Afterthought

The most sophisticated analytics, elegant machine learning, and beautiful visualizations are worthless if built on poor-quality data. Data quality is not a technical problem solved once—it's an ongoing organizational capability requiring people, processes, and technology working together.

The key insights:

1. Poor data quality is expensive and pervasive—costing organizations trillions in waste, bad decisions, and missed opportunities. Most organizations underestimate these costs because they're diffuse rather than concentrated.

2. Quality has multiple dimensions—accuracy, completeness, consistency, timeliness, validity, and uniqueness all matter. Measuring only one dimension gives incomplete picture.

3. Prevention is vastly more effective than cure—catching errors at data creation (validation, standardization, good process design) costs far less than cleaning bad data later. Build quality in, don't inspect it in.

4. Detection requires multiple approaches—data profiling, validation rules, duplicate detection, cross-system reconciliation, trend monitoring, and user feedback all play roles. No single technique catches all issues.

5. Organizational capability matters more than technology—clear ownership, governance, standards, accountability, and quality culture drive success. Tools enable good processes but can't replace them.

6. Balance quality with pragmatism—perfect data is impossible and unnecessary. Focus quality efforts where impact is highest, accept lower quality where stakes are low, improve incrementally rather than seeking perfection.

7. Quality builds trust, poor quality destroys it—once users encounter errors, they stop trusting data systems. Rebuilding trust is far harder than building it initially. Consistent quality over time creates data-driven culture; poor quality kills it.

As quality pioneer W. Edwards Deming emphasized: "You can not inspect quality into a product." Similarly, you can't clean quality into data after the fact—you must build systems and processes that produce quality data from the start.

British Airways learned this lesson expensively. Your organization can learn it more cheaply by treating data quality as the foundation it is—not an afterthought to address when things go wrong.

Garbage in, garbage out. The corollary: Quality in, insight out. Your choice of investment determines which path you take.


References

Batini, C., & Scannapieco, M. (2016). Data and information quality: Dimensions, principles and techniques. Springer. https://doi.org/10.1007/978-3-319-24106-7

English, L. P. (1999). Improving data warehouse and business information quality: Methods for reducing costs and increasing profits. Wiley.

Haug, A., Zachariassen, F., & van Liempd, D. (2011). The costs of poor data quality. Journal of Industrial Engineering and Management, 4(2), 168–193. https://doi.org/10.3926/jiem.2011.v4n2.p168-193

IBM. (2016). Industrializing data quality: IBM Redbooks white paper. IBM Corporation.

Loshin, D. (2010). The practitioner's guide to data quality improvement. Morgan Kaufmann.

Nagle, T., Redman, T. C., & Sammon, D. (2017). Only 3% of companies' data meets basic quality standards. Harvard Business Review. https://hbr.org/2017/09/only-3-of-companies-data-meets-basic-quality-standards

Redman, T. C. (1998). The impact of poor data quality on the typical enterprise. Communications of the ACM, 41(2), 79–82. https://doi.org/10.1145/269012.269025

Redman, T. C. (2013). Data driven: Profiting from your most important business asset. Harvard Business Review Press.

Sebastian-Coleman, L. (2013). Measuring data quality for ongoing improvement: A data quality assessment framework. Morgan Kaufmann.

Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5–33. https://doi.org/10.1080/07421222.1996.11518099


Word count: 5,847 words