Data Quality Problems Explained: Why Bad Data Ruins Analysis
In 2018, British Airways suffered a massive data breach affecting 380,000 customers—payment card details, names, and addresses stolen. The breach was serious. But the aftermath revealed an equally serious problem: BA couldn't determine with certainty who was affected.
Their customer database had severe quality issues. Duplicate records. Inconsistent formats. Missing fields. Outdated contact information. When BA needed to notify affected customers quickly, they couldn't trust their own data. Some customers received multiple notifications (duplicates). Others received none (missing or wrong contact data). The data quality problems, invisible during normal operations, became catastrophic during crisis response.
The incident cost BA £20 million in regulatory fines—but the data quality issues likely cost far more in lost customer trust, inefficient operations, and wasted resources on an ongoing basis.
This illustrates a fundamental truth: data quality problems are invisible until they're catastrophic. Organizations operate with poor-quality data for years, making suboptimal decisions, wasting resources, and eroding trust, without recognizing the root cause. The cost of poor data quality is pervasive but diffuse—death by a thousand cuts rather than one obvious disaster.
"Garbage in, garbage out"—this principle is well-known but poorly heeded. Even the most sophisticated machine learning algorithms, elegant visualizations, and rigorous statistical methods cannot overcome fundamentally flawed input data. Data quality is the foundation. Without it, everything built on top is unstable.
This article explains data quality problems comprehensively: what data quality means, common quality issues organizations face, how these problems impact analysis and decisions, techniques for detecting quality problems, strategies for prevention and improvement, organizational approaches to data quality management, and practical frameworks for balancing quality with cost and speed.
Defining Data Quality: Dimensions and Standards
Data quality measures how well data serves its intended purpose. Not a single characteristic—multiple dimensions.
The Six Core Dimensions of Data Quality
1. Accuracy
Definition: Data correctly represents the real-world entity or event it describes.
Examples of inaccuracy:
- Customer's address in database is wrong (they moved)
- Product price is incorrect (data entry error)
- Sensor reading is off (calibration problem)
- Transaction amount is wrong (software bug)
2. Completeness
Definition: All required data is present; no critical missing values.
Examples of incompleteness:
- Customer record missing email address
- Transaction record missing timestamp
- Product description missing key specifications
- Survey responses with blank required fields
3. Consistency
Definition: Same data represented identically across systems; no contradictions within or across datasets.
Examples of inconsistency:
- Customer has different addresses in sales vs. billing systems
- Product categorized differently in warehouse vs. website
- Date formats vary across fields (MM/DD/YYYY vs. DD-MM-YYYY)
- Customer name spelled differently in different records
4. Timeliness
Definition: Data is up-to-date and available when needed.
Examples of staleness:
- Inventory data from yesterday (customer orders what appears in stock but isn't)
- Customer credit status from last month (no longer accurate)
- Dashboard showing week-old metrics (decisions based on outdated info)
- Product catalog not reflecting recent changes
5. Validity
Definition: Data conforms to defined formats, types, and business rules.
Examples of invalidity:
- Phone numbers with wrong number of digits
- Dates like February 30th (impossible)
- Age values of 250 years (unrealistic)
- Email addresses without @ symbol
6. Uniqueness
Definition: No unintended duplicates; each real-world entity represented once.
Examples of duplication:
- Same customer in database twice with slight name variations
- Duplicate transaction records (charging customer twice)
- Same product listed under different SKUs
- Multiple employee records for same person
Additional Quality Dimensions
Beyond the core six, other dimensions matter in specific contexts:
| Dimension | Definition | Example Issue |
|---|---|---|
| Integrity | Relationships between data elements maintained correctly | Foreign key pointing to non-existent record |
| Precision | Appropriate level of detail | Storing $1,234.567891 when cents precision sufficient |
| Believability | Data perceived as true and credible by users | Sales figures so off that users distrust entire system |
| Accessibility | Data available to authorized users when needed | Critical data locked in system only two people can access |
| Conformity | Data follows standards and conventions | Product codes don't match industry standards |
Common Data Quality Problems
Understanding typical problems helps recognize them in your data.
Problem 1: Missing Data
Description: Records have null, blank, or absent values where data should exist.
Causes:
- Fields not marked required in data entry forms
- Integration processes that don't map all fields
- Users skipping optional fields
- Data loss during migrations or transformations
- Sensors or systems failing to record values
Impact: Analysis excluding incomplete records may miss patterns or create bias. Algorithms can't process missing values without special handling.
Example: Customer survey with optional income field. 70% of respondents skip it. Analysis of income vs. product preference is impossible for most customers, and respondents who provide income may be systematically different (bias).
Problem 2: Duplicate Records
Description: Same real-world entity recorded multiple times with slight variations.
Causes:
- Data entry by multiple people without checking for existing records
- Merging data from multiple sources without de-duplication
- Variations in names, addresses (abbreviations, typos, formatting)
- Lack of unique identifiers
- System bugs creating duplicate records
Impact: Inflated counts, double-counting in metrics, multiple mailings to same person, customer confusion when contacted multiple times.
Example: CRM system has "John Smith, 123 Main St, NYC" and "J. Smith, 123 Main Street, New York City" as separate records. Both get marketing emails. Metric showing "unique customers" is overstated.
Problem 3: Inconsistent Formats and Standards
Description: Same type of data represented differently across records or systems.
Causes:
- No enforced standards for data entry
- Different systems with different conventions
- International operations with regional formats
- Historical changes in standards not applied retroactively
- Manual data entry without validation
Impact: Difficulty aggregating or joining data. Pattern matching fails. Manual cleanup required before analysis.
Examples:
- Phone numbers: "(555) 123-4567" vs. "555-123-4567" vs. "5551234567"
- Dates: "01/02/2026" (Jan 2 or Feb 1?), "2026-01-02", "January 2, 2026"
- Names: "Smith, John" vs. "John Smith" vs. "SMITH, JOHN"
- Units: Mixing metric and imperial measurements without labels
Problem 4: Incorrect Data
Description: Data present but factually wrong.
Causes:
- Human error: Typos, transposed digits, misreading
- Measurement error: Inaccurate instruments, calibration issues
- Processing errors: Software bugs, calculation mistakes
- Outdated data: Was correct when entered but situation changed
- Intentional errors: Users gaming metrics or entering fake data
Impact: Wrong decisions based on false information. Loss of trust when errors discovered.
Example: E-commerce site's inventory count is wrong (software bug during last update). Shows 50 units in stock; actually have 5. Website accepts orders it can't fulfill. Customer dissatisfaction, refunds, revenue loss.
Problem 5: Data Integration Issues
Description: Problems arising when combining data from multiple sources.
Causes:
- Different schemas and field names across systems
- Conflicting keys or identifiers
- Timing differences (one system updates hourly, another daily)
- Different business rules or definitions
- Transformations introducing errors
Impact: Integrated dataset has quality issues even if source systems are individually sound.
Example: Merging customer data from website (uses email as key) and store (uses phone as key). Can't reliably link records. Some customers duplicated, others with conflicting information.
Problem 6: Schema Evolution Problems
Description: Changes to data structures break downstream processes.
Causes:
- Adding or removing fields without coordinating with consumers
- Changing data types (string to number)
- Renaming fields
- Changing meaning of existing fields
- Lack of versioning or migration planning
Impact: Pipelines break, queries fail, reports show errors, applications crash.
Example: API changes field name from "customerId" to "customer_id". All downstream systems using old name now receive null values. Data appears to be missing customers entirely.
Problem 7: Outliers and Anomalies
Description: Values far outside expected ranges—sometimes errors, sometimes legitimate edge cases.
Causes:
- Data entry errors (extra zeros, decimal place mistakes)
- Sensor malfunctions
- System glitches
- Actual rare events
- Test data mixed with production data
Impact: Statistical analyses distorted. Algorithms trained on outliers learn wrong patterns.
Example: Sales dataset shows transaction for $1,000,000 when typical transaction is $100. Investigation reveals data entry error—should have been $1,000. Including this in average sales calculation drastically skews results.
The Cost and Impact of Poor Data Quality
Data quality problems create tangible and intangible costs.
Direct Costs
1. Wasted operational expenses
Poor data leads to inefficiency:
- Sending marketing to wrong addresses (wasted postage)
- Manufacturing wrong quantities (inventory costs)
- Shipping to incorrect locations (logistics costs)
- Processing duplicate transactions (refunds, reconciliation)
Research by IBM estimated poor data quality costs the US economy $3.1 trillion annually.
2. Analyst time spent on data cleaning
Studies consistently show analysts spend 50-80% of time cleaning and preparing data before analysis.
This is hugely expensive:
- Senior analysts doing data janitor work
- Delayed insights (cleaning takes weeks before analysis starts)
- Frustration and burnout
3. Failed projects and initiatives
Data quality issues doom projects:
- Machine learning models trained on bad data perform poorly
- Data warehouse migration fails due to source data issues
- Business intelligence dashboard produces wrong numbers, is abandoned
- Customer segmentation based on incorrect data targets wrong people
4. Regulatory fines and legal costs
Poor data quality creates compliance violations:
- GDPR requires accurate personal data—errors mean violations
- Financial reporting errors from bad data (Sarbanes-Oxley violations)
- Healthcare data errors causing patient harm (HIPAA issues)
Indirect Costs
1. Bad decisions
Executives making strategic decisions based on flawed data:
- Expanding into market that doesn't exist (bad market research data)
- Cutting product that's actually profitable (incorrect cost allocation)
- Targeting wrong customer segments (flawed customer data)
2. Lost trust in data systems
Once users encounter errors, they stop trusting data:
- Decision-makers revert to intuition instead of data-driven approaches
- Reports are ignored or second-guessed constantly
- Data initiatives lose funding and support
Rebuilding trust is far harder than building it initially.
3. Customer dissatisfaction
Poor data quality directly affects customers:
- Wrong addresses mean delayed or missed deliveries
- Duplicate records mean multiple unwanted contacts
- Incorrect preferences mean irrelevant recommendations
- Outdated information means inappropriate service
4. Competitive disadvantage
Competitors with better data quality:
- Make faster, better decisions
- Build better models and predictions
- Serve customers more effectively
- Optimize operations more efficiently
Detecting Data Quality Problems
You can't fix problems you don't know exist. Detection is critical.
Technique 1: Data Profiling
Automated statistical analysis revealing data characteristics.
Metrics to examine:
- Completeness: % of null values per field
- Cardinality: Number of distinct values (detect if field should be unique but isn't)
- Distribution: Min, max, mean, median, standard deviation
- Patterns: Common formats, frequent values
- Outliers: Values far from typical ranges
Example: Profile customer age field. Find:
- 15% null values
- Min: -5 (impossible), Max: 250 (unrealistic)
- Mean: 145 (way too high—data quality issue)
Investigation reveals: Century being included in year (1945 stored as 45, code interprets as 1945 years old). Simple profiling exposed systematic error.
Technique 2: Validation Rules and Constraints
Define business rules and check data against them.
Types of rules:
- Format constraints: Email must contain @, phone must be 10 digits
- Range constraints: Age between 0-120, price > 0
- Referential integrity: Foreign keys must reference existing records
- Business logic: Order date must be before ship date
- Uniqueness constraints: Email address appears only once
Implementation: Database constraints, validation in data entry forms, checks in data pipelines.
Example: Payment processing system enforces: amount > 0, currency code in ISO list, card number passes Luhn algorithm check. Invalid data rejected at entry, preventing propagation.
Technique 3: Duplicate Detection
Algorithms identifying similar or identical records representing same entity.
Approaches:
- Exact matching: All fields identical (misses variations)
- Fuzzy matching: String similarity algorithms (Levenshtein distance, Jaro-Winkler)
- Probabilistic matching: Weight multiple fields, calculate match probability
- Machine learning: Trained models predicting if records are duplicates
Challenge: Balancing false positives (flagging different entities as duplicates) vs. false negatives (missing actual duplicates).
Example: Customer database has:
- John Smith, 123 Main St, jsmith@email.com
- J Smith, 123 Main Street, jsmith@email.com
Fuzzy matching on name + exact matching on email + address normalization flags as likely duplicate. Manual review confirms—merge records.
Technique 4: Cross-System Reconciliation
Compare data across systems to identify inconsistencies.
Process:
- Select common entities (customers, transactions, products)
- Extract from multiple systems
- Compare values for same entity
- Flag discrepancies for investigation
Example: Reconcile sales recorded in POS system vs. inventory reduction in warehouse system. Mismatch indicates either sales data error, inventory tracking error, or theft. Daily reconciliation catches issues quickly.
Technique 5: Trend Analysis and Anomaly Detection
Monitor metrics over time—sudden changes often indicate quality issues.
What to monitor:
- Completeness rates per field
- Record counts (sudden spike or drop)
- Value distributions (mean, variance changing)
- Format pattern shifts
- Error rates in validation checks
Example: Daily customer signups average 100. One day jumps to 10,000. Investigation reveals bot attack creating fake accounts. Without monitoring, fake data would pollute database.
Technique 6: User Feedback Loops
Frontline users encounter quality issues before analysts do.
Mechanisms:
- Easy reporting buttons ("report data issue")
- Regular check-ins with heavy data users
- Review of support tickets mentioning data problems
- Post-incident reviews when operational issues trace to data
Example: Sales reps report customers saying "that's not my address." Aggregate feedback reveals 20% of addresses in region are wrong. Investigation finds recent data migration introduced errors.
Preventing and Fixing Data Quality Problems
Prevention is vastly more effective than cure. But both are needed.
Prevention Strategy 1: Validation at Data Entry
Catch errors when data is created, not downstream.
Implementation:
- Form validation: Required fields, format checks, range validation
- Dropdown menus: For standardized values, prevent free-text entry
- Auto-completion: Suggest valid values as user types
- Confirmation screens: Show user what they entered, ask to confirm
- Real-time checks: Verify against external sources (address validation APIs)
Example: E-commerce checkout validates shipping address against postal service database in real-time. Invalid address? User prompted to correct before order submits. Prevents undeliverable shipments.
Prevention Strategy 2: Standardization and Conventions
Enforce consistent formats across organization.
Standards to establish:
- Naming conventions: How to represent names, addresses, product codes
- Date/time formats: ISO 8601 or other standard, consistently applied
- Unit standards: Always metric or always imperial, never mixed
- Code lists: Standard values for categories, statuses, types
- Identifiers: Unique ID schemes for key entities
Example: Company mandate: All dates stored as YYYY-MM-DD in UTC timezone. All systems comply. Eliminates date format ambiguity and timezone conversion errors.
Prevention Strategy 3: Master Data Management (MDM)
Single source of truth for critical entities.
Concept: Key entities (customers, products, suppliers, employees) managed centrally. Other systems reference master data rather than maintaining own copies.
Benefits:
- One place to ensure quality
- Updates propagate to all systems
- Reduces inconsistencies and duplicates
- Clear ownership and governance
Example: Customer master contains authoritative customer records. CRM, billing, support, and marketing systems all read from and write to customer master. Changes in one system update master and propagate everywhere. Eliminates sync issues.
Prevention Strategy 4: Data Governance
Organizational structure ensuring accountability and standards.
Components:
- Data owners: Business experts responsible for specific data domains
- Data stewards: Day-to-day management and quality monitoring
- Policies: Standards, procedures, roles and responsibilities
- Quality metrics: KPIs tracking data quality over time
- Issue resolution processes: How to report and fix problems
Cultural element: Make data quality everyone's responsibility, not just IT or data team.
Fixing Strategy 1: Data Cleaning and Remediation
Systematic correction of known issues.
Techniques:
- De-duplication: Merge duplicate records using matching rules
- Standardization: Convert to standard formats (parse and reformat addresses, phone numbers)
- Correction: Fix known errors (typos, wrong values)
- Enrichment: Append missing data from external sources
- Validation and filtering: Remove or quarantine records failing quality checks
Tools: Data quality platforms (Informatica, Talend, Trifacta), custom scripts, SQL queries.
Example: Batch process runs nightly on customer database:
- Detect duplicates using fuzzy matching
- Merge duplicates, keeping most complete record
- Standardize all phone numbers to (XXX) XXX-XXXX format
- Validate addresses against postal database, flag invalid ones
- Report showing issues fixed and remaining problems
Fixing Strategy 2: Root Cause Analysis
Don't just fix symptoms—identify and eliminate causes.
Process:
- Document quality issue
- Investigate: How did bad data enter system?
- Identify root cause (bad process, missing validation, user error, system bug)
- Implement fix at source
- Clean existing bad data
- Monitor to ensure fix worked
Example: Analysis shows 30% of product descriptions have incorrect specifications. Root cause investigation reveals:
- Vendors submit data via email
- Data entry team manually types into system
- No validation against vendor data
- Frequent typos and misinterpretation
Fix: Implement automated vendor portal where vendors enter data directly into system with validation rules. Reduces manual entry, catches errors at source. Existing data cleaned and validated with vendors.
Organizational Approaches to Data Quality Management
Sustainable quality requires organizational capability, not just technical fixes.
Approach 1: Centralized Data Quality Team
Dedicated team responsible for quality across organization.
Responsibilities:
- Define quality standards and metrics
- Build and maintain data quality tools
- Monitor quality dashboards
- Coordinate cleanup efforts
- Train organization on quality practices
Pros: Expertise concentration, consistent approaches, clear ownership.
Cons: Can become bottleneck, may be disconnected from business context.
When effective: Large organizations with complex data landscapes needing specialized skills.
Approach 2: Federated/Distributed Ownership
Quality managed by data domain owners with central governance.
Model:
- Business units own their data domains (sales owns customer data, supply chain owns inventory data)
- Central governance sets standards and policies
- Data stewards in each domain ensure quality
- Central team provides tools, training, and oversight
Pros: Business expertise applied to data, ownership clear, scales better.
Cons: Requires mature organization, coordination challenges.
When effective: Organizations where data quality is business-critical and domain knowledge essential.
Approach 3: Continuous Quality Monitoring
Ongoing measurement rather than periodic assessments.
Implementation:
- Dashboards: Real-time visibility into quality metrics
- Automated alerts: Notify when quality thresholds breached
- Quality gates: Prevent bad data from entering critical systems
- SLAs: Define acceptable quality levels for different data
Example: Data pipeline includes quality checks after each transformation step. Completeness, validity, consistency checked. If quality falls below threshold, pipeline pauses and alerts team. Bad data doesn't propagate to analytics or production systems.
Approach 4: Quality by Design
Build quality into systems and processes from the start rather than fixing later.
Principles:
- Fail fast: Reject invalid data at entry point
- Minimize manual entry: Automate data capture when possible (APIs, integrations, sensors)
- Simplify: Fewer fields, clearer definitions, less room for error
- Validate continuously: Quality checks throughout data lifecycle
- Learn from failures: Every quality issue triggers process improvement
Example: Redesigning customer onboarding:
- Before: 50-field form, manual entry, minimal validation, 40% error rate
- After: Progressive profiling (collect data over time), API integrations pre-fill fields, real-time validation, dropdown menus for standard values, 5% error rate
Balancing Data Quality with Speed and Cost
Perfect data is impossible and unnecessary. Pragmatic approaches balance trade-offs.
Principle 1: Fit for Purpose
Quality required depends on use case.
High-quality requirements:
- Financial reporting (regulatory compliance)
- Medical records (patient safety)
- Machine learning training data (model accuracy depends on it)
- Customer-facing personalization (errors visible and harmful)
Lower-quality acceptable:
- Exploratory analysis (rough trends)
- Internal rough estimates
- Prototyping and experimentation
- Historical archive (not actively used)
Example: Customer email addresses—99.9% accuracy needed for transactional emails; 90% acceptable for one-time market research survey.
Principle 2: Risk-Based Prioritization
Focus quality efforts where impact is highest.
Prioritization matrix:
| Data | Impact of Poor Quality | Quality Priority |
|---|---|---|
| Customer payment info | Very high (financial, legal, trust) | Highest |
| Product catalog | High (revenue, customer experience) | High |
| Customer preferences | Medium (personalization less effective) | Medium |
| Internal logs | Low (debugging harder but not critical) | Low |
Invest most in highest priority data. Accept lower quality in low-priority areas.
Principle 3: Progressive Quality Improvement
Improve incrementally rather than attempting perfection immediately.
Approach:
- Baseline: Measure current quality
- Quick wins: Fix easiest high-impact issues first
- Prevent new issues: Stop degradation (validation at entry)
- Systematic cleanup: Gradually clean existing data
- Continuous monitoring: Track improvement, catch regressions
Example:
- Month 1: Add validation to data entry forms (prevent new bad data)
- Month 2: De-duplicate customer database (quick win, high impact)
- Month 3: Standardize address formats
- Month 4: Enrich missing email addresses from third-party data
- Month 5: Build ongoing monitoring dashboard
Each month delivers value. Avoids "boil the ocean" mega-project that never finishes.
Principle 4: Cost of Poor Quality Analysis
Justify quality investments by quantifying costs of poor quality.
Calculate:
- Operational waste from errors
- Analyst time cleaning data
- Failed initiatives due to data issues
- Customer impacts (lost sales, dissatisfaction)
- Compliance risks
Example calculation:
- 5 analysts @ $100k/year spend 60% time cleaning = $300k/year
- Marketing sends 1M emails/year, 20% to wrong addresses = $50k wasted
- Data quality issues caused 3 project failures last year = $500k
- Total cost of poor quality: ~$850k/year
- Investment to improve quality: $200k (new tools and processes)
- ROI: 4.25x, payback period: 3 months
Business case for quality investment becomes clear when costs are quantified.
Tools and Technologies for Data Quality
Technology doesn't solve organizational problems, but it helps execute solutions.
Data Profiling Tools
Analyze data to reveal quality issues
Examples: Ataccama ONE, Informatica Data Quality, Talend Data Preparation
Capabilities: Statistical profiling, pattern detection, completeness analysis, relationship discovery.
Data Validation and Cleansing
Automate detection and correction of quality issues
Examples: Trifacta Wrangler, OpenRefine, custom Python/R scripts
Capabilities: Format standardization, de-duplication, validation rules, transformation.
Master Data Management Platforms
Manage golden records for key entities
Examples: Informatica MDM, SAP Master Data Governance, Microsoft Master Data Services
Capabilities: Single source of truth, data stewardship workflows, conflict resolution, data governance.
Data Quality Monitoring and Observability
Continuous tracking of quality metrics
Examples: Great Expectations, Datafold, Monte Carlo Data, custom dashboards
Capabilities: Automated testing, anomaly detection, alerting, quality SLAs.
Data Governance Platforms
Manage policies, ownership, lineage, and accountability
Examples: Collibra, Alation, Informatica Data Governance
Capabilities: Data catalog, business glossary, policy management, lineage tracking, workflow automation.
Conclusion: Quality as Foundation, Not Afterthought
The most sophisticated analytics, elegant machine learning, and beautiful visualizations are worthless if built on poor-quality data. Data quality is not a technical problem solved once—it's an ongoing organizational capability requiring people, processes, and technology working together.
The key insights:
1. Poor data quality is expensive and pervasive—costing organizations trillions in waste, bad decisions, and missed opportunities. Most organizations underestimate these costs because they're diffuse rather than concentrated.
2. Quality has multiple dimensions—accuracy, completeness, consistency, timeliness, validity, and uniqueness all matter. Measuring only one dimension gives incomplete picture.
3. Prevention is vastly more effective than cure—catching errors at data creation (validation, standardization, good process design) costs far less than cleaning bad data later. Build quality in, don't inspect it in.
4. Detection requires multiple approaches—data profiling, validation rules, duplicate detection, cross-system reconciliation, trend monitoring, and user feedback all play roles. No single technique catches all issues.
5. Organizational capability matters more than technology—clear ownership, governance, standards, accountability, and quality culture drive success. Tools enable good processes but can't replace them.
6. Balance quality with pragmatism—perfect data is impossible and unnecessary. Focus quality efforts where impact is highest, accept lower quality where stakes are low, improve incrementally rather than seeking perfection.
7. Quality builds trust, poor quality destroys it—once users encounter errors, they stop trusting data systems. Rebuilding trust is far harder than building it initially. Consistent quality over time creates data-driven culture; poor quality kills it.
As quality pioneer W. Edwards Deming emphasized: "You can not inspect quality into a product." Similarly, you can't clean quality into data after the fact—you must build systems and processes that produce quality data from the start.
British Airways learned this lesson expensively. Your organization can learn it more cheaply by treating data quality as the foundation it is—not an afterthought to address when things go wrong.
Garbage in, garbage out. The corollary: Quality in, insight out. Your choice of investment determines which path you take.
References
Batini, C., & Scannapieco, M. (2016). Data and information quality: Dimensions, principles and techniques. Springer. https://doi.org/10.1007/978-3-319-24106-7
English, L. P. (1999). Improving data warehouse and business information quality: Methods for reducing costs and increasing profits. Wiley.
Haug, A., Zachariassen, F., & van Liempd, D. (2011). The costs of poor data quality. Journal of Industrial Engineering and Management, 4(2), 168–193. https://doi.org/10.3926/jiem.2011.v4n2.p168-193
IBM. (2016). Industrializing data quality: IBM Redbooks white paper. IBM Corporation.
Loshin, D. (2010). The practitioner's guide to data quality improvement. Morgan Kaufmann.
Nagle, T., Redman, T. C., & Sammon, D. (2017). Only 3% of companies' data meets basic quality standards. Harvard Business Review. https://hbr.org/2017/09/only-3-of-companies-data-meets-basic-quality-standards
Redman, T. C. (1998). The impact of poor data quality on the typical enterprise. Communications of the ACM, 41(2), 79–82. https://doi.org/10.1145/269012.269025
Redman, T. C. (2013). Data driven: Profiting from your most important business asset. Harvard Business Review Press.
Sebastian-Coleman, L. (2013). Measuring data quality for ongoing improvement: A data quality assessment framework. Morgan Kaufmann.
Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5–33. https://doi.org/10.1080/07421222.1996.11518099
Word count: 5,847 words