Source — AI tools can generate fluent, confident-sounding answers on almost any topic. That fluency is useful and dangerous in equal measure. A response can cite a paper that does not exist, quote a statistic with the wrong number, or describe an expert position that no expert holds - and do all of this without any signal that something has gone wrong.

This checklist is a structured protocol for evaluating an AI-generated answer before you act on it, share it, or use it in work that matters. It applies to any large language model output: ChatGPT, Claude, Gemini, Copilot, Perplexity, or any system that generates natural-language responses.

It is not a complete guide to AI systems. It is a practical evaluation tool designed to be used in real time, in about two minutes, on any claim that carries consequence.

Who This Is For

This checklist is designed for:

  • Students using AI tools for research, writing, or study.
  • Educators teaching AI literacy or critical evaluation skills.
  • Knowledge workers using AI in professional research, decision support, or writing.
  • Anyone who receives an AI-generated answer and needs to decide how much to trust it.

You do not need a technical background to use this checklist. You need a willingness to spend two minutes before trusting something that sounds authoritative.

What the Checklist Does Not Cover

This checklist covers source and claim evaluation. It does not cover legal compliance, accessibility requirements, or the ethics of deploying AI in specific professional contexts. For high-stakes domains - medicine, law, mental health, financial decisions - this checklist is a starting point, not a substitute for domain expertise.

The AI Source Trust Checklist

Work through each check in order. A failing check is not automatically disqualifying, but it raises the verification requirement for everything that follows.

Check 1: Is a Source Present?

What to look for: Does the AI response name a specific source - a paper, book, dataset, institution, or named expert - for the claim being made?

Red flags: Vague phrases like "studies show," "experts agree," "research indicates," or "according to scientists" without naming any specific study, researcher, or institution. These constructions can be accurate summaries of real evidence or entirely fabricated consensus. You cannot tell which without a named source.

Pass: A named source is present (author name, paper title, journal, institution, year, or URL).
Fail: No source named. Evaluate the claim with higher skepticism or seek an independent source before proceeding.

Check 2: Is the Source Reachable?

What to look for: Does the named source actually exist and can you access it?

AI hallucination most commonly manifests as fabricated citations: real-sounding author names, plausible journal names, legitimate-looking titles, and invented DOIs or URLs. The citation appears authoritative and is entirely fictional.

How to check:

  • Search the exact title in Google Scholar, PubMed, or the institution's own search.
  • Paste the DOI or URL into a browser and confirm the page exists.
  • If only an author and topic are given, search the author's publication list.

Pass: The source is findable and the page or abstract matches what the AI described.
Fail: The source cannot be found. Treat the claim as unsupported until you locate independent evidence.

Check 3: Is the Source Current Enough for the Claim?

What to look for: Is the publication date of the source appropriate for the type of claim being made?

Currency matters differently depending on the claim type. A 2003 paper on Stoic philosophy is highly relevant. A 2003 paper on AI capabilities is outdated by years of accelerating development. A 2019 study on COVID-19 treatment protocols may have been superseded by later research and regulatory guidance.

Ask:

  • Is this a domain where evidence changes rapidly (medicine, technology, policy, economics)?
  • Could more recent research or events have changed the picture?
  • Does the AI response give a publication year?

Pass: The source date is appropriate for the claim type and domain.
Fail: The source is outdated for a fast-moving domain, or no date is given. Check for more recent evidence.

Check 4: What Type of Claim Is This?

Why this matters: Different claim types require different evidence standards. Conflating them is one of the most common ways AI responses mislead without technically lying.

Identify which of the following best describes the claim:

Claim TypeExampleEvidence Standard
Fact"The Ebbinghaus forgetting curve was first published in 1885."Verifiable from primary source or authoritative reference.
Statistic"73% of employees report feeling disengaged."Requires named study, sample size, methodology, and date.
Interpretation"This suggests that spaced repetition improves long-term retention."Requires supporting evidence; check whether interpretation is well-established or contested.
Forecast"AI will automate 40% of jobs by 2030."Requires named model or analysis; forecasts are inherently uncertain and vary widely.
Advice"You should study in 25-minute focused sessions."Requires evidence base for the advice and disclosure of assumptions about your situation.
Opinion"The most important skill for knowledge workers is clear communication."No verification needed, but should be labeled as perspective, not finding.

Pass: You have identified the claim type and the response treats it appropriately (opinions are not presented as facts; forecasts include uncertainty; statistics include sample information).
Fail: A statistical claim is presented without sample size or methodology. A forecast is stated as certain. An interpretation is presented as proven fact. Require more specificity before proceeding.

Check 5: Does the Citation Agree With the Claim?

What to look for: When you read the actual source, does it say what the AI said it says?

AI systems can cite real papers to support claims those papers do not make, cite papers that contradict the claim, or describe findings at a different confidence level than the original research. This is distinct from fabrication - the paper exists, but it has been misrepresented.

How to check:

  • Read the abstract and conclusion of the cited paper, not just the title.
  • Check whether the study population, methodology, or scope match the claim being made.
  • Check whether the paper's stated confidence level (effect size, p-value, replication status) matches how the AI characterized the finding.

Pass: The source supports the claim at roughly the confidence level the AI described.
Fail: The source says something different, less certain, or narrower than claimed. Correct the claim before using it.

Check 6: Cross-Check One Claim Against an Independent Source

What to do: For the most important claim in the response, find one independent source that agrees, disagrees, or qualifies it - without using the AI as an intermediary.

AI responses can internally self-consistent while being consistently wrong. Cross-checking forces contact with evidence the AI did not select. One independent confirmation is often enough to establish reasonable confidence. One independent contradiction is a signal to investigate further.

Independent sources to use:

  • Google Scholar or PubMed for empirical claims.
  • Government or institutional databases for statistics and policy claims.
  • Primary documents (legislation, trial registrations, official reports) for legal or regulatory claims.
  • Major reference works (encyclopedias, canonical textbooks) for definitional or historical claims.

Pass: An independent source confirms, qualifies, or provides context for the claim.
Fail: No independent source confirms the claim, or independent sources contradict it. Lower your confidence in the claim.

Check 7: Are There Fabrication Signals?

What to look for: Specific patterns that commonly indicate fabricated citations or invented evidence.

Fabrication signals:

  • A named author who has no findable publication record in the relevant field.
  • A journal name that does not appear in publisher databases or is very similar to a real journal name (e.g., "Journal of Applied Cognitive Research" vs. "Applied Cognitive Psychology").
  • A DOI that resolves to a different paper, a broken page, or an error.
  • A URL that produces a 404, redirects to an unrelated page, or contains a plausible but non-existent path.
  • A publication year inconsistent with when the topic could have been studied (e.g., a 2019 study on a term that was not coined until 2022).
  • A statistic with suspiciously round numbers and no methodology note.

Pass: No fabrication signals present.
Fail: One or more fabrication signals detected. Discard the citation and find independent evidence if the claim matters.

Check 8: Does This Topic Require Higher Evidence?

What to look for: Is this claim in a domain where errors carry significant risk?

Certain topics warrant stronger verification regardless of how confident the AI response sounds:

  • Health and medicine: Symptoms, diagnoses, treatments, medications, clinical recommendations.
  • Mental health: Psychological conditions, therapeutic approaches, crisis guidance.
  • Law and regulation: Legal obligations, rights, compliance requirements, jurisdiction-specific rules.
  • Finance: Investment decisions, tax guidance, financial risk calculations.
  • Safety: Emergency procedures, physical safety, product safety.
  • Credentials and identity: Claims about specific people, their qualifications, or their stated positions.

For claims in these domains: verify with a domain professional or authoritative institutional source before acting, regardless of how well the previous checks scored.

Pass: Not a high-stakes domain, or high-stakes domain with authoritative source verified.
Fail: High-stakes domain, claim not independently verified by a domain professional or authoritative institution.

Check 9: Are Uncertainty and Limitations Explicit?

What to look for: Does the AI response acknowledge where evidence is limited, contested, or absent?

Good responses to uncertain questions include phrases like "the evidence is mixed," "this is an active area of debate," "this has not been studied in your specific population," or "this reflects the current best understanding, which may change." Responses that present contested topics as settled, or that give confident answers to questions with genuinely uncertain answers, are not reflecting the evidence accurately.

Ask:

  • Is this a topic where experts genuinely agree, or is there active debate?
  • Does the AI response acknowledge complexity, exceptions, or context dependencies?
  • Does the response distinguish between "we don't know" and "this is not the case"?

Pass: Uncertainty is acknowledged where appropriate; the response does not overstate consensus.
Fail: A contested or uncertain topic is presented as settled. Investigate the actual state of evidence before proceeding.

Check 10: Does the Verification Depth Match the Decision Risk?

What to look for: Are you applying the right level of scrutiny given what you will do with this information?

Verification is not free. The goal is calibrated scrutiny, not maximum scrutiny for everything. A rough framework:

Decision RiskExample UseAppropriate Verification
LowBackground understanding, casual reading, initial explorationChecks 1, 7 minimum. One cross-check if acting on the information.
MediumWriting a professional document, teaching a concept, making a moderate decisionAll 10 checks. Find one independent source for key claims.
HighAcademic research, publication, policy decision, professional advice, clinical or legal contextAll 10 checks. Primary source review. Domain expert verification. Do not rely on AI-generated citations alone.

Pass: Your verification effort matches what you are actually going to do with the information.
Fail: High-stakes use with minimal verification. Stop and apply appropriate checks before proceeding.

Scoring Rubric

After working through the checklist, use this rubric to characterize your overall trust level in the AI response.

High Trust

All relevant checks pass. Source is named, reachable, current, and agrees with the claim. Claim type is correctly identified. No fabrication signals. Uncertainty is acknowledged. Appropriate for medium-risk decisions and professional use with documentation of sources used.

Medium Trust

Most checks pass with one or two minor gaps (e.g., source is present but abstract-only, currency is slightly dated, minor uncertainty not acknowledged). Use with documented caveats. Do not use for high-stakes decisions without additional verification.

Low Trust

Multiple checks fail. Source is absent, unreachable, misrepresented, or shows fabrication signals. Claim type is unclear or misleadingly presented. Do not use in professional, academic, or high-stakes contexts without independent sourcing. Suitable only for rough orientation with explicit caveats that the information is unverified.

Discard

Source confirmed fabricated (paper does not exist, author is fictional). Citation directly contradicts the claim. High-stakes domain claim with no independent verification pathway. Do not use or share.

Worked Examples

Example 1: A Learning Science Claim

AI response: "According to a 2015 meta-analysis by Dunlosky et al., retrieval practice produces significantly larger learning gains than re-reading or highlighting, with effect sizes typically in the range of 0.5 to 0.8."

Running the checklist:

  • Check 1 (Source present): Pass. Named author, year, and study type provided.
  • Check 2 (Source reachable): Dunlosky et al. 2013 (not 2015) is a real published review in Psychological Science in the Public Interest, findable in Google Scholar. Year is slightly off - mild flag, not disqualifying.
  • Check 3 (Currency): Pass. Learning science evidence from 2013 on a well-established effect is appropriate.
  • Check 4 (Claim type): Statistical/empirical claim based on meta-analysis. Requires methodology check.
  • Check 5 (Citation agrees): The Dunlosky review does support retrieval practice over re-reading. Effect size range is broadly consistent with the literature, though the specific figures should be verified against the paper.
  • Check 7 (Fabrication): Author is real, study is real. Year error is minor and common. No fabrication indicators.
  • Check 8 (High stakes): No. Learning science for study or teaching purposes - medium risk at most.

Result: Medium-High Trust. Source is real and broadly supports the claim. Verify exact effect size figures against the paper before citing in academic work. Appropriate for teaching or study planning purposes as stated.

Example 2: A Technology Claim

AI response: "A 2023 Stanford study found that large language models hallucinate factual information in 27% of responses when asked about recent events."

Running the checklist:

  • Check 1 (Source present): Partial pass. Institution named, year given, but no author, paper title, or URL.
  • Check 2 (Source reachable): Cannot be verified from this information alone. "Stanford study" is not a specific source. Searches for Stanford LLM hallucination 2023 return multiple relevant papers; none immediately match this statistic.
  • Check 3 (Currency): Pass for the domain if the source is real.
  • Check 4 (Claim type): Statistic. Requires sample methodology, task type, and which models were tested.
  • Check 5 (Citation agrees): Cannot check without a locatable source.
  • Check 7 (Fabrication): Round number (27%), no methodology, no author - moderate fabrication signals.

Result: Low Trust. The statistic cannot be verified from the information given. Do not cite this figure. Search independently for LLM hallucination rate studies if you need a sourced statistic, and use the specific paper's methodology and scope rather than a decontextualized percentage.

Printable Checklist Version

Copy or print the following for classroom or professional use:

AI SOURCE TRUST CHECKLIST - WhenNotesFly Editorial
whennotesfly.com/technology/ai-machine-learning/ai-source-trust-checklist

CLAIM BEING EVALUATED:
_____________________________________________

CLAIM TYPE (circle): Fact / Statistic / Interpretation / Forecast / Advice / Opinion

? Check 1: A specific source is named (author, paper, institution, URL, or year).
? Check 2: The source is findable and exists at the location or in the database cited.
? Check 3: The source is current enough for the type of claim and domain.
? Check 4: The claim type is identified and treated appropriately (stats have methodology; forecasts show uncertainty; opinions are not presented as facts).
? Check 5: The actual source, when read, supports the claim at roughly the confidence level described.
? Check 6: At least one independent source (not AI-generated) confirms or qualifies the claim.
? Check 7: No fabrication signals are present (invented author, broken DOI, implausible date, suspiciously round statistic).
? Check 8: If a high-stakes domain (health, law, finance, mental health, safety), verified with a domain professional or authoritative institution.
? Check 9: Uncertainty and limitations are acknowledged where the evidence is genuinely mixed or limited.
? Check 10: Verification depth matches the risk level of the decision this will inform.

RESULT:
? High Trust - all relevant checks pass
? Medium Trust - minor gaps, use with documented caveats
? Low Trust - multiple failures, do not use without independent sourcing
? Discard - source confirmed fabricated or directly contradictory

NOTES:
_____________________________________________

Using This Checklist in Educational Settings

Suggested Learning Activity: Claim Audit

Learning outcome: Students can apply a structured evaluation protocol to AI-generated claims and distinguish verified from unverified information.

Duration: 20-30 minutes

Materials: This checklist (printed or on screen), access to Google Scholar or a library database, a short AI-generated text on a topic relevant to the course.

Procedure:

  1. Have students or participants generate a short AI response on any factual topic using a tool of their choice (ChatGPT, Claude, Gemini, Perplexity, or similar).
  2. Ask them to identify the three most specific factual claims in the response.
  3. For each claim, work through Checks 1-7 of the checklist.
  4. For the highest-stakes claim, complete Checks 8-10.
  5. Assign a Trust level to each claim and explain the reasoning.
  6. Debrief: Which checks caught the most problems? Where was the AI reliable? Where was it not?

Variation for professional learning: Use a domain-relevant AI response (e.g., a summary of current regulations, a description of a technical process, a set of career advice). Focus Checks 8-10 on the specific risk level of the professional context.

Limitations of This Checklist

This checklist does not eliminate the need for domain expertise. A person with deep knowledge of a field will catch errors this checklist does not - they will recognize implausible claims, know which researchers hold which positions, and notice when a paper's findings have been superseded or contested in ways not visible in a single abstract.

This checklist is designed for the gap between expert knowledge and zero scrutiny. It provides structured prompts for people who want to evaluate AI output carefully but lack the domain depth to spot errors automatically.

It also does not address questions of AI bias, representational harm, or the ethics of AI deployment in specific contexts. Those are important questions that require frameworks beyond source evaluation.

Sources and Evidence Base

This checklist draws on the following evidence base and published frameworks:

  • Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency.
  • Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students' learning with effective study techniques. Psychological Science in the Public Interest, 14(1), 4-58.
  • Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020). On faithfulness and factuality in abstractive summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
  • SIFT method: Caulfield, M. (2019). Web Literacy for Student Fact-Checkers. Pressbooks. (Stop, Investigate the source, Find better coverage, Trace claims.)
  • Stanford History Education Group. (2021). Civic Online Reasoning. sheg.stanford.edu
  • Wachsmuth, H., & Hou, Y. (2023). Towards a theory of the credibility of online sources. Computational Argumentation Research.
  • NAMLE (2019). Media Literacy Framework. namle.net

About This Resource

This checklist was researched and written by the WhenNotesFly editorial team. It is reviewed against current AI capabilities and updated when the evidence base or AI tool landscape changes materially. For corrections, updates, or classroom adaptation questions: editorial@whennotesfly.com

Last reviewed: May 2026. Published by WhenNotesFly - Editorial Standards - Corrections and Contact.

License: This checklist may be reproduced for non-commercial educational use with attribution: "WhenNotesFly AI Source Trust Checklist, whennotesfly.com/technology/ai-machine-learning/ai-source-trust-checklist."

Frequently Asked Questions

What is an AI source trust checklist?

An AI source trust checklist is a structured set of evaluation steps for assessing whether an AI-generated answer is based on real, verifiable evidence. It covers checks like whether a source is named and reachable, whether the citation agrees with the claim, and whether fabrication signals are present.

How do I know if an AI is hallucinating?

Common hallucination signals include citations that cannot be found in academic databases, author names with no publication record, DOI numbers that resolve to different papers or broken pages, statistics with no methodology note, and confidence about topics where the evidence is actually contested. The fabrication check in step 7 of this checklist covers the most reliable indicators.

Can I use this checklist in the classroom?

Yes. This checklist includes a suggested 20-30 minute learning activity where students evaluate AI-generated claims using the 10-check protocol. It is designed to be photocopied or projected, and can be adapted to any subject area where AI-generated text is being used.

Does this checklist work for all AI tools?

Yes. The checklist is designed for any large language model output including ChatGPT, Claude, Gemini, Copilot, Perplexity, and similar systems. The core evaluation steps - source present, reachable, current, agrees with claim, free of fabrication signals - apply regardless of which tool produced the response.

What is the difference between a hallucination and a mistake?

In AI systems, hallucination refers specifically to the generation of confident-sounding output that has no basis in the training data or real-world evidence - fabricated citations, invented statistics, fictional expert quotes. A mistake is an error in reasoning or calculation where the underlying inputs are real. Both require scrutiny, but fabricated citations are a distinct failure mode specific to language models.