A researcher at a major technology company recently ran an experiment. She gave the same AI model the same task -- analyzing a dataset for market trends -- with two different prompts. The first said, "Analyze this data." The second said, "You are a senior market analyst. Analyze this sales dataset for Q3 trends, focusing on regional variations. Present your findings as three key insights with supporting data points, formatted as bullet points." The first prompt produced a vague, generic summary. The second produced analysis that her team described as "genuinely useful." Same model, same data, dramatically different results. The difference was not intelligence but instruction.

Prompt engineering is the practice of crafting inputs to AI language models to reliably produce high-quality outputs. The term can sound overly technical for what is essentially the craft of communicating clearly with a powerful but unusual kind of intelligence -- one that is highly capable but also literal, context-dependent, and sensitive to framing in ways that human communication typically is not. Understanding how to communicate effectively with AI systems is increasingly a fundamental professional skill, relevant to anyone who uses AI tools in their work.

This article covers the established best practices for prompting large language models, including specific techniques, the reasoning behind each, and examples of the difference between effective and ineffective prompts. It is organized from foundational principles through advanced techniques, appropriate for both new AI users and experienced practitioners looking to improve their results.


The Foundation: What Prompts Actually Do

"Prompting is the art of creating conditions where the model's statistical tendencies align with your actual needs. The better you understand what the model is doing, the more reliably you can shape it." -- Andrej Karpathy, 2023

Technique Description When to Use Example
Role assignment Assign the model a specific professional identity to frame its response When domain expertise, tone, or perspective matters "You are a senior software architect reviewing code for production readiness."
Few-shot examples Provide examples of the desired input-output pattern When format or style must be very specific and hard to describe abstractly Show three examples of the desired analysis format before asking for the actual analysis
Chain-of-thought Ask the model to reason step by step before answering Complex multi-step reasoning tasks; math; logical problems "Think through this step by step before giving your final answer."
Output format specification Specify exactly how the response should be structured When the output must integrate with downstream processes or readers "Format your response as three bullet points, each under 20 words."
Context provision Give relevant background information the model cannot infer Domain-specific tasks; tasks involving your specific situation Provide product documentation before asking for feature prioritization advice
Constraint specification Explicitly state what the model should not do When default behavior produces unwanted content or scope "Do not include caveats or disclaimers. Focus only on actionable recommendations."

Before examining specific techniques, understanding what prompts actually do to AI models clarifies why they work or fail.

Large language models generate responses one token (roughly one word or word fragment) at a time, with each token selected based on the probability distribution over possible next tokens given everything that came before. The prompt is the initial context that shapes these probability distributions. A vague prompt creates a wide probability distribution -- many responses are plausible -- and the model produces something statistically likely but not specifically aimed at your need. A well-crafted prompt constrains the probability distribution toward responses in the range you actually want.

This statistical nature of language model generation explains several important prompt engineering principles:

  • Specificity narrows the distribution: More specific prompts produce responses in a narrower range
  • Context shifts the distribution: Context (role, format, audience) shifts which responses are most probable
  • Examples constrain the distribution: Examples show the model what "right" looks like, which shapes subsequent generation
  • Format instructions shape structure: Format requirements constrain not just content but how it is organized

Core Principles of Effective Prompting

Principle 1: Be Specific About the Task

The single most consistent predictor of prompt quality is task specificity. Vague prompts produce vague responses; specific prompts produce specific responses.

Elements of task specification:

What you want the AI to do: The verb matters. "Analyze," "summarize," "critique," "rewrite," "explain," and "generate" all request different cognitive operations. Be explicit about which one you want.

What input you are providing: Describe the input, especially if it could be interpreted multiple ways. "The attached transcript" is less clear than "the transcript of a 20-minute customer interview conducted with a SaaS startup founder."

What output you need: How long? What format? What level of detail? What perspective or tone? Each of these specifications constrains the output toward something useful for your purpose.

Weak prompt: "Summarize this article."

Strong prompt: "Summarize this article in 3-5 bullet points that capture the key findings. Write for an executive audience with limited time who wants to understand the practical implications for their business."

Principle 2: Provide Relevant Context

Language models have no inherent knowledge of your situation. They cannot know that you are a doctor asking about a medication for a patient context versus a curious layperson, that the email you want help writing is to your most difficult client, or that the audience for the presentation is skeptical of the technology being presented.

Context that consistently improves output quality:

Your role and expertise: "As a software architect reviewing this design proposal..." or "I'm a first-year medical student trying to understand..." allows the model to calibrate the depth and vocabulary of its response appropriately.

The audience: "Write this for a technically sophisticated audience who will ask hard questions about the methodology" produces a different result than "write this for a general audience at a community event."

The purpose: "I need this to be persuasive" produces different emphasis than "I need this to be comprehensive" or "I need this to help me think through the problem."

The constraints: Word limits, time constraints, available resources, and organizational constraints all affect what a good response looks like.

Example: A prompt for a product manager might read: "I'm preparing a one-page business case for internal stakeholders who are skeptical about the cost. I need to present the ROI argument for implementing customer service AI. The audience is financially oriented and will want hard numbers where possible. Draft this business case using the following cost data [data]."

This context shapes every aspect of an appropriate response: the tone, the emphasis on financial metrics, the acknowledgment of skepticism, the appropriate length.

Principle 3: Specify the Format You Need

Format instructions are among the most reliably effective prompt techniques because they directly constrain what the model produces.

Useful format specifications:

Length: "In 200 words or less," "in a single paragraph," "in a comprehensive analysis of at least 500 words," or simply "briefly" or "thoroughly." Length specifications prevent the common problem of responses that are either superficially brief or exhaustively detailed when you needed the opposite.

Structure: "As a numbered list," "as a table," "as headers and sub-bullets," "as a narrative prose summary." Structure specifications make output immediately usable rather than requiring reformatting.

Sections or components: "Include: an executive summary, three key findings, and recommended next steps." This ensures completeness and matches the output to a specific use case.

Perspective or voice: "Write in the active voice," "write from the customer's perspective," "write in a tone appropriate for a legal document."

Exclusions: "Do not include disclaimers," "avoid jargon," "do not recommend specific products." Exclusions prevent common model tendencies that are unhelpful for your specific purpose.

Principle 4: Use Examples (Few-Shot Prompting)

Providing examples of what you want -- showing the model what a good response looks like -- is one of the most powerful techniques in prompt engineering. This approach, called few-shot prompting, consistently outperforms detailed instructions alone for tasks with a specific desired style or format.

Why examples work better than instructions: Instructions describe what you want; examples demonstrate it. A model that has seen three examples of the specific type of analysis you are requesting is much better positioned to produce a similar analysis than one that has read a paragraph describing that type of analysis. Examples constrain the probability distribution toward the style and content of the examples.

Applying few-shot prompting:

For classification tasks: Provide labeled examples. "Classify the sentiment of these customer reviews as positive, negative, or neutral. Examples: 'Great product, fast shipping' = positive; 'The item broke after one week' = negative; 'It works as described, nothing more' = neutral. Now classify the following reviews: [reviews]."

For writing style tasks: Provide examples of the desired style. "Write three product descriptions in this style: [examples]. Now write a description for [product]."

For structured outputs: Provide an example of the desired output structure. "Extract the key information from each customer support ticket in this format: [example with filled-in format]. Now extract from the following tickets: [tickets]."

Principle 5: Break Complex Tasks into Steps

For complex, multi-step tasks, a single prompt that asks for everything simultaneously often produces worse results than a series of prompts that guide the model through intermediate steps.

This principle reflects how good reasoning works: complex problems benefit from decomposition into subproblems, sequential processing, and accumulation of intermediate results. A model that is asked to "analyze this market and produce a comprehensive competitive strategy" must simultaneously gather insights, synthesize them, and structure recommendations. A model that is first asked to "list the key competitors and their primary advantages," then asked to "identify the gaps that these competitors are not addressing," and then asked to "propose three strategic positions that could differentiate our product" is guided through a reasoning process that produces more reliable results.

Chain-of-thought prompting: Research published by Google in 2022 demonstrated that prompting language models to "think step by step" before answering complex questions substantially improves performance on reasoning tasks. Including the phrase "Let's think through this step by step" or "Show your reasoning before providing your conclusion" in prompts that require reasoning consistently improves output quality.

Example: For a complex analytical task, rather than a single prompt asking for the complete analysis, consider:

  1. "List the five most important trends affecting this industry based on the provided data."
  2. "For each trend, identify whether it represents an opportunity or a threat for a company with [specific characteristics]."
  3. "Based on this analysis, what are the three most important strategic priorities?"

Each step builds on the previous, and the intermediate outputs can be reviewed and corrected before proceeding.


Advanced Techniques

Role Prompting

Assigning the AI a specific role or persona can significantly affect the quality and character of responses in certain contexts. "You are an expert copywriter who specializes in direct-response marketing" produces different output than no role specification, especially for tasks where domain expertise matters.

When role prompting is most effective:

  • Tasks with clear domain expertise requirements (legal analysis, medical explanation, financial modeling)
  • Tasks where a specific perspective or voice is important (writing in a particular author's style, analyzing from a specific viewpoint)
  • Tasks where professional norms or standards apply (technical writing, formal communication)

Role prompting caveats: Role prompting does not give the model knowledge it doesn't have -- a model playing a "cybersecurity expert" does not have more cybersecurity knowledge than the same model without the role, though the role can help it organize and present existing knowledge more appropriately. Role prompting also does not override the model's safety training; a model playing a "character who knows how to hack systems" will not produce actually harmful hacking instructions.

Negative Instructions and Constraint Specification

Specifying what you do NOT want can be as valuable as specifying what you do want, because models sometimes have systematic tendencies that produce unhelpful outputs.

Common effective negative instructions:

  • "Do not begin your response with 'Of course!' or similar affirmations."
  • "Do not include disclaimers about seeking professional advice unless specifically relevant."
  • "Do not summarize what you are about to do; just do it."
  • "Avoid hedging language; state your conclusions directly."
  • "Do not include any information not directly requested."

These instructions address common model tendencies that are appropriate in some contexts but counterproductive in others.

Iterative Refinement

Prompt engineering is rarely a single-shot process. The most effective approach to complex or important tasks is iterative: produce an initial output, identify what is wrong or missing, and prompt for revisions that address those specific issues.

The revision prompt pattern: After an initial output, prompting with specific critiques produces more targeted improvements than trying to create a perfect initial prompt. "Revise the third paragraph to be more concise" or "add a section addressing how this approach handles edge cases" gives the model specific, actionable direction.

Building context across turns: In multi-turn conversations, the context of previous turns is available to the model. This allows building toward complex outputs gradually, with the ability to steer at each stage.

Temperature and Sampling Parameters

Many AI platforms allow control over "temperature" and related sampling parameters that affect how creative versus conservative model outputs are.

Temperature (typically 0-1 or 0-2 in different platforms): Lower temperature produces more predictable, consistent outputs. Higher temperature produces more varied, sometimes more creative outputs.

  • Low temperature (0.0-0.3): Best for tasks where consistency and reliability are important: code generation, data extraction, classification, factual question answering
  • Medium temperature (0.5-0.7): Best for most writing, analysis, and conversational tasks
  • High temperature (0.8-1.0+): Best for brainstorming, creative writing, and tasks where diversity of ideas is valued over consistency

Domain-Specific Prompt Patterns

For Code Generation

Effective code generation prompts specify:

  • Language and version: "Write this in Python 3.11 using only the standard library"
  • What the code must do: Specific input-output behavior
  • Constraints: Performance requirements, style guidelines, error handling requirements
  • What to avoid: Common pitfalls you want avoided
  • Test cases: Providing expected input-output pairs helps the model verify its own solution

Example code prompt: "Write a Python 3.11 function that takes a list of dictionaries and returns a new list sorted by the 'date' key in descending order. The date values are ISO 8601 formatted strings. Handle the case where the date key is missing by placing those items last. Include docstring and type hints."

For Analysis and Research

Effective analysis prompts specify:

  • The analytical framework: "Analyze using [framework]" or "approach this as a [type of analysis]"
  • The level of critical engagement: "Evaluate strengths and weaknesses" vs. "assume validity and identify implications"
  • The sources of evidence: "Using only the provided document" vs. "drawing on your training knowledge"
  • The depth and completeness: "Identify the three most important factors" vs. "provide a comprehensive analysis of all factors"

For Writing Assistance

Effective writing assistance prompts specify:

  • The purpose and audience: What the writing must accomplish and for whom
  • The tone and voice: Formal/informal, technical/accessible, warm/authoritative
  • The length and format: Specific word count, structure, components
  • Style references or examples: "In the style of [example]" or "similar in tone to [example]"

Common Prompting Mistakes

Ambiguity: Prompts that can be interpreted multiple ways will be interpreted multiple ways -- and typically in ways that do not match what you wanted. Review prompts for ambiguous terms and resolve them explicitly.

Assuming context: Prompts that assume context the model does not have ("improve this based on our previous discussion" when there is no previous discussion in the current session) produce off-target responses.

Overly long, undifferentiated instructions: Prompts that include many instructions without clear structure or priority may have some instructions effectively ignored. Structure complex instructions clearly and prioritize the most important ones.

Asking for everything at once: Complex tasks decomposed into clear steps consistently outperform single complex prompts. The investment in decomposition is typically recovered through the quality improvement.

Not iterating: The first response is often the starting point for the best response. Treating prompting as a dialogue -- reviewing outputs, identifying what is wrong, prompting for specific improvements -- consistently produces better final results than trying to craft perfect first prompts.

Accepting outputs uncritically: AI language models produce outputs that may be incorrect, incomplete, outdated, or biased. Prompting skill includes the judgment to evaluate outputs critically rather than accepting them at face value.

As AI tools become central to knowledge work, prompting skill becomes a professional competency comparable to writing skill or spreadsheet skill -- basic proficiency is valuable, and genuine expertise produces substantially better results. The techniques in this article are not complex or require technical background; they require only the discipline of thinking carefully about what you want and communicating it clearly.

See also: Practical AI Applications 2026, Training AI Models Explained, and Workflow Automation Ideas.


What Research Shows About Prompt Engineering Effectiveness

The systematic study of prompt engineering as a distinct discipline has accelerated significantly since 2022, producing quantitative evidence about which techniques deliver meaningful performance gains and under what conditions.

Jason Wei, a researcher at Google Brain, led the team that published "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (NeurIPS 2022). The study evaluated chain-of-thought prompting -- where models are explicitly asked to show their reasoning steps before producing a final answer -- across eight large language models and 23 benchmark datasets covering arithmetic, commonsense, and symbolic reasoning tasks. The key finding: chain-of-thought prompting produced accuracy improvements of up to 18.9 percentage points on the GSM8K math benchmark when applied to models with more than 100 billion parameters, but showed minimal improvement on smaller models. This parameter-count threshold finding has practical implications: chain-of-thought instruction phrases such as "Let's think step by step" reliably improve complex reasoning outputs on frontier models but should not be expected to help with smaller deployed models.

Jules White and colleagues at Vanderbilt University published "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT" (2023), identifying 16 distinct prompt patterns -- structured prompt designs with consistent internal structure -- that addressed recurring prompting challenges across business domains. The catalog emerged from systematic analysis of prompts collected from developers, researchers, and business users across six months. Patterns including the "Persona Pattern" (assigning expert roles), "Recipe Pattern" (requesting step-by-step completion of an incomplete specification), and "Output Automater Pattern" (requesting automation scripts alongside explanations) were associated with consistently higher user satisfaction on complex tasks, measured across 340 prompt-task pairs evaluated by domain experts. White's research established that the most effective prompts share structural characteristics across domains: explicit context framing, output format specification, and role assignment together accounted for the majority of variance in output quality ratings.

Sander Schulhoff at the University of Maryland led the compilation of "The Prompt Report: A Systematic Survey of Prompting Techniques" (arXiv 2024), the most comprehensive systematic review of prompting research to date, covering 58 text-based prompting techniques and 40 techniques for non-text modalities (image, audio, video). The meta-analysis of 1,565 papers identified that few-shot prompting consistently outperforms zero-shot prompting by 8-15 percentage points on classification tasks, and that the quality of examples provided in few-shot prompts matters more than the quantity -- two high-quality examples typically outperform five mediocre ones. The review also documented a reproducibility problem in prompting research: the same prompting technique applied to the same task on the same model produced results that varied by up to 20 percentage points depending on minor wording differences in how the technique was implemented. Schulhoff's conclusion is that prompting is more sensitive to implementation details than the research literature typically acknowledges, and that practitioners should treat published prompting technique recommendations as starting points requiring validation in their specific deployment context.

JD Zamfirescu-Pereira at UC Berkeley published "Why Johnny Can't Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts" (CHI 2023), a study that observed 20 non-expert participants attempting to design prompts for a chatbot application. The study found systematic patterns in how non-experts approach prompting that differ from expert approaches: non-experts relied heavily on natural-language instruction and failed to use examples even when examples would substantially improve performance; they abandoned promising prompt directions prematurely after single failures rather than iterating; and they consistently underestimated the value of explicit format specifications. Expert participants (defined as people with more than six months of deliberate prompt engineering experience) achieved task success rates approximately 60 percent higher than non-experts on the same prompting challenges, primarily through systematic use of few-shot examples, explicit output format specification, and iterative refinement. The study supports treating prompt engineering as a learnable skill with a meaningful competency gap between novice and expert practitioners.


Real-World Case Studies in Prompt Engineering Impact

GitHub Copilot and Structured Code Prompts. A 2022 study by researchers at GitHub, led by Sida Peng, Albert Xu, and colleagues, used a randomized controlled trial design with 95 professional developers to measure the productivity impact of AI-assisted coding. Developers assigned to use GitHub Copilot completed a JavaScript server implementation task 55.8 percent faster than the control group (median 71 minutes versus 160 minutes). The study found that prompt quality -- how developers phrased their natural language comments above functions to guide Copilot -- accounted for significant variation in suggestion quality within the treatment group. Developers who wrote descriptive function docstrings before implementation (providing Copilot with explicit context about inputs, outputs, and purpose) accepted suggestions at a 42 percent higher rate than developers who wrote minimal or no docstrings. The research, published as "The Impact of AI on Developer Productivity: Evidence from GitHub Copilot," established one of the first rigorous measurements of prompt quality's effect on downstream AI tool productivity in a professional setting.

Duolingo's GPT-4 Integration and Prompt Design. Duolingo, the language learning platform with over 500 million registered users, integrated GPT-4 into its "Duolingo Max" subscription in 2023, featuring a "Roleplay" exercise and "Explain My Answer" functionality. Edwin Bodge, Duolingo's principal engineer for AI features, described in a 2023 engineering blog post how prompt engineering accounted for the majority of development time. The team found that system prompts specifying the tutor persona (patient, encouraging, non-judgmental), the pedagogical constraints (never give away the answer directly; use the Socratic method), and the output format (respond only in the target language for roleplay; use simple vocabulary appropriate to the learner's level) were essential to producing educationally appropriate responses. Prompts without explicit pedagogical constraints produced responses that were linguistically correct but counterproductive for language learning -- providing direct translations rather than comprehension-building scaffolding. A/B testing of prompt variants showed that the explicit persona and constraint specification produced learner satisfaction scores 31 percent higher than prompts that relied on the model's default behavior.

Legal AI Prompt Standardization at Allen and Overy. Allen and Overy, a global law firm with 3,800 lawyers, became one of the first major law firms to deploy Harvey AI (an LLM-based legal research platform) at firm-wide scale in 2023. The firm's knowledge management team, led by practice innovation director Victoria Hobbs, developed a firm-wide prompt engineering guide for attorneys after an internal study found that unguided attorneys produced prompts that generated responses requiring substantial revision 68 percent of the time, while attorneys using structured prompt templates produced usable first drafts 71 percent of the time. The templates specified jurisdiction, area of law, applicable standards, audience for the output, and desired format -- the same elements that prompt engineering research identifies as critical for professional contexts. The firm reported in its 2024 annual innovation report that AI-assisted legal research reduced average research time on standard matters by 40 percent and that standardized prompt templates accounted for approximately half of that reduction, with the remaining half attributable to the underlying model capability.

Customer Service AI and Prompt Optimization at Klarna. Klarna's 2024 deployment of an AI customer service agent -- which by February 2024 was handling 2.3 million conversations per month and performing work equivalent to 700 full-time agents -- was underpinned by extensive prompt engineering for the system prompts governing agent behavior. Klarna's engineering team, in a technical writeup shared with the AI development community, described an iterative prompt development process spanning six months before launch, involving systematic measurement of resolution rate (the percentage of conversations resolved without human escalation), customer satisfaction scores, and average handle time. The team found that system prompts specifying escalation criteria (explicit rules for when the agent should hand off to a human), tone guidelines (empathetic but efficient), and information verification requirements (always confirm account details before processing changes) were the primary levers for improving resolution rate from an initial 42 percent to the published 66 percent. The prompt engineering investment -- six months of a four-person team -- was described by the company as the highest-ROI single technical investment in the product's development.


References

Frequently Asked Questions

What is prompt engineering and why does it matter?

Art and science of instructing AI models effectively. Matters because: same model gives vastly different results with different prompts, good prompting unlocks capabilities, and poor prompting wastes time. Like learning to ask questions well—skill that improves with practice.

What are basic principles of effective prompting?

Be specific (clear task definition), provide context, specify output format, give examples when helpful, break complex tasks into steps, set constraints/requirements, and iterate based on results. Clear instructions → better outputs. Vague prompts → unpredictable results.

What is few-shot prompting and when to use it?

Providing examples of desired input-output pairs before asking for new output. When: task needs specific format, style transfer, or clarifying ambiguous instructions. Examples teach model what you want better than descriptions. Zero-shot (no examples) vs. few-shot (with examples).

How do you prompt for complex multi-step tasks?

Break into sequential prompts (chain-of-thought), explicitly request step-by-step thinking, provide intermediate checks, or use system like AutoGPT for autonomous steps. Complex tasks: decompose, verify each step, iterate. Single complex prompt often fails—sequential simpler prompts work better.

What are common prompt engineering mistakes?

Too vague ('write about X'), no output format specification, overcomplicating (simpler often better), not iterating, assuming model knows context you haven't provided, and not validating outputs. Good prompting: clarity, specificity, iteration, validation.

How do you reduce hallucinations in LLM outputs?

Strategies: ask for sources/citations, prompt to say 'I don't know', provide relevant context, request step-by-step reasoning, lower temperature (more deterministic), and validate outputs. Can't eliminate entirely—models predict plausibility not truth. Validation always required.

Should you learn prompt engineering or just use AI naturally?

Depends on use: casual use—natural works fine. Professional/regular use—learning techniques pays off (better results, less frustration, time savings). Prompt engineering not rocket science but strategic thinking helps. Return on learning increases with frequency of use.