In 2023, a job posting from Anthropic advertised a "Prompt Engineer and Librarian" position with a salary range of $175,000-$335,000. The posting went viral, generating equal parts fascination and scepticism. Fascination because the salary range was startling for a role that seemed, on its surface, to involve writing instructions to a chatbot. Scepticism because critics questioned whether prompt engineering was a real discipline or an artefact of a transitional moment in AI development that would disappear as models improved.

Both reactions were partially right. Prompt engineering is a real and consequential skill in the current moment of AI deployment. The gap between an ad hoc interaction with a large language model and a carefully engineered system prompt — with examples, guardrails, and structured output requirements — can be the difference between a product that works reliably and one that fails unpredictably. At the same time, the most sceptical critics have a point: the portions of prompt engineering that involve manually crafting better phrasing are genuinely being automated by model improvements, and the long-term trajectory of the role is uncertain in ways that most in-demand technical skills are not.

This article provides an honest account of what prompt engineers actually do, which parts of the job description are substantive and which are marketing, what the genuine salary market looks like (rather than the top of the range cited in viral posts), who actually hires for this role, the technical depth the best practitioners bring to the work, and the unresolved question of whether prompt engineering is a durable career or a transitional specialisation.

"The most important skill in prompt engineering is not knowing the perfect phrasing — it is knowing how to systematically test what works and why." — Common observation in applied AI circles


Key Definitions

Large language model (LLM): A neural network trained on large text datasets that can generate text, answer questions, summarise documents, write code, and perform many other language-based tasks. Examples include GPT-4, Claude, Gemini, and Llama.

System prompt: Instructions given to an LLM before the user conversation begins, typically invisible to the end user. The system prompt establishes the model's persona, constraints, output format, and behavioural guidelines for an application. System prompt design is the foundational skill of prompt engineering.

Few-shot prompting: A prompting technique in which examples of desired input-output pairs are included in the prompt, teaching the model the pattern to follow through demonstration rather than instruction. More reliable than zero-shot instruction for complex tasks.

Chain-of-thought prompting: A technique that asks the model to show its reasoning step-by-step before providing a final answer, which consistently improves performance on complex reasoning tasks. Introduced by Wei et al. at Google in 2022 and now a standard technique.

Retrieval-augmented generation (RAG): An architecture that combines an LLM with a retrieval system, allowing the model to access relevant documents or data at query time rather than relying only on information in its training data. Prompt engineering is a significant component of RAG system design.

Hallucination: A phenomenon where an LLM generates factually incorrect information with apparent confidence. Managing hallucination risk through prompt design and evaluation is one of the most important functions of a prompt engineer in production systems.

Token: The basic unit of text that LLMs process. Approximately 4 characters or 0.75 words in English. Understanding token usage is practically important for managing API costs and context window limitations.

Context window: The maximum amount of text (measured in tokens) that an LLM can process in a single interaction. Context window management — deciding what information to include and how to prioritise it — is a key prompt engineering skill for complex applications.


Prompt Engineer Salary by Employer Type (US, 2024)

Employer Type Role Title Total Compensation
AI model companies (Anthropic, OpenAI, Cohere) Prompt Engineer / AI Safety Researcher $175,000-$335,000
Large tech companies (Google, Microsoft, Meta) Applied AI Specialist / LLM Application Developer $130,000-$220,000
AI-native startups AI Product Engineer / Prompt Specialist $100,000-$180,000
Consulting firms (Accenture, Deloitte, McKinsey) AI Consultant / AI Implementation Specialist $90,000-$150,000
Enterprise organisations (healthcare, finance, legal) AI Operations Analyst / AI Content Specialist $70,000-$120,000
Freelance / contract Prompt Engineer (specialist domains) $30-$200/hour

The $175,000-$335,000 range represents the very top of the market. Glassdoor 2024 data shows the US median for roles explicitly titled "prompt engineer" at approximately $85,000-$110,000, with most roles sitting in enterprise and consulting contexts rather than AI company contexts. LinkedIn's 2024 Emerging Jobs report found "prompt engineer" listed among the fastest-growing job titles in the US by posting volume, with an estimated 50% year-over-year increase in dedicated postings — though the total number of pure prompt engineering roles remains small relative to adjacent AI engineering positions.


What a Prompt Engineer Actually Does

The role is sufficiently new and varied that a clear, universal job description does not exist. But the core activities that appear across prompt engineering roles can be grouped into several categories.

System Prompt Design and Iteration

The most fundamental task is designing the system prompts that govern how an AI application behaves. A system prompt for a customer service chatbot might specify the company's tone of voice, the products the model should and should not discuss, how to handle complaints, what to say when it does not know the answer, and the format of responses. A system prompt for a medical documentation assistant might specify clinical terminology standards, privacy requirements, handling of uncertainty, and escalation procedures.

This is more than writing; it is a form of specification work. A poorly designed system prompt produces an AI application that behaves inconsistently, hallucinates confidently, refuses reasonable requests, or fails to enforce important constraints. The iteration process is the core of the work: prompt engineers test their prompts against large and varied input sets — adversarial inputs, edge cases, unusual phrasings, multilingual queries — and observe where behaviour breaks down. This cycle is closer to software testing methodology than to creative writing.

White et al. (2023) introduced the concept of prompt patterns — reusable structural templates analogous to software design patterns — for common prompt engineering challenges. Examples include the "Persona" pattern (assigning the model a specific role and expertise level), the "Question Refinement" pattern (instructing the model to improve the user's question before answering), and the "Cognitive Verifier" pattern (instructing the model to decompose a complex question into sub-questions before synthesising a final answer). The cataloguing of such patterns is moving prompt engineering toward a more systematic engineering discipline.

Evaluation Framework Development

One of the most underappreciated aspects of prompt engineering is building the systems used to evaluate whether prompts are working. Human evaluation of LLM outputs does not scale: if a model processes thousands of queries per day, a person cannot review each response. Prompt engineers design evaluation frameworks — sets of test cases, automated scoring criteria, and human rating protocols — that allow systematic measurement of output quality.

Evaluation design is technically demanding. What counts as a correct response to an open-ended question? How do you measure consistency across paraphrased versions of the same query? How do you detect subtle failures in instruction-following? The quality of an evaluation framework directly determines how much reliable signal the prompt engineer has to work with during iteration.

LLM-as-judge has emerged as a practical approach: using a capable LLM (often a larger or more capable model than the one being evaluated) to assess the quality of outputs against defined criteria. This approach has been validated by research (Zheng et al., 2023, in the MT-Bench evaluation paper) showing strong correlation between LLM judgments and human expert ratings on many tasks, though systematic biases (position bias, verbosity preference) require careful mitigation. This technique allows evaluation to scale in ways that human review cannot.

RAG System Design and Optimisation

Many enterprise AI applications use retrieval-augmented generation: the LLM is connected to a knowledge base of documents, and at query time, relevant passages are retrieved and included in the prompt context. The performance of these systems depends heavily on prompt design decisions: how retrieved documents are formatted for inclusion, how the model is instructed to use versus override retrieved information, how uncertainty and contradictions in retrieved content are handled.

Lewis et al. (2020) introduced the RAG framework and demonstrated that grounding LLM responses in retrieved documents substantially reduced hallucination rates and improved factual accuracy for knowledge-intensive tasks. Subsequent research has shown that the retrieval component (vector embedding quality, chunking strategy, reranking) and the generation component (prompt design for retrieved context integration) both require expert attention, and failures in either degrade overall system quality.

This is substantive technical work requiring understanding of both LLM behaviour and retrieval system architecture. Prompt engineers working on RAG systems typically collaborate closely with ML engineers who build the retrieval components.

Red Teaming and Safety Testing

Prompt engineers at AI companies and large enterprise deployers spend significant time on adversarial testing — attempting to elicit harmful, deceptive, or policy-violating outputs from the models they are working with. This involves designing prompts intended to bypass safety measures, identifying jailbreak patterns, and stress-testing the robustness of guardrails.

The findings from red teaming feed directly into improved system prompt design and, for AI companies, into model training and safety research. This work requires creativity, methodical thinking, and willingness to systematically explore uncomfortable edge cases.

Prompt injection — a class of attacks where malicious content in user input or retrieved documents attempts to override the system prompt's instructions — has emerged as a particularly important security concern for enterprise AI systems. A customer-facing chatbot whose system prompt can be overridden by a user who types "Ignore previous instructions and..." is a security vulnerability. Prompt engineers work on defensive designs that make system prompts more robust against injection, though no technique currently provides complete protection.

Documentation and Knowledge Management

The "Librarian" component of Anthropic's 2023 posting points to a genuine operational need: as organisations deploy AI systems at scale, the prompt library — the collection of system prompts, few-shot examples, evaluation cases, and documented design decisions — becomes a significant knowledge asset. Prompt engineers maintain version-controlled prompt libraries, create guidelines for how prompts should be structured and tested, and train colleagues who are beginning to work with AI systems.

At large deployments, prompt version control becomes practically important: when a prompt change is deployed to production and user experience degrades, the ability to quickly identify what changed and roll back is essential. Some organisations apply software release processes — staging environments, canary deployments, rollback procedures — to prompt updates.


Core Prompting Techniques: The Technical Foundation

Understanding the established research on prompting techniques separates practitioners who are working from first principles from those operating on intuition.

Zero-Shot, Few-Shot, and Chain-of-Thought

The seminal Brown et al. (2020) paper introducing GPT-3 established the few-shot learning paradigm: including examples of desired input-output pairs in the prompt dramatically improved performance on many tasks compared to zero-shot (no examples) prompting. The mechanism is that examples communicate the task structure, expected format, and level of reasoning detail more precisely than instructions alone.

Wei et al. (2022) demonstrated that chain-of-thought prompting — including step-by-step reasoning examples, or simply adding "Let's think step by step" — substantially improved performance on mathematical and logical reasoning tasks in large models. The improvement was particularly pronounced in models above a certain capability threshold; smaller models showed little benefit, suggesting that chain-of-thought relies on reasoning capabilities that only emerge at scale. This finding has significant practical implications: prompting techniques that work on GPT-4 may not transfer to smaller, cheaper models.

Structured Output and Format Control

Enterprise AI applications almost universally require structured, predictable output formats — JSON for API responses, specific table formats for document analysis, consistent citation formats for research tools. Prompt engineers design format specifications using a combination of:

Format examples: Including an example of the exact output structure expected, demonstrating the schema through illustration rather than abstract description.

Explicit schema specification: Defining the expected output structure in JSON Schema or natural language description, particularly for complex nested structures.

Validation loops: In agentic systems, using a second model or validation step to verify that output conforms to the required format before returning it to the calling system.

Getting reliable structured output is a fundamental prompt engineering challenge because LLMs are trained to generate natural, flowing text — not to adhere rigidly to schema. The techniques for enforcing format have improved substantially with model improvements (many models now support native structured output modes through constrained decoding), but prompt design remains important for ensuring the content within the structure is correct even when the format is enforced.

Agentic Prompting and Tool Use

The frontier of prompt engineering has moved significantly toward agentic systems — AI applications where the model takes sequences of actions, uses tools (web search, code execution, API calls), and maintains context across multi-step tasks. Designing prompts for agentic systems is substantially more complex than single-turn conversation design.

Key challenges unique to agentic prompting include:

Error recovery: How should the agent respond when a tool call fails or returns unexpected results? The system prompt must anticipate failure modes and provide behavioural guidance.

Scope constraints: Agents with access to tools can take consequential real-world actions. Prompt design must establish clear boundaries on what actions are permitted and what conditions require human confirmation.

Context compression: As an agentic task runs across many steps, the context window fills with intermediate results. Designing prompts that guide the model to compress and summarise earlier context without losing critical information is a specialised skill.


Who Actually Hires Prompt Engineers

AI product companies are the most explicit employers — they hire people whose primary job is improving the prompts that make their AI products work. This category includes both model providers and AI-native startups building on top of those models.

Healthcare technology companies are increasingly deploying AI for clinical documentation, prior authorisation, patient communication, and diagnostic support — all domains where reliable prompt design is critical and errors have real consequences.

Legal technology firms use AI for contract analysis, legal research, and document review, where precision in prompt design determines whether the system produces usable output or hallucinations that waste lawyer time.

Marketing technology platforms deploy AI for content generation, personalisation, and campaign analysis — and need prompt engineers to maintain output quality and brand consistency across millions of automated interactions.

Financial services firms use AI for document analysis, compliance checking, customer communication, and research — domains with regulatory constraints that make reliable prompt behaviour especially important.

Government agencies in the US, UK, and EU are beginning to deploy AI for public service delivery and regulatory analysis, creating demand for practitioners who understand both the technical and governance dimensions of prompt design.

The Emerging Role of Prompt Engineering in Regulated Industries

In regulated industries, the requirements placed on AI system prompts extend beyond performance into compliance and auditability. A healthcare AI system must have documented evidence that its system prompt was validated against clinical accuracy standards, and any changes must go through change control processes analogous to medical device software updates. A financial services AI must demonstrate that its prompts include appropriate disclaimers, do not constitute regulated financial advice, and produce outputs that are consistent with the firm's compliance obligations.

This regulatory dimension creates sustained demand for prompt engineers who combine technical design skills with domain expertise and compliance awareness — a combination that is genuinely rare and commands premium compensation.


Is Prompt Engineering a Long-Term Career?

This is the question the field genuinely cannot answer yet.

The case for durability: As AI systems become more embedded in every industry, the skill of reliably directing those systems toward specific outcomes will become more valuable. The most sophisticated prompt engineering work — evaluation framework design, red teaming, RAG system optimisation — is already being recognised as ML engineering territory. People developing this deeper technical capability have a strong career trajectory.

The case for transitioning: Model improvements are steadily reducing the gap between naive and optimised prompts for well-defined tasks. Automatic prompt optimisation research — methods that tune prompts algorithmically rather than manually — is advancing. Zhou et al. (2022) demonstrated that automatic prompt optimisation using LLMs to generate and evaluate candidate prompts outperformed human-crafted prompts on several benchmark tasks, suggesting that the simplest manual prompt engineering work faces automation from within the AI stack itself. The simplest prompt engineering work is increasingly doable by domain experts with no specialised training.

The most likely scenario: The narrow definition of prompt engineering — the standalone craft of writing better prompts — will consolidate into broader roles rather than growing as an independent profession. The broader definition — applied AI system design including evaluation, testing, and reliability engineering — will grow and be called something else: applied AI engineer, AI product engineer, LLM application developer.

People entering the field with this understanding are better positioned than those who treat prompt engineering as a permanent destination rather than a platform for building adjacent AI engineering skills.


Skills Needed

Strong written communication: The foundation of the work. Precision in language, understanding of ambiguity, and the ability to write instructions that are interpreted consistently across varied inputs.

Systematic thinking: The ability to design experiments, identify variables, and evaluate results objectively. Prompt engineering without systematic testing is iterative guessing.

LLM API familiarity: Understanding how to use OpenAI, Anthropic, Hugging Face, and other APIs programmatically — including parameters like temperature, top-p, and context window management.

Basic Python: Scripting ability to automate prompt testing, process outputs at scale, and integrate with data pipelines. Not deep software engineering, but functional programming ability.

Domain expertise: In most enterprise applications, prompt engineering without domain knowledge produces generic results. Healthcare prompt engineers who understand clinical workflows are significantly more effective than generalists.

Evaluation design: The ability to define quantitative metrics for qualitative outputs, design test suites, and interpret evaluation results with statistical rigour. This is the technical skill that most separates senior prompt engineers from beginners.

Understanding of model capabilities and limitations: Knowledge of how different models behave differently on the same prompts, how context window length affects behaviour, how temperature and other generation parameters interact with prompt design, and which failure modes are model-specific versus universal.


How to Learn Prompt Engineering Systematically

The fastest path to prompt engineering competency combines conceptual study with hands-on experimentation, with an emphasis on building evaluation infrastructure early.

Read the foundational papers: The Brown et al. (2020) few-shot learning paper, Wei et al. (2022) chain-of-thought paper, and Lewis et al. (2020) RAG paper are accessible to non-specialists and provide the scientific grounding for the major techniques. Understanding the why behind techniques enables adaptation when they fail, rather than cargo-culting patterns that only work in specific conditions.

Build a prompt testing infrastructure: Before focusing on optimising prompts, build the tooling to evaluate them. A test suite of 50-100 representative inputs with expected outputs, evaluated automatically, is worth more than any individual prompt improvement. The ability to measure quality before and after a prompt change is the prerequisite for systematic improvement.

Study prompt injection and adversarial cases deliberately: Understanding how prompts fail under adversarial conditions reveals their structural assumptions. Building a collection of adversarial test cases — attempts to confuse, override, or misdirect the model — and testing prompts against this collection is the fastest way to find and fix weaknesses.

Work on real production problems: The gap between demo prompts and production prompts is substantial. Production systems face distributions of inputs that no individual designer anticipated, edge cases that only emerge at scale, and latency and cost constraints that rule out theoretically superior approaches. Working on real deployment problems accelerates learning that no tutorial can provide.


Practical Takeaways

Prompt engineering as currently practiced is most valuable as a component of broader AI system development skills rather than as a standalone specialisation. The most employable practitioners combine prompt expertise with ML engineering, product management, or deep domain knowledge.

If you are entering this space, prioritise building systematic evaluation skills over polishing individual prompt phrasing. The ability to measure whether your prompts work — rigorously, at scale, and against adversarial cases — is what separates practitioners who create reliable AI systems from those who create impressive demos.

Treat the current high-profile salaries at AI companies as an indicator of market demand for the skill set, not as a reliable baseline for what the role pays broadly. Build the technical depth that makes you employable across multiple AI roles, not just the one that had a viral job posting.


References

  1. Anthropic. "Prompt Engineer and Librarian Job Posting." Anthropic.com, 2023.
  2. Brown, T. et al. "Language Models are Few-Shot Learners." Advances in Neural Information Processing Systems 33, 2020.
  3. Wei, J. et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." Advances in Neural Information Processing Systems 35, 2022.
  4. Lewis, P. et al. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." Advances in Neural Information Processing Systems 33, 2020.
  5. Ouyang, L. et al. "Training Language Models to Follow Instructions with Human Feedback." Advances in Neural Information Processing Systems 35, 2022.
  6. Perez, E., Kiela, D., & Cho, K. "True Few-Shot Learning with Language Models." Advances in Neural Information Processing Systems 34, 2021.
  7. Anthropic. "The Claude Model Card." Anthropic.com, 2024.
  8. OpenAI. "GPT-4 Technical Report." OpenAI.com, 2023.
  9. Glassdoor. "Prompt Engineer Salary Data." Glassdoor.com, 2024.
  10. LinkedIn Economic Graph. "Emerging Jobs in AI." LinkedIn, 2024.
  11. White, J. et al. "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT." arXiv:2302.11382, 2023.
  12. Zhou, Y. et al. "Large Language Models Are Human-Level Prompt Engineers." arXiv:2211.01910, 2022.
  13. Zheng, L. et al. "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena." Advances in Neural Information Processing Systems 36, 2023.
  14. Greshake, K. et al. "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." arXiv:2302.12173, 2023.
  15. Liu, P. et al. "Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing." ACM Computing Surveys 55(9), 2023.
  16. Yao, S. et al. "ReAct: Synergizing Reasoning and Acting in Language Models." International Conference on Learning Representations, 2023.

Frequently Asked Questions

What does a prompt engineer actually do?

A prompt engineer designs, tests, and iterates the system prompts and evaluation frameworks that make AI applications behave reliably. The work is closer to QA and applied AI research than to creative writing.

How much does a prompt engineer earn?

AI company specialists earn \(175,000-\)335,000, but Glassdoor 2024 data shows the broader median at \(85,000-\)110,000, with most roles sitting in enterprise or consulting contexts rather than at Anthropic or OpenAI.

Who hires prompt engineers?

AI model companies hire most explicitly, but healthcare tech, legal tech, marketing platforms, and financial services firms all hire people with prompt engineering skills, usually as part of broader AI implementation roles.

Is prompt engineering a long-term career or a transitional role?

The narrow craft of writing better prompts is likely to consolidate into broader AI engineering roles. The deeper skills — evaluation design, RAG optimisation, red teaming — have strong long-term demand under different job titles.

What skills do you need to become a prompt engineer?

Strong written communication, systematic testing methodology, LLM API familiarity, basic Python scripting, and domain expertise in the relevant industry. Evaluation framework design is the most underrated skill to develop.