In 2023, a job posting from Anthropic advertised a "Prompt Engineer and Librarian" position with a salary range of $175,000-$335,000. The posting went viral, generating equal parts fascination and scepticism. Fascination because the salary range was startling for a role that seemed, on its surface, to involve writing instructions to a chatbot. Scepticism because critics questioned whether prompt engineering was a real discipline or an artefact of a transitional moment in AI development that would disappear as models improved.

Both reactions were partially right. Prompt engineering is a real and consequential skill in the current moment of AI deployment. The gap between an ad hoc interaction with a large language model and a carefully engineered system prompt, with examples, guardrails, and structured output requirements, can be the difference between a product that works reliably and one that fails unpredictably. At the same time, the most sceptical critics have a point: the portions of prompt engineering that involve manually crafting better phrasing are genuinely being automated by model improvements, and the long-term trajectory of the role is genuinely uncertain in ways that most in-demand technical skills are not.

This article provides an honest account of what prompt engineers actually do, which parts of the job description are substantive and which are marketing, what the genuine salary market looks like (rather than the top of the range cited in viral posts), who actually hires for this role, and the unresolved question of whether prompt engineering is a durable career or a transitional specialisation.

"The most important skill in prompt engineering is not knowing the perfect phrasing — it's knowing how to systematically test what works and why." — Common observation in applied AI circles


Key Definitions

Large language model (LLM): A neural network trained on large text datasets that can generate text, answer questions, summarise documents, write code, and perform many other language-based tasks. Examples include GPT-4, Claude, Gemini, and Llama.

System prompt: Instructions given to an LLM before the user conversation begins, typically invisible to the end user. The system prompt establishes the model's persona, constraints, output format, and behavioural guidelines for an application.

Few-shot prompting: A prompting technique in which examples of desired input-output pairs are included in the prompt, teaching the model the pattern to follow through demonstration rather than instruction.

Chain-of-thought prompting: A technique that asks the model to show its reasoning step-by-step before providing a final answer, which consistently improves performance on complex reasoning tasks.

Retrieval-augmented generation (RAG): An architecture that combines an LLM with a retrieval system, allowing the model to access relevant documents or data at query time rather than relying only on information in its training data. Prompt engineering is a significant component of RAG system design.


What a Prompt Engineer Actually Does

The role is sufficiently new and varied that a clear, universal job description does not exist. But the core activities that appear across prompt engineering roles — whether explicit or embedded in other titles — can be grouped into several categories.

System Prompt Design and Iteration

The most fundamental task is designing the system prompts that govern how an AI application behaves. A system prompt for a customer service chatbot might specify the company's tone of voice, the products the model should and should not discuss, how to handle complaints, what to say when it does not know the answer, and the format of responses. A system prompt for a medical documentation assistant might specify clinical terminology standards, privacy requirements, handling of uncertainty, and escalation procedures.

This is more than writing; it is a form of specification work. A poorly designed system prompt produces an AI application that behaves inconsistently, hallucinates confidently, refuses reasonable requests, or fails to enforce important constraints. A well-designed system prompt produces reliable, predictable behaviour that supports the application's intended purpose.

The iteration process is the core of the work. Prompt engineers test their prompts against large and varied input sets — adversarial inputs, edge cases, unusual phrasings, multilingual queries — and observe where behaviour breaks down. They then modify the prompt to address failures and retest. This cycle is closer to software testing methodology than to creative writing, and practitioners who are systematic about it produce markedly better results than those who rely on intuition alone.

Evaluation Framework Development

One of the most underappreciated aspects of prompt engineering is building the systems used to evaluate whether prompts are working. Human evaluation of LLM outputs does not scale: if a model processes thousands of queries per day, a person cannot review each response to check quality. Prompt engineers design and implement evaluation frameworks — sets of test cases, automated scoring criteria, and human rating protocols — that allow systematic measurement of output quality.

Evaluation design is technically demanding. What counts as a correct response to an open-ended question? How do you measure consistency across paraphrased versions of the same query? How do you detect subtle failures in instruction-following? These are not simple questions, and the quality of an evaluation framework directly determines how much reliable signal the prompt engineer has to work with during iteration.

At AI companies, this work overlaps significantly with ML evaluation research — a domain that requires both statistical understanding and deep familiarity with the failure modes of specific models.

RAG System Design and Optimisation

Many enterprise AI applications use retrieval-augmented generation: the LLM is connected to a knowledge base of documents, and at query time, relevant passages are retrieved and included in the prompt context. The performance of these systems depends heavily on prompt design decisions: how retrieved documents are formatted for inclusion, how the model is instructed to use versus override retrieved information, how uncertainty and contradictions in retrieved content are handled, and how the model should behave when relevant documents are not found.

This is substantive technical work that requires understanding both the LLM's behaviour and the structure of the retrieval system. Prompt engineers working on RAG systems typically collaborate closely with ML engineers who build the retrieval components.

Red Teaming and Safety Testing

Prompt engineers at AI companies and large enterprise deployers spend significant time on adversarial testing — attempting to elicit harmful, deceptive, or policy-violating outputs from the models they are working with. This involves designing prompts intended to bypass safety measures, identifying jailbreak patterns, and stress-testing the robustness of guardrails.

The findings from red teaming feed directly into improved system prompt design and, for AI companies, into model training and safety research. This work requires creativity, methodical thinking, and a willingness to systematically explore edge cases that are uncomfortable by design.

Documentation and Knowledge Management

The "Librarian" component of Anthropic's famous 2023 job posting points to a genuine operational need: as organisations deploy AI systems at scale, the prompt library — the collection of system prompts, few-shot examples, evaluation cases, and documented design decisions — becomes a significant knowledge asset that requires management.

Prompt engineers document their design decisions, maintain version-controlled prompt libraries, create guidelines for how prompts should be structured and tested within their organisation, and train colleagues who are beginning to work with AI systems. This knowledge management function is more significant in larger organisations where many people are writing prompts independently.


Salary Reality

The Anthropic posting that went viral in 2023 represented the very top of the market — a specialised AI safety company paying for rare expertise. Understanding the actual salary distribution requires looking at the broader population of roles.

AI companies (Anthropic, OpenAI, Cohere, AI21 Labs, Mistral): Specialist prompt engineering roles at these companies pay $150,000-$300,000+ in total compensation, reflecting both the specialised nature of the work and the general compensation inflation in AI hiring. These roles are rare and competitive.

Large technology companies deploying AI (Google, Microsoft, Meta, Salesforce, Adobe): Enterprise AI roles involving significant prompt engineering work — often titled "AI Engineer," "Applied AI Specialist," or "LLM Application Developer" rather than "Prompt Engineer" — pay $130,000-$220,000 in total compensation at most levels.

Consulting and professional services firms (Accenture, Deloitte, McKinsey): AI-focused roles in consulting that include prompt engineering work typically pay $90,000-$150,000, aligned with general consulting compensation bands rather than technology company pay scales.

Enterprise organisations deploying AI tools (healthcare, finance, legal, marketing): Roles in these sectors that involve AI implementation and prompt work are often part of broader technology or operations roles paying $70,000-$120,000, with the prompt engineering component being one of several responsibilities.

Freelance and contract work: Platforms like Upwork and Toptal show wide variation, from $30/hour for basic prompt writing to $150-200/hour for specialists with demonstrable track records in specific domains.

The honest summary: the $175,000-$335,000 range from the viral posting is real for top-of-market specialist roles at leading AI companies, but the median prompt engineering position — to the extent one exists — pays significantly less and is often embedded in broader AI engineering or content operations roles rather than existing as a standalone function.


Who Actually Hires Prompt Engineers

AI product companies are the most explicit employers — they hire people whose primary job is improving the prompts that make their AI products work. This category includes both the model providers and the hundreds of AI-native startups building on top of those models.

Healthcare technology companies are increasingly deploying AI for clinical documentation, prior authorisation, patient communication, and diagnostic support — all domains where reliable prompt design is critical and errors have real consequences.

Legal technology firms use AI for contract analysis, legal research, and document review, where precision in prompt design determines whether the system produces usable output or hallucinations that waste lawyer time.

Marketing technology platforms deploy AI for content generation, personalisation, and campaign analysis — and need prompt engineers to maintain output quality and brand consistency across millions of automated interactions.

Financial services firms use AI for document analysis, compliance checking, customer communication, and research — domains with regulatory constraints that make reliable prompt behaviour especially important.

Government agencies in the US, UK, and EU are beginning to deploy AI for public service delivery, benefits processing, and regulatory analysis, creating demand for practitioners who understand both the technical and governance dimensions of prompt design.


Is Prompt Engineering a Long-Term Career?

This is the question the field genuinely cannot answer yet, and honest practitioners will say so. The arguments on both sides are substantive.

The case for durability: As AI systems become more embedded in every industry, the skill of reliably directing those systems toward specific outcomes will become more valuable rather than less. The most sophisticated prompt engineering work — evaluation framework design, red teaming, RAG system optimisation — is already being recognised as ML engineering territory with a prompt specialisation, not as a stand-alone craft. People developing this deeper technical capability have a strong career trajectory.

The case for transitioning: Model improvements are steadily reducing the gap between naive and optimised prompts for well-defined tasks. Automatic prompt optimisation research (methods that tune prompts algorithmically rather than manually) is advancing. The simplest prompt engineering work — writing better instructions for a single task — is increasingly doable by domain experts with no specialised training, and may not sustain dedicated roles.

The most likely scenario: The narrow definition of prompt engineering — the standalone craft of writing better prompts — will likely consolidate into broader roles rather than growing as an independent profession. The broader definition — applied AI system design including evaluation, testing, and reliability engineering — will grow and be called something else (applied AI engineer, AI product engineer, LLM application developer).

People entering the field now with this understanding are better positioned than those who treat prompt engineering as a permanent destination rather than a platform for building adjacent AI engineering skills.


Skills Needed

Strong written communication: The foundation of the work. Precision in language, understanding of ambiguity, and the ability to write instructions that are interpreted consistently are core.

Systematic thinking: The ability to design experiments, identify variables, and evaluate results objectively. Prompt engineering without systematic testing is iterative guessing.

LLM API familiarity: Understanding how to use OpenAI, Anthropic, Hugging Face, and other APIs programmatically — including parameters like temperature, top-p, and context window management — is increasingly expected.

Basic Python: Scripting ability to automate prompt testing, process outputs at scale, and integrate with data pipelines. Not deep software engineering, but functional programming ability.

Domain expertise: In most enterprise applications, prompt engineering without domain knowledge produces generic results. Healthcare prompt engineers who understand clinical workflows, or legal technology practitioners who understand legal reasoning, are significantly more effective than generalists.


Practical Takeaways

Prompt engineering as currently practiced is most valuable as a component of broader AI system development skills rather than as a standalone specialisation. The most employable practitioners combine prompt expertise with ML engineering, product management, or deep domain knowledge.

If you are entering this space, prioritise building systematic evaluation skills over polishing individual prompt phrasing. The ability to measure whether your prompts work — rigorously, at scale, and against adversarial cases — is what separates practitioners who create reliable AI systems from those who create impressive demos.


References

  1. Anthropic. "Prompt Engineer and Librarian Job Posting." Anthropic.com, 2023.
  2. Brown, T. et al. "Language Models are Few-Shot Learners." Advances in Neural Information Processing Systems, 33, 2020.
  3. Wei, J. et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." Advances in Neural Information Processing Systems, 35, 2022.
  4. Lewis, P. et al. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." Advances in Neural Information Processing Systems, 33, 2020.
  5. Ouyang, L. et al. "Training Language Models to Follow Instructions with Human Feedback." Advances in Neural Information Processing Systems, 35, 2022.
  6. Perez, E., Kiela, D., & Cho, K. "True Few-Shot Learning with Language Models." Advances in Neural Information Processing Systems, 34, 2021.
  7. Anthropic. "The Claude Model Card." Anthropic.com, 2024.
  8. OpenAI. "GPT-4 Technical Report." OpenAI.com, 2023.
  9. Glassdoor. "Prompt Engineer Salary Data." Glassdoor.com, 2024.
  10. LinkedIn Economic Graph. "Emerging Jobs in AI." LinkedIn, 2024.
  11. White, J. et al. "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT." arXiv preprint arXiv:2302.11382, 2023.
  12. Zhou, Y. et al. "Large Language Models Are Human-Level Prompt Engineers." arXiv preprint arXiv:2211.01910, 2022.

Frequently Asked Questions

What does a prompt engineer actually do?

A prompt engineer designs, tests, and refines the inputs given to large language models (LLMs) to produce reliable, accurate, and useful outputs. This includes writing system prompts, building evaluation frameworks, creating few-shot examples, testing edge cases, and iterating based on failure modes. The role sits between technical writing, QA, and applied AI research.

How much does a prompt engineer earn?

Salaries vary widely by employer and seniority. Specialist prompt engineer roles at AI companies like Anthropic, OpenAI, or Cohere have advertised \(175,000-\)300,000. Broader roles at enterprises using AI tools typically pay \(80,000-\)130,000 for people combining prompt work with adjacent duties. The high end is real but rare and usually requires strong adjacent technical skills.

Who hires prompt engineers?

AI companies hire for the role most explicitly. Enterprise technology departments, consulting firms, healthcare technology companies, legal tech firms, and marketing technology organisations also hire people with prompt engineering skills, often as part of broader AI integration or content operations roles rather than pure prompt engineering positions.

Is prompt engineering a long-term career or a transitional role?

This is genuinely contested. Optimists argue that as AI systems become more central to software, the specialised skill of reliably directing them will become more valuable. Sceptics note that model improvements and better tooling are automating the simple parts of the job, and that the role may consolidate into broader ML engineering, product management, or domain expert roles within 5-10 years.

What skills do you need to become a prompt engineer?

Strong written communication and structured thinking are essential — the core task is writing clearly and systematically. Technical familiarity with LLM APIs (OpenAI, Anthropic, Hugging Face), understanding of tokenisation and context windows, basic Python scripting for automation, and domain expertise in the relevant industry all add significant value over writing skill alone.