A prompt that defines a large language model (LLM) as "you are an expert in this field" could instead reduce accuracy, a study found.
On May 3, online media outlet Gigazine reported that a research team led by Jizao Hu at the University of Southern California tested the impact of "expert persona prompts" on six AI models and found performance declined in areas such as coding and math.
Previous studies have suggested that assigning an expert role related to a specific task could improve performance. For example, when asked to explain birds, an AI given the role of a bird expert could provide better answers than one assigned the role of a car expert. As this view spread, prompt guides also emerged that encourage AI to take on an expert role on its own.
Hu's team applied different prompts to six models, including Llama-3.1-8B and Qwen2.5-7B, and compared benchmark performance. The tests used both short instructions such as "you are a software engineer" and longer instructions that stressed deep expertise and extensive experience in a specific field.
Results varied by task. On MT-Bench, which measures multi-turn dialogue performance, complex expert prompts partly improved output quality in writing and reasoning. In coding, math and humanities, quality instead fell. The team also found an overall performance decline on MMLU, which evaluates broad knowledge accuracy. The researchers judged that the instruction "you are an expert" did not guarantee better answers.
The team pointed to a resource-allocation problem as a factor behind the pattern. Hu's researchers explained that an instruction to become an expert could make the model use abilities that should be used to recall facts for following the instruction instead. The model does not gain new expert knowledge, but could see accuracy wobble as it spends compute resources meeting a formal demand to act like an expert.
In coding in particular, the results ran counter to common assumptions. The researchers said that even if AI is told it is a skilled programmer, the quality or usefulness of code does not improve. They added that conveying project requirements more specifically helps generate code users want. That means clearly stating task conditions and output standards may be more effective than assigning a role.
The expert persona prompt was not entirely negative. The study also found potential improvements in AI alignment, particularly in controlling responses to match human ethical standards. On JailbreakBench, which evaluates how well a model blocks unethical content, it showed a large improvement. The results suggested that accuracy and alignment may not move in the same direction.
The findings could also affect prompt design practices. Some users have widely believed that giving a model an expert identity first improves performance, and there have been many related guides. The experiment showed that this approach can backfire depending on the type of task.
As a result, for tasks where correct-answer accuracy matters, such as coding assistance or solving math problems, it may become more effective to present needed formats, constraints and project requirements in detail rather than stressing an "expert role". By contrast, in settings where safety control is important, there remains room for using an expert persona as a supporting tool.
The study shows that what matters in prompt design is not which role a model is given, but how clearly a user conveys the conditions of the desired task. The more a task requires accurate answers, the more a method that specifies problem scope, output format and evaluation criteria could lead to more stable results than a broad instruction to "answer like an expert".