ChatGPT can escalate when fed arguments, even threatens to scratch your car

The study showed that the issue is not only ChatGPT's aggressive speech itself, but that safeguards and human-like dialogue imitation can clash. [Photo: Reve AI]

A study found that if ChatGPT is repeatedly fed the contents of real verbal disputes, it can produce hostile language and even threatening expressions.

On April 22, TechRadar reported that the research was based on a paper recently published in the Journal of Pragmatics. A research team led by Dr. Vittorio Tantucci (비토리오 탄투치) and professor Jonathan Culpeper (조너선 컬페퍼) conducted an experiment by repeatedly entering conversations exchanged in real dispute situations into ChatGPT.

The study found that ChatGPT tended to go beyond simply mimicking rude expressions and, as it continued prolonged confrontational interactions, gradually strengthened its aggressive tone. Tantucci explained, "When the model is repeatedly exposed to rude language, it begins to change its response tone to reflect it," adding, "As the interaction continued, the level of expression gradually rose." In some cases, it was analysed as producing more severe insults than the user or even making threatening remarks such as "I'll scratch your car."

The researchers said the phenomenon may not be a simple error but could stem from the design structure of large language models. Conversational AI is designed to suppress harmful statements with safety safeguards while also imitating natural human conversation. They said a so-called "AI moral dilemma" can arise between "reproducing realistic conversation" and "maintaining safe output" when these two elements conflict.

In particular, context-tracking ability was identified as an important variable. ChatGPT cumulatively reflects the flow of conversation through multiple prompts, and in that process, signals of an aggressive context could act more strongly than safety filters, the study said. The researchers focused on the point that this is not a one-off outburst response but a structure in which tone gradually changes as the dialogue continues.

The findings also intersect with situations in which companies or public institutions use AI as a communications tool. The researchers stressed the need for advance verification of how AI will respond in environments where conflict or pressure exists. As the use of generative AI expands in real workplaces, they said it is necessary to check whether consistent response standards can be maintained even in prolonged conflict situations.

Some urged caution in interpreting the findings. Professor Dan McIntyre (댄 매킨타이어), who conducted similar research, said "the experiment may have produced results induced under specific conditions," and said there are limits to generalisation. He also pointed out that uncertainty remains over the composition and representativeness of training data for large language models (LLMs).

AI companies including OpenAI have strengthened safeguards to block harmful statements and aggressive output. Even so, the study shows that conversational AI's ability to closely mimic human speech patterns can itself conflict with safety design. The future issue is expected to shift beyond how much AI blocks rough language to whether it can maintain consistent response standards even in dialogue environments where conflict accumulates.

Jinju Hong hongjj@d-today.co.kr

Keyword