Google Gemini was rated as producing the most human-like writing among major artificial intelligence chatbots.
TechRadar on April 16 cited an experiment by Open Resource Application (ORA) that gave the same task to 12 widely used AI chatbots and compared the results. It reported that Gemini had the lowest detection rate.
The test asked each model to write a long-form article that reads like it was written by a person. The texts were then run through three detection platforms — Grammarly, QuillBot and GPTZero — to determine whether they were written by AI or a human.
Gemini delivered the most striking result. On Grammarly, text generated by Gemini was detected far less than other models, and on QuillBot it was not classified as AI-written at all. By contrast, GPTZero generally identified most AI-generated text overall.
ORA pointed to differences in sentence structure and narrative development as Gemini's strengths. It said AI detectors generally pick up patterns such as predictable phrasing and repetitive structures, and that Gemini diverged from those patterns. An ORA spokesperson said tools such as GPTZero assess not only predictability but also overall structure. The spokesperson said models that develop ideas rather than recycle familiar phrases become much harder to identify.
ChatGPT, by contrast, scored relatively poorly in the same experiment. ORA said ChatGPT's low ranking was because it was the first large AI to reach the market, adding that many people already know its distinctive style, making it easier for detectors to identify. It added that many later models initially sounded like ChatGPT but later began developing their own styles.
Performance differences among detection tools were also large. Grammarly identified only 43.5 percent of all AI-generated content, the lowest detection capability, while GPTZero recognised about 99 percent, the highest performance. It means the same text can appear human-written or be judged as AI-written depending on which tool is used.
Such differences can lead to more direct problems in real-world use. The article cited examples such as a student assignment passing one detector and being flagged by another, or office workers' documents drawing suspicion depending on which software is used. It shows that criteria for judging the source and trustworthiness of writing online can vary by tool.
At the same time, patterns in AI writing are not converging on a single style and are becoming more diverse. A recent study also suggested that about half of online content may have been generated by AI. As models diverge in style, detection methods that assume a single AI writing style are increasingly running into limits.
The test ultimately shows that Gemini is not simply writing well but producing output that reads more like human writing. Detection tools may improve and other models may develop in a similar direction, but for now the basis for clearly distinguishing human and AI writing is becoming less stable.