Saturday, February 21, 2026

Auckland researchers find blindspots in AI vision

University of Auckland scientists have put commercial AI systems – ChatGPT, Claude, Gemini – through clinical tests to investigate how close they are to achieving human-level vision.

β€œThe AIs were excellent at recognising objects, but bad at performing much simpler-seeming tasks such as comparing the lengths of lines or comparing unfamiliar shapes,” said the senior author of the study, Dr Katherine Storrs.

Dr Storrs runs the Computational Perception Lab in the School of Psychology.

She believes there may be implications for scientists using the AIs as models of human vision and in robotics where detailed spatial understanding, not just object recognition, is crucial.

Dr Katherine Storrs and PhD student Gene Tang.
The images here are an outline of a banana and a silhouette of a bear. The AI tools recognised images such as these despite the lack of detail.
Here’s the easy stuff. The AI tools recognise images such as the banana and the bear despite the lack of detail. Their training has equipped them to pick up on the shapes.

The scientists put the AI systems through tests a neurologist might use with a stroke patient.

The PhD student who led the research, Gene Tang, says artificial vision is doing amazing things such as guiding self-driving cars and robots and recognising faces more accurately than humans.

β€œWhile these innovations have gotten really good, none of them requires the full range of human visual abilities,” he said.

The images are ones that the AI systems struggle with: Recognising which of the bottom two blobby shapes matches the top one and deciphering the overlapping shapes.

He says sometimes we need to judge the orientation of a single edge (is this picture hung on the wall perfectly level?); sometimes disentangle a cacophony of overlapping shapes (is there something hiding inside that bush?); and sometimes recognise semantic categories (is this painting by Cezanne or Picasso?).

Most of the AI tools people know, such as ChatGPT, Claude and Gemini, are both Large Language Models and Visual-Language Models; you can type into the chat box, but also show a photo and ask questions about it.

The images are of pictures used to test AI models. The AI models performed poorly in assessing whether the gaps in two circles were in the same place, but well in assessing vague drawings of spoons and a fork.
The AI models performed poorly in assessing whether the gaps in the circles were in the same place, but they were good at matching the two spoons, despite the lack of detail.

β€œThey are great at recognising familiar objects, but poor at tasks like making fine-grained judgements about particular parts of an image, or comparing unfamiliar shapes,” says Mr Tang.

The VLMs reached the clinical threshold for visual deficits on over a dozen of the simple and intermediate tests.

β€œThe particular pattern of deficits would be very unusual for a human,” says Dr Storrs.

β€œWhen people have selective problems with certain visual functions, it’s usually our processing of complex things like faces.”

A human would probably be diagnosed with visual form agnosia, a broad term for difficulties in interpreting visual shapes and patterns, which can be caused by brain damage to the visual cortex, she says.

Usually, however, it also involves difficulties in recognising objects.

There are case reports of people who successfully recognise real-world objects but struggle with seemingly β€œsimpler” tasks like identifying geometric shapes or disentangling two overlapping shapes, which is similar to the pattern of difficulties shown by the VLMs.

β€œBut this is an incredibly rare pattern of deficits in a human,” says Dr Storrs.

One explanation for the VLM limitations is that their training data include plenty of named objects but fewer captions that precisely describe the size, shape, and position of lines in an image – the sort of information needed to perform a lot of the β€œsimpler” visual tasks, she says.

Their article Visual language models show widespread visual deficits on neuropsychological tests was published in Nature Machine Intelligence.

Latest Articles