University of Auckland scientists have put commercial AI systems β ChatGPT, Claude, Gemini β through clinical tests to investigate how close they are to achieving human-level vision.
βThe AIs were excellent at recognising objects, but bad at performing much simpler-seeming tasks such as comparing the lengths of lines or comparing unfamiliar shapes,β said the senior author of the study, Dr Katherine Storrs.
Dr Storrs runs the Computational Perception Lab in the School of Psychology.
She believes there may be implications for scientists using the AIs as models of human vision and in robotics where detailed spatial understanding, not just object recognition, is crucial.


The scientists put the AI systems through tests a neurologist might use with a stroke patient.
The PhD student who led the research, Gene Tang, says artificial vision is doing amazing things such as guiding self-driving cars and robots and recognising faces more accurately than humans.
βWhile these innovations have gotten really good, none of them requires the full range of human visual abilities,β he said.

He says sometimes we need to judge the orientation of a single edge (is this picture hung on the wall perfectly level?); sometimes disentangle a cacophony of overlapping shapes (is there something hiding inside that bush?); and sometimes recognise semantic categories (is this painting by Cezanne or Picasso?).
Most of the AI tools people know, such as ChatGPT, Claude and Gemini, are both Large Language Models and Visual-Language Models; you can type into the chat box, but also show a photo and ask questions about it.

βThey are great at recognising familiar objects, but poor at tasks like making fine-grained judgements about particular parts of an image, or comparing unfamiliar shapes,β says Mr Tang.
The VLMs reached the clinical threshold for visual deficits on over a dozen of the simple and intermediate tests.
βThe particular pattern of deficits would be very unusual for a human,β says Dr Storrs.
βWhen people have selective problems with certain visual functions, itβs usually our processing of complex things like faces.β
A human would probably be diagnosed with visual form agnosia, a broad term for difficulties in interpreting visual shapes and patterns, which can be caused by brain damage to the visual cortex, she says.
Usually, however, it also involves difficulties in recognising objects.
There are case reports of people who successfully recognise real-world objects but struggle with seemingly βsimplerβ tasks like identifying geometric shapes or disentangling two overlapping shapes, which is similar to the pattern of difficulties shown by the VLMs.
βBut this is an incredibly rare pattern of deficits in a human,β says Dr Storrs.
One explanation for the VLM limitations is that their training data include plenty of named objects but fewer captions that precisely describe the size, shape, and position of lines in an image β the sort of information needed to perform a lot of the βsimplerβ visual tasks, she says.
Their article Visual language models show widespread visual deficits on neuropsychological tests was published in Nature Machine Intelligence.


