Antara Chakraborty, M. Optom. FIAO

Assistant Professor, KD Institute of Optometry, Ahmedabad, India

 

In recent years, Artificial Intelligence (AI) has progressed from simple picture identification to complicated visual reasoning, increasingly mimicking how the human brain processes vision. But as machines come to resemble us more, a compelling question arises: might they hallucinate like us? Just as the human brain can create or misinterpret visual information, AI systems can produce confident but erroneous results due to distorted visual encoding, attention difficulties, or deceptive inputs. These shared vulnerabilities call into question our concept of perception itself, whether in biology or artificial systems. (1–5)

Mimicking the Visual Cortex

Human vision is structured hierarchically, beginning with fundamental feature identification and progressing to semantic interpretation. A study that mapped fMRI (Functional Magnetic Resonance Imaging) data from the visual brain to AI models discovered a near-perfect correspondence: low-level activities corresponded to early cortical regions, whereas 3D and semantic processing corresponded to higher-order areas. (1) This implies that deep neural networks are functionally matched with our visual architecture of the brain.

The Anatomy of Hallucination

Despite this, AI frequently hallucinates. Large Vision-Language Models (LVLMs) may produce complex responses that do not match the actual input. Hallucinations occur during critical stages such as “visual enrichment” and “semantic refinement,” in which object properties are translated into language. When attention mechanisms fail, hallucinated content appears. (3)

To overcome this, researchers created VaLiD (Visual Layer Fusion Contrastive Decoding), a revolutionary training-free approach that targets hallucination at its source – the visual encoding stage. VaLiD works by combining representations from several visual layers, driven by uncertainty ratings, to maintain accurate features and reduce information distortion. It then uses a contrastive decoding mechanism to prioritise text outputs that correspond to the genuine visual information. This technique considerably increases model reliability, lowers hallucinations across benchmark datasets, and does so without incurring additional computational training expenses, making it suitable for real-world deployment. (2)

Human vs AI Vision and Hallucinations

It combines insights from neuroscience and AI research to demonstrate how hallucinations occur, what causes them, and how each system reacts to visual illusion.

Aspect Human Vision AI Vision Models
Processing Architecture Visual cortex (V1 to ventral/dorsal streams) (Convolutional Neural Network) CNNs + Transformers with hierarchical layers
Hallucination Trigger Vision loss, neurological dysfunction Middle-layer distortion, attention mismatch
Correction Mechanism Neuroplasticity, external feedback VaLiD: Visual layer fusion
Deception Handling Contextual inference, multisensory support Limited – fails under synthetic traps

Table 1: The Table Demonstrates the Similarities and Differences Between Human Visual Processing and AI Vision Models.

Beyond Accuracy: Why True Vision Requires Meaning

While Large Vision-Language Models (LVLMs) perform well on typical image tasks, their accuracy drops dramatically when presented with false or ambiguous visuals, indicating a lack of contextual richness in their reasoning. (4) This is similar to how the human brain might misfire under ambiguity, as seen in conditions such as Charles Bonnet Syndrome, where visual gaps are filled with imagined images. A larger neuroscientific approach emphasises that vision is more than just data decoding; it is based on context, memory, and meaning. (5) To completely imitate human perception, AI must progress from pattern recognition to cognitive comprehension, which includes not just what is seen but also why and how it is perceived.

Conclusion

From understanding brain-based scene processing to correcting AI hallucinations, detecting visual deception, and integrating cognitive insights, the road ahead is clear: we must build AI that not only sees but understands. (1–5) The more we decode human vision, the closer we get to designing machines that perceive ethically, accurately, and adaptively.

In the end, vision is not just about seeing but interpreting reality. And whether through a cortex or a codebase, both humans and machines are still learning how to see the truth.

References

  1. Dwivedi, K., Bonner, M. F., Cichy, R. M., & Roig, G. (2021). Unveiling functions of the visual cortex using task-specific deep neural networks. PLoS computational biology, 17(8), e1009267.
  2. .Wang, J., Gao, Y., & Sang, J. (2024). Valid: Mitigating the hallucination of large vision language models by visual layer fusion contrastive decoding. arXiv preprint arXiv:2411.15839.
  3. Jiang, Z., Chen, J., Zhu, B., Luo, T., Shen, Y., & Yang, X. (2025). Devils in middle layers of large vision-language models: Interpreting, detecting and mitigating object hallucinations via attention lens. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 25004-25014).
  4. Ping, L., Gu, Y., & Feng, L. (2024). Measuring the visual hallucination in chatgpt on visually deceptive images. Preprint.
  5. .Zhang, M., Tang, E., Ding, H., & Zhang, Y. (2024). Artificial intelligence and the future of communication sciences and disorders: A bibliometric and visualization analysis. Journal of Speech, Language, and Hearing Research, 67(11), 4369-4390.