Insights from infancy for the future of AI

BY CAO MENGXUE and WANG ZHIWEI | 07-17-2025
Chinese Social Sciences Today

The early phase of human development may offer inspiring insights into AI studies. Photo: TUCHONG


At around nine months of age, a profound transformation quietly unfolds in the cradle. American developmental psychologist Michael Tomasello refers to this pivotal moment as the “nine-month revolution” in human cognitive development—a turning point at which infants begin to shift from isolated individuals to participants in shared intentionality, laying the foundation for cultural learning. We refer to this process as the “nine-month transformation,” as it signifies not merely biological growth, but a reorganization of the cognitive architecture itself. As artificial intelligence (AI) strives to cross the chasm from being a mere tool to becoming a genuine partner, this early phase of human development may offer crucial insights.


‘Nine-month transformation’ and cognitive big bang

The “nine-month transformation” in human cognitive development is marked by the emergence of four core abilities: joint attention, intention understanding, pointing behavior, and imitation. These nascent capacities serve as the four foundational pillars of mental development, enabling infants to build bridges to language, culture, and social life.


Joint Attention: From “You and Me” to “Us and the World.” Joint attention refers to the ability of two individuals to focus on the same external object while being mutually aware of each other’s engagement. Though seemingly simple, this ability represents a major milestone in cognitive development. It signals the infant’s dawning awareness that others have minds and that attention can be shared. Within the triadic interaction among infant, caregiver, and the environment, infants begin to take an active role in directing communication. They intentionally draw others’ attention to things they find meaningful and gauge others’ reactions in return. This capacity plays a pivotal role in language acquisition: By linking the sounds they hear with the objects and events they see, infants begin to map out correspondences between phonological forms, information structures, and real-world referents. In this sense, joint attention offers a kind of cognitive coordinate system—a three-dimensional map that aligns linguistic form, structural pattern, and conceptual meaning.


Understanding Intentions: The Little Detective Behind the Behavior. As infants grow, they gradually come to realize that others’ actions are not merely physical movements but expressions of underlying mental states and goals. This shift transforms the infant from passive observer to active interpreter of behavior, laying the foundation for complex social interaction. Rather than simply imitating others, infants begin to anticipate their intentions, engage in cooperative activity, and develop the ability to “read minds.” Around the same time, infants begin to detect patterns in vocal cues. For example, prosodic features help signal intention: Falling intonation may be interpreted as default or background information, while rising or emphatic tones often serve to highlight novelty or invite attention.


Pointing Behavior: The First Words in the Cradle. With the emergence of subjective awareness, infants begin to point at objects, often accompanied by eye contact or vocalizations. These gestures go beyond mere requests (imperative pointing) to include declarative pointing—in which infants attempt to draw others into joint attention. Pointing is a crucial precursor to language. When ignored by caregivers, infants often persist or escalate their efforts, demonstrating an expectation of social reciprocity. Pointing marks the beginning of mental transparency: It reflects the social drive underpinning language in reality.


Imitation: The Beginning of Social Play. Imitation is a powerful mechanism through which infants learn social behavior. Yet it is not mere mimicry—it is a form of role-play that involves enacting the mental state of another. Through imitation, higher-order social abilities like humor begin to emerge. Infants might make funny faces or deliberately break small rules to amuse adults. This “intentional imitation” signals entry into a mode of learning that is deeply social. It is also the key that opens the door to culture. Through repeated interactive exchanges—“you do, I learn”—infants absorb social customs, communication norms, and behavioral conventions. This lays the groundwork for the phonetic imitation crucial to early language acquisition.


Pathways and philosophies of AI

In the contemporary landscape of AI, two fundamentally divergent paths have emerged: One prioritizes output optimization, focusing on finding the best solutions; the other emphasizes process modeling, aiming to reconstruct the generative logic of intelligence itself. Which path holds greater promise for achieving true intelligence? The answer may lie in the transformative leap that occurs in human infants at around nine months of age.


One dominant approach in AI relies on black-box methodologies grounded in large-scale data training. These systems emphasize input-output mappings to achieve optimal solutions, often at the expense of process transparency or model interpretability. Large language models developed through this paradigm can produce human-like linguistic output across a range of tasks. Yet they lack genuine comprehension of meaning and do not possess a coherent, structured model of the world. Their capabilities stem from efficient pattern extraction and recombination, rather than from internally constructed cognitive models.


The success of this black-box approach can be misleading: It suggests that if a system can mimic human behavior, it must possess human-like intelligence. British-Canadian cognitive psychologist Geoffrey Hinton, often called the “Godfather of AI,” argues that if language models can perform complex tasks, we need not overly concern ourselves with their internal mechanisms. He even proposes that such models might constitute an entirely new kind of cognitive architecture. In contrast, a second school of thought emphasizes process construction and cognitive modeling. This approach holds that true intelligence is not defined by the mapping from input to output alone, but rather by the complicated and dynamic processing that occurs in between. Meta’s Chief AI Scientist, Yann LeCun, asserts that for AI to reach the cognitive level of animals—or humans—it must learn in ways akin to how infants make sense of the world.


What the “nine-month transformation” reveals is not a simple accumulation of behavioral skills through experience, but a fundamental reorganization of the cognitive architecture itself. For AI to move toward authentic intelligence, it must not only reproduce results but also understand and replicate processes; it must not merely imitate superficial behaviors but internalize their underlying mechanisms. Although this path may be slower and more challenging, it more closely mirrors the nature of human cognition.


Three core lessons for AI

The early development of infant intelligence offers three key insights into the nature of intelligence and its implications for AI.


First, intelligence is driven by qualitative transformation, rather than quantitative accumulation. Cognitive breakthroughs in infancy are not the result of increasing input, but of reorganizing the mind’s internal structure. Human intelligence is characterized not by information processing per se, but by the ability to perceive and understand others’ intentions, focus, and emotional states. Thus, AI systems that merely replicate human language or behavior fall short. Real intelligence must involve inferring, interpreting, and even anticipating others’ mental states in interactions.


Human cognition is inherently “multimodal”: We naturally integrate speech, visuals, gestures, and actions. For AI, however, integrating such multimodal inputs demands more than synchronization algorithms. It also requires developing preferences and inferential abilities to determine which types of information should be integrated—and how. To become truly intelligent, AI must develop the ability to understand minds in interactions—not just decode words literally.


Second, intelligence emerges through active construction, not passive reaction. Infants transition from being reactive to external stimuli to becoming proactive participants in social interaction. They initiate interactions through gaze, vocalization, and gesture to capture attention and share interests—transforming learning into an intentional, constructive process. 


While passive systems merely optimize responses based on existing inputs, constructive systems actively generate new interactions and novel informational linkages. For AI to achieve this shift—from “perception-response” to “interaction-construction”—it must do more than respond optimally. It must pose questions, provoke inquiry, articulate intentions, and create meaning collaboratively in interaction.


Third, reconceptualizing intelligence as a “process” rather than a “black box” necessitates two essential qualities: interpretability and evolvability. Cognitive development in infants is a dynamic and continuous process characterized by ongoing adaptation and reconstruction, rather than a one-off outcome of target fitting. The “nine-month transformation” reveals that human intelligence features processual evolution, wherein new capacities do not arise by direct acquisition, but emerge spontaneously via self-organization through sustained interaction with the environment and others.


Most current AI systems prioritize outcome optimization. Yet truly robust intelligence must undergo a dynamic, developmental process of learning. Internally, this demands a flexible representational system capable of continuous adaptation and enrichment. Externally, it requires an environment that provides diverse and flexible opportunities for interaction. Only through the interplay of these forces can true intelligence emerge and evolve.


From the “nine-month transformation” seen in infants, we glimpse a path toward truly human-like AI: one that evolves from isolated information processors into socially aware cognitive agents; from one-way responders into co-constructors of knowledge. In the future, AI may evolve beyond a mere tool for executing human commands to become a collaborator capable of understanding human intentions, co-constructing knowledge, and engaging meaningfully in social interactions.


Cao Mengxue is an associate professor from the Institute of Linguistics at the Chinese Academy of Social Sciences (CASS). Wang Zhiwei is affiliated with the Key Laboratory of Linguistics at CASS.


Edited by REN GUANHONG