Is ChatGPT ridiculing my baby in the language learning ring?

“Mom”, “dad”, “baby”… At just 14 months, my baby learned to jabber around ten words and to repeat “no” tirelessly. You might expect to shed a tear with each new addition to his vocabulary (his first “poop” was applauded, we must admit). In fact, we barely notice the newcomers, drowned in diction exercises like Armande Altaï in the star Academy (2001 era, we assume). In the middle of the “dibiditapatoutabouba” soup, “bird” has the effect of a happy coincidence.

While observing my baby painfully stammering “ta” for “cat” when waking up very early in the morning, I remembered a meeting in 2018 with Yann Le Cun, director of artificial intelligence research at Meta (Fair), where this pioneer of deep learning explained that he was collaborating with neurolinguist Emmanuel Dupoux, specialized in infant learning, to try to unravel the mysteries of babies’ learning power. The idea is that taking inspiration from babies would make it possible to create more efficient AI. It takes a child about three years to generate complex language. Where would the machine be after 14 months of training? In the ring of learning to speak, would my baby be uppercut by current algorithms or, on the contrary, would he knock out language models like ChatGPT developed by OpenAI?

“Dog” or “choin”?

“Human language is of unparalleled complexity and the only agent that learns language effectively is the baby,” explains Marvin Lavechin, specialist in artificial intelligence and language acquisition models, who worked in Emmanuel Dupoux’s team. Before expressing himself in a complex form of language, the baby goes through universal stages. “The child first produces vowels, then syllables, which are more complicated to pronounce from the point of view of mouth motor skills,” explains Séverine Alonso-Bekier, psychomotor therapist. Afterwards, he associates the syllables together and begins to form the words to construct a sentence. Three years old is the age when the child masters complex language, structured sentences, with notions of space and time. It does not just designate an object. “He’ll say ‘my toy in the bedroom.’ He masters a certain number of parameters,” she continues.

You might think that three years to learn the language is a long time. In reality, imagine yourself immersed in Japan for three years, without a dictionary or translator, you will not become bilingual. You will barely be able to distinguish certain sounds and understand others. Great cognitive performance of the child. Except that ChatGPT knows how to write a philosophy dissertation at bac + 5 level, a pleading, a political speech… He even succeeds in competitive examinations for major schools. A three year old doesn’t do any of that. Even less, my 14 month old baby, who calls his father “mommy” every other time. Is the game already over? Not so fast.

Knowing a word means many things: associating a sound with an object, knowing that the word “dog” represents the animal; or know how to recognize that the word “dog” is French, without necessarily understanding it. “We did tests. We give the algorithm the word “dog” and a word that sounds like a French word but which is not one, “choin”, for example. Does the algorithm identify the word “dog” as belonging to the French language? We realize that they learn exponentially slower than children,” explains Marvin Lavechin. They need infinitely more data to arrive at the same result. “There is an ocean between the speed of learning in babies and in AI,” observes the researcher. Even the data on which babies learn is much more complex.

A slow and logicless machine

Today, the algorithms are trained on audio books, very articulate words, without parasitic noises or sound variations. On the contrary, a child learns in a noisy environment where two discussions can take place simultaneously, where external noises drown out the voices. I sometimes talk to my baby from the kitchen while a New Order vinyl crackles next to him. In the presence of another adult, I don’t always take the time to articulate as I would alone with them. Is a machine comfortable with this type of data?

In fact, the researchers tried to put the AI ​​in the same learning position as the baby. Researchers placed microphones on very young children aged 0 to two or three years old, they collected all the speech received by the infant during the day. “We retrieved this data and trained our learning models on these recordings,” indicates Marin Lavechin. The algorithm will deal with all kinds of situations: a mother telling a story to her child, the speech is very close and articulate; a mother talking while the television is playing in the background. On this data, the algorithms “break their teeth” in a masterful way,” smiles the AI ​​specialist.

In a few years, children intuitively know how to conjugate verbs, they acquire the basics of physics, by observing the world. “After two months, babies understand the notion of the permanence of objects. When an object is hidden, it has not disappeared. In the meantime, they had to understand that the world is three-dimensional, that objects can be in front of other objects. Around the age of 8 months, they understand that an object that is not supported will fall. Gravity, the effect of inertia…”, pointed out Yann Le Cun during this discussion. Today’s machines know how to produce language and text – after training on gigantic quantities of data – but they do not have the slightest basic logic. Do we need to specify who emerges victorious from the fight?

source site