In the race to achieve General Artificial Intelligence, DeepMind has taken the lead (or have they?). The company recently released details regarding its new model Gato, which can carry out 600 GAIs (General Artificial Intelligence). This model can do simple tasks such as playing games and doing simple tasks independently. Gato comes as a competitor to OpenAI's GPT-3.
Gato, or "A Generalist Agent," as DeepMind calls it, is inspired by the recent development of large-scale language modeling to tackle problems beyond the realm of Text-Inputs. The agent has been described to work as a multi-modal, multi-task, multi-embodiment generalist policy to do many tasks (but are they being done well?).
General Artificial Intelligence or Artificial General Intelligence (AGI) is the ability of an intelligent agent to learn and implement an intellectual task that a human can achieve. It has been the aim of AI since its first applications, and although it may sound scary thanks to Hollywood, the idea may very well be the answer to our origins.
This idea has sprouted a race among engineers and researchers to achieve the perfect agent, giving us some revolutionary architectures like GPT-3, DALL-E 2, PaLM, LaMDA, Chinchilla, and Flamingo! However, most of these models aim to achieve limited use-cases revolving around text and speech; Gato strives to become something more.
DeepMind is the British Artificial Intelligence division of Alphabet. co (wonder whose parent company is?). Since its foundation in 2010, the research lab has wholly revolutionized the Reinforcement Learning scene.
The lab has transformed the sphere by predominantly training agents to excel in real-life games, with its very first instance being the completion of Atari Games by their agent. Contrasting to the pre-existing "game-playing" models like IBM Deep Blue or Watson, the model promised no hard-coding towards a definite goal but the flexibility to achieve the task by learning using a combination of Convolution Neural Networks backed with Q-Learning.
Gato, at its core, works as a multi-disciplinary network that works as a multi-modal, multi-task, multi-embodiment generalist policy. As quoted from the authors at DeepMind, "The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens."
Gato works by normalizing all different inputs and data streams from various tasks and modulating them into flat sequences of tokens. When treated as weights, it enables it to interact with Language, Images, Playing Games, and Interacting with Mechanical Objects.
It is achieved by taking these tokenized weights from the first step and sampling them into action vectors autoregressively, one token at a time. Once all tokens constituting the action vector have been tested (as defined by the environment's action specification), the action is decoded and transmitted to the environment, producing a new observation.
The technique is then repeated. The model is constantly aware of all prior observations and actions inside its 1024-token context window.
After going through the paper and writer's notes, DeepMind's Gato doesn't significantly improve text generation, image captioning, playing Atari games, or moving mechanical components. However, it brings everything together and achieves to do something about everything.
The network is an excellent step towards AGIs and what every general AI network strives to be. With only 1.18 billion parameters compared to its competitors like GPT-3, 175 billion parameters, and 70 billion parameters "small" Chinchilla, the network sets a new benchmark in the efficacy of multi-purpose networks.
With both the research giants developing and perfecting their respective models for years, one might wonder whether we are making progress in the department of AGI and its availability to "humanity." However, the fundamental problem with creating a "General" intelligence is its ability to learn without any explicit, structured data or rather a system feeding it information, which none of the networks seem to address.
Secondly, if we were to talk about the availability of these networks to "humanity," both the networks currently and in the foreseeable future appear to be moderated under the companies owned by giant corporations.
Both the networks at their current state are achieving impressive feats in their respective tasks, but are these tasks ready to be incorporated into the consumer construct of our society. Both networks still require a considerable amount of work in the filter department to make their products safe from explicit unacceptable languages and behavior.
Are you interested in reading more about upcoming and new AI Architectures? How about checking out our Blog Dall E 2.