Learning concepts with energy functions

Many hallmarks of human intelligence, such as generalizing from limited experience, abstract reasoning and planning, analogical reasoning, creative problem solving, and capacity for language require the ability to consolidate experience into _concepts_, which act as basic building blocks of understanding and reasoning. Our technique enables agents to learn and extract concepts from tasks, then use these concepts to solve other tasks in various domains. For example, our model can use concepts learned in a two-dimensional particle environment to let it carry out the same task on a three-dimensional physics-based robotic environment—without retraining in the new environment.

This work uses energy functions to let our agents learn to _classify_ and _generate_ simple concepts, which they can use to solve tasks like navigating between two points in dissimilar environments. Examples of concepts include visual (“red” or “square”), spatial (“inside”, “on top of”), temporal (“slow”, “after”), social (“aggressive”, “helpful”) among others. These concepts, once learned, act as basic building blocks of agent’s understanding and reasoning, as shown in other research fromDeepMind⁠(opens in a new window)andVicarious⁠(opens in a new window).

To create the energy function, we mathematically represent concepts asenergy models⁠(opens in a new window). The idea of energy models is rooted in physics, with the intuition that observed events and states represent low-energy configurations.

We define an energy function E(x, a, w) for each concept in terms of:

* The state of the world the model observes(x)

* Anattention mask⁠(opens in a new window)(a) over entities in that state.

* A continuous-valued vector (w), used as conditioning, that specifies the concept for which energy is being calculated

States of the world are composed of sets of entities and their properties and positions (like the dots below, which have both positional and colored properties). Attention masks, used for “identification”, represent a model’s focus on some set of entities. The energy model outputs a single positive number indicating whether the concept is satisfied (when energy is zero) or not (when energy is high). A concept is satisfied when an attention mask is focused on a set of entities that represent a concept, which requires both that the entities are in the correct positions (modification of x, or generation) and that the right entities are being focused on (modification of a, or identification).

We construct the energy function as a neural network based on therelational network architecture⁠(opens in a new window), which allows it to take an arbitrary number of entities as input. The parameters of this energy function are what is being optimized by our training procedure; other functions are derived implicitly from the energy function.

This approach lets us use energy functions to learn a single network that can performboth generation and recognition. This allows us to cross-employ concepts learned from generation to identification, and vice versa. (Note: This effect is already observed in animals viamirror neurons⁠(opens in a new window).)

Our training data is composed of trajectories of (attention mask, state), which we generate ahead of time for the specific concepts we’d like our model to learn. We train our model by giving it a set of demonstrations (typically 5) for a given concept set, and then give it a new environment (X0) and ask it to predict the next state (X1) and next attention mask (a). We optimize the energy function such that the next state and next attention mask found in the training data are assigned low energy values. Similar to generative models likevariational autoencoders⁠(opens in a new window), the model is incentivized to learn values that usefully compress aspects of the task. We trained our model using a variety of concepts involving, visual, spatial, proximal, and temporal relations, and quantification in a two-dimensional particle environment.

Spatial Region Concepts: given demonstration 2D points (left), energy function over point placement is inferred (middle), stochastic gradient descent over energy is then used to generate new points (right)

We evaluated our approach across a suite of tasks designed to see how well our single system could learn to identify and generate things united by the same concept; our system can learn to classify and generate specific sets of spatial relationships, or can navigate entities through a scene in a specific way, or can develop good judgements for concepts like quantity (one, two, three, or more than three) or proximity.

Quantity Concept: demonstration attention is placed on one, two, three, or more than three entities. Inference is used to generate attention masks of similar quantity

Proximity Concepts: demonstration events bring attention to the entity closest or furthest to the marker or to bring the marker to be closest or furthest to entity of a particular color (left). Inference is used to generate attention masks for closest or further entity (recognition) or to place the marker to be closest or furthest from an entity (generation) (right)

Learning concepts with energy functions Learning concepts with energy functions

Quick editorial signal Snelle redactionele duiding

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

How to get started with Codex Zo begin je met Codex

What is Codex? Wat is Codex?

Codex settings Codex-instellingen