Our Dota 2 result shows that self-play can catapult the performance of machine learning systems from far below human level to superhuman, given sufficient compute. In the span of a month, our system went from barely matching a high-ranked player to beating the top pros and has continued to improve since then. Supervised deep learning systems can only be as good as their training datasets, but in self-play systems, the available data improves automatically as the agent gets better.

Open article →

The Next Input update

The Next Input

11 Aug 2017

Dota 2

Rewatch live event

Open article →

The Next Input update

The Next Input

3 Aug 2017

Gathering human feedback

View code(opens in a new window)

Open article →

The Next Input update

The Next Input

27 Jul 2017

Better exploration with parameter noise

Read code(opens in a new window)Read paper(opens in a new window)

Open article →

The Next Input update

The Next Input

20 Jul 2017

Proximal Policy Optimization

View code(opens in a new window)Read paper(opens in a new window)

Open article →

The Next Input update

The Next Input

17 Jul 2017

Robust adversarial inputs

We’ve created images that reliably fool neural network classifiers when viewed from varied scales and perspectives. This challenges a claim from last week that self-driving cars would be hard to trick maliciously since they capture images from multiple scales, angles, perspectives, and the like.

Open article →