← Back to OpenAI updates ← Terug naar OpenAI-updates

OpenAI ARTICLE ARTIKEL 10 October 2024 10 oktober 2024

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Title: MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering Title: MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Updates Updates Videos Video's

Article details Artikelgegevens

AI maker AI-maker OpenAI Type Type Article Artikel Published Gepubliceerd 10 October 2024 10 oktober 2024 Updates Updates Videos Video's View original article Bekijk origineel artikel

Why it matters Waarom dit telt

Quick editorial signal Snelle redactionele duiding

2 min

Impact Impact

Relevant if you build with AI tools, APIs, or coding agents. Relevant als je bouwt met AI-tools, API's of coding agents.

Audience Voor wie Developers Developers

Level Niveau Expert Expert

Track this as a OpenAI update, not just a standalone headline. Bekijk dit als OpenAI-update, niet alleen als losse headline.
Useful for builders who need to understand API, coding, or workflow changes. Nuttig voor bouwers die API-, code- of workflowwijzigingen willen begrijpen.
Likely worth revisiting after people have used the release in practice. Waarschijnlijk de moeite waard om opnieuw te bekijken zodra mensen het in praktijk gebruiken.

model apps developers safety

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering | OpenAI

Skip to main content

[](https://openai.com/)

* Research

* Products

* Business

* Developers

* Company

* Foundation(opens in a new window)

Log inTry ChatGPT(opens in a new window)

Try ChatGPT(opens in a new window)Login

OpenAI

October 10, 2024

Publication

MLE-bench

Evaluating Machine Learning Agents on Machine Learning Engineering

Read paper(opens in a new window)

We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering. To this end, we curate 75 ML engineering-related competitions from Kaggle, creating a diverse set of challenging tasks that test real-world ML engineering skills such as training models, preparing datasets, and running experiments. We establish human baselines for each competition using Kaggle's publicly available leaderboards. We use open-source agent scaffolds to evaluate several frontier language models on our benchmark, finding that the best-performing setup — OpenAI's o1‑preview with AIDE scaffolding — achieves at least the level of a Kaggle bronze medal in 16.9% of competitions. In addition to our main results, we investigate various forms of resource-scaling for AI agents and the impact of contamination from pre-training. We open-source our benchmark code⁠(opens in a new window) to facilitate future research in understanding the ML engineering capabilities of AI agents.

October 10, 2024

Publication

MLE-bench

Evaluating Machine Learning Agents on Machine Learning Engineering

Authors

Chan Jun Shern, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Lilian Weng, Aleksander Madry

* o1

* Software & Engineering

* Learning Paradigms

* Reasonings & Policy

Authors

Chan Jun Shern, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Lilian Weng, Aleksander Madry

Help shape what we cover next Help bepalen wat we hierna volgen

Anonymous feedback, no frontend account needed. Anonieme feedback, zonder front-end account.

Share article Deel artikel

More from OpenAI Meer van OpenAI

All updates Alle updates

27 Apr 2026 27 apr. 2026

OpenAI available at FedRAMP Moderate OpenAI available at FedRAMP Moderate

Expanding secure AI for government. Expanding secure AI for government.

Open article → Open artikel →

27 Apr 2026 27 apr. 2026

Choco automates food distribution with AI agents Choco automates food distribution with AI agents

Using OpenAI APIs, Choco processes millions of orders, reducing manual work and enabling always-on operations across global food supply chains. Using OpenAI APIs, Choco processes millions of orders, reducing manual work and enabling always-on operations across global food supply chains.

Open article → Open artikel →

27 Apr 2026 27 apr. 2026

An open-source spec for Codex orchestration: Symphony. An open-source spec for Codex orchestration: Symphony.

Title: An open-source spec for Codex orchestration: Symphony. Title: An open-source spec for Codex orchestration: Symphony.

Open article → Open artikel →

27 Apr 2026 27 apr. 2026

The next phase of the Microsoft OpenAI partnership The next phase of the Microsoft OpenAI partnership

Amended agreement provides long-term clarity. Amended agreement provides long-term clarity.

Open article → Open artikel →

Gemini komt eraan