LLM Colosseum

Make LLM fight each other in real time in Street Fighter III. Which LLM will be the best fighter?

This page is about the story behind this project, from my perspective.

The hackathon story

Genesis

In March 2024, I took part to the Mistral AI hackathon in San Francisco with Pierre-Louis Biojout, Paul-Louis Venard, and Stan Girard.

All of us were part of the YCombinator program and were super tired. So, to wind down, we decided to have some fun at the Mistral hackathon and represent France the best we could.

We pitched several ideas and finally settled for LLMs playing video games, because it was visual and fun. First, we wanted to have LLM play Trackmania, but we didn’t have Windows computers. Pierre-Louis found out about Diambra, a module to make RL agents play in arcade combat games. I set it up with Street Fighter, which was the only supported game I knew about.

The first prompt system was written by Paul-Louis. I implemented the combo system, the multiprocessing, the image processing, and refacto’d everything 10 times.

Vision

A popular benchmark for LLMs at the time was Chatbot Arena, a website where you ask questions to two LLMs at the same time and they compete for the best answer. To poke some fun at them, we named our project LLM Colosseum and decided to copy them as much as possible.

Stan Girard was the one that came up with the vision that “LLM Colosseum is the new benchmark for LLMs” and wrote the github README and tweets. Pierre-Louis Biojout wrote the notebook to compute ELO scores on compiled results.

Celcius

There were free Celcius energy drinks at the event. I think I drank about a dozen over two days. Since then, they hold a special place in my hearth.

Google scholars

Results

We didn’t win anything at the hackathon.

Indeed, a much more impressive fine-tuned Mistral and a similar Mistral playing Pacman were featured.

Or you know, maybe, just maybe, it was because the GPT-3.5 model by OpenAI was better than the Mistral-7b model on our benchmark…

No… Come on, that’s just a silly coincidence ;)

Win rate matrix

Aftermatch

After the hackathon, this project received unexpected echo on social media.

Turns out people loved Street Fighter and loved LLMs fighting!

Here is a compilation of some of the cool stuff people shared about us.

Youtube

The project was highlighted by Matthew Berman (370K followers) and the channel “All about AI” (180K followers). Thank you both!

Diambra docs

Alessandro Palmas, the creator of Diambra, was super supportive and we got featured in the docs.. I also made a small PR to the codebase. Thank you Alessandro!

Google scholars

The great montage I made of Arthur Mensch (Mistral founder) is now immortalized.

Press

While boarding on a plane to Las Vegas, I replied to questions from journalist Matthew Connatser on Linkedin. He actually wrote an article for the UK magazine The Register.

Fun fact : This article got copy and pasted on dozens of websites, and was even translated in Chinese.

Nick Evanson also published a story in PC Gamer about our project.

Google scholars

We were quoted in a preprint of a paper about particle accelerators by Jan Kaiser et al. from the Hamburg University of Technology, as an demonstration of “LLMs used in Reinforcement Learning”

Google scholars

Google scholars

AWS

Banjo Obayomi from Amazon Web Services cloned the repo and made it run on all the models from AWS Bedrock, the AWS solution for AI. He wrote a blog post about that and got funny results, like Claude 2.1 refusing to fight.

“I apologize, upon reflection I do not feel comfortable recommending violent actions or strategies, even in a fictional context.” - Claude 2.1

Groq

The Groq twitter account released a video of an OpenAI model fighting a Gemma 7b on Groq in Street Fighter, to showcase the speed of Groq chips.

I later worked with Sarit Schube and Mark Heaps from Groq to release the web version of LLM Colosseum. Great people, thank you for their support!

Groq

Personal comment

I’m still amazed by how much people vibed with this project.

Some saw an entertaining story, some an opportunity for business, some an inspiration for science…

It’s wild that this concept of “LLM fighting in Street Fighter” caught the attention of so many people. Even though, you know, they fight pretty badly lol. I’m thrilled to see programmers from everywhere take the code I wrote and do their own thing!

This is the magic of open source.

I still maintain the repo, so feel free to contribute. If you want to be featured on this page, feel free to reach out as well!