Why not everything, everywhere, all at once

The question

“But why only gaming?”

I get this question at least once a week. Sometimes from investors, sometimes from potential partners, sometimes from friends who genuinely don’t understand why we wouldn’t just expand into everything. Healthcare, fintech, e-commerce, customer support. The list keeps going.

And honestly, it’s a fair question. The technology we’ve built at Theymes for player care could probably be useful in a lot of other places. We know this. We’ve gotten inbound requests from companies that have nothing to do with gaming.

So why not just say yes to all of them?

A slot machine walk into a bar

There’s this classic problem in decision theory called the multi-armed bandit. The name comes from a row of slot machines. Each one pays out at a different rate, but you don’t know which ones are good and which ones are bad. You have a limited budget of pulls. Every pull you spend trying a new machine is a pull you’re not spending on a machine you already know works ok.

This is the explore-exploit tradeoff. And it’s one of the most useful mental models I’ve found for making decisions under uncertainty.

Explore: try something new, learn something, but risk wasting a turn. Exploit: go with what’s already working, but maybe miss something better.

The tension is real. Too much exploration and you never commit to anything. Too much exploitation and you might be stuck milking a mediocre option while a great one sits untouched.

Try it yourself

Below is a simulation.

You have 5 slot machines and 25 pulls
Each machine has a hidden payout rate, some are generous, some are terrible
You don’t know which is which
Your job is to maximize your total payout

After your 25 pulls, you can compare how you did against three well-known strategies that mathematicians and computer scientists have spent decades refining.

25pulls left

Your total0

So, what happened?

If you’re like most people, you probably did one of two things: either you found one decent machine early and stuck with it, or you kept bouncing around trying to find the best one and ran out of pulls.

Both of these are natural instincts. And both of them are suboptimal.

The algorithms you played against, Epsilon-Greedy, UCB1, and Thompson Sampling, each handle the tension differently. Epsilon-Greedy is the simplest: exploit most of the time, but randomly explore with a small probability. UCB1 is more sophisticated, it accounts for uncertainty and gives bonus points to machines you haven’t tried much. Thompson Sampling is Bayesian. It models what it believes about each machine and samples from those beliefs.

What’s interesting is that none of these strategies say “just try everything equally” or “just pick one and commit.” They all balance the two, adapting over time.

Back to the question

When someone asks “why only gaming?”, what they’re really suggesting is that we should explore more. Pull more levers. Cast a wider net.

This is where the bandit problem becomes a useful way to think about it. We’ve already done our exploration. Before Theymes focused on gaming, I spent over a decade consulting across industries. Fintech, healthcare, retail, telecom, you name it. That’s ten years of pulling different levers, seeing what pays out.

Gaming paid out. Not just in the business sense, but in how well the problem fits the technology, how deep the need is, and how much the industry is ready for this shift.

What we’re doing now is exploiting. We found a high-payout machine and we’re pulling it. Hard.

That doesn’t mean we’ll never explore again. The best algorithms increase their exploration when conditions change, or when they’ve been exploiting long enough that the uncertainty starts creeping back up. Markets shift, technology evolves, and what was the best option last year might not be the best option next year.

But right now? The math says pull the lever you know works.

The uncomfortable part

The explore-exploit tradeoff also explains something that people don’t like to hear: saying yes to everything is a strategy, and it’s usually a bad one.

Spreading yourself across five industries might feel like diversifying, but in practice it’s just uniform exploration with a limited budget. If you go back to the simulation above, the “try each machine equally” approach almost never wins.

Focus feels like you’re leaving money on the table. But unfocus guarantees you’ll never find the table.

Some reading

If you got curious about the math, here are a few starting points. The original multi-armed bandit problem was formalized by Herbert Robbins in 1952. The UCB1 algorithm was introduced by Auer, Cesa-Bianchi, and Fischer in 2002. Thompson Sampling actually dates back to 1933, making it one of the oldest ideas in the field, and ironically one that only recently got the recognition it deserved.

And if you want the real rabbit hole: look into the Gittins index. It’s the provably optimal solution for a specific version of this problem, and it basically says you should assign an “index” to each option based on how promising it is including its uncertainty, and always pick the one with the highest index.

Sounds a lot like good intuition, formalized.