Have you ever played Metazooa ?
On the surface it looks simple. The game chooses an animal, and you try to guess it. But instead of asking yes/no questions, you are constrained to taxonomy. Every guess is a species, and every answer is the lowest common ancestor (LCA) between your guess and the secret animal.
If the game is thinking of a lion and you guess a whale, the response is "Mammalia". Helpful, but not that helpful.
After playing a few rounds, a natural question starts to come up:
Is there an objectively best animal to start with?
Not a good guess. Not an intuitive guess. The optimal one.
It turns out this question drops us straight into information theory. In particular, a concept Claude Shannon introduced back in 1948 gives a surprisingly clean answer.
---
Framing the Problem
At any point in Metazooa, the game state is a tree of remaining possibilities. Each leaf is a species, and each internal node is a taxonomic group.
When the AI makes a guess, the LCA you return partitions the remaining animals into groups. Some guesses barely narrow things down. Others cut the search space dramatically.
So the problem is not "what animal seems reasonable?" but rather:
Which guess reduces uncertainty the most, on average?
That is exactly what information theory is good at answering.
---
Entropy and Uncertainty
Shannon entropy measures uncertainty. If there are N equally likely animals remaining, the entropy is:
$$ H = \log 2(N) $$ You can think of this as the minimum number of yes/no questions needed to identify the correct animal. In the Metazooa dataset I analyzed, there are 328 species, which gives: $$ H {\text{initial}} \approx 8.36 \text{ bits} $$
This is a hard lower bound. No strategy can do better than this on average.
Each guess reduces uncertainty. The amount it reduces it by is called information gain :
$$ \text{Information Gain} = H {\text{before}} - H {\text{after}} $$
So the best starting guess is simply the one that maximizes expected information gain over all possible LCA responses.
---
Computing Expected Information Gain
For a given guess, you can simulate all possible LCAs it might produce. Each possible LCA corresponds to a group of animals that would remain consistent with that feedback.
The expected entropy after making a guess is:
$$ H {\text{after}} = \sum i P(i) \cdot \log 2(|G i|) $$
where:
- $G _i$ is the set of animals consistent with feedback $i$ - $P(i)$ is the probability of receiving that feedback
Once this is computed, information gain is just the difference from the initial entropy.
Repeat this for every possible starting guess, and the optimal choice falls straight out of the math.
---
Strategies Compared
Using this framework, I evaluated three different strategies:
1. Minmax Optimizes for the worst case. You will never do too badly, no matter which animal was chosen.
2. Entropy Optimizes the average case by maximizing expected information gain.
3. Hybrid A tunable mix of the two approaches.
Minmax and entropy are philosophically very different. Minmax plans for the most adversarial scenario. Entropy assumes all animals are equally likely and plays the averages.
So you might expect them to disagree.
They mostly do not.
---
Results
Minmax (Empirical)
When optimizing purely for worst case performance, the best starting guesses cluster tightly together:
- Bison : 4.78 average guesses - Water buffalo : 4.79 - Dog : 4.79 - Yak : 4.79 - Skunk, otters, foxes : all within a few hundredths
What is striking here is not which animal wins, but how flat the curve is. Once you are guessing a mid sized, well connected mammal, performance differences become very small.
Entropy (Theoretical)
The entropy based analysis produces a very similar ranking.
The top information gain guesses are:
- Mink / Ferret : ~1.193 bits - Weasel / Otters : ~1.19 bits - Wolverine / Badger / Raccoon : close behind
These values predict an expected game length of about 7.0 guesses, compared to the theoretical minimum of 8.36 bits.
The overlap with the minmax results is not accidental. Animals that perform well in the worst case also tend to sit near the top in expected information gain.
(If you are curious, the full rankings and raw numbers are available in the source code linked below.)
---
Why This Works
Animals near the structural center of the taxonomy tend to split the tree cleanly. They are not too general and not too specialized.
In other words:
Good Metazooa guesses are about balance, not intuition.
Information theory does not just explain the game. It predicts how it behaves in practice.
---
Try It Yourself
To make this easier to explore, I built an interactive tool:
You can:
- Choose a strategy (Minmax, Entropy, or Hybrid) - Restrict the problem to a clade like "Mammalia" or "Carnivora" - See which species is theoretically optimal as a starting guess
You can also play the original game at metazooa.com .
---
Takeaways
- Information theory is practical, not abstract - Entropy is about asking good questions, not just counting possibilities - Different objectives lead to different strategies - When theory and experiments line up, you have found something fundamental
The next time you play Metazooa, try opening with a mink, a ferret, or even a bison.
It might not feel clever, but mathematically, it is about as efficient as possible.
---
Want to dive deeper? The full analysis and source code, including the Python scripts used to compute information gain and compare strategies, are available here: github.com/alexjercan/metazooa