Introduction / Background

For people unfamiliar with the ‘2vec’ kinds of models, the idea is to go from a one-hot encoded representation (each pokemon corresponds to an 1 in an otherwise entirely 0 vector) to a lower-dimensional one that has some useful information ‘baked’ into it.

Word2Vec uses the idea that a word is defined by the context it appears in. Here is a helpful link that explains Word2Vec in case you’re interested. Expanding a bit, this means that words appearing in the same contexts convey similar meaning. Taken this way, it’s not super unreasonable to expand this idea to Pokemon, since Pokemon that fill similar ‘roles’ within a team would probably find themselves having similar teammates. I will be using data from battles hosted on Pokemon Showdown (PS), an online Pokemon battling simulator.

Model and Data Generation

This part is a bit technical so for people who don’t care about the gory details about “how” please skip to the Results section.


The model I’ve described above (a word/pokemon being ‘defined’ by the words appearing in its context without consideration of position) is referred to as a Continuous Bag-of-Words approach. There are a lot of different ways to fit such a model, the way I’ve done it is using a two-hidden layer neural network to get two matrices of ‘encodings’ and ‘decodings’, and then average the resulting vectors element-wise to get the final embedding.

Data Generation - Theory

In an ideal world, I would have access to the actual teams that were used on PS in a given month. In the real world, I do not have access to this data, so I have to estimate it.

The monthly usage statistics files give access to the marginal (eg: \(P(Clefable \text{ on team})\)) and pairwise conditional (eg: \(P(Charizard \text{ on team | } Clefable \text{ on team})\)) probabilities for all pokemon in the metagame. The problem is, the actual probability of a team cannot be inferred from this, so (incorrect) simplifying assumptions had to be made. This next section goes into more technical detail about why this happens/what assumptions end up being made, so unless you really care about statistics skip ahead and take for granted that you can get the probability of a certain team composition.

A joint probability, \(P(A, B, C)\), can (repeatedly) be broken down into a marginal and conditional probability as shown below \[ P(A, B, C) = P(A, B | C) * P(C) \] Using pokemon in place of random, you get \[ P(Spinda, Clefable, Bisharp) = P(Spinda, Clefable | Bisharp) * P(Bisharp) \]

From the usage statistics, we have the marginal probability for Bisharp. What we are missing is the conditional joint \(P(Spinda, Clefable | Bisharp)\), all we have is \(P(Spinda | Bisharp)\) and \(P(Clefable | Bisharp)\). A potential work-around is to simply use the product of the two conditionals as an approximation of the joint conditional which, from a statistical standpoint, means that I am making an assumption about the conditional independence of two events. Specifically, this assumption means that, gi