Is There Gender Equality in the Pokémon Universe?

With more than 13 millions of subscribers, Reddit’s Data Is Beautiful is one of the main online forum on data visualization. Last week I came accross the funny dataset of the current DataViz monthly challenge: Information on All 802 Pokemon. Having a quick look at the data, I discovered with surprize a percentage_male variable. I wasn’t aware that Pokémon have genders. So I decided to dig further into this gender dimension of the Pokémon universe.

I learnt that since the 2nd generation, Pokémon could either be male or female. For example, when a little Pikachu get out of its egg, he has 50% percent of being male (and 50% of being female). Some Pokémons have more chances to be male, some to be female and some have no genders (as I always thought). Take Squirtle for example. According to the dataset, he has 88.1% of chances to be male. Squirtle is therefore a male Pokémon.

This surprise led me to ask a simple question: is there gender equality in the Pokémon universe?

Catch ’Em All

Firstly, let’s get the data and classify the Pokémon according to their more probable gender.

Show code

pokemon <- read_csv("input/pokemon.csv") #data from

pokemon_gender <- pokemon %>%
  select(percentage_male, generation, name) %>%
  mutate(gender = case_when(percentage_male == 0.0 ~ "Pokémon more likely to be FEMALE",
                            percentage_male == 11.2 ~ "Pokémon more likely to be FEMALE",
                            percentage_male == 24.6 ~ "Pokémon more likely to be FEMALE",
                            percentage_male == 50.0 ~ "Pokémon with equal likelihood of being FEMALE OR MALE",
                            percentage_male == 75.4 ~ "Pokémon more likely to be MALE",
                            percentage_male == 88.1 ~ "Pokémon more likely to be MALE",
                            percentage_male == 100.0 ~ "Pokémon more likely to be MALE"),
         gender = replace_na(gender, "Pokémon with NO GENDER"), #NA is for genderless
         generation = case_when(generation == 1 ~ "from Generation I",
                                generation == 2 ~ "from Generation II",
                                generation == 3 ~ "from Generation III",
                                generation == 4 ~ "from Generation IV",
                                generation == 5 ~ "from Generation V",
                                generation == 6 ~ "from Generation VI",
                                generation == 7 ~ "from Generation VII")) %>%
  count(gender, generation, name) #mutate(n = 1) would also work

datatable(select(pokemon_gender, name, generation, gender), rownames = FALSE, 
          options = list(pageLength = 5, dom = 'ftpi'))

Visualize ’Em all

A simple treemap allows us to visualize our hierarchical dataset.

Show code

pokemon_tm <- treemap(pokemon_gender,
                      index = c("gender", "generation", "name"),
                      vSize = "n",
                      palette = "Pastel1",
                      title = "Pokémon Genders over the Generations")

Looks like gender imbalance to me. The Pokémon more likely to be FEMALE cell is less than half the size of the Pokémon more likely to be MALE cell.

The sunburstR package, a htmlwidget to create d3.js sequence sunbursts, allows us to better explore the Pokémon gender repartition over the generations.

We get a simple but effective interactive visualization, which I published online using flexdashboard. Note that some modification have to be made manually inside the HTML output.

Click on the image to open the interactive page.

Show code

pokemon_tm_nest <- d3_nest(
  pokemon_tm$tm[,c("gender", "generation", "name", "vSize", "color")],
  value_cols = c("vSize", "color")

sb <- sunburst(
  data = pokemon_tm_nest,
  valueField = "vSize",
  legend = list(w = 400),
  legendOrder = c("Pokémon more likely to be FEMALE", 
                  "Pokémon more likely to be MALE", 
                  "Pokémon with equal likelihood of being FEMALE OR MALE",
                  "Pokémon with NO GENDER"),
  count = TRUE,
  sumNodes = FALSE,
  colors = htmlwidgets::JS("function(d){return;}"),
  withD3 = TRUE)

sb <- htmlwidgets::onRender(sb,
  // have legend as default'.sunburst-togglelegend').property('checked', true);'.sunburst-legend').style('visibility', '');
## code to copy past into the equivalent html output

// Fade all but the current sequence, and show it in the breadcrumb trail.
  function mouseover(d) {

    var percentage = (100 * d.value / totalSize).toPrecision(2); // precision 2 - lgnbhl mod
    var percentageString = percentage + "%";
    if (percentage < 0.13) { // conditionality added
      percentageString = "";

    var countString = [
        '<span style = "font-size:.7em">',
        d3Format.format("1.2s")(d.value) + ' Pokémon on 801', // on 801 Pokémon
    if (percentage < 0.13) { // conditionality added
      countString =;

Thanks for reading!

  • For updates of recent blog posts, follow me on Twitter.
  • For reproducing my data analysis, go on my Github page.
  • Curious about what I can do for your organisation? Have a look at my Project page.

Leave a Comment