# Gender Analysis of the Pokémon Universe

Last week I came accross the funny dataset of the last Data Is Beautiful monthly challenge: Information on All 802 Pokemon. Having a quick look at the data, I discovered, to my biggest surprized, a `percentage_male` variable. I wasn’t aware that Pokémon have genders. So I decided to dig further into this gender dimension of the Pokémon universe.

I learnt that since the second generation, Pokémon could either be male or female. For example, when a little Pikachu get out of its egg, he has 50% percent of being male (and 50% of being female). Some Pokémons have more chances to be male, some to be female and some have no genders (as I always thought). Take Squirtle for example. According to the dataset, he has 88.1% of chances to be male. Squirtle is therefore a male Pokémon.

This surprise led me to ask a simple question: is there gender equality in the Pokémon universe?

## Catch ’Em All

Firstly, let’s get the data and classify the Pokémon according to their more probable gender.

``````library(tidyverse)

pokemon <- read_csv("pokemon.csv") #data from https://www.kaggle.com/rounakbanik/pokemon

pokemon_gender <- pokemon %>%
select(percentage_male, generation, name) %>%
mutate(gender = case_when(percentage_male == 0.0 ~ "Pokémon more likely to be FEMALE",
percentage_male == 11.2 ~ "Pokémon more likely to be FEMALE",
percentage_male == 24.6 ~ "Pokémon more likely to be FEMALE",
percentage_male == 50.0 ~ "Pokémon with equal likelihood of being FEMALE OR MALE",
percentage_male == 75.4 ~ "Pokémon more likely to be MALE",
percentage_male == 88.1 ~ "Pokémon more likely to be MALE",
percentage_male == 100.0 ~ "Pokémon more likely to be MALE"),
gender = replace_na(gender, "Pokémon with NO GENDER"), #NA is for genderless
generation = case_when(generation == 1 ~ "from Generation I",
generation == 2 ~ "from Generation II",
generation == 3 ~ "from Generation III",
generation == 4 ~ "from Generation IV",
generation == 5 ~ "from Generation V",
generation == 6 ~ "from Generation VI",
generation == 7 ~ "from Generation VII")) %>%
count(gender, generation, name) #mutate(n = 1) would also work

pokemon_gender
``````
``````## # A tibble: 801 x 4
##    gender                           generation        name           n
##    <chr>                            <chr>             <chr>      <int>
##  1 Pokémon more likely to be FEMALE from Generation I Chansey        1
##  2 Pokémon more likely to be FEMALE from Generation I Clefable       1
##  3 Pokémon more likely to be FEMALE from Generation I Clefairy       1
##  4 Pokémon more likely to be FEMALE from Generation I Jigglypuff     1
##  5 Pokémon more likely to be FEMALE from Generation I Jynx           1
##  6 Pokémon more likely to be FEMALE from Generation I Kangaskhan     1
##  7 Pokémon more likely to be FEMALE from Generation I Nidoqueen      1
##  8 Pokémon more likely to be FEMALE from Generation I Nidoran♀       1
##  9 Pokémon more likely to be FEMALE from Generation I Nidorina       1
## 10 Pokémon more likely to be FEMALE from Generation I Ninetales      1
## # ... with 791 more rows
``````

## Visualize ’Em all

A simple treemap allows us to visualize our hierarchical dataset.

``````library(treemap)

pokemon_tm <- treemap(pokemon_gender,
index = c("gender", "generation", "name"),
vSize = "n",
palette = "Pastel1",
title = "Pokémon Genders over the Generations")
``````

Looks like gender imbalance to me. The ```Pokémon more likely to be FEMALE``` cell is less than half the size of the ```Pokémon more likely to be MALE``` cell.

The `sunburstR` package, a htmlwidget to create d3.js sequence sunbursts, allows us to better explore the Pokémon gender repartition over the generations.

``````library(sunburstR)
library(d3r)
library(htmlwidgets)

pokemon_tm_nest <- d3_nest(
pokemon_tm\$tm[,c("gender", "generation", "name", "vSize", "color")],
value_cols = c("vSize", "color")
)

sb <- sunburst(
data = pokemon_tm_nest,
valueField = "vSize",
legend = list(w = 400),
legendOrder = c("Pokémon more likely to be FEMALE",
"Pokémon more likely to be MALE",
"Pokémon with equal likelihood of being FEMALE OR MALE",
"Pokémon with NO GENDER"),
count = TRUE,
sumNodes = FALSE,
colors = htmlwidgets::JS("function(d){return d3.select(this).datum().data.color;}"),
withD3 = TRUE)

sb <- htmlwidgets::onRender(sb,
#ref: https://github.com/timelyportfolio/sunburstR/issues/15
"function(el,x){
// have legend as default
d3.select(el).select('.sunburst-togglelegend').property('checked', true);
d3.select(el).select('.sunburst-legend').style('visibility', '');
}"
)
``````

Some modification have to be made manually inside the HTML output.

``````## code to copy past into the equivalent html output

// Fade all but the current sequence, and show it in the breadcrumb trail.
function mouseover(d) {

var percentage = (100 * d.value / totalSize).toPrecision(2); // precision 2 - lgnbhl mod
var percentageString = percentage + "%";
if (percentage < 0.13) { // conditionality added
percentageString = "";
}

var countString = [
'<span style = "font-size:.7em">',
d3Format.format("1.2s")(d.value) + ' Pokémon on 801', // on 801 Pokémon
'</span>'
].join('');
if (percentage < 0.13) { // conditionality added
countString = d.data.name;
}
``````

We get a simple but effective interactive visualization, which I published online using `flexdashboard`. Click on the image to open the interactive page.