Introducing my new R package {BFS}

BFS ggplot2 plotly

How to easily access national Swiss data, explore it and create a map of Switzerland.

true
2019-11-07

In this article, I will show how to use my BFS package package to easily search and download datasets from the Swiss Federal Statistical Office. We will then quickly explore a dataset and plot a map of Swiss municipalities, the lowest level of administrative division in Switzerland.

As always the code is fully reproducible, so you can get it from my Github account or on my online RStudio Cloud session.

Getting the data

To use my BFS package, we should begin by downloading information related to all available datasets of the Swiss Federal Statistical Office Catalogue. We can get the BFS metadata in German (“de”), French (“fr”), Italian (“it”) and English (“en”)1.

# install/load needed R packages
library(tidyverse)
library(scales)
library(colorspace)
library(plotly)
library(RSwissMaps)
library(BFS)

# setting light theme
theme_set(theme_light())

# Get BFS metadata in German
meta_de <- bfs_get_catalog_data(language = "de")

meta_de
# A tibble: 350 x 5
   title         language published        url_bfs        url_px      
   <chr>         <chr>    <chr>            <chr>          <chr>       
 1 Gebäude nach~ de       Gebäude nach in~ https://www.b~ https://www~
 2 Gebäude nach~ de       Gebäude nach Ka~ https://www.b~ https://www~
 3 Privathausha~ de       Privathaushalte~ https://www.b~ https://www~
 4 Ständige Woh~ de       Ständige Wohnbe~ https://www.b~ https://www~
 5 Wohnungen na~ de       Wohnungen nach ~ https://www.b~ https://www~
 6 Wohnungen na~ de       Wohnungen nach ~ https://www.b~ https://www~
 7 Wohnungen na~ de       Wohnungen nach ~ https://www.b~ https://www~
 8 Betrag, Bezü~ de       Betrag, Bezüger~ https://www.b~ https://www~
 9 Stipendien: ~ de       Stipendien: Bez~ https://www.b~ https://www~
10 Hotellerie: ~ de       Hotellerie: Ang~ https://www.b~ https://www~
# ... with 340 more rows

We currently have access to 676 BFS datasets.

The challenge today is to plot a detailed map of Switzerland containing all 2’212 Swiss municipalites, or “gemeinde” in German. Let’s find out if we have a dataset that contains the word “gemeinde”.

meta_de_gemeinde <- meta_de %>%
  filter(str_detect(title, "gemeinde"))

meta_de_gemeinde
# A tibble: 1 x 5
  title         language published         url_bfs        url_px      
  <chr>         <chr>    <chr>             <chr>          <chr>       
1 Ausländische~ de       Ausländische Gre~ https://www.b~ https://www~

We got 34 different datasets related to Swiss municipalities. I am interested by the one related to the cross-border worker (“Ausländische Grenzgänger/innen” in German) at the first row. Let’s download it using the bfs_get_dataset() function.

# browseURL(meta_de_gemeinde$url_bfs[1]) # open related URL page
# for reproducibility, the BFS number is "px-x-0302010000_101"
meta_de_gemeinde$title[1] # print title
[1] "Ausländische Grenzgänger/innen nach Arbeitsgemeinde und Geschlecht"
# browseURL(meta_de_gemeinde$url_bfs[1]) # open related webpage
# for reproducibility, use BFS number "px-x-0302010000_101"
data_bfs <- bfs_get_data(number_bfs = "px-x-0302010000_101", language = "de")
data_bfs
# A tibble: 670,140 x 4
   Arbeitsgemeinde Geschlecht         Quartal `Ausländische Grenzgäng~
   <chr>           <chr>              <chr>                      <dbl>
 1 Schweiz         Geschlecht - Total 1996Q1                   146704 
 2 Schweiz         Geschlecht - Total 1996Q2                   145023.
 3 Schweiz         Geschlecht - Total 1996Q3                   143656.
 4 Schweiz         Geschlecht - Total 1996Q4                   141277.
 5 Schweiz         Geschlecht - Total 1997Q1                   140218.
 6 Schweiz         Geschlecht - Total 1997Q2                   137131.
 7 Schweiz         Geschlecht - Total 1997Q3                   137321.
 8 Schweiz         Geschlecht - Total 1997Q4                   135938.
 9 Schweiz         Geschlecht - Total 1998Q1                   135810.
10 Schweiz         Geschlecht - Total 1998Q2                   135849.
# ... with 670,130 more rows

Note that the developing version of BFS leverage the new pins package to save all the downloaded Swiss datasets in the same cache folder, accessible using the bfs_open_dir() function.

That’s all for my new BFS package.

Exploring the data

Using the Tidyverse workflow, we can now performe a quick exploratory data analysis.

Let’s begin with a glimpse at the data.

glimpse(data_bfs)
Rows: 670,140
Columns: 4
$ Arbeitsgemeinde                  <chr> "Schweiz", "Schweiz", "Schw~
$ Geschlecht                       <chr> "Geschlecht - Total", "Gesc~
$ Quartal                          <chr> "1996Q1", "1996Q2", "1996Q3~
$ `Ausländische Grenzgänger/innen` <dbl> 146704.0, 145022.6, 143656.~

The dataset contains information about the number of cross-border workers by quarter (quartal), Swiss municipality (arbeitsgemeinde) and gender (geschlecht).

Notice that value is a pondered value: each worker get a weighted point between 0-1 according to the number of hours of works he/she is doing (see more here). It is therefore more appropriate to speak about “cross-border work” as the value of two men working half time in a Swiss municipality is equal to a full time cross-border working woman.

I am curious to learn more about the gender ratio of cross-border work by municipality and its evolution over the years. Let’s build a new gender_ratio variable.

data_bfs_ratio <- data_bfs %>%
  tidyr::pivot_wider(names_from = "Geschlecht", values_from = `Ausländische Grenzgänger/innen`) %>%
  rename(quarter = Quartal, 
         municipality = Arbeitsgemeinde,
         man = Mann, 
         woman = Frau, 
         gender_total = `Geschlecht - Total`) %>%
  mutate(municipality = str_remove_all(municipality, "\\.|^\\- ")) %>% # cleaning
  mutate(gender_ratio = man / gender_total * 100) %>%
  arrange(desc(quarter))

data_bfs_ratio
# A tibble: 223,380 x 6
   municipality       quarter gender_total      man    woman gender_ratio
   <chr>              <chr>          <dbl>    <dbl>    <dbl>        <dbl>
 1 Schweiz            2021Q2      347748.  224921.  122827.          64.7
 2 Zürich             2021Q2       10548.    7720.    2828.          73.2
 3 Aeugst am Albis    2021Q2           0        0        0          NaN  
 4 Affoltern am Albis 2021Q2          11.9      8.8      3.1         73.9
 5 Bonstetten         2021Q2           1.7      0.9      0.9         52.9
 6 Hausen am Albis    2021Q2           0.5      0.5      0          100  
 7 Hedingen           2021Q2          39.8     37.8      2           95.0
 8 Kappel am Albis    2021Q2           0        0        0          NaN  
 9 Knonau             2021Q2           0        0        0          NaN  
10 Maschwanden        2021Q2           0        0        0          NaN  
# ... with 223,370 more rows

We see that the gender ratio of cross-border workers for the 2nd quarter of 2019 in Switzerland is 64.2% (but 73% in Zürich).

Does it mean we have strong cantonal gender disparities in terms of cross-border work?

# Create table to join later to bfs_data
# ref: https://en.wikipedia.org/wiki/Data_codes_for_Switzerland#Cantons
cantons <- tibble::tribble(
  ~canton, ~code, ~id_can,
  "Aargau", "AG", 19,
  "Appenzell Innerrhoden", "AI", 15,
  "Appenzell Ausserrhoden", "AR", 16,
  "Bern", "BE", 2,
  "Basel-Landschaft", "BL", 13,
  "Basel-Stadt", "BS", 12,
  "Fribourg", "FR", 10,
  "Genève", "GE", 25,
  "Glarus", "GL", 8,
  "Graubünden", "GR", 18,
  "Jura", "JU", 26,
  "Luzern", "LU", 3,
  "Neuchâtel", "NE", 24,
  "Nidwalden", "NW", 7,
  "Obwalden", "OW", 6,
  "St Gallen", "SG", 17,
  "Schaffhausen", "SH", 14,
  "Solothurn", "SO", 11,
  "Schwyz", "SZ", 5,
  "Thurgau", "TG", 20,
  "Ticino", "TI", 21,
  "Uri", "UR", 4,
  "Vaud", "VD", 22,
  "Valais", "VS", 23,
  "Zug", "ZG", 9,
  "Zürich", "ZH", 1,
)

data_bfs_ratio_annualized <- data_bfs_ratio %>%
  mutate(year = str_extract(quarter, "^.{4}"),
         year = as.numeric(year)) %>%
  group_by(municipality, year) %>%
  summarise(gender_ratio_annualized = mean(gender_ratio, na.rm = TRUE)) %>%
  ungroup() %>%
  mutate(gender_per = gender_ratio_annualized/100)

data_bfs_ratio_annualized %>%
  inner_join(cantons, by = c("municipality" = "canton")) %>% # join table
  ggplot(aes(x = year, y = gender_per, color = code)) +
  geom_line() +
  geom_line(data = filter(data_bfs_ratio_annualized, municipality == "Schweiz"),
             color = "red2", linetype = "dashed", size = 1) +
  scale_y_continuous(label = percent) +
  annotate("text", x = 2016, y = 0.54, size = 3.5, label = "National gender ratio") +
  geom_segment(aes(x = 2016, y = 0.551, xend = 2016, yend = 0.635), 
               color = "black", size = 0.2, 
               arrow = arrow(length = unit(0.2, "cm"))) +
  labs(title = "Proportion of Men in Cross-Border Workforce",
       subtitle = "Switzerland, 1995-2019",
       color = "Canton",
       x = "", y = "",
       caption = "quarter annualized - Data source: BFS")

It looks like we have different gender ratio levels according to the Swiss canton. However, it is hard to see clearly and to categorise the cantons by group.

Let’s make a time serie clustering to get the categories. I will reuse some code of the excellent blogpost of Bruno Rodrigues to perform a time-series k-means clustering.

set.seed(1111)

# Only since 2007 as missing values before for some cantons
data_bfs_wide <- data_bfs_ratio_annualized %>%
  inner_join(cantons, by = c("municipality" = "canton")) %>% # join table
  filter(year > 2006) %>%
  select(municipality, year, gender_ratio_annualized) %>%
  pivot_wider(names_from = year, values_from = gender_ratio_annualized)

wss <- map_dbl(1:6, ~{kmeans(select(data_bfs_wide, -municipality), .)$tot.withinss})

elbow_df <- as.data.frame(cbind("n_clust" = 1:6, "wss" = wss))

ggplot(elbow_df) +
  geom_line(aes(y = wss, x = n_clust))

The optimal number of categories seems to be four. Let’s cluster our times series in four different groups.

clusters <- kmeans(select(data_bfs_wide, -municipality), centers = 4)

gg_plot <- data_bfs_wide %>% 
  mutate(cluster = clusters$cluster) %>%
  pivot_longer(cols = c(-municipality, -cluster), 
               names_to = "year", 
               values_to = "gender ratio") %>%
  mutate(cluster = as.factor(cluster)) %>%
  rename(canton = municipality) %>%
  ggplot() +
  geom_line(aes(y = `gender ratio`, x = year, 
                group = canton, colour = cluster), 
            show.legend = FALSE) +
  facet_wrap(~cluster, nrow = 1) +
  scale_color_brewer(palette = "Set2") +
  scale_x_discrete(breaks = seq(2007, 2019, by = 3)) +
  guides(color = FALSE) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Proportion of Men in Cross-Border Workforce in Swiss Cantons",
       x = "")

plotly::ggplotly(gg_plot) %>%
  hide_legend()

Put your mouse over the interactive plot above. You can discover the canton’s name and related gender ratio for each time serie of the four clusters. Note that we have only kept data from 2007 as we had missing values for the previous years in some cantons.

That’s all for the exploration of the cantonal level. What about the municipal level?

Mapping time!

Let’s plot the gender ratio of cross-bording work by Swiss municipality for the last year available, i.e. 2019.

With the great RSwissMaps package of David Zumbach, it is possible to create a map of Swizterland with only a few lines of code.

# the BFS id of all municipalities are inside the RSwissMaps
# the data inside RSwissMaps is taken from year 2016
bfs_id_mun <- RSwissMaps::mun.template(year = 2016)

data_bfs_2018 <- data_bfs_ratio_annualized %>%
  left_join(bfs_id_mun, by = c("municipality" = "name")) %>%
  filter(year == 2019)

mun.plot(data_bfs_2018$bfs_nr, 
         data_bfs_2018$gender_per, 
         year = 2016) +
  scale_fill_viridis_c(labels = percent, direction = -1) +
  theme(legend.position = "right") +
  labs(title = "Proportion of Men in Cross-Border Workforce in Switzerland",
       subtitle = "More women in green-yellow, 2019",
       fill = "",
       caption = "Quarterly annualized - Data Source: BFS")

Data shows that central Swiss municipalites also have some cross-bording workers. If cross-bording work is mainly done by men, a few Swiss municipalities have more women working as cross-bording worker.

Let me know what you think about my new BFS package and feel free to contribute or make a pull request here.

Thanks for reading!


  1. English and Italian have less datasets available.↩︎

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Luginbuhl (2019, Nov. 7). : Introducing my new R package {BFS}. Retrieved from felixluginbuhl.com/blog/posts/2019-11-07-swiss-data/

BibTeX citation

@misc{luginbuhl2019introducing,
  author = {Luginbuhl, Felix},
  title = {: Introducing my new R package {BFS}},
  url = {felixluginbuhl.com/blog/posts/2019-11-07-swiss-data/},
  year = {2019}
}