How to easily access national Swiss data, explore it and create a map of Switzerland.
In this article, I will show how to use my BFS package package to easily search and download datasets from the Swiss Federal Statistical Office. We will then quickly explore a dataset and plot a map of Swiss municipalities, the lowest level of administrative division in Switzerland.
As always the code is fully reproducible, so you can get it from my Github account or on my online RStudio Cloud session.
To use my BFS
package, we should begin by downloading information related to all available datasets of the Swiss Federal Statistical Office Catalogue. We can get the BFS metadata in German (“de”), French (“fr”), Italian (“it”) and English (“en”)1.
# install/load needed R packages
library(tidyverse)
library(scales)
library(colorspace)
library(plotly)
library(RSwissMaps)
library(BFS)
# setting light theme
theme_set(theme_light())
# Get BFS metadata in German
meta_de <- bfs_get_catalog_data(language = "de")
meta_de
# A tibble: 350 x 5
title language published url_bfs url_px
<chr> <chr> <chr> <chr> <chr>
1 Gebäude nach~ de Gebäude nach in~ https://www.b~ https://www~
2 Gebäude nach~ de Gebäude nach Ka~ https://www.b~ https://www~
3 Privathausha~ de Privathaushalte~ https://www.b~ https://www~
4 Ständige Woh~ de Ständige Wohnbe~ https://www.b~ https://www~
5 Wohnungen na~ de Wohnungen nach ~ https://www.b~ https://www~
6 Wohnungen na~ de Wohnungen nach ~ https://www.b~ https://www~
7 Wohnungen na~ de Wohnungen nach ~ https://www.b~ https://www~
8 Betrag, Bezü~ de Betrag, Bezüger~ https://www.b~ https://www~
9 Stipendien: ~ de Stipendien: Bez~ https://www.b~ https://www~
10 Hotellerie: ~ de Hotellerie: Ang~ https://www.b~ https://www~
# ... with 340 more rows
We currently have access to 676 BFS datasets.
The challenge today is to plot a detailed map of Switzerland containing all 2’212 Swiss municipalites, or “gemeinde” in German. Let’s find out if we have a dataset that contains the word “gemeinde”.
meta_de_gemeinde <- meta_de %>%
filter(str_detect(title, "gemeinde"))
meta_de_gemeinde
# A tibble: 1 x 5
title language published url_bfs url_px
<chr> <chr> <chr> <chr> <chr>
1 Ausländische~ de Ausländische Gre~ https://www.b~ https://www~
We got 34 different datasets related to Swiss municipalities. I am interested by the one related to the cross-border worker (“Ausländische Grenzgänger/innen” in German) at the first row. Let’s download it using the bfs_get_dataset()
function.
# browseURL(meta_de_gemeinde$url_bfs[1]) # open related URL page
# for reproducibility, the BFS number is "px-x-0302010000_101"
meta_de_gemeinde$title[1] # print title
[1] "Ausländische Grenzgänger/innen nach Arbeitsgemeinde und Geschlecht"
# browseURL(meta_de_gemeinde$url_bfs[1]) # open related webpage
# for reproducibility, use BFS number "px-x-0302010000_101"
data_bfs <- bfs_get_data(number_bfs = "px-x-0302010000_101", language = "de")
data_bfs
# A tibble: 670,140 x 4
Arbeitsgemeinde Geschlecht Quartal `Ausländische Grenzgäng~
<chr> <chr> <chr> <dbl>
1 Schweiz Geschlecht - Total 1996Q1 146704
2 Schweiz Geschlecht - Total 1996Q2 145023.
3 Schweiz Geschlecht - Total 1996Q3 143656.
4 Schweiz Geschlecht - Total 1996Q4 141277.
5 Schweiz Geschlecht - Total 1997Q1 140218.
6 Schweiz Geschlecht - Total 1997Q2 137131.
7 Schweiz Geschlecht - Total 1997Q3 137321.
8 Schweiz Geschlecht - Total 1997Q4 135938.
9 Schweiz Geschlecht - Total 1998Q1 135810.
10 Schweiz Geschlecht - Total 1998Q2 135849.
# ... with 670,130 more rows
Note that the developing version of BFS
leverage the new pins
package to save all the downloaded Swiss datasets in the same cache folder, accessible using the bfs_open_dir()
function.
That’s all for my new BFS
package.
Using the Tidyverse workflow, we can now performe a quick exploratory data analysis.
Let’s begin with a glimpse at the data.
glimpse(data_bfs)
Rows: 670,140
Columns: 4
$ Arbeitsgemeinde <chr> "Schweiz", "Schweiz", "Schw~
$ Geschlecht <chr> "Geschlecht - Total", "Gesc~
$ Quartal <chr> "1996Q1", "1996Q2", "1996Q3~
$ `Ausländische Grenzgänger/innen` <dbl> 146704.0, 145022.6, 143656.~
The dataset contains information about the number of cross-border workers by quarter (quartal
), Swiss municipality (arbeitsgemeinde
) and gender (geschlecht
).
Notice that value
is a pondered value: each worker get a weighted point between 0-1 according to the number of hours of works he/she is doing (see more here). It is therefore more appropriate to speak about “cross-border work” as the value of two men working half time in a Swiss municipality is equal to a full time cross-border working woman.
I am curious to learn more about the gender ratio of cross-border work by municipality and its evolution over the years. Let’s build a new gender_ratio
variable.
data_bfs_ratio <- data_bfs %>%
tidyr::pivot_wider(names_from = "Geschlecht", values_from = `Ausländische Grenzgänger/innen`) %>%
rename(quarter = Quartal,
municipality = Arbeitsgemeinde,
man = Mann,
woman = Frau,
gender_total = `Geschlecht - Total`) %>%
mutate(municipality = str_remove_all(municipality, "\\.|^\\- ")) %>% # cleaning
mutate(gender_ratio = man / gender_total * 100) %>%
arrange(desc(quarter))
data_bfs_ratio
# A tibble: 223,380 x 6
municipality quarter gender_total man woman gender_ratio
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Schweiz 2021Q2 347748. 224921. 122827. 64.7
2 Zürich 2021Q2 10548. 7720. 2828. 73.2
3 Aeugst am Albis 2021Q2 0 0 0 NaN
4 Affoltern am Albis 2021Q2 11.9 8.8 3.1 73.9
5 Bonstetten 2021Q2 1.7 0.9 0.9 52.9
6 Hausen am Albis 2021Q2 0.5 0.5 0 100
7 Hedingen 2021Q2 39.8 37.8 2 95.0
8 Kappel am Albis 2021Q2 0 0 0 NaN
9 Knonau 2021Q2 0 0 0 NaN
10 Maschwanden 2021Q2 0 0 0 NaN
# ... with 223,370 more rows
We see that the gender ratio of cross-border workers for the 2nd quarter of 2019 in Switzerland is 64.2% (but 73% in Zürich).
Does it mean we have strong cantonal gender disparities in terms of cross-border work?
# Create table to join later to bfs_data
# ref: https://en.wikipedia.org/wiki/Data_codes_for_Switzerland#Cantons
cantons <- tibble::tribble(
~canton, ~code, ~id_can,
"Aargau", "AG", 19,
"Appenzell Innerrhoden", "AI", 15,
"Appenzell Ausserrhoden", "AR", 16,
"Bern", "BE", 2,
"Basel-Landschaft", "BL", 13,
"Basel-Stadt", "BS", 12,
"Fribourg", "FR", 10,
"Genève", "GE", 25,
"Glarus", "GL", 8,
"Graubünden", "GR", 18,
"Jura", "JU", 26,
"Luzern", "LU", 3,
"Neuchâtel", "NE", 24,
"Nidwalden", "NW", 7,
"Obwalden", "OW", 6,
"St Gallen", "SG", 17,
"Schaffhausen", "SH", 14,
"Solothurn", "SO", 11,
"Schwyz", "SZ", 5,
"Thurgau", "TG", 20,
"Ticino", "TI", 21,
"Uri", "UR", 4,
"Vaud", "VD", 22,
"Valais", "VS", 23,
"Zug", "ZG", 9,
"Zürich", "ZH", 1,
)
data_bfs_ratio_annualized <- data_bfs_ratio %>%
mutate(year = str_extract(quarter, "^.{4}"),
year = as.numeric(year)) %>%
group_by(municipality, year) %>%
summarise(gender_ratio_annualized = mean(gender_ratio, na.rm = TRUE)) %>%
ungroup() %>%
mutate(gender_per = gender_ratio_annualized/100)
data_bfs_ratio_annualized %>%
inner_join(cantons, by = c("municipality" = "canton")) %>% # join table
ggplot(aes(x = year, y = gender_per, color = code)) +
geom_line() +
geom_line(data = filter(data_bfs_ratio_annualized, municipality == "Schweiz"),
color = "red2", linetype = "dashed", size = 1) +
scale_y_continuous(label = percent) +
annotate("text", x = 2016, y = 0.54, size = 3.5, label = "National gender ratio") +
geom_segment(aes(x = 2016, y = 0.551, xend = 2016, yend = 0.635),
color = "black", size = 0.2,
arrow = arrow(length = unit(0.2, "cm"))) +
labs(title = "Proportion of Men in Cross-Border Workforce",
subtitle = "Switzerland, 1995-2019",
color = "Canton",
x = "", y = "",
caption = "quarter annualized - Data source: BFS")
It looks like we have different gender ratio levels according to the Swiss canton. However, it is hard to see clearly and to categorise the cantons by group.
Let’s make a time serie clustering to get the categories. I will reuse some code of the excellent blogpost of Bruno Rodrigues to perform a time-series k-means clustering.
set.seed(1111)
# Only since 2007 as missing values before for some cantons
data_bfs_wide <- data_bfs_ratio_annualized %>%
inner_join(cantons, by = c("municipality" = "canton")) %>% # join table
filter(year > 2006) %>%
select(municipality, year, gender_ratio_annualized) %>%
pivot_wider(names_from = year, values_from = gender_ratio_annualized)
wss <- map_dbl(1:6, ~{kmeans(select(data_bfs_wide, -municipality), .)$tot.withinss})
elbow_df <- as.data.frame(cbind("n_clust" = 1:6, "wss" = wss))
ggplot(elbow_df) +
geom_line(aes(y = wss, x = n_clust))
The optimal number of categories seems to be four. Let’s cluster our times series in four different groups.
clusters <- kmeans(select(data_bfs_wide, -municipality), centers = 4)
gg_plot <- data_bfs_wide %>%
mutate(cluster = clusters$cluster) %>%
pivot_longer(cols = c(-municipality, -cluster),
names_to = "year",
values_to = "gender ratio") %>%
mutate(cluster = as.factor(cluster)) %>%
rename(canton = municipality) %>%
ggplot() +
geom_line(aes(y = `gender ratio`, x = year,
group = canton, colour = cluster),
show.legend = FALSE) +
facet_wrap(~cluster, nrow = 1) +
scale_color_brewer(palette = "Set2") +
scale_x_discrete(breaks = seq(2007, 2019, by = 3)) +
guides(color = FALSE) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Proportion of Men in Cross-Border Workforce in Swiss Cantons",
x = "")
plotly::ggplotly(gg_plot) %>%
hide_legend()
Put your mouse over the interactive plot above. You can discover the canton’s name and related gender ratio for each time serie of the four clusters. Note that we have only kept data from 2007 as we had missing values for the previous years in some cantons.
That’s all for the exploration of the cantonal level. What about the municipal level?
Let’s plot the gender ratio of cross-bording work by Swiss municipality for the last year available, i.e. 2019.
With the great RSwissMaps
package of David Zumbach, it is possible to create a map of Swizterland with only a few lines of code.
# the BFS id of all municipalities are inside the RSwissMaps
# the data inside RSwissMaps is taken from year 2016
bfs_id_mun <- RSwissMaps::mun.template(year = 2016)
data_bfs_2018 <- data_bfs_ratio_annualized %>%
left_join(bfs_id_mun, by = c("municipality" = "name")) %>%
filter(year == 2019)
mun.plot(data_bfs_2018$bfs_nr,
data_bfs_2018$gender_per,
year = 2016) +
scale_fill_viridis_c(labels = percent, direction = -1) +
theme(legend.position = "right") +
labs(title = "Proportion of Men in Cross-Border Workforce in Switzerland",
subtitle = "More women in green-yellow, 2019",
fill = "",
caption = "Quarterly annualized - Data Source: BFS")
Data shows that central Swiss municipalites also have some cross-bording workers. If cross-bording work is mainly done by men, a few Swiss municipalities have more women working as cross-bording worker.
Let me know what you think about my new BFS
package and feel free to contribute or make a pull request here.
English and Italian have less datasets available.↩︎
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Luginbuhl (2019, Nov. 7). : Introducing my new R package {BFS}. Retrieved from felixluginbuhl.com/blog/posts/2019-11-07-swiss-data/
BibTeX citation
@misc{luginbuhl2019introducing, author = {Luginbuhl, Felix}, title = {: Introducing my new R package {BFS}}, url = {felixluginbuhl.com/blog/posts/2019-11-07-swiss-data/}, year = {2019} }