Introducing My Second R Package, {bfsdata}

The {bfsdata} package makes the data from the Swiss Federal Statistical Office (or BFS for “Bundesamt für Statistik”) easily accessible to R users. It lets you search, download and read BFS datasets directly from the R console.

How to use {bfsdata}

The {bfsdata} package consists of three functions:

Imagine you want to make an exploratory analysis about students in Swiss universities. You could search if a dataset title of the BFS database contains the word “student” by using the bfs_search function.

devtools::install_github("lgnbhl/bfsdata")
library(bfsdata)
bfs_search("student", langage = "en")
# or alternatively: bfs_search("student", "", "en") 
## [1] University students by year, ISCED fields, gender and level of studies                                               
## [2] University students by year, ISCED fields, nationality and level of studies                                          
## [3] University of applied sciences and teacher education students by year, ISCED fields, gender and level of studies     
## [4] University of applied sciences and teacher education students by year, ISCED fields, nationality and level of studies

You found that four English dataset titles contain the word “student”.

Note that it could also have been done on the BFS online database with the same result, as shown in this screenshot:

When using bfs_search, you will have in your Global Environment a CSV file called bfsMetadata. It is the Excel file from the BFS website, accessible through the “list of the cubes” link, read with the {readxl} package. You will also get a CSV dataset which contains the result of your search, named bfsMetadataSubset.

Let’s have a look at it.

str(bfsMetadataSubset)
## Classes 'tbl_df', 'tbl' and 'data.frame':    4 obs. of  5 variables:
##  $ Title              : chr  "University students by year, ISCED fields, gender and level of studies" "University students by year, ISCED fields, nationality and level of studies" "University of applied sciences and teacher education students by year, ISCED fields, gender and level of studies" "University of applied sciences and teacher education students by year, ISCED fields, nationality and level of studies"
##  $ Timespan           : chr  "1980-2016" "1990-2016" "1997-2016" "1997-2016"
##  $ Last Update        : chr  "30.03.2017" "30.03.2017" "30.03.2017" "30.03.2017"
##  $ Link               : chr  "px-x-1502040100_131" "px-x-1502040100_132" "px-x-1502040400_161" "px-x-1502040400_162"
##  $ Languages available: chr  "de, fr, it, en" "de, fr, it, en" "de, fr, it, en" "de, fr, it, en"

The first dataset of our search seems interesting. We can download it (by typing row = 1, for the first result of the bfs_search function) and give it an optional name (the dataset is named “bfsData” by default) with the bfs_download function:

bfs_download(row = 1, name = "bfsData_student") 
# or alternatively: bfs_download(1)

The bfs_download function downloads the BFS dataset, stored online in a PX format, and reads it in your R session with the {pxR} package. It also saves the dataset in both CSV and PX formats in the inst/extdata directory of the package.

Okay, let’s have a glimpse of it.

library(tidyverse)
glimpse(bfsData_student)
## Observations: 15,540
## Variables: 5
## $ Studienstufe <fctr> First university degree or diploma, Bachelor, Ma...
## $ Geschlecht   <fctr> Male, Male, Male, Male, Male, Female, Female, Fe...
## $ ISCED.Field  <fctr> Education science, Education science, Education ...
## $ Jahr         <fctr> 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, ...
## $ value        <dbl> 545, 0, 0, 93, 13, 946, 0, 0, 70, 52, 1380, 0, 0,...

Looks promising! We could use it to explore the relations between gender and academic fields, i.e. the ISCED.Field variable.

head(levels(bfsData_student$ISCED.Field)) # of 41 ISCED fields
## [1] "Education science"                              
## [2] "Teacher training without subject specialisation"
## [3] "Teacher training with subject specialisation"   
## [4] "Fine arts"                                      
## [5] "Music and performing arts"                      
## [6] "Religion and theology"

Note that you can have access to the metadata of your BFS dataset by reading directly the PX file in R, also saved in your Global Environnement while using the bfs_download function. Here are three examples of metadata taken from the bfsData_student_px object.

print(bfsData_student_px$TITLE.en.)
## $value
## [1] "University students by Year, ISCED Field, Gender and Level of study"
print(bfsData_student_px$CONTACT.en.)
## $value
## [1] "Section Educational Processes, e-mail  <a href=mailto:[email protected]>[email protected]</a>"
print(bfsData_student_px$DATABASE.en.)
## $value
## [1] "FSO - STAT-TAB / Federal Statistical Office, 2010 Neuchâtel / Switzerland / ©  Federal Statistical Office"

We can now make a function that plots the number of students each year since 1980, by gender and by ISCED field.

bfs_plot <- function(academicField) {
  library(tidyverse)
  library(lubridate)
  library(scales)
  # Make Jahr (year in German) a Date object
  bfsData_student$Jahr <- as.Date(paste0(bfsData_student$Jahr, "-01-01"))
  df <- bfsData_student %>%
    filter(ISCED.Field == academicField) %>%
    mutate(year = lubridate::ymd(Jahr))
  ggplot(data = df, aes(x = year, y = value, colour = Geschlecht, linetype = Studienstufe)) +
    geom_line() +
    scale_x_date(breaks = date_breaks("4 years"),
                 labels = date_format("%Y"),
                 # Hadley: https://github.com/tidyverse/ggplot2/issues/1090
                 limits = c(df$year[[2]], NA)) +
    scale_color_discrete(name = "Gender") +
    scale_linetype_discrete(name = "Level of studies") +
    labs(x = "", y = "Number of student",
        title = "Student Gender Gap in Swiss Universities",
        subtitle = paste0("Number of student in ", academicField, ", by gender and level of studies"),
        caption = "Author: Félix Luginbühl (@lgnbhl); Data source: BFS") +
    theme_light() +
    theme(plot.title = element_text(size = 16, face = "bold"),
          plot.caption = element_text(size = 9, color = "darkgrey"))
  }

Okay, let’s try this bfs_plot function with some academic fields.

bfs_plot("Management and administration")

bfs_plot("Medicine")

bfs_plot("Sociology and cultural studies")

bfs_plot("Political sciences and civics")