BFS R package

Search and download data from the Federal Statistical Office

Félix Luginbühl


  1. A use case example.
  2. Motivations behind the BFS package
  3. Exploring the data catalog with BFS
  4. All you need is bfs_get_data()

… and your questions

1. A use case example

2. Motivations behind the BFS package

Motivations behind the BFS package

  • leveraging the full R ecosystem: R Shiny, Rmarkdown, etc.
  • Data reproducibility and transparency
  • Speed: get BFS data with 1 line of code

3. Exploring the data catalog with BFS

Exploring the data catalog manually

STAT-TAB - Interactive tables

Exploring the data catalog with BFS


Exploring the data catalog with BFS

library(BFS) #install.packages("BFS")

# get the data catalogue
catalog_data_en <- bfs_get_catalog_data(language = "en")

# A tibble: 180 × 7
   title  language publication_date    url_bfs url_px guid  catalog_date       
   <chr>  <chr>    <dttm>              <chr>   <chr>  <chr> <dttm>             
 1 Provi… en       2023-04-04 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
 2 Perma… en       2022-10-06 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
 3 Priva… en       2022-10-06 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
 4 Death… en       2022-09-26 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
 5 Divor… en       2022-09-26 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
# ℹ 175 more rows

Exploring the data catalog with BFS

  • Choose a dataset using filter() from dplyr.
library(BFS) #install.packages("BFS")
library(dplyr) #install.packages("dplyr")

catalog_data_en <- bfs_get_catalog_data(language = "en")

# search for a recent dataset
catalog_data_uni <- catalog_data_en %>%
  filter(title == "University students by year, ISCED field, sex and level of study")

## # A tibble: 1 × 5
##   title                                           langu…¹ publi…² url_bfs url_px
##   <chr>                                           <chr>   <chr>   <chr>   <chr> 
## 1 University students by year, ISCED field, sex … en      Univer… https:… https…
## # … with abbreviated variable names ¹​language, ²​published

Exploring the data catalog with BFS

  • Get the BFS dataset with bfs_get_data().
library(BFS) #install.packages("BFS")
library(dplyr) #install.packages("dplyr")

catalog_data_en <- bfs_get_catalog_data(language = "en")

catalog_data_uni <- catalog_data_en %>%
  filter(title == "University students by year, ISCED field, sex and level of study")

# get the data
df_uni <- bfs_get_data(url_bfs = catalog_data_uni$url_bfs, language = "en")

## # A tibble: 17,640 × 5
##    Year    `ISCED Field`     Sex    `Level of study`                     Unive…¹
##    <chr>   <chr>             <chr>  <chr>                                  <dbl>
##  1 1980/81 Education science Male   First university degree or diploma       545
##  2 1980/81 Education science Male   Bachelor                                   0
##  3 1980/81 Education science Male   Master                                     0
##  4 1980/81 Education science Male   Doctorate                                 93
##  5 1980/81 Education science Male   Further education, advanced studies…      13
## # … with 17,630 more rows, and abbreviated variable name ¹​`University students`

Exploring the data catalog with BFS

  • get additional footnotes information
library(BFS) #install.packages("BFS")
library(dplyr) #install.packages("dplyr")

catalog_data_en <- bfs_get_catalog_data(language = "en")

catalog_data_uni <- catalog_data_en %>%
  filter(title == "University students by year, ISCED field, sex and level of study")

df_uni <- bfs_get_data(url_bfs = catalog_data_uni$url_bfs, language = "en")

# get data comments
comments <- bfs_get_data_comments(url_bfs = catalog_data_uni$url_bfs, language = "en")

[1] "To ensure that the presentations from cubes containing the 'level of studies' variable are easy to understand, all post-graduate studies are included under the heading 'Continuing and further education', which are additionally published under the following headings:\r\n- Continuing education\r\n- Specialised and further studies\r\n- Postgraduate studies (until 2004)\r\nFor the definition of other levels of study, please refer to the Definitions chapter.\r\n\r\n"

4. All you need is bfs_get_data()

All you need is bfs_get_data()

  • Using pxweb R package functions under the hood to query the Swiss Federal Statistical Office PXWEB API.1

All you need is bfs_get_data()

  • Better reproducibility (and stability) with number_bfs.
# open webpage

All you need is bfs_get_data()

  • Better reproducibility (and stability) with number_bfs.
  number_bfs = "px-x-1502040100_131", 
  language = "en"
# A tibble: 18,060 × 5
   Year    `ISCED Field`   Sex   `Level of study` `University students`
   <chr>   <chr>           <chr> <chr>                            <dbl>
 1 1980/81 Education scie… Male  First universit…                   545
 2 1980/81 Education scie… Male  Bachelor                             0
 3 1980/81 Education scie… Male  Master                               0
 4 1980/81 Education scie… Male  Doctorate                           93
 5 1980/81 Education scie… Male  Further educati…                    13
# ℹ 18,050 more rows

All you need is bfs_get_data()

  • Change the language
  number_bfs = "px-x-1502040100_131", 
  language = "de"
# A tibble: 18,060 × 5
   Jahr    `ISCED Fach`  Geschlecht Studienstufe Studierende an den u…¹
   <chr>   <chr>         <chr>      <chr>                         <dbl>
 1 1980/81 Erziehungswi… Mann       Lizenziat/D…                    545
 2 1980/81 Erziehungswi… Mann       Bachelor                          0
 3 1980/81 Erziehungswi… Mann       Master                            0
 4 1980/81 Erziehungswi… Mann       Doktorat                         93
 5 1980/81 Erziehungswi… Mann       Weiterbildu…                     13
# ℹ 18,050 more rows
# ℹ abbreviated name: ¹​`Studierende an den universitären Hochschulen`

All you need is bfs_get_data()

  • Clean names with janitor::clean_names() (snake case)
  number_bfs = "px-x-1502040100_131", 
  language = "de",
  clean_names = TRUE
# A tibble: 18,060 × 5
   jahr    isced_fach    geschlecht studienstufe studierende_an_den_u…¹
   <chr>   <chr>         <chr>      <chr>                         <dbl>
 1 1980/81 Erziehungswi… Mann       Lizenziat/D…                    545
 2 1980/81 Erziehungswi… Mann       Bachelor                          0
 3 1980/81 Erziehungswi… Mann       Master                            0
 4 1980/81 Erziehungswi… Mann       Doktorat                         93
 5 1980/81 Erziehungswi… Mann       Weiterbildu…                     13
# ℹ 18,050 more rows
# ℹ abbreviated name: ¹​studierende_an_den_universitaren_hochschulen

All you need is bfs_get_data()

  • Query specific categories with query.
  number_bfs = "px-x-1502040100_131", 
  language = "de",
  clean_names = TRUE,
  query = NULL

All you need is bfs_get_data()

  • Use bfs_get_metadata() to get query code and values categories.
metadata <- BFS::bfs_get_metadata(
  number_bfs = "px-x-1502040100_131", 
  language = "de"
tibble [4 × 6] (S3: tbl_df/tbl/data.frame)
 $ code       : chr [1:4] "Jahr" "ISCED Fach" "Geschlecht" "Studienstufe"
 $ text       : chr [1:4] "Jahr" "ISCED Fach" "Geschlecht" "Studienstufe"
 $ values     :List of 4
  ..$ : chr [1:43] "0" "1" "2" "3" ...
  ..$ : chr [1:42] "0" "1" "2" "3" ...
  ..$ : chr [1:2] "0" "1"
  ..$ : chr [1:5] "0" "1" "2" "3" ...
 $ valueTexts :List of 4
  ..$ : chr [1:43] "1980/81" "1981/82" "1982/83" "1983/84" ...
  ..$ : chr [1:42] "Erziehungswissenschaft" "Ausbildung von Lehrkräften ohne Fachspezialisierung" "Ausbildung von Lehrkräften mit Fachspezialisierung" "Bildende Kunst" ...
  ..$ : chr [1:2] "Mann" "Frau"

All you need is bfs_get_data()

  • Manually create BFS query dimensions.
  number_bfs = "px-x-1502040100_131",
  language = "en",
  query = list(
          "Jahr" = c("40", "41"),
          "ISCED Fach" = c("0"),
          "Geschlecht" = c("*"), # Use "*" to select all
          "Studienstufe" = c("2", "3")
# A tibble: 8 × 5
  Year    `ISCED Field`    Sex   `Level of study` `University students`
  <chr>   <chr>            <chr> <chr>                            <dbl>
1 2020/21 Education scien… Male  Master                             151
2 2020/21 Education scien… Male  Doctorate                          121
3 2020/21 Education scien… Fema… Master                             555
4 2020/21 Education scien… Fema… Doctorate                          306
5 2021/22 Education scien… Male  Master                             143

All you need is bfs_get_data()

  • Query the code variables and value types.
  number_bfs = "px-x-1502040100_131",
  language = "en",
  query = list(
          "Jahr" = c("40", "41"),
          "ISCED Fach" = c("0"),
          "Geschlecht" = c("*"), # Use "*" to select all
          "Studienstufe" = c("2", "3")
  column_name_type = "code", # "text" by default
  variable_value_type = "code") # "text" by default
# A tibble: 8 × 5
  Jahr  `ISCED Fach` Geschlecht Studienstufe `University students`
  <chr> <chr>        <chr>      <chr>                        <dbl>
1 40    0            0          2                              151
2 40    0            0          3                              121
3 40    0            1          2                              555
4 40    0            1          3                              306
5 41    0            0          2                              143

All you need is bfs_get_data()

  • Documentation:
  • Source code:
# open function documentation in R

5. Questions

Thank you for your attention!

  • BFS documentation:
  • Swiss City Statistics app:
  • LinkedIn: