Search and download data from the Swiss Federal Statistical Office
The BFS
package allows to search and download public data from the Swiss Federal Statistical Office (BFS stands for Bundesamt für Statistik in German) API in a dynamic and reproducible way.
Installation
install.packages("BFS")
You can also install the development version from Github.
devtools::install_github("lgnbhl/BFS")
Usage
Get the data catalog
Retrieve the list of publicly available datasets from the data catalog in any language (“de”, “fr”, “it” or “en”) by calling bfs_get_catalog_data()
.
catalog_data_en <- bfs_get_catalog_data(language = "en")
catalog_data_en
## # A tibble: 180 × 7
## title language publication_date url_bfs url_px guid catalog_date
## <chr> <chr> <dttm> <chr> <chr> <chr> <dttm>
## 1 Provis… en 2023-04-04 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 2 Perman… en 2022-10-06 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 3 Privat… en 2022-10-06 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 4 Deaths… en 2022-09-26 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 5 Divorc… en 2022-09-26 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 6 Live b… en 2022-09-26 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 7 Marria… en 2022-09-26 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 8 Acquis… en 2022-08-25 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 9 Acquis… en 2022-08-25 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## 10 Demogr… en 2022-08-25 08:30:00 https:… https… bfsR… 2023-04-04 08:30:00
## # ℹ 170 more rows
English and Italian data catalogs offer a limited list of datasets. For the full list please get the French (“fr”) or German (“de”) data catalogs.
Download a dataset in any language
The function bfs_get_data()
allows you to download any dataset for the data catalog.
To download a specific dataset, you can either use the url_bfs
or the number_bfs
.
The url_bfs
argument refers to the offical webpage of a dataset. Find below an example getting the url_bfs
for a specific dataset using the R package “dplyr”.
library(dplyr)
url_bfs_uni_students <- catalog_data_en |>
dplyr::filter(title == "University students by year, ISCED field, sex and level of study") |>
dplyr::pull(url_bfs)
url_bfs_uni_students
## [1] "https://www.bfs.admin.ch/content/bfs/en/home/statistiken/kataloge-datenbanken/daten.assetdetail.24367729.html"
Then you can download the full dataset using url_bfs
.
# https://www.bfs.admin.ch/content/bfs/en/home/statistiken/kataloge-datenbanken/daten.assetdetail.16324907.html
df_uni <- bfs_get_data(url_bfs = url_bfs_uni_students, language = "en")
## Downloading large query (in 4 batches):
## | | | 0% | |================== | 25% | |=================================== | 50% | |==================================================== | 75% | |======================================================================| 100%
df_uni
## # A tibble: 18,060 × 5
## Year `ISCED Field` Sex `Level of study` `University students`
## <chr> <chr> <chr> <chr> <dbl>
## 1 1980/81 Education science Male First university degr… 545
## 2 1980/81 Education science Male Bachelor 0
## 3 1980/81 Education science Male Master 0
## 4 1980/81 Education science Male Doctorate 93
## 5 1980/81 Education science Male Further education, ad… 13
## 6 1980/81 Education science Female First university degr… 946
## 7 1980/81 Education science Female Bachelor 0
## 8 1980/81 Education science Female Master 0
## 9 1980/81 Education science Female Doctorate 70
## 10 1980/81 Education science Female Further education, ad… 52
## # ℹ 18,050 more rows
It is recommended to privilege the use of the number_bfs
argument for stability and reproducibility.
You can manually find the number_bfs
by opening the official webpage and looking for the “FSO number”, as showed below.
# open Uni students dataset webpage
browseURL(url_bfs_uni_students)
Then you can download the dataset using number_bfs
.
bfs_get_data(number_bfs = "px-x-1502040100_131", language = "en")
You can also access additional information about the dataset by running bfs_get_data_comments()
.
bfs_get_data_comments(number_bfs = "px-x-1502040100_131", language = "en")
## Downloading large query (in 4 batches):
## | | | 0% | |================== | 25% | |=================================== | 50% | |==================================================== | 75% | |======================================================================| 100%
## # A tibble: 1 × 4
## row_no col_no comment_type comment
## <int> <int> <chr> <chr>
## 1 NA 4 column_comment "To ensure that the presentations from cubes con…
Query specific elements
You may get an error message if the query is too large.
:
Error
Too large query. 24030 and the maximum number of values
The smallest batch size is 5000. that can be downloaded through the API is
You may also have an error if the API calls too many requests.
in pxweb_advanced_get(url = url, query = query, verbose = verbose) :
Error Requests (RFC 6585) (HTTP 429). Too Many
One solution is to query only specific elements of the dataset to download less data. Here an example.
First you want to get the variable names, i.e. code
, and categories, i.e. values
, of your dataset.
# choose a BFS number and language
metadata <- bfs_get_metadata(number_bfs = "px-x-1502040100_131", language = "en")
str(metadata)
## tibble [4 × 7] (S3: tbl_df/tbl/data.frame)
## $ code : chr [1:4] "Jahr" "ISCED Fach" "Geschlecht" "Studienstufe"
## $ text : chr [1:4] "Year" "ISCED Field" "Sex" "Level of study"
## $ values :List of 4
## ..$ : chr [1:43] "0" "1" "2" "3" ...
## ..$ : chr [1:42] "0" "1" "2" "3" ...
## ..$ : chr [1:2] "0" "1"
## ..$ : chr [1:5] "0" "1" "2" "3" ...
## $ valueTexts :List of 4
## ..$ : chr [1:43] "1980/81" "1981/82" "1982/83" "1983/84" ...
## ..$ : chr [1:42] "Education science" "Teacher training without subject specialisation" "Teacher training with subject specialisation" "Fine arts" ...
## ..$ : chr [1:2] "Male" "Female"
## ..$ : chr [1:5] "First university degree or diploma" "Bachelor" "Master" "Doctorate" ...
## $ time : logi [1:4] TRUE NA NA NA
## $ elimination: logi [1:4] NA TRUE TRUE TRUE
## $ title : chr [1:4] "University students by Year, ISCED Field, Sex and Level of study" "University students by Year, ISCED Field, Sex and Level of study" "University students by Year, ISCED Field, Sex and Level of study" "University students by Year, ISCED Field, Sex and Level of study"
Then you can manually select the dimensions of the dataset you want to query.
# Manually create BFS query dimensions
# Use `code` and `values` elements
# Use "*" to select all
dimensions <- list(
"Jahr" = c("40", "41"),
"ISCED Fach" = c("0"),
"Geschlecht" = c("*"),
"Studienstufe" = c("2", "3"))
# Query BFS data with specific dimensions
BFS::bfs_get_data(
number_bfs = "px-x-1502040100_131",
language = "en",
query = dimensions
)
## # A tibble: 8 × 5
## Year `ISCED Field` Sex `Level of study` `University students`
## <chr> <chr> <chr> <chr> <dbl>
## 1 2020/21 Education science Male Master 151
## 2 2020/21 Education science Male Doctorate 121
## 3 2020/21 Education science Female Master 555
## 4 2020/21 Education science Female Doctorate 306
## 5 2021/22 Education science Male Master 143
## 6 2021/22 Education science Male Doctorate 115
## 7 2021/22 Education science Female Master 599
## 8 2021/22 Education science Female Doctorate 318
Catalog of tables
A lot of tables are not accessible through the official API, but they are still present in the official BFS website. You can search for specific tables using bfs_get_catalog_tables()
.
catalog_tables_en_students <- bfs_get_catalog_tables(language = "en", title = "students")
catalog_tables_en_students
## # A tibble: 5 × 7
## title language publication_date url_bfs url_table guid catalog_date
## <chr> <chr> <dttm> <chr> <chr> <chr> <dttm>
## 1 Stud… en 2023-04-05 00:00:00 https:… https://… bfsR… 2023-04-05 00:00:00
## 2 Stud… en 2023-04-05 00:00:00 https:… https://… bfsR… 2023-04-05 00:00:00
## 3 Stud… en 2023-03-28 08:30:00 https:… https://… bfsR… 2023-04-05 00:00:00
## 4 Stud… en 2023-03-28 08:30:00 https:… https://… bfsR… 2023-04-05 00:00:00
## 5 Stud… en 2023-03-28 08:30:00 https:… https://… bfsR… 2023-04-05 00:00:00
Most of the BFS tables are Excel or CSV files. For example, you can use the “openxlsx” R package to read a specific Excel table using the url_table
column.
library(dplyr)
library(openxlsx)
tables_bfs_uni_students <- catalog_tables_en_students |>
dplyr::slice(3) |>
dplyr::pull(url_table)
df_table <- tryCatch(expr = openxlsx::read.xlsx(tables_bfs_uni_students, startRow = 1),
error = function(e) "Failed reading table") |>
dplyr::as_tibble()
df_table
## # A tibble: 6 × 2
## Students.at.Universities.and.Institutes.of.Technology.(UIT).2022/23:.S…¹ X2
## <chr> <chr>
## 1 "Definitions " Defi…
## 2 "Tab 1" Entr…
## 3 "Tab 2" Stud…
## 4 "Tab 3" Stud…
## 5 "Tab 4" Stud…
## 6 "Tab 5" Fore…
## # ℹ abbreviated name:
## # ¹`Students.at.Universities.and.Institutes.of.Technology.(UIT).2022/23:.Standard.Tables`
Main dependencies of the package
Under the hood, this package is using the pxweb package to query the Swiss Federal Statistical Office PXWEB API. PXWEB is an API structure developed by Statistics Sweden and other national statistical institutions (NSI) to disseminate public statistics in a structured way.
You can clean the column names of the datasets automatically using janitor::clean_names()
by adding the argument clean_names = TRUE
in the bfs_get_data()
function.
Other information
This package is in no way officially related to or endorsed by the Swiss Federal Statistical Office (BFS).
Contribute
Any contribution is strongly appreciated. Feel free to report a bug, ask any question or make a pull request for any remaining issue.