Download a book using the url of a Wikisource content page into a
data frame. The Wikisource table of content page should link to all the
Wikisource pages constituting the book. The text in the Wikisource
pages is downloaded using the wikisource_page()
function.
wikisource_book(url, cleaned = TRUE)
url | A url of a Wikisource content page listing the pages constituting the book. |
---|---|
cleaned | A boolean variable for cleaning Wikisource pages. |
A five column tbl_df (a type of data frame; see tibble or dplyr packages) with one row for each line of the text or texts, with columns.
A character column
A character column with the title of the Wikisource summary page
Integer column with a number for the text from each Wikisource page downloaded
A character column with a two letter string refering the language of the text
A character column with the url of the Wikisource page of the text
The download could fail if the Wikisource paths listed into content page strongly differ from the url path of the content page.
if (FALSE) { # download Voltaire's "Candide" wikisource_book("https://en.wikisource.org/wiki/Candide") # download "Candide" in French and Spanish library(purrr) fr <- "https://fr.wikisource.org/wiki/Candide,_ou_l%E2%80%99Optimisme/Garnier_1877" es <- "https://es.wikisource.org/wiki/C%C3%A1ndido,_o_el_optimismo" books <- map_df(c(fr, es), wikisource_book) }