Download a book using the url of a Wikisource content page into a data frame. The Wikisource table of content page should link to all the Wikisource pages constituting the book. The text in the Wikisource pages is downloaded using the wikisource_page() function.

wikisource_book(url, cleaned = TRUE)

Arguments

url

A url of a Wikisource content page listing the pages constituting the book.

cleaned

A boolean variable for cleaning Wikisource pages.

Value

A five column tbl_df (a type of data frame; see tibble or dplyr packages) with one row for each line of the text or texts, with columns.

text

A character column

title

A character column with the title of the Wikisource summary page

page

Integer column with a number for the text from each Wikisource page downloaded

language

A character column with a two letter string refering the language of the text

url

A character column with the url of the Wikisource page of the text

Details

The download could fail if the Wikisource paths listed into content page strongly differ from the url path of the content page.

Examples

if (FALSE) { # download Voltaire's "Candide" wikisource_book("https://en.wikisource.org/wiki/Candide") # download "Candide" in French and Spanish library(purrr) fr <- "https://fr.wikisource.org/wiki/Candide,_ou_l%E2%80%99Optimisme/Garnier_1877" es <- "https://es.wikisource.org/wiki/C%C3%A1ndido,_o_el_optimismo" books <- map_df(c(fr, es), wikisource_book) }