BTW if anyone wants training data from Project Gutenberg to do WHATEVER, there's a one click file you can download containing ALL the cleaned utf-8 .txt files at Project Gutenberg, updated weekly. No scraping necessary! https://gutenberg.org/cache/epub/feeds/txt-files.tar.zip (9.9GB)