
this thread explains how to digitize a book quickly and easily. search archive.org for "booksculture00mabiuoft" for the book. lesson #21! we've finally become an _adult_... good for us... *** ok, at last the cleaning is done, so now for some fun... here's the new code for this lesson:
http://zenmarkuplanguage.com/grapes121.py http://zenmarkuplanguage.com/grapes121.txt
so let's review that script and its intent, ok? we have an edited version of the book, containing its runheads and pagenumbers, but we'll want to output a version of it _without_ runheads and pagenumbers. easy enough. it's done. (we've been creating this as an interim text for many of the checks we have done.) you will find this version in the top part of the output. that version still retains the original linebreaks and pagebreaks; how about a version without 'em? done. you see this version in the bottom part of the output. (we used .html to ignore the linebreaks in its source.) that's kind of primitive. no linked table-of-contents, and "tag abuse" in that we used no "blockquote" tags, or "header" tags, but it will do fine for the time being. in the next lesson, we'll do versions for .epub/.mobi. *** further, i will revisit the matter of cleaning the text, doing a debriefing on the experiment we just ran, and showing how to consolidate all the steps that we took. to this purpose, i might take on another book, so if you have any suggestions for what book you want me to do, please feel welcome to let me know, frontchannel or back. *** over at distributed proofreaders, they consider the "proofing" part to be "easy"... it is _post-processing_ where they have their big logjam. which is ridiculous, as we have just demonstrated. once the text is clean, creating e-books should be a matter of a button-click. given the (poor) quality of thought typical on this list, somebody might respond with "that was an easy book", hoping you'll infer that button-click generation would be "impossible", or at least "difficult", for books which are more complex in nature. i tell you that's bullshit, and if you want me to write the code to prove it, i will. because i already have... in more than one language... *** have a nice weekend... :+) -bowerbird