let's get some tutorials from the xhtml crowd

the xhtml crowd needs to do some tutorial threads. you guys seem to have all the answers. so spill 'em! :+) *** i've been showing people to proceed from "raw" o.c.r. but evidently that's old-fashioned... nowadays we just wave our hands in the air, and out pops finished .html. i'm sure all of those poor people over at d.p. would be full of gratitude if you were only to elucidate for them. so teach us the incantations and spells, oh wise ones! (pointers to all those "existing tools" will be good too.) *** meanwhile, alex, what's the story with this script?
http://www-kenh.archive.org/download/artofbook00holm/artofbook00holm_abbyy.h... i don't need it myself, because i wrote an app for that. but i seem to remember someone was whining for it... -bowerbird

Bowerbird@aol.com wrote:
the xhtml crowd needs to do some tutorial threads.
you guys seem to have all the answers. so spill 'em! :+)
Please tell me you didn't infer all that from the discussion that ensued from Jim's thoughts on fixing paragraph separation on the Kindle?
i've been showing people to proceed from "raw" o.c.r.
but evidently that's old-fashioned... nowadays we just wave our hands in the air, and out pops finished .html.
Along with a straw man, apparently.
(pointers to all those "existing tools" will be good too.)
For me the tools are Vim, Perl, xsltproc, Calibre, kindlegen, git, xmllint and elbow grease. Thank goodness I'm not claiming to have produced a magic button that can produce nineteen formats before breakfast from an input language used by no one. And, for the record, I'd also welcome any further discussion of, or pointers to, tools, techniques and strategies for producing ebooks. As a PG newbie, I've found the "do what you want" attitude to be one of the biggest barriers to entry. I prefer instruction, guidance and examples of best practice. (Another barrier was my own dumb choice of a 700 page two volume work to start on. Ho hum.)

On Thu, 15 Dec 2011, Paul Flo Williams wrote:
And, for the record, I'd also welcome any further discussion of, or pointers to, tools, techniques and strategies for producing ebooks. As a PG newbie, I've found the "do what you want" attitude to be one of the biggest barriers to entry. I prefer instruction, guidance and examples of best practice. (Another barrier was my own dumb choice of a 700 page two volume work to start on. Ho hum.)
Yes, that has happened many times before. A new volunteer takes on a project that is too daunting, and then burns out before it is done. There is a learning curve involved, Previous attempts to provide instruction and guidance have tended to be large and sprawling, such as: http://www.gutenberg.org/wiki/Gutenberg:Volunteers%27_FAQ Unless you are very persistant, I'd suggest starting with something easier to manage. --Andr

Andrew Sly wrote:
On Thu, 15 Dec 2011, Paul Flo Williams wrote:
(Another barrier was my own dumb choice of a 700 page two volume work to start on. Ho hum.)
Yes, that has happened many times before. A new volunteer takes on a project that is too daunting, and then burns out before it is done. There is a learning curve involved,
Unless you are very persistant, I'd suggest starting with something easier to manage.
It's been a 15-month labour of love, but I'm nearly there now!

On Sun, 18 Dec 2011, Paul Flo Williams wrote:
Andrew Sly wrote:
Unless you are very persistant, I'd suggest starting with something easier to manage.
It's been a 15-month labour of love, but I'm nearly there now!
In that case, Congratulations! And please don't be afraid to ask for help or feedback on what you've done. Even though I've produced many texts for PG, I still like to get a fresh set of eyes to take a look sometimes before I submit a file, that may see something I've overlooked. --Andrew

On Thu, December 15, 2011 1:32 am, Paul Flo Williams wrote: [snip irrelevant context]
For me the tools are Vim, Perl, xsltproc, Calibre, kindlegen, git, xmllint and elbow grease. Thank goodness I'm not claiming to have produced a magic button that can produce nineteen formats before breakfast from an input language used by no one.
And, for the record, I'd also welcome any further discussion of, or pointers to, tools, techniques and strategies for producing ebooks.
Judging by the list of tools you're using, I would venture to say you're a *nix guy. I don't know if my tool set will help you out, because for e-book creation I work almost exclusively on Windows (much to my shame). But this is what I use: Most important is ABBYY FineReader. Not only does FineReader do very good OCR, but the user interface has a side-by-side feature where a page image is displayed next to the recognized text. FineReader has a global search and replace function, a stemming dictionary, user defined dictionaries, and will highlight words that the OCR was "uncertain" about. I do all my spell checking in FineReader, and do not export the document until I have paged through the entire document checking the layout (sometimes FineReader gets confused about what is, and is not, a paragraph when you have a lot of really short paragraphs in a row). Once I have done all the proof-reading I can in FineReader, I export the document as simple HTML. The HTML produced by FineReader is class 2 tag soup (SGML), so my next step is to convert the FineReader output to XHTML. At the same time, it would be nice if I could guess at some of the structures in the book other than paragraphs, such as blockquotes and headers. FineReader can't seem to intuit these structures, but it does produce an inordinant number of <font> tags. If the original document had chapter titles in a font larger or different from the common font I figured that maybe those are headers. In the end I wrote a program that takes FineReader output, converts it to XHTML, and attempts to add some structure based on varying font sizes. It's not perfect and I'm sure it introduces errors, but it does seem that the absolute number of errors is reduced. I named the program fr2html.exe. If you want the 'C' code, I'd be happy to send it to you. Or, because it makes extensive use of DomCApi on sourceforge, I could add it as a sub project there. I then open the resultant HTML file in Microsoft Web Developer. This program has a split screen view so I can edit the HTML directly yet see the formatted output. It also does validation on the HTML as I work. I don't do degraded text. To get my work product into Project Gutenberg I post the result into some public repository on the web. Then I used to send e-mail to Michael Hart to the effect of "there it is; if you want to add it to Project Gutenberg, go get it." I don't know if he ever did, but that's not my problem. Now that Mr. Hart is gone, I don't know who I should notify to do an end-run around the whitewashers.

On Thu, December 15, 2011 12:55 am, Bowerbird@aol.com wrote:
the xhtml crowd needs to do some tutorial threads.
you guys seem to have all the answers. so spill 'em! :+)
http://www.hwg.org/opcenter/gutenberg/tutorials.html (circa 2000) http://www.passkeysoft.com/HTMLeBooks.html (circa 2002) http://gnpnet.com/free--html--tutorial--ebook--learn--dreamweaver--frontpage... http://www.unrulyguides.com/2011/10/formatting-101-basics/ http://www.unrulyguides.com/2011/07/html-tutorial-ebook-template-for-kindle-... http://guidohenkel.com/2010/12/take-pride-in-your-ebook-formatting/ http://www.paulsalvette.com/2011/08/xhtml-tutorial-ebook-formatting.html http://www.paulsalvette.com/2011/08/turning-manuscript-into-clean-xhtml.html http://www.jedisaber.com/eBooks/formatsource.shtml http://www.expertrating.com/courseware/HTMLCourse/HTML_tutorial.asp
participants (4)
-
Andrew Sly
-
Bowerbird@aol.com
-
Lee Passey
-
Paul Flo Williams