this week, we looked at the e-book file-formats
which were auto-generated for our sample book.

recall that this is _not_ just some "random" book.
it's a book that has had a very high profile in the
history of book-digitization for the entire planet.

it was also an extremely simple book to mark up,
as it's almost entirely paragraphs or headers, but
d.p. managed to whiff their chance outright, and
that hack-job has sat at p.g., unfixed, for 7 years.

***

it's quite easy to make vague accusations against
project gutenberg, alleging low-quality e-texts...

and people do. sometimes with sheer nonsense.
(some who _were_ on this list, some who still are.)

it's also extremely easy to _ignore_ vague claims.

especially when some of 'em are sheer nonsense.

so that's how the whitewashers treat vague claims,
_all_ the vague claims -- as nothing but nonsense.

it's much harder to clearly document your criticism.

first, it takes work; but also your audience must then
pay attention to be able to understand the problems.

i did the work. so... were you paying any attention?

i hope whitewashers too pay attention to my demo.
and realize what they have to do to prevent shoddy
work like this from being posted in the p.g. library,
which includes telling distributed proofreaders that
the postprocessors will need to step up their game.

i hope that happens. but i doubt that it will.

it's just too easy, once you start ignoring criticism,
to continue in the comfortable path of head in sand.

but if someone wants to send me a positive sign,
it'd sure be easy to correct the flaws in pg#16736.

(like it woulda been easy to do it right originally.)

***

i hear someone out there asking "couldn't marcello
improve his script, so it would make this file work?"

well, yeah, he could. and he probably will, now that
i've laid this embarrassing example in front of you...

but his converter ain't the only tool out there that'll
_expect_ headers to be tagged with a [h] header tag.

you do not have to accept the strictest definition of
a "paragraph" that someone (like lee) can dream up,
but you certainly can't tag a header as a "paragraph"
and still expect that anybody will take you seriously.

and, in general, it just won't work to try to overcome
gross stupidity at the tag level with "clever scripting".
as the saying goes, just when you think you obtained
a fool-proof system, the world devises a bigger fool.

so even if marcello plugs this one hole in the boat,
it won't matter if postprocessors punch a new one.
and if the postprocessors can do anything they want,
poking a hole in the boat is exactly what they will do.

***

and, if you want the bigger picture, well here it is...

lots of books now sit in the d.p. postprocessing queue.

lots and lots and lots and lots of them.

what i am showing here in this current thread is that
post-processing doesn't need to be all that difficult...
i'm basically boiling it down to a single python script.

the d.p. books as they come out of f2 are not as clean
as the text i am working with here. they have "notes"
in them, and pseudo-markup, and other artifacts, but
they aren't really _that_ far away from what i'm using,
if you only take the step of seeing that that's the case.
on the other hand, if you continue to see that job as
"difficult and time-consuming", it continues to be so.

at any rate, that's the bigger picture...

***

i find it interesting that nobody at all has weighed in
on the quality (or lack of it) inherent in the .html code
that's generated by the python script in the latest demo.

this crew regularly lambastes html that is generated by
ms-word, and calibre, and sigil, and everything else, but
not a peep about a converter they can actually influence...

it's as if they don't even _want_ a high-quality converter.

***

z.m.l. input file:
> http://zenmagiclove.com/grapes009.txt

python script to create .html output:
> http://zenmagiclove.com/cgi-bin/tday2011.py

and next week we go back to the python script and work
on installing routines for more-complicated structures...

remember how people used to insist that light-markup
was incapable of handling anything but simple books?
i wish i had a nickel for every time someone wrote that.

and when i asked them to show me a hundred books
that were "too complicated" for me to do, guess what?
nobody showed me 100. or even a dozen. or even one!

now, surely it must be the case that there's books that
are so complicated that my z.m.l. cannot handle them!
heck, i _know_ it's true. i can point to a dozen myself!
but i can't point to many more. and i don't think that
you can point to many more either. but just try... ok?

-bowerbird