New subject: poofing and tarking

18 Oct 2004

      craig said:
...
you poofing friggleschnitz!
mommy, mommy, craig is starting a flamewar!
he called me a poofing friggleschnitz, mommy!
i demand that he be banned!      ;+)

***

david said:
...
One and the same, yes.
great!          :+)

and you did a little bit of work on...
what was it?... plucker, right?     ;+)

if you are prepared to tackle the i.m.d.b.,
15,000 e-texts should be a piece of cake...
...
...much like bringing 5 friends into a video store, 
  and trying  to agree on one movie for everyone 
  to watch. Not going to happen ;)
i wish things were that inconsequential.
but the project gutenberg e-texts make up
the most important e-library, historically.

although even now it's starting to be dwarfed
by other efforts, it would be a proper tribute
to michael if it were to be well-maintained...
...
That being said, I'd be interested in seeing 
  a list of the  tools  people know of, or are 
  working on, or have worked with in the past, 
  that can be used to take a 7-bit ascii text 
  PG work, and convert it into other formats.
"convert" is a rather loose and 
unspecific word, wouldn't you say?    :+)

nonetheless, i'll cut to the chase...

the main problem with the e-texts is
their formatting is _so_ inconsistent.
so before you can do anything useful
with them, you must write routines
that can resolve their inconsistency.

the inconsistency is very maddening,
because it's so pointless.  although
some is understandable, considering
how many hands created the e-texts,
the sadder truth is that much of it
could have been prevented; however,
mr. newby and company simply fail 
to grasp the negative consequences 
of the inconsistency, and thus never
made it their priority to minimize it.

the good news is you _can_ write
routines that will fix the problem.
it is _not_ impossible, just thorny;
the biggest expenditure of time is a
quality-control check to make sure
that you knew every inconsistency.
their variety will amaze and astound.

subsequent conversion to any format
is straightforward once you have done
the job of resolving the inconsistency.

you don't even have to do that job,
if you don't want to, you can just
go to david moynihan at blackmask
and get his files, as he has edited
out almost all the inconsistency,
which is what then allowed him
to make a half-dozen versions of
most e-texts in the entire library.

if you're looking for explicit info,
ron burkey did a converter called
"gutenmark", and his website at
http://www.sandroid.org/gutenmark
does a good job of documenting the
inconsistency he faced on the way,
before he gave up the effort, saying:
...
the more perfect my 
  automated conversions became, 
  the farther (in my own mind) 
  I seemed to be from 
  having a perfect conversion.
i think that's a nice way of saying that
the more he learned about the e-texts,
the more he found out how bad they are,
from the standpoint of consistency...

there is also some basic information at:
palmdigitalmedia.com/dropbook/converting

but i'd guess that at this point in time,
moynihan will have the most expertise
about the problems you would be facing.
much of it might be inside his noggin,
but i do know he has a _lot_ of macros
that undoubtedly embed gobs of wisdom.
and, more to the point, david has shown,
incontrovertibly, that mass conversions
to a plethora of formats is fully possible.

recently, david even _offered_ his files
to project gutenberg, but -- as far as i
know -- his gift was spurned, for some
bizarre reason i'll never be able to grasp.

oh yeah, i've written some routines that
squash out most of the inconsistency, and 
there's a way you could pry 'em out of me
-- namely, if you got support for my z.m.l.
(zen markup language) built into plucker.
it's a simple rule-set; you could probably
have it up-and-running in a couple days...
backchannel me if you're interested.     :+)

once you've vanquished the inconsistency,
there are other concerns, which might or
might not be a problem to you, including:
1.  errors in the e-texts, lots of them.
2.  styling lost or converted to all-caps.
3.  information about images discarded.
4.  image filenames are often not unique.
5.  accents lost in many foreign e-texts.
6.  a confusing redundancy of some books.
7.  attacks levied if you reveal problems.

oh yeah, also make sure that you are always
working with the freshest e-texts available,
as i'm not sure if they make an announcement
whenever they make corrections to an e-text;
they just quietly substitute in the new file...

***

i would welcome you here, but i am
on my way out the door _very_ soon...     :+)

there are a handful of tarking naugshlocks here
_so_ unworthy of my help they made me decide
to decline to do any work for project gutenberg,
in spite of its great historical importance and 
my highest regard for the genius of michael hart.
i'm sure others, like you, will cover my absence,
while i will be happy grazing greener pastures...

at any rate, have a nice day...          ;+)

-bowerbird

re: poofing and tarking

Bowerbird＠aol.com

David A. Desrosiers

tags

participants (2)