
craig said:
you poofing friggleschnitz!
mommy, mommy, craig is starting a flamewar! he called me a poofing friggleschnitz, mommy! i demand that he be banned! ;+) *** david said:
One and the same, yes.
great! :+) and you did a little bit of work on... what was it?... plucker, right? ;+) if you are prepared to tackle the i.m.d.b., 15,000 e-texts should be a piece of cake...
...much like bringing 5 friends into a video store, and trying to agree on one movie for everyone to watch. Not going to happen ;)
i wish things were that inconsequential. but the project gutenberg e-texts make up the most important e-library, historically. although even now it's starting to be dwarfed by other efforts, it would be a proper tribute to michael if it were to be well-maintained...
That being said, I'd be interested in seeing a list of the tools people know of, or are working on, or have worked with in the past, that can be used to take a 7-bit ascii text PG work, and convert it into other formats.
"convert" is a rather loose and unspecific word, wouldn't you say? :+) nonetheless, i'll cut to the chase... the main problem with the e-texts is their formatting is _so_ inconsistent. so before you can do anything useful with them, you must write routines that can resolve their inconsistency. the inconsistency is very maddening, because it's so pointless. although some is understandable, considering how many hands created the e-texts, the sadder truth is that much of it could have been prevented; however, mr. newby and company simply fail to grasp the negative consequences of the inconsistency, and thus never made it their priority to minimize it. the good news is you _can_ write routines that will fix the problem. it is _not_ impossible, just thorny; the biggest expenditure of time is a quality-control check to make sure that you knew every inconsistency. their variety will amaze and astound. subsequent conversion to any format is straightforward once you have done the job of resolving the inconsistency. you don't even have to do that job, if you don't want to, you can just go to david moynihan at blackmask and get his files, as he has edited out almost all the inconsistency, which is what then allowed him to make a half-dozen versions of most e-texts in the entire library. if you're looking for explicit info, ron burkey did a converter called "gutenmark", and his website at http://www.sandroid.org/gutenmark does a good job of documenting the inconsistency he faced on the way, before he gave up the effort, saying:
the more perfect my automated conversions became, the farther (in my own mind) I seemed to be from having a perfect conversion.
i think that's a nice way of saying that the more he learned about the e-texts, the more he found out how bad they are, from the standpoint of consistency... there is also some basic information at: palmdigitalmedia.com/dropbook/converting but i'd guess that at this point in time, moynihan will have the most expertise about the problems you would be facing. much of it might be inside his noggin, but i do know he has a _lot_ of macros that undoubtedly embed gobs of wisdom. and, more to the point, david has shown, incontrovertibly, that mass conversions to a plethora of formats is fully possible. recently, david even _offered_ his files to project gutenberg, but -- as far as i know -- his gift was spurned, for some bizarre reason i'll never be able to grasp. oh yeah, i've written some routines that squash out most of the inconsistency, and there's a way you could pry 'em out of me -- namely, if you got support for my z.m.l. (zen markup language) built into plucker. it's a simple rule-set; you could probably have it up-and-running in a couple days... backchannel me if you're interested. :+) once you've vanquished the inconsistency, there are other concerns, which might or might not be a problem to you, including: 1. errors in the e-texts, lots of them. 2. styling lost or converted to all-caps. 3. information about images discarded. 4. image filenames are often not unique. 5. accents lost in many foreign e-texts. 6. a confusing redundancy of some books. 7. attacks levied if you reveal problems. oh yeah, also make sure that you are always working with the freshest e-texts available, as i'm not sure if they make an announcement whenever they make corrections to an e-text; they just quietly substitute in the new file... *** i would welcome you here, but i am on my way out the door _very_ soon... :+) there are a handful of tarking naugshlocks here _so_ unworthy of my help they made me decide to decline to do any work for project gutenberg, in spite of its great historical importance and my highest regard for the genius of michael hart. i'm sure others, like you, will cover my absence, while i will be happy grazing greener pastures... at any rate, have a nice day... ;+) -bowerbird