a case of deliberate sabotage by a p.g. volunteer

jim, i just took a look at "wings of the dove" -- pg#29452, which you post-processed -- and i'm troubled by what i discovered there. ok, maybe "troubled" is a bit melodramatic, much like the subject-line on this post, but i don't really think it's all _that_ farfetched... what i found is that, in the .html version of the book, you showed the italics properly... good job. in the .txt version, however, you deliberately deleted the italics markers which formatters had invested considerable work in inserting... bad job! the text version -- this was the 8-bit file -- was also missing a handful of diacritics in it:
Seen at a foreign table d'hôte, he suggested Brünig (several cases of this one) word--à bientôt!--across You're blasé, but you're not enlightened. wasn't it, à peu près, what all Matcham were inespérées, were pure manna
in some books, those missing diacritics would be a big issue. here, they're fairly uncommon, and thus do not really constitute a very big deal. but the missing italics? they are a major problem. and it's a problem that _you_ introduced yourself. you're supposed to use underscores for the italics. (go ahead, read the instructions, it says it clearly.) you're definitely _not_ supposed to remove them! and i must say, it takes a lot of gall for you to do this deliberate sabotage of the plain-text file and _then_ come here to complain because that file is inferior... of course it's inferior! you made it so! you went out of your way to make it substandard. if i would've done that, i'd be ashamed of myself. i'm also disturbed that the whitewashers allowed this intentionally-disfigured file into the library... but that's another matter, a fight for another day. -bowerbird

I assure you Bowerbird, that contrary to your comments I did not "deliberately" disfigure the text file, and I would appreciate it if you retract your comments. In any case the "formatters" you refer to would be myself. An army of one. Also, I do not ever rewrap books of my own volition but only as required in order to be accepted for submission by PG. What you see posted by PG is not necessarily the same thing as I would choose to submit to PG, [nor identically that which I did in fact submit to PG] which in my case would probably at this point in time be an HTML, although I can imagine at some point in time with good tools TEI might be more interesting to me. If you are unhappy with HTML as an input submission format then I recommend writing a simple parser for HTML that changes the HTML choice of tags to the tags you prefer. If you wrote such a parser I suspect you could contribute it to PG where it would represent a positive contribution to the many volunteers like myself who would prefer to be submitting in HTML format in the first place. In practice HTML encodes most of what I as a volunteer would choose to spend my time and energy transcribing, but I wish it had a little more power, such as the ability to unambiguously encode authorfirstname, authorlastname, chapter divisions, etc. What I do do for PG represents considerable sacrifice to myself and my family, as I am sure my wife and children would be only too happy to attest. If you think you have something positive to contribute to PG, please do so. Abusing me for my choice of which sacrifices I am willing to make, or not willing to make, does not represent a contribution to PG, nor does it encourage my continuing contributions to PG. The EPUB was not generated by me nor do I have any great knowledge of the EPUB format. I assume that some other volunteer at PG has written a tool to automatically generate EPUB from HTML and that volunteer did so with some choice of margins you do not prefer, or which doesn't work well with your choice of machine. I don't know how to fix this problem, but it does point out the advantages of TEI which allows the encoding in one document the various "hints" necessary for attractive rendering of the one TEI input file into various output rendering language targets. I also did not generate the MOBI, but I use MOBI files all the time with my favorite reader machine. The MOBI that some volunteer at PG, not me, has generated, looks beautiful on my choice of machine, which also allows me to change the size of the font and the margins to my liking, which tends to depend on the time of day - by midnight my eyes get tired and then I tend to like a larger font and smaller margins. Which is why I like reflow formats and reader machines - they allow me to easily "fix" many of the day-to-day "poor choices" that some one else has made which would otherwise get in the way of MY being able to enjoy the book the way *I* want. Presumably this other volunteer DID generate the MOBI file in a way that looked attractive to him or her on his or her choice of machines, which needn't be identical to my preferences - especially since my preferences tend to change with the time of day! My machine also works well with PDF files except I can't fix issues like when the person or process generating the PDF uses a "poor" choice of font, or poor choice of margins when read on my machine. I can sometimes work around these problems by holding my machine in landscape mode, and displaying only half a page of PDF at a time, but it tends to be awkward and painful to hold the machine sideways for a length of time, and PDF often doesn't like to be read a half a page at a time - since it is a page layout language, not a half page layout language. Which is why I tend to prefer reflow formats like MOBI or HTML over PDF. However, at the very least the acidity of Bowerbirds remarks reaffirms my contention that PG needs to allow volunteers like myself to submit files in the volunteer's choice of file formats, NOT Bowerbirds. In which case I could have offered PG my efforts in one file format, and PG could have chosen to accept or reject that offering. If PG chose to accept that offering then hopefully neither Bowerbird nor any other volunteer would abuse me of my efforts which PG has then already acknowledged. Rather, that volunteer would (hopefully) acknowledge that PG had already accepted my contribution, and in turn if they felt they could make further positive contributions to this book, or any other book, in that file format or in any other file format, then they would be free to do so. Unfortunately, there is not a universal sense within the PG community as to what does or does not represent a positive contribution, which in turn leads to that unhappy state of affairs to which Bowerbird is only too aptly demonstrating today. Again I ask consideration that PG seriously consider allowing volunteers to be able to submit books using only ONE file format if they choose to do so, not requiring multiple file formats since that leads to that unhappy state of affairs that Bowerbird is today only too well demonstrating. Better yet, pick YOUR OWN book to transcribe and contribute to PG, rather than abusing ME of MY efforts on MY choice of books!

On Mon, Sep 21, 2009 at 2:47 PM, James Adcock <jimad@msn.com> wrote:
If you think you have something positive to contribute to PG, please do so. Abusing me for my choice of which sacrifices I am willing to make, or not willing to make, does not represent a contribution to PG, nor does it encourage my continuing contributions to PG.
Which is why I've killfiled Bowerbird, and I believe that PG should permanently eject him for their mailing lists.
However, at the very least the acidity of Bowerbirds remarks reaffirms my contention that PG needs to allow volunteers like myself to submit files in the volunteer’s choice of file formats, NOT Bowerbirds.
That's not a rational argument. Whatever the base file formats are, Project Gutenberg, like most archives, needs to pick one or a small set of them so that the people who use Project Gutenberg can know what they need to read the files. A PG text reader can't be demanded to understand any file that anyone cares to use, and nobody can be expected to understand Word 95 files, and similar garbage that infects indiscriminate archives. -- Kie ekzistas vivo, ekzistas espero.

Whatever the base file formats are, Project Gutenberg, like most archives, needs to pick one or a small set of them so that the people who use Project Gutenberg can know what they need to read the files.
Again, this is confusing input file formats with output file formats. PG could choose to allow HTML as an acceptable input file format because PG can easily write a tool to convert HTML to their choice of PG TXT file format, including standardizing on such issues as whether italics ought to be rendered in PG TXT files as *star* or +plus+ or _underscore_ or SHOUT or better yet maybe PG could allow these kinds of choices to be made by an output filter so that text readers for the blind could have something more compatible with their prosodic emphasis machines, or better yet maybe the output filters could actually implement some of the "proper" prosodic emphasis markings for the more popular blind reader machines in order to maximize their capabilities. In my experience what happens is just the opposite of what you might expect -- rather the first time user of PG picks up a PG TXT file because they think that represents the "lowest common denominator" for their machine and so they think "it must surely work" and what they find instead is that what gets displayed on their machine is a total hash of line breaks in non-sensible locations, and random garbage marks, and then they conclude PG is archaic brain dead stuff by people who are clueless and they give up and go away. Or alternatively they post stupid stuff on public forums like "gee I like all these free books from PG and I read them all the time even though they have these random line-breaks stuck in all over the place" -- which in turn makes the efforts of the PG volunteers look like clueless idiots. There are other sites which take PG texts and do intelligent things like "tell me what kind of machine you are reading on and I will suggest which of the many file formats will probably display to your liking on your machine" which I think in practice tends to result in happier customers. Right now PG is still basically assuming that the average PG "customer" is a die-hard hacker running some flavor of a *nix machine in a college environment. Which is probably [somewhat] true of the people submitting books, but not at all true of the people who would just like to read them.
participants (3)
-
Bowerbird@aol.com
-
David Starner
-
James Adcock