
once again, someone over at distributed proofreaders has raised the issue of people selling hard-copy of p.g. e-texts at amazon.com... so a brand-new newcomer, "exilefromgroggs", posted this:
I would suggest an alternative strategy. Set PG up as an Amazon seller (yes, seller!). Charge $1/£1 per book for books that are bought that way, and point out that 100% of the money that PG receives will be ploughed back into the project to make further books available. In the seller's blurb for the book, point out that the books can, in fact, be downloaded for free from PG's website, and include a link. I would imagine that being up-front and open, and being clearly linked to a "good cause" would mean that PG would rapidly become a high-profile seller in its own right through Amazon, and would take the wind out of the sails of people who are trying to make a fast buck for no effort.
because, you understand, that's always the exact way that those d.p. volunteers picture these amazon resellers -- as "people who are trying to make a fast buck for no effort." i fully understand that most of these resellers put as little effort as possible into churning out the "product" they sell. but the truth of the matter is that it takes time and money to turn an e-text into a printed book, and even more work to handle distribution, and the profits aren't _that_ great... but nobody bothered to tell that to the brand new newcomer. because doing so would force the existing volunteers there to confront the fact that their workflow doesn't have a way to create a decent print product. the ascii text is unstyled. the .html product is formatted for a web-browser, not print. the e-book products are formless blobs that barely work in the machines for which they're intended, let alone for print. that's why the vast majority of the resellers start with the text-file, instead of the .html file. so all of the work that post-processors put into making the .html file "look good" is just wasted energy. every one of these html-books is a "snowflake" -- unique unto itself, per the post-processor -- and it's far too much work to try to figure out each one, so you could port it to print. or, for that matter, anything else, including e-books, which is why those turn out rather badly. with a library this big, there has to be some standardization, or else you'll be unable to keep the files updated over time... of course, it's _easy_ to say that now, with all the experience we have from having the books _not_ being _convertable_ to the formats we want. but some of us said it all _years_ ago, that letting each post-processor go off in their own direction was a sure-fire way to make sure their work'd be short-lived. -bowerbird p.s. there is also _another_ newcomer over there who wants to program an html5 web-app that interacts with the d.p. site, and nobody is bothering to tell him not to waste his time and energy, because even if he codes it up, the powers-that-be will ignore it. p.p.s. ...waving back to lucy... :+)

...that's why the vast majority of the resellers start with the text-file, instead of the .html file. so all of the work that post-processors put into making the .html file "look good" is just wasted energy.
Whether or not making the html file "look good" is wasted energy the html typically also contains a lot of glyph-related coding which is not correctly specified in the txt file, which is why the stuff sold by the vast majority of resellers ends up bearing little resemblance to what the author actually wrote, and/or the publisher actually printed. Also, the complaint I've seen before at DP is when some DP participant scarfs off a copy of the book prior to when DP sends it to PG, such that the DP volunteers' efforts are "stolen" before the book ever even sees the light of day. I would hope that more people would understand that once a book is posted on PG then it becomes "fair game" [regardless of how "moral" any particular person feels such republishing is at that point.]

On 10/11/2011 06:16 AM, Jim Adcock wrote:
Whether or not making the html file "look good" is wasted energy
It is not only wasted energy but actual damage. DP is run like a tea party and produces pretty embroidery, but that embroidery only fits the producer's tea table. There's very little knowledge at DP about computer technology or book design nor any desire to acquire it. (There are a few knowledgeable individuals, but they are drowned out by the crowd.) At the same time DP is obsessed about rank and procedure: They have `General Managers´ and `DP Boards´ and `Codes of Conduct´ and a bigger part of the software is concerned with ranking proofers than with assisting those proofers with their jobs. DP has imploded and there's no indication that they will or want to reform themselves. Time to start over. I'm interested in and willing to offer all technical support I'm capable of to a new DP built around these guidelines: 1. Use one master format for every book. (There will be a small set of master formats to choose from.) 2. Minimize formatting. Make books that are usable across a wide variety of devices, not books that look exactly like the paper edition. 3. Use a resource control system (like git) for posting and maintenance. PG will host the master repository and the public can pull from it. A group of `committers´ can push. Every committer can have his own group of aides and pull from them. 4. Use already scanned material: IA, Google, Gallica etc. 5. Important works first. Don't bother with those embarrassing amateurish works DP turns out by the hundreds. 6. Accept unicode only. -- Marcello Perathoner webmaster@gutenberg.org

Okay, so I'm curious... I think I understand where all the other points are coming from, but why 4.? Why would you want to stop people from scanning books that aren't already available online? And I'm also slightly curious about 5., I guess. Who would decide what's "important", and how? Jana On Oct 11, 2011, at 11:39, Marcello Perathoner wrote:
On 10/11/2011 06:16 AM, Jim Adcock wrote:
Whether or not making the html file "look good" is wasted energy
It is not only wasted energy but actual damage.
DP is run like a tea party and produces pretty embroidery, but that embroidery only fits the producer's tea table.
There's very little knowledge at DP about computer technology or book design nor any desire to acquire it. (There are a few knowledgeable individuals, but they are drowned out by the crowd.)
At the same time DP is obsessed about rank and procedure: They have `General Managers´ and `DP Boards´ and `Codes of Conduct´ and a bigger part of the software is concerned with ranking proofers than with assisting those proofers with their jobs.
DP has imploded and there's no indication that they will or want to reform themselves. Time to start over.
I'm interested in and willing to offer all technical support I'm capable of to a new DP built around these guidelines:
1. Use one master format for every book. (There will be a small set of master formats to choose from.)
2. Minimize formatting. Make books that are usable across a wide variety of devices, not books that look exactly like the paper edition.
3. Use a resource control system (like git) for posting and maintenance. PG will host the master repository and the public can pull from it. A group of `committers´ can push. Every committer can have his own group of aides and pull from them.
4. Use already scanned material: IA, Google, Gallica etc.
5. Important works first. Don't bother with those embarrassing amateurish works DP turns out by the hundreds.
6. Accept unicode only.

On 10/11/2011 02:36 PM, Jana Srna wrote:
I think I understand where all the other points are coming from, but why 4.? Why would you want to stop people from scanning books that aren't already available online?
Because it saves time, and you won't end up doing all those long forgotten (for good reason) books you can buy for 1 cent.
And I'm also slightly curious about 5., I guess. Who would decide what's "important", and how?
There are lots of lists you could consult. I've seen one Schnitzler coming out of DP lately but not much else. Schnitzler, Wittgenstein, Freud, Hofmannsthal are important. The campfire girls of mars, which we churn out by the hundreds, are not important. -- Marcello Perathoner webmaster@gutenberg.org

On 10/11/2011 9:16 AM, Marcello Perathoner wrote:
On 10/11/2011 02:36 PM, Jana Srna wrote:
I think I understand where all the other points are coming from, but why 4.? Why would you want to stop people from scanning books that aren't already available online?
Because it saves time, and you won't end up doing all those long forgotten (for good reason) books you can buy for 1 cent.
One thing I learned in all my time at DP is that there is always *someone* who finds even the most obscure books important. I've always felt that the "classics" will certainly be made available, if not by PG* then by various academic or profit-making organizations. What will be overlooked are the more obscure works. The things that only a few people will value but who will be extremely glad to have them available. Yet another school book on American History? The student doing comparison of how views of American history changed over time will be delighted to have a decent sized corpus to work with. Yet another silly children's series book? Well, the website called something like Not Quite Nancy Drew seems to value them highly and has folks who are definitely interested them. Obscure and detailed articles from the journal of the American Society of Civil Engineers about the train tunnels into and out of Manhattan? Those got a thank you letter from someone who had been searching for that material for years. Changing the subject, I believe that one of the reasons the DP-EU and DP-Canada sites have not (yet) had the success of the original DP has to do with the difficulty in building a community of volunteers. Getting a critical mass takes time (last I looked, DP-Canada was making nice progress) and some other ingredient that I don't understand. Volume comes with number of volunteers. DP has many faults, it is true, not least of which is slowness to change, but starting up a new site and expecting it to quickly achieve the volumes that DP currently does is unrealistic. On the other hand, there are so many books out there waiting to be turned into proper ebooks that having a dozen sites like DP would only benefit our (PG's) purpose of making lots of ebooks freely available. JulietS *It does get frustrating to make new, better proofed, better illustrated versions of classics only to have the PG website continue to list the older, more problematic versions more prominently.

On 10/12/11 3:01 AM, Juliet Sutherland wrote:
One thing I learned in all my time at DP is that there is always *someone* who finds even the most obscure books important. I've always felt that the "classics" will certainly be made available, if not by PG* then by various academic or profit-making organizations. What will be overlooked are the more obscure works. The things that only a few people will value but who will be extremely glad to have them available.
Hear, hear!
Changing the subject, I believe that one of the reasons the DP-EU and DP-Canada sites have not (yet) had the success of the original DP has to do with the difficulty in building a community of volunteers. Getting a critical mass takes time (last I looked, DP-Canada was making nice progress) and some other ingredient that I don't understand. Volume comes with number of volunteers. DP has many faults, it is true, not
A bunch of reasons, other than critical mass (which is obviously a factor), I can think of: - the last time I looked at DP-Canada and DP-EU their workflow was vastly different from that of DP. Which is not necessarily a bad thing, given the problematic workflow of DP. It is just that I find it hard to to remember what should be done in each stage of DP itself, let alone that I will remember the variations of Canada and EU. - in the case of PG-EU its sole reason for existence is the poor Unicode support at DP and DP-Canada. In terms of the size of the public domain, the 70+ countries have a public domain that can be described as a subset of the public domain of the USA and 50+ countries together. - unnecessary opaqueness, if these three variations of DP only would announce to their various communities what works are in progress at their sister sites, you'd see more cross-pollination. Regards, Walter

On 10/12/2011 03:01 AM, Juliet Sutherland wrote:
One thing I learned in all my time at DP is that there is always *someone* who finds even the most obscure books important. I've always felt that the "classics" will certainly be made available, if not by PG* then by various academic or profit-making organizations.
My vision is to have every gadget on this earth sold preloaded with a corpus of the world's finest literature, in all languages. I'm not interested in having all classics somewhere on the net, I want all classics in one place, done the same way, so they can be handled as a whole corpus. I want a corpus that every company can pick up so that every new tablet, every new phone, every new tv set, every new mp3 player comes with a lifetime of the finest reading preloaded. Every person on this earth has a right to carry all public domain in his pocket.
*It does get frustrating to make new, better proofed, better illustrated versions of classics only to have the PG website continue to list the older, more problematic versions more prominently.
The PG website lists the most downloaded books first. If people don't download your `better proofed, better illustrated´ version, then there is something wrong with it. The older versions are done in a much simpler way and thus work much better on those devices people actually read books on. Note that many users don't go thru the PG search facility but thru search engines and links posted everywhere on the net, so the way PG lists books is not that important for book performance. The lesson here is: the crazy formatting of DP acts like DRM. It prevents those books from working on many devices. And people only post links to books that actually worked for them. Of course no one at DP wants to hear that, as it is much easier to blame PG. -- Marcello Perathoner webmaster@gutenberg.org

On Wed, Oct 12, 2011 at 5:50 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
The PG website lists the most downloaded books first. If people don't download your `better proofed, better illustrated´ version, then there is something wrong with it.
So you're saying that people download both versions, compare them, and then redownload the better version?
The lesson here is: the crazy formatting of DP acts like DRM. It prevents those books from working on many devices. And people only post links to books that actually worked for them.
Or people post links to the books that existed when they created the link. Or they post a link to the first version that came up on the PG search.
Of course no one at DP wants to hear that, as it is much easier to blame PG.
Also because it's illogical and unsupported by evidence. -- Kie ekzistas vivo, ekzistas espero.

On 10/12/2011 07:23 PM, David Starner wrote:
On Wed, Oct 12, 2011 at 5:50 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
The PG website lists the most downloaded books first. If people don't download your `better proofed, better illustrated´ version, then there is something wrong with it.
So you're saying that people download both versions, compare them, and then redownload the better version?
Compare them and share a link to the version that worked.
The lesson here is: the crazy formatting of DP acts like DRM. It prevents those books from working on many devices. And people only post links to books that actually worked for them.
Or people post links to the books that existed when they created the link. Or they post a link to the first version that came up on the PG search.
There's nothing we can do about that or can we go and change other people's links? What do you propose here? Should PG stop offering the "most popular" category just because the most popular books were not produced by David Starner? Every new book gets a tremendous boost, being posted on Facebook, Twitter and the "recent" RSS feed at PG. Also there is a "recent" category that people can select while searching. If your new book can't overcome some older edition in spite of that big boost, then obviously there's something wrong with it. -- Marcello Perathoner webmaster@gutenberg.org

On Wed, Oct 12, 2011 at 2:19 PM, Marcello Perathoner <marcello@perathoner.de> wrote:
On 10/12/2011 07:23 PM, David Starner wrote:
So you're saying that people download both versions, compare them, and then redownload the better version?
Compare them and share a link to the version that worked.
I don't think there's evidence that most people do that.
What do you propose here?
That when there's multiple copies of a book, PG should list the new one first.
Should PG stop offering the "most popular" category just because the most popular books were not produced by David Starner?
You remember "Hand Shadows to Be Thrown upon the Wall"? Yeah baby, all mine. (Okay, with a little help from Heather Martino.) Which, I will note, shows that what people want is not necessarily the "Most Important" books.
Every new book gets a tremendous boost, being posted on Facebook, Twitter and the "recent" RSS feed at PG. Also there is a "recent" category that people can select while searching.
If your new book can't overcome some older edition in spite of that big boost, then obviously there's something wrong with it.
Obviously. There's no chance you're letting your biases dictate your interpretation of the evidence. -- Kie ekzistas vivo, ekzistas espero.

On 10/12/2011 08:48 PM, David Starner wrote:
Compare them and share a link to the version that worked.
I don't think there's evidence that most people do that.
What do you think they do instead? Share the link that didn't work?
What do you propose here?
That when there's multiple copies of a book, PG should list the new one first.
And what evidence do you offer to prove that the newer version is better? The user has always had the choice of sorting alphabetically, per popularity or per release date. Now you want to take one choice away from the user. Weren't you in favor of letting people always do what they wanted?
You remember "Hand Shadows to Be Thrown upon the Wall"? Yeah baby, all mine. (Okay, with a little help from Heather Martino.) Which, I will note, shows that what people want is not necessarily the "Most Important" books.
That one was popular only because a popular blog deep-linked into our servers. When I fixed the popularity algorithm to only count primary file downloads, not image downloads, it fell off the top 100 quite suddenly.
Obviously. There's no chance you're letting your biases dictate your interpretation of the evidence.
Maybe mine, maybe yours ... -- Marcello Perathoner webmaster@gutenberg.org

The PG website lists the most downloaded books first. If people don't download your `better proofed, better illustrated´ version, then there is something wrong with it.
It takes a pretty sophisticated user of PG, like probably someone who has actually created books for PG, to understand that a more recent version, and therefore one that has been downloaded LESS, is probably going to be the better quality effort for people to read. Why yes, there IS something "wrong" with that "better proofed, better illustrated" version: namely that PG continues to denigrate that new version and continues to "advertise" the old defective version. This is a self-defeating strategy: PG says it wants volunteers to rework the old and crufty versions -- but then makes sure that almost no customer actually reads these newly decrufted versions!

On 10/14/2011 12:35 AM, Jim Adcock wrote:
The PG website lists the most downloaded books first. If people don't download your `better proofed, better illustrated´ version, then there is something wrong with it.
It takes a pretty sophisticated user of PG, like probably someone who has actually created books for PG, to understand that a more recent version, and therefore one that has been downloaded LESS, is probably going to be the better quality effort for people to read.
It takes a user sophisticated enough to click on a link that says "sort by release date".
Why yes, there IS something "wrong" with that "better proofed, better illustrated" version: namely that PG continues to denigrate that new version and continues to "advertise" the old defective version. This is a self-defeating strategy: PG says it wants volunteers to rework the old and crufty versions -- but then makes sure that almost no customer actually reads these newly decrufted versions!
A new edition on PG has to face the same challenges as every product that enters an already saturated market. If your edition is better by a significant margin it will rise to the top eventually. The contention here is that often new editions are better only in the narcissistic POV of the producer. A few missed spelling errors in an old edition don't hurt anybody, but an ebook that doesn't display right on your device of choice because of misguided markup decisions hurts a lot. I am a sophisticated enough reader and I often download all editions of a book to many devices and *more times than not* the older ones look better on my Kindle or Nexus. -- Marcello Perathoner webmaster@gutenberg.org

On Fri, Oct 14, 2011 at 5:32 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
A new edition on PG has to face the same challenges as every product that enters an already saturated market. If your edition is better by a significant margin it will rise to the top eventually.
Actually, companies typically (a) replace one model with another to reduce competition and (b) makes sure to make clear the differences between the models they have on the market.
an ebook that doesn't display right on your device of choice because of misguided markup decisions hurts a lot.
That wasn't the theory you were espousing with TEI-Lite; as I recall you told us that everything was there, and we should just accept it looked like crap in HTML. -- Kie ekzistas vivo, ekzistas espero.

On 10/14/2011 01:29 PM, David Starner wrote:
A new edition on PG has to face the same challenges as every product that enters an already saturated market. If your edition is better by a significant margin it will rise to the top eventually.
Actually, companies typically (a) replace one model with another to reduce competition and (b) makes sure to make clear the differences between the models they have on the market.
But PG is not like a company that designs its products. It is much more like a store that offers products from different companies side by side. Why should PG prefer your book over the same book produced by somebody else? Especially if the public prefers the other book? It has always been PG's policy to never delete anything, except for copyright status changes.
an ebook that doesn't display right on your device of choice because of misguided markup decisions hurts a lot.
That wasn't the theory you were espousing with TEI-Lite; as I recall you told us that everything was there, and we should just accept it looked like crap in HTML.
It looked like crap when you did it. Done right it looks like this: http://www.gnutenberg.de/pgtei/0.5/examples/candide/4650-h.html http://www.gnutenberg.de/pgtei/0.5/examples/candide/4650-pdf.pdf -- Marcello Perathoner webmaster@gutenberg.org

"Marcello" == Marcello Perathoner <marcello@perathoner.de> writes:
>>> an ebook that doesn't display right on your device of choice >>> because of misguided markup decisions hurts a lot. >> That wasn't the theory you were espousing with TEI-Lite; as I >> recall you told us that everything was there, and we should >> just accept it looked like crap in HTML. Marcello> It looked like crap when you did it. Marcello> Done right it looks like this: Marcello> Marcello> http://www.gnutenberg.de/pgtei/0.5/examples/candide/4650-h.html Marcello> Marcello> http://www.gnutenberg.de/pgtei/0.5/examples/candide/4650-pdf.pdf If the output looks right only when the author of the software uses it, at least one would infer that it is not well documented. Carlo

On 10/14/2011 04:48 PM, Carlo Traverso wrote:
If the output looks right only when the author of the software uses it, at least one would infer that it is not well documented.
Or you could infer that people like complaining better than studying documentation. -- Marcello Perathoner webmaster@gutenberg.org

Why should PG prefer your book over the same book produced by somebody else? Especially if the public prefers the other book?
It is not "the public" that prefers the older cruftier version. Rather that is the person "hiding behind the mirror" at PG who keeps pushing the older cruftier version onto the public by overtly "advertising" that older version. Look, if you prefer the old crufty versions -- then *don't* ask volunteers to make new versions that fix that old cruftiness!

A few missed spelling errors in an old edition don't hurt anybody, but an ebook that doesn't display right on your device of choice because of misguided markup decisions hurts a lot.
In the case I'm thinking of the "most popular version" doesn't display right on ebook readers, is missing passages of text, and contains several 100 minor errors, and a dozen or so more major errors.

On Wed, Oct 12, 2011 at 10:50 PM, Marcello Perathoner <marcello@perathoner.de> wrote:
My vision is to have every gadget on this earth sold preloaded with a corpus of the world's finest literature, in all languages.
The only problem with this vision is that it's not what people want. Speaking with the head of Kobo last week, he says that they no longer put free books onto their devices, because too many people complained that it took up the space they wanted for their own choice of books. Most people just wanted to delete them immediately. Until device capacity is infinite, I can't see it happening.

I have to say this was my own experience with my Kobo. The thought was nice, but if I want to read one of the classics, I'd rather go download it from PG or IA than have it already cluttering up my device on first boot. Alex On Wed, Oct 12, 2011 at 1:43 PM, Zara Baxter <zbaxter@gmail.com> wrote:
On Wed, Oct 12, 2011 at 10:50 PM, Marcello Perathoner <marcello@perathoner.de> wrote:
My vision is to have every gadget on this earth sold preloaded with a corpus of the world's finest literature, in all languages.
The only problem with this vision is that it's not what people want.
Speaking with the head of Kobo last week, he says that they no longer put free books onto their devices, because too many people complained that it took up the space they wanted for their own choice of books. Most people just wanted to delete them immediately.
Until device capacity is infinite, I can't see it happening. _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Alex Buie <abuie@kwdservices.com> writes:
I have to say this was my own experience with my Kobo. The thought was nice, but if I want to read one of the classics, I'd rather go download it from PG or IA than have it already cluttering up my device on first boot.
My 2-years-old ebook reader (CyBook) came with several hundred books preloaded on its internal storage medium; I did not delete them and there is still space to add some more. I'm happy that they "filled" the reader with a nice collection. One day, I'll probably remove this all, or not, because I'm lazy. I usually have my books on the external micro-SD card. -- Karl Eichwalder

On Wed, Oct 12, 2011 at 08:00:19PM +0200, Karl Eichwalder wrote:
Alex Buie <abuie@kwdservices.com> writes:
I have to say this was my own experience with my Kobo. The thought was nice, but if I want to read one of the classics, I'd rather go download it from PG or IA than have it already cluttering up my device on first boot.
My 2-years-old ebook reader (CyBook) came with several hundred books preloaded on its internal storage medium; I did not delete them and there is still space to add some more. I'm happy that they "filled" the reader with a nice collection. One day, I'll probably remove this all, or not, because I'm lazy.
I usually have my books on the external micro-SD card.
We worked with the Kobo people, and it was a pleasure that they were so enlightened. Unfortunately, their software really sucks. So, deleting files, and overall file management, and navigation of content....all pretty well sucks. Don't get me started on iTunes for the iDevices. That sucks in a much more proprietary way. Kindle and Nook are both miserable, too. If it were EASIER to manage your content (especially exchanging among various devices), it would be easier to mass delete etc. But Kobo really takes the lead in sucky software, in my experience. I can think of some nice things to say about all of these devices, too, but the content management aspects are universally disappointing. -- Greg

On Oct 12, 2011, at 1:15 PM, Greg Newby wrote:
Don't get me started on iTunes for the iDevices. That sucks in a much more proprietary way.
On my iDevice, I can visit PG’s site, click the .epub (or, seeing I have the Kindle app — which rocks for reading — the .mobi) link, and tap a button to download the file *on the device*. Then it’s there, and I can read it just like that. When I next sync with iTunes, it notices that I now have the book on my device, and it backs it up. I need do nothing more at all. Please enlighten me on what the problem is here. :) -- b

On Wed, Oct 12, 2011 at 01:21:54PM -0500, Benjamin Klein wrote:
On Oct 12, 2011, at 1:15 PM, Greg Newby wrote:
Don't get me started on iTunes for the iDevices. That sucks in a much more proprietary way.
On my iDevice, I can visit PG?s site, click the .epub (or, seeing I have the Kindle app ? which rocks for reading ? the .mobi) link, and tap a button to download the file *on the device*. Then it?s there, and I can read it just like that. When I next sync with iTunes, it notices that I now have the book on my device, and it backs it up. I need do nothing more at all.
Please enlighten me on what the problem is here. :)
Why don't you start by emailing the file from your device (or from within iTunes) to someone you'd like to share it with.

On Oct 12, 2011, at 1:27 PM, Greg Newby wrote:
Why don't you start by emailing the file from your device (or from within iTunes) to someone you'd like to share it with.
I guess I could, but I don't see what the point would be. Until you said this, it never occurred to me that I might try to email the file to someone. I would have emailed a link. -- b

Greg Newby <gbnewby@pglaf.org> writes:
We worked with the Kobo people, and it was a pleasure that they were so enlightened.
Unfortunately, their software really sucks. So, deleting files, and overall file management, and navigation of content....all pretty well sucks.
I guess those devices are mostly suitable if you intend to read books after book from the first to the last page. Then it is not too bad. For advanced usage scenarios. You will need a "samrtphone" or some other general purpose mini-computer (either netbook or tablet)--but those devices consume more power, are more expensive, and are not that light. -- Karl Eichwalder

On 10/12/2011 07:43 PM, Zara Baxter wrote:
On Wed, Oct 12, 2011 at 10:50 PM, Marcello Perathoner <marcello@perathoner.de> wrote:
My vision is to have every gadget on this earth sold preloaded with a corpus of the world's finest literature, in all languages.
The only problem with this vision is that it's not what people want.
Speaking with the head of Kobo last week, he says that they no longer put free books onto their devices, because too many people complained that it took up the space they wanted for their own choice of books. Most people just wanted to delete them immediately.
The Kobo is a lowest end machine with just 1 GB free. To implement my vision, some time will pass, and with memory doubling every 1.5 years, the smallest iPhone you will buy in 5 years will have 128 GB. All PG Epubs without images take up 6 GB. (+ 15 GB with images) How was that about taking too much space? In 10 years the smallest iPhone you can buy will have 1 TB. -- Marcello Perathoner webmaster@gutenberg.org

Of course no one at DP wants to hear that, as it is much easier to blame PG.
Sadly, no matter who is to blame, even "simple things" like the "typesetting" "conventions" of paragraph division are handle poorly. For example, the two most common conventions are: 1) Indent first word of paragraph and *no* vertical whitespace. or: 2) One line of vertical whitespace between paragraphs and first word is *not* indented. [[and yes I know it's not quite as simple as all that]] And "we" [whoever "we" is] can't even seem to accomplish the one *or* the other. Such as, in a not uncommon example, a book comes out of "the mill" with *two* vertical lines of whitespace between paragraphs *and* the first word is indented -- twice wronged!

Because it saves time, and you won't end up doing all those long forgotten (for good reason) books you can buy for 1 cent.
My criteria has always been that if "enough" people want to read it, it's worth doing. Where "enough" is a pretty small number -- if people collectively spend more time reading it than I spend making it, it's probably a "success." Because I'm not putting in the effort in the first place unless [at least on some masochistic level] I'm enjoying the effort. Surprisingly, to me at least, looking at first 30-day download counts of recent additions it seems like just about everything meets this test. "Weak" performers would seem to be the less-common foreign language books, but I don't think people reading this thread would agree that would be a reason not to "want" them.

On Tue, 11 Oct 2011 11:39:55 +0200, Marcello Perathoner wrote:
1. Use one master format for every book. (There will be a small set of master formats to choose from.)
What would you suggest as at least part of that set? TEI? reStructuredText? or Z.M.L.? ;-) But generally speaking: yes, sounds good to me.
2. Minimize formatting. Make books that are usable across a wide variety of devices, not books that look exactly like the paper edition.
As long as it won't preclude others from doing that afterwards, sounds fine to me.
3. Use a resource control system (like git) for posting and maintenance. PG will host the master repository and the public can pull from it. A group of `committers´ can push. Every committer can have his own group of aides and pull from them.
Very sensible feature.
4. Use already scanned material: IA, Google, Gallica etc.
You probably mean: we're not going to spend time and energy on getting our own scans good, just jump through the hoops of IA and we're good.
5. Important works first. Don't bother with those embarrassing amateurish works DP turns out by the hundreds.
I would leave that up to the volunteers themselves. If anything, I tend to gravitate towards the obscure, because the 'important' works will be digitised anyway.
6. Accept unicode only.
But of course. 7. Make it really distributed in order to facilitate the various copyright regimes. Right now the DP versions for Canada and Europe are hopelessly out of sync software-wise with the main DP site. The latter also gets me back to scanning: I am not aware of repositories comparable to IA in 50+ and 70+ countries. I can see good reasons not to take that bit on as well, however. Regards, Walter

On 10/11/2011 02:57 PM, Walter van Holst wrote:
On Tue, 11 Oct 2011 11:39:55 +0200, Marcello Perathoner wrote:
1. Use one master format for every book. (There will be a small set of master formats to choose from.)
What would you suggest as at least part of that set?
TEI? reStructuredText?
Those are my pet formats but you could use other formats as well. They have to be free and documented and have a toolchain that can build at least HTML and plain text (more is better).
or Z.M.L.? ;-)
Do you mean this one? http://www.liminalzone.org/ZML or do you mean the ZML.com scam site?
7. Make it really distributed in order to facilitate the various copyright regimes. Right now the DP versions for Canada and Europe are hopelessly out of sync software-wise with the main DP site.
You mean DP-US is hopelessly behind the other ones because they can't do unicode.
The latter also gets me back to scanning: I am not aware of repositories comparable to IA in 50+ and 70+ countries. I can see good reasons not to take that bit on as well, however.
Many European governments have started scanning. -- Marcello Perathoner webmaster@gutenberg.org

On Tue, 11 Oct 2011 15:38:14 +0200, Marcello Perathoner wrote:
or Z.M.L.? ;-)
Do you mean this one?
I was just trolling, never mind.
7. Make it really distributed in order to facilitate the various copyright regimes. Right now the DP versions for Canada and Europe are hopelessly out of sync software-wise with the main DP site.
You mean DP-US is hopelessly behind the other ones because they can't do unicode.
No, mind you, I wrote 'out of sync'. Which doesn't say who is behind who in what terms.
The latter also gets me back to scanning: I am not aware of repositories comparable to IA in 50+ and 70+ countries. I can see good reasons not to take that bit on as well, however.
Many European governments have started scanning.
Yes, and like al government bureaucracies they're not acting particularly fast. Or worse, like the Royal Library here in the Netherlands, have entered into onerous agreements with Google. Regards, Walter

On Tue, Oct 11, 2011 at 5:39 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
At the same time DP is obsessed about rank and procedure:
Monsieur Maximilien Robespierre, I choose to stand by rank and procedure rather than follow you. -- Kie ekzistas vivo, ekzistas espero.

On 10/11/2011 03:08 PM, David Starner wrote:
On Tue, Oct 11, 2011 at 5:39 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
At the same time DP is obsessed about rank and procedure:
Monsieur Maximilien Robespierre, I choose to stand by rank and procedure rather than follow you.
David, calling me names is against the DP `Code of Conduct´. You better go and read it again until you are sure you understand it. -- Marcello Perathoner webmaster@gutenberg.org

On Tue, Oct 11, 2011 at 9:45 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
David, calling me names is against the DP `Code of Conduct´. You better go and read it again until you are sure you understand it.
It's a good thing your system won't have a code of conduct, then. The point stands; rank and procedure beats tyranny.
I've seen one Schnitzler coming out of DP lately but not much else. Schnitzler, Wittgenstein, Freud, Hofmannsthal are important.
I get it; Germans are important. Everyone else, not so much.
The campfire girls of mars, which we churn out by the hundreds, are not important.
Except that we can churn them out by the hundreds. We have the English speaking manpower and processing simple text is trivial. If you can get the same manpower to work on Hofmannsthal, all the better for you. -- Kie ekzistas vivo, ekzistas espero.

On Oct 11, 2011, at 16:45, David Starner wrote:
On Tue, Oct 11, 2011 at 9:45 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
I've seen one Schnitzler coming out of DP lately but not much else. Schnitzler, Wittgenstein, Freud, Hofmannsthal are important.
I get it; Germans are important. Everyone else, not so much.
Except that among those four "Germans", there are exactly four Austrians... So I'm guessing Marcello's suggestions were directed at me personally. And I'm choosing to take them as a vote of confidence, because I've provided books by all four of them. (; Jana

On 10/11/2011 04:45 PM, David Starner wrote:
I've seen one Schnitzler coming out of DP lately but not much else. Schnitzler, Wittgenstein, Freud, Hofmannsthal are important.
I get it; Germans are important. Everyone else, not so much.
They are Austrian. All born in Vienna, except Freud who worked there most of his life. David, if you want to throw dirt at me, you have to aim better. With your last post you only showed to everybody the general direction of your prejudices. P.S. I'm Italian, I only work in Germany. -- Marcello Perathoner webmaster@gutenberg.org

On Tue, Oct 11, 2011 at 11:24 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
David, if you want to throw dirt at me, you have to aim better.
I think this shows exactly why your plan is doomed. You don't have the gravitas to lead people anywhere. Rank and procedure permit people to work together.
With your last post you only showed to everybody the general direction of your prejudices.
What, that I oppose any narrow list of what's important? That I oppose taking people and telling them you must work on the stuff I consider important, instead of the stuff you want to? -- Kie ekzistas vivo, ekzistas espero.

What, that I oppose any narrow list of what's important? That I oppose taking people and telling them you must work on the stuff I consider important, instead of the stuff you want to?
I find that I can only muster the courage to tackle a book that 1) *I* consider important *and* 2) is a book I really want to do. I find it too hard to keep proof-re-reading a book that I am really not that interested in. This is what DP is good at: getting people to casually attack in little bits books that no one is really very interested in anyway. The problem is, is that eventually *someone* needs to take "ownership" of the book to push it out the door. Often that doesn't happen, and 100+ hours of volunteer work get flushed -- or at least stuck indefinitely on the throne for several years.

On Tue, Oct 11, 2011 at 4:52 PM, James Adcock <jimad@msn.com> wrote:
I find it too hard to keep proof-re-reading a book that I am really not that interested in. This is what DP is good at: getting people to casually attack in little bits books that no one is really very interested in anyway. The problem is, is that eventually *someone* needs to take "ownership" of the book to push it out the door. Often that doesn't happen, and 100+ hours of volunteer work get flushed -- or at least stuck indefinitely on the throne for several years.
The problem is, I don't see what people are willing to PP as having much connection to importance. Right now, I've got a work by Dostovesky sitting in PP Available for over a year. A number of other works I PMed sitting in PP Available for a long time I did because they're Important Works. Had they been the Campfire Girls, they'd already be in PG. And looking at the list, I find the restriction to Google Books or the Internet Archive to be silly. Most of what DP does is already from them. You want to exclude the ever-popular SF serials? Why? You want to prevent someone from chasing down a particular work that's not on Google Books? What have you gained? -- Kie ekzistas vivo, ekzistas espero.

I'm interested in and willing to offer all technical support I'm capable of to a new DP built around these guidelines....
I would be supportive of such an effort. I think you will run into problems in practice getting acceptance for "how restrictive" "restrictive" should be. But, when rules are made they are sure to chafe someone, or everyone, someplace.

At the same time DP is obsessed about rank and procedure: They have `General Managers´ and `DP Boards´ and `Codes of Conduct´ and a bigger part of the software is concerned with ranking proofers than with assisting those proofers with their jobs.
Definitely agree.
1. Use one master format for every book. (There will be a small set of master formats to choose from.)
This is great, and I would love if we could have ZML be one of the allowed master formats, although I'm aware there's a bit of history there... http://www.z-m-l.com/
2. Minimize formatting. Make books that are usable across a wide variety of devices, not books that look exactly like the paper edition.
This is important. Flowery HTML is nice but is nowhere near necessary or prudent, imo.
3. Use a resource control system (like git) for posting and maintenance. PG will host the master repository and the public can pull from it. A group of `committers´ can push. Every committer can have his own group of aides and pull from them.
I _really_ like this idea!
4. Use already scanned material: IA, Google, Gallica etc.
No comment specifically on this one, but I concur.
5. Important works first. Don't bother with those embarrassing amateurish works DP turns out by the hundreds.
I don't agree quite so much here, but to each his own.
6. Accept unicode only.
Absolutely. Alex On Tue, Oct 11, 2011 at 5:39 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
On 10/11/2011 06:16 AM, Jim Adcock wrote:
Whether or not making the html file "look good" is wasted energy
It is not only wasted energy but actual damage.
DP is run like a tea party and produces pretty embroidery, but that embroidery only fits the producer's tea table.
There's very little knowledge at DP about computer technology or book design nor any desire to acquire it. (There are a few knowledgeable individuals, but they are drowned out by the crowd.)
At the same time DP is obsessed about rank and procedure: They have `General Managers´ and `DP Boards´ and `Codes of Conduct´ and a bigger part of the software is concerned with ranking proofers than with assisting those proofers with their jobs.
DP has imploded and there's no indication that they will or want to reform themselves. Time to start over.
I'm interested in and willing to offer all technical support I'm capable of to a new DP built around these guidelines:
1. Use one master format for every book. (There will be a small set of master formats to choose from.)
2. Minimize formatting. Make books that are usable across a wide variety of devices, not books that look exactly like the paper edition.
3. Use a resource control system (like git) for posting and maintenance. PG will host the master repository and the public can pull from it. A group of `committers´ can push. Every committer can have his own group of aides and pull from them.
4. Use already scanned material: IA, Google, Gallica etc.
5. Important works first. Don't bother with those embarrassing amateurish works DP turns out by the hundreds.
6. Accept unicode only.
-- Marcello Perathoner webmaster@gutenberg.org _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On 10/11/2011 05:18 PM, Alex Buie wrote:
This is great, and I would love if we could have ZML be one of the allowed master formats, although I'm aware there's a bit of history there... http://www.z-m-l.com/
If you search the archives you'll find a post I made years ago about why this particular `language´ is not suited for ebook production. Here's the full story about the proponent of that `language´: http://www.gnutenberg.de/pgtei/0.5/examples/bowerbird/poo.html -- Marcello Perathoner webmaster@gutenberg.org

This looks like quite an... interesting read. Was this produced from tei? -- Alex Buie Network Coordinator / Server Engineer KWD Services, Inc Media and Hosting Solutions +1(703)445-3391 +1(480)253-9640 +1(703)919-8090 abuie@kwdservices.com On Tue, Oct 11, 2011 at 11:53 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
On 10/11/2011 05:18 PM, Alex Buie wrote:
This is great, and I would love if we could have ZML be one of the allowed master formats, although I'm aware there's a bit of history there... http://www.z-m-l.com/
If you search the archives you'll find a post I made years ago about why this particular `language´ is not suited for ebook production.
Here's the full story about the proponent of that `language´:
http://www.gnutenberg.de/pgtei/0.5/examples/bowerbird/poo.html
-- Marcello Perathoner webmaster@gutenberg.org _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

On 10/11/2011 06:06 PM, Alex Buie wrote:
This looks like quite an... interesting read. Was this produced from tei?
Actually it was produced in an XML that was transformed into TEI. All source files are here: http://www.gnutenberg.de/pgtei/0.5/examples/bowerbird/ This is the source: http://www.gnutenberg.de/pgtei/0.5/examples/bowerbird/poo.bb -- Marcello Perathoner webmaster@gutenberg.org

On Tue, October 11, 2011 9:53 am, Marcello Perathoner wrote:
On 10/11/2011 05:18 PM, Alex Buie wrote:
This is great, and I would love if we could have ZML be one of the allowed master formats, although I'm aware there's a bit of history there... http://www.z-m-l.com/
If you search the archives you'll find a post I made years ago about why this particular `language´ is not suited for ebook production.
Here's the full story about the proponent of that `language´:
http://www.gnutenberg.de/pgtei/0.5/examples/bowerbird/poo.html
Now Marcello, there is no need to resort to ad hominem. The fact of the matter is that even if BowerBird had consistently behaved in an exemplary and respectful fashion, z.m.l. is still inadequate as a markup language. Part of the problem is that we have no real specification of what z.m.l. /is/. BB has provided tons of "examples," (some of which are now inconsistent as apparently the language is still evolving) but no definitive declaration of what the language allows, and disallows, and how the elements are to be used. This is why I have dubbed z.m.l., SML -- Spousal Markup Language: there are rules, but you have to figure them out on your own, and they are subject to change on a whim. The original PG philosophy was that the text was the only thing that mattered, and all markup was superfluous. It quickly became apparent that at least emphasis needed to be indicated and so it was decided that italicized text would be indicated in UPPER CASE. Unfortunately, people began to discover that there were books which contained upper case text which was not intended as emphasis, so the /new/ standard became to use underscores to indicate italicization (only those of us old enough to have learned to type on typewriters will recall that the mechanical convention of typing was to underline what would otherwise be italicized). There are a number of constructs in other markup languages which z.m.l. does not support. BowerBird's response is that support for those constructs is unnecessary as e-books simply do not require them. This is, of course, the same argument as the one that /all/ markup is unnecessary, the line is simply drawn in a different place, and BowerBird becomes the ultimate arbiter of what is, and is not, needed in e-books. It's hard to know what markup will be necessary to preserve any specific work of literature. Thus, what is really needed is an eXtensible Markup Language, such as TEI, which captures everything we know about now, and can be extended when we encounter something new. z.m.l. fails on both these counts.

...It quickly became apparent that at least emphasis needed to be indicated and so it was decided that italicized text would be indicated in UPPER CASE. Unfortunately, people began to discover that there were books which contained upper case text which was not intended as emphasis...
There are also books which contain italics which are not intended as emphasis, but, oh well....

On Tue, October 11, 2011 2:39 pm, James Adcock wrote:
...It quickly became apparent that at least emphasis needed to be indicated and so it was decided that italicized text would be indicated in UPPER CASE. Unfortunately, people began to discover that there were books which contained upper case text which was not intended as emphasis...
There are also books which contain italics which are not intended as emphasis, but, oh well....
Precisely, which is why it's important to have a markup language complete enought to deal with uppercase and italics. That's also why it's important to have an eXtensible Markup Language. It's easy to say now that the distinction between uppercase and italics are important, but apparently somebody missed it. I don't know what the next unanticipated text construct will be, but it would be nice to be able to deal with it when we encounter it.

Excuse Me, but are UPPER CASE letters actually formatting that need be marked up. Then, again. we could also chose an appropriate unicode font and use glyph encoding in the master format. Of course you will need one hell of an analyzer to get it to work any particular e-book reader. C'mon, kiddies!! We have "BEEN THERE AND BACK AGAIN" regards Keith. Am 12.10.2011 um 00:57 schrieb Lee Passey:
On Tue, October 11, 2011 2:39 pm, James Adcock wrote:
...It quickly became apparent that at least emphasis needed to be indicated and so it was decided that italicized text would be indicated in UPPER CASE. Unfortunately, people began to discover that there were books which contained upper case text which was not intended as emphasis...
There are also books which contain italics which are not intended as emphasis, but, oh well....
Precisely, which is why it's important to have a markup language complete enought to deal with uppercase and italics. That's also why it's important to have an eXtensible Markup Language. It's easy to say now that the distinction between uppercase and italics are important, but apparently somebody missed it. I don't know what the next unanticipated text construct will be, but it would be nice to be able to deal with it when we encounter it.

They don't require markup. But if they exist, then you can't use uppercase AS markup. That is, you can take "The _italic_ word" and mark it up as ""The ITALIC word"", as was the very original tradition, but that fails when you try to mark up "The _italic_ CAPITAL word". On Wed, Oct 12, 2011 at 3:29 AM, Keith J. Schultz <schultzk@uni-trier.de> wrote:
Excuse Me, but are UPPER CASE letters actually formatting that need be marked up.
On Tue, October 11, 2011 2:39 pm, James Adcock wrote:
...It quickly became apparent that at least emphasis needed to be indicated and so it was decided that italicized text would be indicated in UPPER CASE. Unfortunately, people began to discover that there were books which contained upper case text which was not intended as emphasis...

Sparr, ALLOW, me to explain THAT Upper Case can be used for emphasis, yet it requires no mark up! YET, emphasis need not always have to be mark up as italic. _Also, the use of underscore can be used for other stylistic means. _Yet, semantics and style vary from author to author. regards Keith. P.S. nothing italic in the above!! ;-))) Am 12.10.2011 um 23:39 schrieb Sparr:
They don't require markup. But if they exist, then you can't use uppercase AS markup.
That is, you can take "The _italic_ word" and mark it up as ""The ITALIC word"", as was the very original tradition, but that fails when you try to mark up "The _italic_ CAPITAL word".
On Wed, Oct 12, 2011 at 3:29 AM, Keith J. Schultz <schultzk@uni-trier.de> wrote:
Excuse Me, but are UPPER CASE letters actually formatting that need be marked up.
On Tue, October 11, 2011 2:39 pm, James Adcock wrote:
...It quickly became apparent that at least emphasis needed to be indicated and so it was decided that italicized text would be indicated in UPPER CASE. Unfortunately, people began to discover that there were books which contained upper case text which was not intended as emphasis...
gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

I should probably have sent my reply in an HTML email so my message would be unambiguous. Let me try again. At the dawn of Project Gutenberg, during a period lost to the mists of time, if someone encountered "The *italic* word" in a book, they would have typed "The ITALIC word" into the digital version. Eventually it became obvious that this was a bad way to mark up italics because under that scheme there was no unambiguous way to mark up "The *italic* CAPITAL word". Unambiguous encoding is a goal of many of the people here, so this was resolved by moving to other forms of markup, including using underscores to represent italics. On Wed, Oct 12, 2011 at 6:12 PM, Keith J. Schultz <schultzk@uni-trier.de> wrote:
Sparr,
ALLOW, me to explain THAT Upper Case can be used for emphasis, yet it requires no mark up! YET, emphasis need not always have to be mark up as italic. _Also, the use of underscore can be used for other stylistic means. _Yet, semantics and style vary from author to author.
regards Keith.
P.S. nothing italic in the above!! ;-)))
Am 12.10.2011 um 23:39 schrieb Sparr:
They don't require markup. But if they exist, then you can't use uppercase AS markup.
That is, you can take "The _italic_ word" and mark it up as ""The ITALIC word"", as was the very original tradition, but that fails when you try to mark up "The _italic_ CAPITAL word".
On Wed, Oct 12, 2011 at 3:29 AM, Keith J. Schultz <schultzk@uni-trier.de> wrote:
Excuse Me, but are UPPER CASE letters actually formatting that need be marked up.
On Tue, October 11, 2011 2:39 pm, James Adcock wrote:
...It quickly became apparent that at least emphasis needed to be indicated and so it was decided that italicized text would be indicated in UPPER CASE. Unfortunately, people began to discover that there were books which contained upper case text which was not intended as emphasis...

ALLOW, me to explain THAT Upper Case can be used for emphasis, yet it requires no mark up! YET, emphasis need not always have to be mark up as italic. _Also, the use of underscore can be used for other stylistic means. _Yet, semantics and style vary from author to author.
I think if one goes back to typographic usage in previous centuries one finds that use of italic vs. non-italic is intended as *contrastive* not *emphasis* -- that the notion of "*emphasis*" is yet-another html design mistake. For example a brief non-italic section might be found within an italic section to show *contrast* -- it certainly wasn't intended that the non-italic section be UN-"emphasized" -- nor was the long-italic section intended to imply the entire section was to be *emphasized* -- rather the long italic section was simply intended to be *contrastive.* Sometimes when contrast was printed it *was* intended to be read as emphasis, but that reading is supplied in context by the reader not by the typography. As often the italic is simply intended to represent *some* kind of *difference.* Only recently have we found the desire of writers to make everything explicitly *emphatic* for the "sake of the reader" -- with html unfortunately only too happy to oblige!

Hi Jim, the philosophical discussion of what emphasis is or what is contrastive is quite futile, here. Since, you are so found of history. I believe the \ephm command in TeX advents HTML. regards Keith Am 14.10.2011 um 01:15 schrieb Jim Adcock:
ALLOW, me to explain THAT Upper Case can be used for emphasis, yet it requires no mark up! YET, emphasis need not always have to be mark up as italic. _Also, the use of underscore can be used for other stylistic means. _Yet, semantics and style vary from author to author.
I think if one goes back to typographic usage in previous centuries one finds that use of italic vs. non-italic is intended as *contrastive* not *emphasis* -- that the notion of "*emphasis*" is yet-another html design mistake. For example a brief non-italic section might be found within an italic section to show *contrast* -- it certainly wasn't intended that the non-italic section be UN-"emphasized" -- nor was the long-italic section intended to imply the entire section was to be *emphasized* -- rather the long italic section was simply intended to be *contrastive.* Sometimes when contrast was printed it *was* intended to be read as emphasis, but that reading is supplied in context by the reader not by the typography. As often the italic is simply intended to represent *some* kind of *difference.* Only recently have we found the desire of writers to make everything explicitly *emphatic* for the "sake of the reader" -- with html unfortunately only too happy to oblige!

Read carefully; I'm agreeing with you almost completely, I'm just exploring some of the subtleties. On Thu, October 13, 2011 5:15 pm, Jim Adcock wrote:
I think if one goes back to typographic usage in previous centuries one finds that use of italic vs. non-italic is intended as *contrastive* not *emphasis*
I think you're probably right. I can see old Johann himself being confronted by one of his typesetters saying "our client wants this text differenciated from its surrounding gothic text, how do we do it", and Johann replying "just use those new leads we got from Aldine Press in Italy".
-- that the notion of "*emphasis*" is yet-another html design mistake.
Well, whether it is a mistake is by no means a consensus opinion, but I tend to agree with you. HTML 2.0 was primarily presentational markup; it was used to indicate how things should look. As HTML has evolved, its "owners" realized that it would be more powerful if it were semantic instead, indicating what linguistic meaning a phrase had irrespective of presentation. I suspect there was some sort of knee-jerk reaction where someone said "<i> is presentational, we need semantic markup, italics are used for emphasis, let's replace <i> with <em>." If you look closely at this last sentence, you will see a syllogistic fallacy akin to "All men are mortal, Socrates is mortal, therefore, all men are Socrates."
For example a brief non-italic section might be found within an italic section to show *contrast* -- it certainly wasn't intended that the non-italic section be UN-"emphasized" -- nor was the long-italic section intended to imply the entire section was to be *emphasized* -- rather the long italic section was simply intended to be *contrastive.*
This is a fairly cogent explanation of the issue. I have no problem with the <em> tag, but I /do/ have a problem with the notion that it is a replacement for <i>, and I think that <i> should /not/ be deprecated. Italics are frequently used in text to indicate a character's thought process; and while it's possible to think /emphatically/, not all thoughts are emphatic, nor are all things emphasized thoughts. The distinction between italics and emphasis, however, /is/ useful. The important thing to remember about HTML in particular, and XML in general, is that it is not designed to be used by humans, it is designed to be used by computers; and one of those uses is in synthetically generated speech. A text-to-speech engine should make no distinction between italicized and non-italicized text when the italics are used to indicate the name of a ship, but /should/ add stress when the italics are used to indicate emphasis.
Sometimes when contrast was printed it *was* intended to be read as emphasis, but that reading is supplied in context by the reader not by the typography.
True, but it's important to recognize that the typographical contrast was required for the reader to recognize that a contextual interpretation was necessary. At that point a relatively well trained human is required to indicate what interpretation is appropriate. When no human is available (as when you have a text to speech enginer) things start to get messy. Maybe Watson could intuit context from bare contrast, but for the foreseeable future I don't see how you can proceed without a human in the loop. But semantic markup allows a human to indicate the interpretation concretely before the computer gets involved. Therefore, I think that "The <i>Queen Mary</> sailed <em>last</em> night!" is both appropriate and desirable markup. (As an exercise for the reader, how would one mark up the foregoing text if it were thought rather than said?)
As often the italic is simply intended to represent *some* kind of *difference.* Only recently have we found the desire of writers to make everything explicitly *emphatic* for the "sake of the reader" -- with html unfortunately only too happy to oblige!
Certainly this happens, but I don't think HTML is at fault; it's abuse of HTML that is at fault. Much of the blame goes to the users, who try to make HTML behave like PDF, trying to specify exactly how they want the document to look without regard to what the document /is/, but some of the blame goes to the W3C whose advice is "use <em> instead of <i>", when in fact it should be "use <em> for emphasis and <i> to contrast a span of text from the surrounding text." The solution to the problem is not to avoid HTML--the same problems will arise with any markup language. The solution is to teach people how to use HTML correctly. HTML for e-books is actually fairly simple to learn and use if people would just overcome the knee-jerk reaction against that seems to be deeply engrained in a lot of people.

On Fri, Oct 14, 2011 at 2:59 PM, Lee Passey <lee@novomail.net> wrote:
... the W3C whose advice is "use <em> instead of <i>", when in fact it should be "use <em> for emphasis and <i> to contrast a span of text from the surrounding text."
I think they actually recommend using CSS where <em> isn't appropriate.

On 10/14/2011 3:32 PM, Scott Olson wrote:
On Fri, Oct 14, 2011 at 2:59 PM, Lee Passey <lee@novomail.net <mailto:lee@novomail.net>> wrote:
... the W3C whose advice is "use <em> instead of <i>", when in fact it should be "use <em> for emphasis and <i> to contrast a span of text from the surrounding text."
I think they actually recommend using CSS where <em> isn't appropriate.
Well, I think that's what they /should/ recommend, but I don't think that's the message that's getting out. And the messages that /are/ getting out are mixed. For example, the CSS 2.1 specification notes: "CSS gives so much power to the "class" attribute, that authors could conceivably design their own "document language" based on elements with almost no associated presentation (such as DIV and SPAN in HTML) and assigning style information through the "class" attribute. Authors should avoid this practice since the structural elements of a document language often have recognized and accepted meanings and author-defined classes may not." In other words, don't use <span> or <div> where some other element already covers it. Or perhaps more explicitly, don't use <span class="italic"> when you could use <i>. I guess it could be argued that since <i> is deprecated the foregoing note doesn't apply, but that's by no means clear. Another example: Over the course of the past year I've been fighting with Adobe's RoboHelp to create an HTML-based help system. In the WYSIAWYG editor Adobe has created, when you select a span of text and select "italicize" the HTML code gets created with <em>, not <span class="italic">. The same is true of bold, which is automatically encoded using <strong>. If non-emphasized font changes are supposed to be marked up with <span> and not <em> or <strong>, it seems that Adobe didn't get the memo. So, by show of hands: how many here believe that you should always use <em> when you want an italicized presentation? Now I'm not saying you're wrong -- in fact, I think you're right. But there's a whole lot of confusion about this which is very pervasive, and a lot more clear public statements from the w3c are necessary.

Hi All, Lee states some interesting facts and thoughts: 1) there is no true consensus 2) semantic mark up (em or \emph) are there to show emphasis not italic 3) if em is used and it is output is italic there is no difference 4) the distinction is purely made by the reader or speaker. So what we are left with is that the discussion is philosophical. YET, the actual problem is not truly seen. We have two problems here 1) production of a text 2) coverting an already printed text to a mark up electronic version During 1 it is better to use a command like emphasis so that when the end product is released there is a consistent use of emphasis through out the text. While in 2 we can ONLY assume that emphasis was intended by the author/producer/publisher! It is inappropriate to use the semantic mark up of emphasis unless we intend to impose an interpretation of the text. Last but not least, if a emphasis command is implemented properly there should be other typographical changes to the text being formatted so that it is contrasted more in the text in order to make it stand out more! In other words you should be able to see the difference between italics and emphasis. I will iterate once more emphasis is not a html design mistake (maybe its implementation). TeX had the \emph-command before HTML. It can be (though you should not) redefine to suite the authors needs and likes. regards Keith. Am 14.10.2011 um 22:59 schrieb Lee Passey:
Read carefully; I'm agreeing with you almost completely, I'm just exploring some of the subtleties.
On Thu, October 13, 2011 5:15 pm, Jim Adcock wrote:
I think if one goes back to typographic usage in previous centuries one finds that use of italic vs. non-italic is intended as *contrastive* not *emphasis*
I think you're probably right. I can see old Johann himself being confronted by one of his typesetters saying "our client wants this text differenciated from its surrounding gothic text, how do we do it", and Johann replying "just use those new leads we got from Aldine Press in Italy".
-- that the notion of "*emphasis*" is yet-another html design mistake.
Well, whether it is a mistake is by no means a consensus opinion, but I tend to agree with you. HTML 2.0 was primarily presentational markup; it was used to indicate how things should look. As HTML has evolved, its "owners" realized that it would be more powerful if it were semantic instead, indicating what linguistic meaning a phrase had irrespective of presentation. I suspect there was some sort of knee-jerk reaction where someone said "<i> is presentational, we need semantic markup, italics are used for emphasis, let's replace <i> with <em>." If you look closely at this last sentence, you will see a syllogistic fallacy akin to "All men are mortal, Socrates is mortal, therefore, all men are Socrates."
For example a brief non-italic section might be found within an italic section to show *contrast* -- it certainly wasn't intended that the non-italic section be UN-"emphasized" -- nor was the long-italic section intended to imply the entire section was to be *emphasized* -- rather the long italic section was simply intended to be *contrastive.*
This is a fairly cogent explanation of the issue. I have no problem with the <em> tag, but I /do/ have a problem with the notion that it is a replacement for <i>, and I think that <i> should /not/ be deprecated. Italics are frequently used in text to indicate a character's thought process; and while it's possible to think /emphatically/, not all thoughts are emphatic, nor are all things emphasized thoughts.
The distinction between italics and emphasis, however, /is/ useful. The important thing to remember about HTML in particular, and XML in general, is that it is not designed to be used by humans, it is designed to be used by computers; and one of those uses is in synthetically generated speech. A text-to-speech engine should make no distinction between italicized and non-italicized text when the italics are used to indicate the name of a ship, but /should/ add stress when the italics are used to indicate emphasis.
Sometimes when contrast was printed it *was* intended to be read as emphasis, but that reading is supplied in context by the reader not by the typography.
True, but it's important to recognize that the typographical contrast was required for the reader to recognize that a contextual interpretation was necessary. At that point a relatively well trained human is required to indicate what interpretation is appropriate. When no human is available (as when you have a text to speech enginer) things start to get messy.
Maybe Watson could intuit context from bare contrast, but for the foreseeable future I don't see how you can proceed without a human in the loop. But semantic markup allows a human to indicate the interpretation concretely before the computer gets involved. Therefore, I think that
"The <i>Queen Mary</> sailed <em>last</em> night!"
is both appropriate and desirable markup. (As an exercise for the reader, how would one mark up the foregoing text if it were thought rather than said?)
As often the italic is simply intended to represent *some* kind of *difference.* Only recently have we found the desire of writers to make everything explicitly *emphatic* for the "sake of the reader" -- with html unfortunately only too happy to oblige!
Certainly this happens, but I don't think HTML is at fault; it's abuse of HTML that is at fault. Much of the blame goes to the users, who try to make HTML behave like PDF, trying to specify exactly how they want the document to look without regard to what the document /is/, but some of the blame goes to the W3C whose advice is "use <em> instead of <i>", when in fact it should be "use <em> for emphasis and <i> to contrast a span of text from the surrounding text."
The solution to the problem is not to avoid HTML--the same problems will arise with any markup language. The solution is to teach people how to use HTML correctly. HTML for e-books is actually fairly simple to learn and use if people would just overcome the knee-jerk reaction against that seems to be deeply engrained in a lot of people.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

I mention this because I think there are some on this list who are probably "privacy freaks" who would rather NOT have Amazon keep a copy of every personal document you send to your Kindle via Amazon. This is a "preference item" you can choose to turn OFF as described below: === ...You can control these new features from the Manage Your Kindle page at www.amazon.com/manageyourkindle where you can see a list of your archived documents, re-deliver documents to your Kindle, delete any document from archive, or even turn off archiving for your account. Learn more about the Kindle Personal Documents Service from our help pages at www.amazon.com/kindlepersonaldocuments.

Excuse Me, but are UPPER CASE letters actually formatting that need be marked up...
...We have "BEEN THERE AND BACK AGAIN"
I suggest: http://tinyurl.com/3vozpku The practice of typography: modern methods of book composition By Theodore Low De Vinne, 1904 as a reminder that previous generations of authors, editors, and typesetters have all gone through these issues before, and had worked out among themselves what the minimal set of tools and markups were which were necessary to express the will of authors in a clear, clean and artistic manner. "We" have indeed been there before -- but it is not at all clear to me that the current generation of us "typesetters" creating e-books have nearly gotten "back there again." PS: Even in 1904 it was well recognized that there was lots of trash typesetting and book publishing "style" that one ought not feel obligated to emulate! Conversely what was good style in 1904 is still good style now.

On Thu, 13 Oct 2011 14:37:03 -0700, Jim Adcock wrote:
I suggest:
The practice of typography: modern methods of book composition By Theodore Low De Vinne, 1904
Now, that looks like a book I'd love to work on at DP. Are there scans of it available? It is in the public domain in the USA, isn't it? Regards, Walter

On Fri, Oct 14, 2011 at 9:49 AM, Walter van Holst < walter.van.holst@xs4all.nl> wrote:
On Thu, 13 Oct 2011 14:37:03 -0700, Jim Adcock wrote:
I suggest:
The practice of typography: modern methods of book composition By Theodore Low De Vinne, 1904
Now, that looks like a book I'd love to work on at DP. Are there scans of it available? It is in the public domain in the USA, isn't it?
The Internet Archive has 3 copies from 3 different sources. -- André Engels, andreengels@gmail.com

"Walter" == Walter van Holst <walter.van.holst@xs4all.nl> writes:
Walter> On Thu, 13 Oct 2011 14:37:03 -0700, Jim Adcock wrote: >> I suggest: >> >> http://tinyurl.com/3vozpku >> >> The practice of typography: modern methods of book composition >> By Theodore Low De Vinne, 1904 Walter> Now, that looks like a book I'd love to work on at DP. Are Walter> there scans of it available? It is in the public domain in Walter> the USA, isn't it? There are better images at the Internet Archive, but the book has already been cleared not much time ago (otherwise I would have already asked clearance). Carlo

Hi Jim, Thanx for the link! But, we have "Been there and back again" was directed to the whole discussion here on the list! The cite I have used before in similar discussions. regards Keith. Am 13.10.2011 um 23:37 schrieb Jim Adcock:
Excuse Me, but are UPPER CASE letters actually formatting that need be marked up...
...We have "BEEN THERE AND BACK AGAIN"
I suggest:
The practice of typography: modern methods of book composition By Theodore Low De Vinne, 1904
as a reminder that previous generations of authors, editors, and typesetters have all gone through these issues before, and had worked out among themselves what the minimal set of tools and markups were which were necessary to express the will of authors in a clear, clean and artistic manner. "We" have indeed been there before -- but it is not at all clear to me that the current generation of us "typesetters" creating e-books have nearly gotten "back there again."
PS:
Even in 1904 it was well recognized that there was lots of trash typesetting and book publishing "style" that one ought not feel obligated to emulate! Conversely what was good style in 1904 is still good style now.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Well, I guess it comes down to what one semantically considers to be emphasis! At least, we can now discuss if we need semantic mark up or just "syntactic" mark up. that is if a feature should be mark as italic, UPPER CASE, bold or as emphasized. My opinion is that we just need the "syntactic" mark up. leave it up to the reader or producer of the e-book if s/he cares to change the form to what s/he might consider appropriate!. regards Keith Am 11.10.2011 um 22:39 schrieb James Adcock:
...It quickly became apparent that at least emphasis needed to be indicated and so it was decided that italicized text would be indicated in UPPER CASE. Unfortunately, people began to discover that there were books which contained upper case text which was not intended as emphasis...
There are also books which contain italics which are not intended as emphasis, but, oh well....
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Hi Lee, Am 11.10.2011 um 18:41 schrieb Lee Passey: [snip cite from Marcello]
Now Marcello, there is no need to resort to ad hominem. The fact of the matter is that even if BowerBird had consistently behaved in an exemplary and respectful fashion, z.m.l. is still inadequate as a markup language.
Part of the problem is that we have no real specification of what z.m.l. /is/. BB has provided tons of "examples," (some of which are now inconsistent as apparently the language is still evolving) but no definitive declaration of what the language allows, and disallows, and how the elements are to be used. This is why I have dubbed z.m.l., SML -- Spousal Markup Language: there are rules, but you have to figure them out on your own, and they are subject to change on a whim. I have read the specs to TEI, I have, also, read how text is supposed to be marked up with TEI.
Talk about "Spousal Markup" and inconsistancies. TEI is far to bloated to be useful.
The original PG philosophy was that the text was the only thing that mattered, and all markup was superfluous. It quickly became apparent that at least emphasis needed to be indicated and so it was decided that italicized text would be indicated in UPPER CASE. Unfortunately, people began to discover that there were books which contained upper case text which was not intended as emphasis, so the /new/ standard became to use underscores to indicate italicization (only those of us old enough to have learned to type on typewriters will recall that the mechanical convention of typing was to underline what would otherwise be italicized).
There are a number of constructs in other markup languages which z.m.l. does not support. BowerBird's response is that support for those constructs is unnecessary as e-books simply do not require them. This is, of course, the same argument as the one that /all/ markup is unnecessary, the line is simply drawn in a different place, and BowerBird becomes the ultimate arbiter of what is, and is not, needed in e-books.
It's hard to know what markup will be necessary to preserve any specific work of literature. Thus, what is really needed is an eXtensible Markup Language, such as TEI, which captures everything we know about now, and can be extended when we encounter something new. z.m.l. fails on both these counts.
I do agree that some mark up is required and a decent spec is needed. But, there is a problem with a extendible language. They more than often allow the user to have several avenues to extend the language of any particular feature. This then causes inconstancies with the usage of the mark-up and different users do not have a consistent way of marking up said features. I would propose an modular language that is extended when the need arises with only one way to represent a particular feature. The modular approach has the advantage that the tool chain can be modular in design and the tool chain need not be rewritten to support the new "feature", but just needs a new modul to handle it. Furthermore, the master format should preserve the original structure of the text while not being so restrictive to not allow the restructuring of the text. That is these mark up elements can be ignored or interpreted differently in order to support another output format. regards Keith.

"Keith J. Schultz" <schultzk@uni-trier.de> writes:
TEI is far to bloated to be useful.
TEI is not bloated, its feature are there if you need them. But you can easily start with a subset, which is simpler than most of the HTML the DP people are used to do. Start with basic sectioning (<div>, <head>) and block "tags" (<p>) and submit your "pages" to a revision control system (svn, git) and you are done. But basic inline tags not that difficult... Somebody else will check it out and do the rest. Marcello already wrote a tutorial that's probably easier and taster to read than the (evolving) DP Guidelines.
Furthermore, the master format should preserve the original structure of the text while not being so restrictive to not allow the restructuring of the text. That is these mark up elements can be ignored or interpreted differently in order to support another output format.
Not sure what actually you are talking about, but with TEI you can perfectly preserve the original structure and even the layout of a page, and also add semantical or grammatical markup, if wanted. Later on, the processing software will ignore the markup you are not interested in. -- Karl Eichwalder

Marcello already wrote a tutorial that's probably easier and taster to read than the (evolving) DP Guidelines.
Do you have a link to this? Alex On Wed, Oct 12, 2011 at 1:50 PM, Karl Eichwalder <ke@gnu.franken.de> wrote:
"Keith J. Schultz" <schultzk@uni-trier.de> writes:
TEI is far to bloated to be useful.
TEI is not bloated, its feature are there if you need them. But you can easily start with a subset, which is simpler than most of the HTML the DP people are used to do.
Start with basic sectioning (<div>, <head>) and block "tags" (<p>) and submit your "pages" to a revision control system (svn, git) and you are done. But basic inline tags not that difficult... Somebody else will check it out and do the rest.
Marcello already wrote a tutorial that's probably easier and taster to read than the (evolving) DP Guidelines.
Furthermore, the master format should preserve the original structure of the text while not being so restrictive to not allow the restructuring of the text. That is these mark up elements can be ignored or interpreted differently in order to support another output format.
Not sure what actually you are talking about, but with TEI you can perfectly preserve the original structure and even the layout of a page, and also add semantical or grammatical markup, if wanted. Later on, the processing software will ignore the markup you are not interested in.
-- Karl Eichwalder _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d

Alex Buie <abuie@kwdservices.com> writes:
Marcello already wrote a tutorial that's probably easier and taster to read than the (evolving) DP Guidelines.
Do you have a link to this?
I think it is here: http://pgtei.pglaf.org/marcello/0.4/doc/20000-h.html --it covers more than just the basics. As a proofer/formatter you can ignore the dicusssions about the teiheader and the processing tools. -- Karl Eichwalder

On Tue, October 11, 2011 3:39 am, Marcello Perathoner wrote: [snip]
I'm interested in and willing to offer all technical support I'm capable of to a new DP built around these guidelines:
OK Marcello, I'm with you. How do we start?
1. Use one master format for every book. (There will be a small set of master formats to choose from.)
Makes sense to me. Of course, we need to pick master formats that allow for the preservation of as much semantic "goodness" as possible. And we will all need to be willing to compromise. There are some things I'm adamant about (only true paragraphs can be called paragraphs) and other things I can compromise on (name of classes, use of spans, etc.). And as we compromise, I think we should favor solutions that preserve data over those that omit it, even if preservation is somewhat harder.
2. Minimize formatting. Make books that are usable across a wide variety of devices, not books that look exactly like the paper edition.
I agree 100%. Using HTML as an example, all styling should be restricted to external style sheets, and the document should be marked so that it will look acceptable in the majority of HTML user agents (aka browsers) even if the style sheet is lost.
3. Use a resource control system (like git) for posting and maintenance. PG will host the master repository and the public can pull from it. A group of `committers´ can push. Every committer can have his own group of aides and pull from them.
I have never used git, so I cannot comment on it directly. I tend to prefer CVS as it stores the files on the server in the file system rather than in a database, which makes it simpler for me to do back-end compilation, evaluation, composition, etc. Git (or subversion which I do not like at all) have the advantage of being accessible via HTTP or FTP, as well as a proprietary port and protocol.
4. Use already scanned material: IA, Google, Gallica etc.
I don't think this is absolutely necessary. What I /do/ think is absolutely necessary is that the scanned material must be publicly available. So, if someone wants to work on a book from scans that are /not/ publicly available, it should be required that the person get those scans into a public archive first.
5. Important works first. Don't bother with those embarrassing amateurish works DP turns out by the hundreds.
I agree in principle, but think this could be a very difficult position to enforce. I also think that a lot of these dime novels are being churned out because the more important works have already been "done," however amateurishly. I think if we abandoned the notion of a work being "done," and let people work on whatever excites them, even if it's something that has existed in the corpus for decades, attention will return to the important works naturally.
6. Accept unicode only.
Yes, using utf-8 encoding. About a year ago I was able to snag the domain name "ebookcoop.net." I would be willing to donate the name to the cause if we can get an ip address to attach it to. I have been very impressed with the work that Mr. Frank has done at fadedpage.net. Is this a project we could leverage? Roger, would you like to join us?

Lee wrote:
I have been very impressed with the work that Mr. Frank has done at fadedpage.net. Is this a project we could leverage? Roger, would you like to join us?
Lee, you must mean the fadedpage.net of old, when it was a site to explore ebook production. That employed several advanced features that users found useful and could have been a prototype for a real site, but I coded it myself and ran it on a non-commercial, personal server. Neither the quality of the code nor the bandwidth to the hardware were scalable, and it wasn't my intention to run a full production site anyway. At the time, I thought some of what I learned might be useful to DP. I thought the features that really made a difference might be picked up by DP and possibly integrated. I don't expect that to happen. That's the old site. The current fadedpage.net is a test site with exactly two users so I'm guessing it's not the one you were referring to. Also, it is dedicated to the books that some deem not worth producing. I disagree with the proposed restriction to only "important books". The books I'd like to save on fadedpage are the ones written for young readers: the boys' and girls' series books that shaped the minds of young people before television--reading about "Automobile Boys" or "Radio Girls"--or the school stories. Not great literature, but that doesn't mean it's not an important part of our history. I think those other "important books" will be preserved just fine without my help. I'd like to take some small part of America's reading experience and present it to the modern reader in a better way than I've seen so far. Perhaps because I'm a high-school teacher, the series books and the school stories appeal to me. They are on the opposite end of the spectrum from the books proposed for the "start over" site. Finally, I note that a technical proposal was made for a new site. Even if I agreed with all the technical points, there is what feels like a showstopper to me that's equally if not more important. Running a cooperative workflow of volunteers involves motivating people. A leader of such an effort can't just tell them what to do, or sometimes even how to do it. They need to gain their good will, evoke their interests, stimulate their creativity, and communicate with them in an open manner. Who is going to do that in the proposed organization? I see the shortcomings of DP as well as anyone; they need not be enumerated here. Should there be other sites? Yes, I think so. Many of them. And though I'm strictly an amateur programmer, perhaps I'll be able to contribute to some site, somewhere. For completely different reasons it won't be DP and it won't be the "start over" site. So, Lee, thanks for asking but I'm not going to be part of the project as proposed. --Roger

On 10/18/2011 06:03 AM, Roger Frank wrote:
Also, it is dedicated to the books that some deem not worth producing. I disagree with the proposed restriction to only "important books". The books I'd like to save on fadedpage are the ones written for young readers: the boys' and girls' series books that shaped the minds of young people before television--reading about "Automobile Boys" or "Radio Girls"--or the school stories. Not great literature, but that doesn't mean it's not an important part of our history.
Why you'd want to preserve a book that takes longer to digitize than it took to write is beyond me. Those cringeworthy books were turned out by the hundreds by faceless writer farms. They read like Crisco recipes. Those bland, insipid books only prepare children for the bland, insipid television shows they will watch contentedly for the rest of their lives. Little boxes, on the hillside, ...
I think those other "important books" will be preserved just fine without my help.
I'm not interested in `preserving´ books. Google and IA already do that *much* better than we do. I'm not interested in long files of books that gather dust in some disused library basement. I'm only interested in books that people actually read. And not crappy books either, because there's no difference, reading a crappy book or watching a crappy show on tv. So while all important books may already be preserved, they are not as easy to get and download as PG books. I want a set of the best books of all kinds and languages to be downloadable to people's pockets with minimal effort and cost. And I want gadget producers and lots of other sites to offer that download.
Running a cooperative workflow of volunteers involves motivating people. A leader of such an effort can't just tell them what to do, or sometimes even how to do it. They need to gain their good will, evoke their interests, stimulate their creativity, and communicate with them in an open manner. Who is going to do that in the proposed organization?
What good is a leader if she *doesn't* tell people what to do ??? Michael's way of (non)-leading has failed spectacularly. PG has collected a big heap of books -- that's true -- but the heap is too inhomogeneous to be of any real use. Nobody uses PG books as they are. Every site I know puts PG books into their own master format before redistributing them. Wikipedia instead, by enforcing strict guidelines -- vilified by many -- in a lot less time, has become omnipresent. It's hard to come across a site that doesn't link to wikipedia. Every advance in technology has caught PG with its pants down. The switch to HTML had to be done by manually redoing all the books, because plain text turned out to be non-processable. The HTML that DP has produced for years and is still producing does not work well on the new portable reading devices. The next advance in technology will leave PG out of the game completely because PG has failed to deploy the least of technological safeguards, eg. a master format that can be made to transform into other unheard-of-today formats -- Marcello Perathoner webmaster@gutenberg.org

The HTML that DP has produced for years and is still producing does not work well on the new portable reading devices.
HTML does have the problem of not being perfectly matched to what we want to do -- it has many featured that arguably shouldn't be used in coding books -- but are -- and it is missing other features necessary to do the job well, most notably (to me) its weaknesses in coding poetry. BUT, a more serious problem is that PG "coders" don't agree on what should or shouldn't be coded. You can't create a better standard until there is much better agreement about what should be coded -- and what shouldn't be. Finally, authors and publishers do things in books where it isn't at all clear (to me at least) why they did what they did -- and how do you code those things, other than literally? Not to mention the problem of figuring out how to code that which appears to be coming from an incompetent original typesetter.

Am 18.10.2011 um 20:45 schrieb Jim Adcock:
HTML does have the problem of not being perfectly matched to what we want to do -- it has many featured that arguably shouldn't be used in coding books -- but are -- and it is missing other features necessary to do the job well, most notably (to me) its weaknesses in coding poetry. HTML, actually today others very fine control of the output. By "do the job well" do you mean ease of use and effort or what individuals produce due to lack of competence!
BUT, a more serious problem is that PG "coders" don't agree on what should or shouldn't be coded. You can't create a better standard until there is much better agreement about what should be coded -- and what shouldn't be. "PG coders" will never agree on a standard, their wants are far to diverse.
What you need is a group that defines the standard and controls it. This group must be knowledgable in computer science foremost and have a good understanding of layout and feeling for literature.
Finally, authors and publishers do things in books where it isn't at all clear (to me at least) why they did what they did -- and how do you code those things, other than literally? Not to mention the problem of figuring out how to code that which appears to be coming from an incompetent original typesetter.
The most important rule when working effectively with a computer system is not asking why, but what do I to get the result I want. Like I mentioned you have to encode as close to the original without interpretation, and later change or ignore it during further processing. regards Keith.

By "do the job well" do you mean ease of use and effort or what individuals produce due to lack of competence!
By "do the job well" I mean having features that are well-matched to what book coders actually need to code, and when book coders want to produce something that renders "correctly" on an HMTL, EPub, and/or Kindle device that rendering well-represents the efforts of the coder, and can do so without the coder having to resort to extraordinary tricks, hacks, and work-arounds to accomplish common book coding tasks. There a quite a number of books out there that intelligently regale the problems with trying to use HTML to code books, as opposed to HTML sites. For EPub see the books by Castro, Deuchler. For Kindle see the book by Tallent.
Like I mentioned you have to encode as close to the original without interpretation, and later change or ignore it during further processing.
Somewhat agreed, but the book coder always has to make *some* decisions to avoid simply slavishly creating a PDF/bitmap photocopy -- which is what Google Books is *already* doing, for better or for worse.

So, what it comes down to is the lack of good tools. regards Keith. Am 19.10.2011 um 19:24 schrieb James Adcock:
By "do the job well" do you mean ease of use and effort or what individuals produce due to lack of competence!
By "do the job well" I mean having features that are well-matched to what book coders actually need to code, and when book coders want to produce something that renders "correctly" on an HMTL, EPub, and/or Kindle device that rendering well-represents the efforts of the coder, and can do so without the coder having to resort to extraordinary tricks, hacks, and work-arounds to accomplish common book coding tasks. There a quite a number of books out there that intelligently regale the problems with trying to use HTML to code books, as opposed to HTML sites. For EPub see the books by Castro, Deuchler. For Kindle see the book by Tallent.

By "do the job well" do you mean ease of use and effort or what individuals produce due to lack of competence!
By "do the job well" I mean having features that are well-matched to what book coders actually need to code
So, what it comes down to is the lack of good tools.
Well, I guess a "good tool" could mean for arguments sake a portable cross compiler which takes say a subset-superset variation on "HTML" which is well-matched to our coding tasks, and which cross-compiles that code to four different flavors [of "HTML"]: 1) "Standard" HTML for HTML desktop browsers, 2) "HTML" for EPub devices, 3) "HTML" for Mobi (for Kindles), 4) and to Txt70, for, well, I'm not sure who actually uses Txt70 anymore. Where #2 is actually the need for the entire EPub source file set, and #3 is probably best implemented by creating a Mobi-flavored variation of the "EPub" source file set [which Kindlegen then can generate a mobi file from] Part of the problem is that "HTML" is interpreted pretty differently by the major HMTL desktop browsers [to the extent they even agree] than by EPUB devices, which have two major flavors -- Adobe-derived vs. Apple -- which in turn is interpreted pretty differently than Mobi devices (which in practice nowadays means Kindle devices) I am not wedded to "HTML" -- its just that we [in theory] have good cross compilers for it, people "know it" or at least *think* they know it [for better or for worse!] and there are a zillion tools and books for working with it.

On 10/18/11 11:51 AM, Marcello Perathoner wrote:
Why you'd want to preserve a book that takes longer to digitize than it took to write is beyond me. Those cringeworthy books were turned out by the hundreds by faceless writer farms. They read like Crisco recipes.
As long as none is forcing you to spend your precious time on them, what is wrong with others spending theirs? It doesn't detract any value from the books you enjoy. Or the books I enjoy, which are probably an entirely different subset of all books again. Regards, Walter

On 10/18/2011 09:49 PM, Walter van Holst wrote:
On 10/18/11 11:51 AM, Marcello Perathoner wrote:
Why you'd want to preserve a book that takes longer to digitize than it took to write is beyond me. Those cringeworthy books were turned out by the hundreds by faceless writer farms. They read like Crisco recipes.
As long as none is forcing you to spend your precious time on them, what is wrong with others spending theirs? It doesn't detract any value from the books you enjoy. Or the books I enjoy, which are probably an entirely different subset of all books again.
If DP's queues were empty and people eager to get more things to PP, I'd concur with you. As it is, I must disagree. -- Marcello Perathoner webmaster@gutenberg.org

It would be amazing if you could provide a pack for kindle on your website. For someone who's good with computers this is not such a big deal, however, the average user would value this a lot.
Help request that just came in. Owning devices that can carry thousands of books makes people change their attitude. The weight is the same if you carry 1 or 1,000. -- Marcello Perathoner webmaster@gutenberg.org

It would be amazing if you could provide a pack for kindle on your website. For someone who's good with computers this is not such a big deal, however, the average user would value this a lot.
freekindlebooks.org has made an effort to identify the books that would make such a good book "pack." Feel free to crib those efforts. The freekindlebooks list is the classic books and authors most often read by US audiences. OR, you could just crib on the "Magic Catalog" effort to include a list of best books which could be downloaded "on demand" simply by clicking on a book title from within a Kindle (or an advanced EPub device -- to the extent a particular EPub device supports dynamic downloading of books for free)

On Tue, Oct 18, 2011 at 4:58 PM, Marcello Perathoner <marcello@perathoner.de> wrote:
If DP's queues were empty and people eager to get more things to PP, I'd concur with you. As it is, I must disagree.
They are eager to get more Campfire Girls to PP. Looking at what has been waiting for a PPer for over a year, I see that Immanuel Kant (in German) has been waiting for over three years, as has Alexandre Dumas (in Finnish). Looking at English, The complete works of Richard Crashaw (vol 1 of 2) has been waiting over two years, Translations from the German, Vol. III - Musæus, Tieck, Richter has been waiting just short of two years, Minor poets of the Caroline period, Vol. III, Poems of James Russell Lowell, and Seven Plays of Lady Gregory have all been waiting over a year and a half. The oldest EASY Juvenile in the queue is less than a month old. -- Kie ekzistas vivo, ekzistas espero.

On 10/18/2011 11:18 PM, David Starner wrote:
They are eager to get more Campfire Girls to PP.
But of course! If, at a children party, you put on the table carrot sticks and chocolate, milk and coke, the coke and the chocolate will be gone as sure as the carrots and milk will be left. -- Marcello Perathoner webmaster@gutenberg.org

On Thu, Oct 20, 2011 at 9:25 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
But of course! If, at a children party, you put on the table carrot sticks and chocolate, milk and coke, the coke and the chocolate will be gone as sure as the carrots and milk will be left.
If you want a top-down organization that proofs what you think worthy, rather than a bottom-up outfit that proofs what people want to proof and read, then your goals would seem a better fit with academia: an authority with the power to decide on a canon and to coerce people working for pay or academic whuffie to prepare the texts. Though I might add that DP manages to tackle a surprising number of works that are indisputably worthy and not at all fun to proof -- like the Baburnama, or the Kashf-al-Mahjub, or Haeckel's Report on the Radiolaria. (I think all of those might be held up in PP; we've just about wrassled P3 to the ground, next are F2 and PP). -- Karen Lofstrom

On Thu, Oct 20, 2011 at 3:25 PM, Marcello Perathoner <marcello@perathoner.de> wrote:
On 10/18/2011 11:18 PM, David Starner wrote:
They are eager to get more Campfire Girls to PP.
But of course! If, at a children party, you put on the table carrot sticks and chocolate, milk and coke, the coke and the chocolate will be gone as sure as the carrots and milk will be left.
Volunteers are adults. It doesn't mean they eat any better, but it does mean they can tell you where to shove your carrot sticks as they leave to go volunteer elsewhere. -- Kie ekzistas vivo, ekzistas espero.

On 10/20/2011 11:11 PM, David Starner wrote:
Volunteers are adults. It doesn't mean they eat any better, but it does mean they can tell you where to shove your carrot sticks as they leave to go volunteer elsewhere.
So they are adults! Adults can be made to do things if they are told those things are important. But DP chose to tell them that more Campfire Girls are as important as more Kant. Also PPers might start working on difficult stuff if they are not distracted by an endless supply of easy Campfire Girls. -- Marcello Perathoner webmaster@gutenberg.org

On Thu, Oct 20, 2011 at 4:16 PM, Marcello Perathoner <marcello@perathoner.de
wrote:
So they are adults! Adults can be made to do things if they are told those things are important.
*Made* to do things? Maybe if you're going to start paying them. People volunteer their time towards things that interest them, or things that *they * think are important. You can't just force them to only work on what *you*think is important. If my motivation for contributing to DP is the thrill I get from proofreading Campfire Girls stories, what possible reason would I have to stay if you take away the Campfire Girls stories?

On Thu, Oct 20, 2011 at 12:16 PM, Marcello Perathoner <marcello@perathoner.de> wrote:
Also PPers might start working on difficult stuff if they are not distracted by an endless supply of easy Campfire Girls.
PPers as well as proofers work on the sorts of books that they enjoy, or feel competent to do. If you ordered one of the PPers who work on the Not Quite Nancy Drew books to work on the Kashf-al-Mahjub, I suspect you'd have an unhappy PPer and a sub-standard output. I did a lot of the proofing on the Kashf-al-Mahjub. The work was demanding. What I knew of Islamic history and philosophy, as well as Arabic and Persian languages, was absolutely essential. -- Karen Lofstrom

On 10/21/2011 01:26 AM, Karen Lofstrom wrote:
PPers as well as proofers work on the sorts of books that they enjoy, or feel competent to do.
Ain't it the job of DP to increase proofers competence? Offer some training? Get them interested in new topics? Aren't the proofers interested in actually *learning* something? Yours is only half of the story, the second half. The first half is: Over the years, DP has actively steered people away that were interested in serious literature. With all that obsession about page counts and facsimile-formatting. Also chick lit is using up the processing power and bandwidth that people need to work on serious literature. In consequence people interested in serious literature, whose work stuck for years in the queue, left in disgust. It is not that "volunteers" in general are interested in chick lit. It is that most volunteers interested in serious literature have left. That's the grave you dug, now go lay in it. -- Marcello Perathoner webmaster@gutenberg.org

On Fri, Oct 21, 2011 at 8:00 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
Yours is only half of the story, the second half. The first half is:
Over the years, DP has actively steered people away that were interested in serious literature. With all that obsession about page counts and facsimile-formatting.
I don't see how obsession about page counts will deter the person interested in working on Russian literature for one second. Nor why such a serious person can't ignore page counts; I've managed to do so.
Also chick lit is using up the processing power and bandwidth that people need to work on serious literature. In consequence people interested in serious literature, whose work stuck for years in the queue, left in disgust.
Chick lit is not the word you were looking for. In any case, what have we just completed? Looking right now at the list of completed DP works, I see The Poetical Works of Robert Bridges; Blackwood 373 - 1846.11; England, Canada and the Great War; Birds and Man; The Influence of the Organ in History; Notes & Queries 1851.07.26; and Cleopatra's Needle: A History of the London Obelisk among the last 20. So obviously we're doing something other than chick lit. Perhaps you should blame history for using the processing power? In fact, I think that needs repeating. We have a very limited supply of Campfire Girls and friends, they go through quickly and are easy to process. They're having very little impact on the system. Who has the largest English language queues in P3? History. Science. Religion. Fiction, Literature and Poetry currently have no wait to get into P3. Moreover, I'd like to see a real list of "serious literature", this time looking at literature available in English in 1922. Because that's what DP does, mostly: English. And non-English material is in its own queues. Nothing is blocking German literature, but German, and we apparently don't have that many people who like working with Fraktur. -- Kie ekzistas vivo, ekzistas espero.

Also chick lit is using up the processing power....
Hey, I did a little "chick lit" book outside of DP -- at Michael's Request. D.H. Lawrence "The Rainbow" http://www.gutenberg.org/ebooks/28948 "chick lit" can be good lit too! And yes there are still a lot of "good books" out there to be done -- *and* which are also "fun books" to do. One just needs to put a little effort into the looking!

Well, I would not have any problem doing work with Fraktur. I learned to read German in Fraktur and Sütterlin. Fraktur is not actually that hard or demanding. The bigger problem is that books in Fraktur often use archaic/obsolete german grammar and spelling. I should mention that I do like the way DP does things so I am not planning on getting involved with them. Other projects I willing to take a shot at. regards Keith. Am 21.10.2011 um 17:54 schrieb David Starner:
Moreover, I'd like to see a real list of "serious literature", this time looking at literature available in English in 1922. Because that's what DP does, mostly: English. And non-English material is in its own queues. Nothing is blocking German literature, but German, and we apparently don't have that many people who like working with Fraktur.

But of course! If, at a children party, you put on the table carrot sticks and chocolate, milk and coke, the coke and the chocolate will be gone as sure as the carrots and milk will be left.
I can't speak for other people who have proofed at DP, and I am not currently proofing at DP, but, in practice, this is how things used to go for me: "Gee I am tired and I have a headache but I still feel like I want to make a contribution to PG. Since I have low energy right now I'm not going to work on my own projects, why don't I go to DP and see what I can contribute for the next hour. Hm, they have a lot of really hard uninteresting texts with really bad OCRs which no one is ever going to read anyway. I don't think I feel like working on those right now, in part because their proofing tools are not very good and its going to make my headache worse. OK, now here's a novel that actually looks like something someone might actually read some day, and the OCR is competent -- not that great but not that bad, so I guess I can 'knock off' a bunch of P1 pages pretty quickly and pretty accurately and make at least *some* kind of contribution today. I'm not going to work on the higher rounds because I don't feel like I'm at the top of my form right now and I'm afraid I'll let a bunch of stupid mistakes go by...." Now, I'm still looking for the coke and the chocolate -- what volunteers *actually* get at PG *and* DP is typically a swift kick for their efforts.

If DP's queues were empty and people eager to get more things to PP, I'd concur with you. As it is, I must disagree.
I have strongly criticized DP's queue management choices in the past. What I see now is: http://www.pgdp.net/c/stats/stats_central.php where my central concern, namely that the gap between books worked on and books completed was continuing to grow, representing a large and growing percentage of "volunteer effort being wasted" "stuck on queue" seems to be better addressed. IE looking at the graph back in 10/2008 something like 33% of the total volunteer effort was "stuck on queue", but now the percent volunteer effort "stuck on queue" is down to a more respectable 25%, and it seems like that is being managed to be about 8,000 books. They have seemed to understand the need to put more effort into postprocessing -- once you have made a large investment in getting a book to a near-done stage it is really important to push them out the door. I don't think DP has changed anything else so what I think they have done is simply reduced the rate of books accepted into the system in the first place, increasing the (implied) queue of books digitized but not begun proofreading. What this does, imho, is simply chase away volunteers who want to proof the lower rounds. But that is probably better than simply wasting those volunteer efforts by having too much stuck on queue in later rounds.

On Wed, 19 Oct 2011, Jim Adcock wrote:
really important to push them out the door. I don't think DP has changed anything else so what I think they have done is simply reduced the rate of books accepted into the system in the first place, increasing the (implied) queue of books digitized but not begun proofreading.
On the contrary, I'm aware of a decent amount of change at DP over a last few years. I've seen a more concerted effort to encourage users to work in higher rounds, and give them help to do so. I've seen increased flexibility in having projects repeat and skip rounds. If you look back far enough P2 was a problem that focused on, and then brought into a pretty good balance. At that point P3 seemed an insurmountable problem with a huge backlog. Then the efforts above brought that backlog down sooner than anyone expected. I gather the focus now is on F2 and PP. Of course it is not a perfect system, it never was, and never will be. There is still lots of room for improvement. Also, it is always changing, so it can be hard to get a grasp on just what the currant state of affairs is. Sometimes just the numbers don't tell the whole story. --Andrew

On Tue, Oct 18, 2011 at 5:51 AM, Marcello Perathoner <marcello@perathoner.de> wrote:
I'm only interested in books that people actually read.
Then we've pretty much completed the list of English books. The exceptions are exceptions because they're hard; Newton's Principia is not a trival work. Far less so is the OED.
And not crappy books either, because there's no difference, reading a crappy book or watching a crappy show on tv.
Which contradicts your other goals; if you want it to be universal, you have to give people what they want, not what you think they should want.
Running a cooperative workflow of volunteers involves motivating people. A leader of such an effort can't just tell them what to do, or sometimes even how to do it. They need to gain their good will, evoke their interests, stimulate their creativity, and communicate with them in an open manner. Who is going to do that in the proposed organization?
What good is a leader if she *doesn't* tell people what to do ???
You get to tell me what to do when you pay my salary. You have no idea how to lead volunteers.
Wikipedia instead, by enforcing strict guidelines -- vilified by many -- in a lot less time, has become omnipresent. It's hard to come across a site that doesn't link to wikipedia.
You'll note that Wikipedia doesn't have a leader, and is pretty anarchic as a whole. Following the Wikipedia example would encourage PGDP to internally set better standards by consensus, not listen to one leader. Secondly, Wikipedia works by letting people work on what they want. There's quite a detail of information on various TV shows, random rock bands, minor sports stars, D&D, etc.
The next advance in technology will leave PG out of the game completely because PG has failed to deploy the least of technological safeguards, eg. a master format that can be made to transform into other unheard-of-today formats
I was all for TEI-Lite, until you made it clear you had no interest in my concerns about what it needed. You'll sell a master format the day you balance the concerns of the parties who have to use it. You'll never get people using a master format if you go about it by ordering people to use your format. -- Kie ekzistas vivo, ekzistas espero.

Am 18.10.2011 um 21:56 schrieb David Starren: [snip, snip]
You'll note that Wikipedia doesn't have a leader, and is pretty anarchic as a whole. Following the Wikipedia example would encourage PGDP to internally set better standards by consensus, not listen to one leader.
Wiki does not have a leader? O.K. no particular reader, but leaders. I know a professor, a leader in research on Dante. He had a very hard time getting the entry change because it was factually wrong. Wiki is not so anarchic as one tends to think, and for good reasons, too
Secondly, Wikipedia works by letting people work on what they want. There's quite a detail of information on various TV shows, random rock bands, minor sports stars, D&D, etc.
Yet, there will always those that consider ALL this information superfluous. Others, culturally irrelevant. Others, an important record. Others, just entertaining. regards Keith.

Am 18.10.2011 um 11:51 schrieb Marcello Perathoner:
Michael's way of (non)-leading has failed spectacularly. PG has collected a big heap of books -- that's true -- but the heap is too inhomogeneous to be of any real use. Nobody uses PG books as they are. Every site I know puts PG books into their own master format before redistributing them. Here, you are not being fair to Micheal. He had a vision and turned it to reality. Of no REAL USE! Others, are USING them!
Wikipedia instead, by enforcing strict guidelines -- vilified by many -- in a lot less time, has become omnipresent. It's hard to come across a site that doesn't link to wikipedia.
Every advance in technology has caught PG with its pants down. The switch to HTML had to be done by manually redoing all the books, because plain text turned out to be non-processable. The HTML that DP has produced for years and is still producing does not work well on the new portable reading devices. Again the plain vanilla texts were his vision. From the very beginning I had advocated a different format. Plain vanilla has its, caveats. It is one of the very few formats that have survived over 30 years. And, it still used and useful. Do not get me wrong I am not a fan of PVT.
The next advance in technology will leave PG out of the game completely because PG has failed to deploy the least of technological safeguards, eg. a master format that can be made to transform into other unheard-of-today formats True, PG should develop a master format. Yet, with any format it will become dated and a conversion to a more modern one will hardly ever be able to be done automatically. (Ever tried converting databases to the newer format) Of course, if you continually update you can avoid this problem, but it always uses considerable resources.
If you can go find some real old HTML files from the beginning and see if you like what comes out. I doubt it. regards Keith.
participants (22)
-
Alex Buie
-
Andre Engels
-
Andrew Sly
-
Benjamin Klein
-
Bowerbird@aol.com
-
David Starner
-
Greg Newby
-
James Adcock
-
Jana Srna
-
Jim Adcock
-
Juliet Sutherland
-
Karen Lofstrom
-
Karl Eichwalder
-
Keith J. Schultz
-
Lee Passey
-
Marcello Perathoner
-
Roger Frank
-
Scott Olson
-
Sparr
-
traverso@posso.dm.unipi.it
-
Walter van Holst
-
Zara Baxter