
At 07:10 PM 10/19/2004 -0800, you wrote:
Steve Thomas writes:
Most users of PG don't go around grumbling about the lack of XML or the ability to output as PDF. They're just stoked to be able to find the text online.
That's why they're users of PG. If they needed XML or PDF, they go elsewhere. And frankly, I've heard many complaints about how hard it is to process PG texts and how much information is lost.
I don't want to add to the flame war here, but I can say this, which has been said here before. Sometimes I will find a PG text which I would like more information about, so I will go to google and search for it. In almost all cases, I have found tons of sites which somehow convert the books into html or a similar format. blackmask.com immediately comes to mind but there are lots of others. Many don't give credit to PG at all. My point is that yes, I agree with gutenberg9443 in that I would much rather have plain text first and worry about the rest later, but many people don't need to complain to PG about plain text only for the simple reason that they can look for almost anything on google and find a nicer formatted version. I would like to see PG eventually go to xml not because I particularly like the format but because the new DAISY standard for digital talking books for the blind uses a form of xml. It should, in theory, be possible to convert html to DAISY, but how well that would work I don't know. If anyone wants to analyze a set of DAISY files, go to http://bookshare.org/ and search for an early PG title. I say "early" because they apparently quit adding the newer titles. I think there might be a demo link on there just for public domain books. I will make one other comment on accents. Yes, I can see the importance of 8-bit files. I have a local mirror of almost all of PG on my system and I finally switched to getting 8-bit files only of works in non-English. However, since I am blind and I read with speech, the accents really don't matter since the synthesizer doesn't pronounce them anyway. If it sees a letter in the high ASCII range, it skkips it. This is especially bad, for example, with the works of Tolkien because accents are used so heavily.