
I personally like Marcello's efforts pretty well, but let me accept his challenge and use his examples as examples of the problems that I *personally* find as a reader of PG texts -- that I *in reality* find with PG's current efforts -- as well as examples of the need for better input markup languages than we currently are using:
Go here:
http://www.gnutenberg.de/pgtei/0.5/examples/candide/4650-pdf.pdf
and tell me what you don't like about the title page.
What I don't like about the title page is that it doesn't show up correctly on my choice of machine, because my choice of machine assumes the existence of spine information. Thus the "Title" shows up on my machine as "4650-pdf" and "Author" shows up as "4650-pdf" So when I come back to my machine two weeks from now and search for this book by title, I cannot find it. And when I search for it by author, I still cannot find it. Other than that, this PDF text, to my surprise, shows up beautifully on my machine. I would, in practice, be willing to read this text. The choice of sans-serif font looks weird, and I would like to be able to change this choice of font, but of course I can't because this is PDF. Other than that, I would be happy to read this as a book representing a good effort from PG. Further, I would be able to download this file via the airwaves while waiting stuck at an airport, for example, and read this book there. In my opinion these results well-represent PG as an electronic publishing house.
And then go here:
http://www.gnutenberg.de/pgtei/0.5/examples/candide/4650-h.html
to verify that it looks the same in HTML.
I can verify that it neither looks the same nor even shows up on my choice of machine at all, because my machine doesn't support HTML as a native file format. I can, if I am lucky, access this file via the airwaves using the machine's built-in web browser while waiting stuck at the airport, but I cannot store the results as a file, because my machine doesn't support HTML as a built-in file type. So I can read it on the ground, but I probably won't be able to read it in the air, and if I use my browser to access some other web site then I will probably lose this book. [Well, I take that back -- when I actually TRY to read this file via the airwaves as described above, it crashes my machine, requiring a hard reboot] Assuming I am not at an airport, but rather at home with my desktop computers, I can spend about 5 minutes of my time running an output-file-format to output-file-format cross-rendering software to change this HTML to MOBI format, which IS a native file format of my reader machine. The results then show up on my machine pretty beautifully. Except since HTML lacks spine information the Title now shows up as "4650-h" and the Author now shows up as "4650-h" Which means again, if I come back to my machine in two weeks, I will not be able to find this book. However, other than that, I like these results -- now that I have cross-rendered HTML to MOBI. The results are attractive, I CAN change font size. The font displayed is an attractive and appropriate sarif font. The pages reflow correctly. The links work for navigation. I can switch the machine to landscape mode and everything reflows correctly, supporting the capabilities of my machine. This file format would in practice be my favorite choice of file formats for my machine -- even though I can only access it initially from my house via a desktop machine and I have to waste five minutes of my time translating output file formats. In my opinion these results well represent PG as an electronic publishing house.
And then go here:
http://www.gnutenberg.de/pgtei/0.5/examples/candide/4650-0.txt
to see how it looks in TXT.
To my surprise, I CAN take this UTF-8 TXT formatted file, transfer it to my favorite machine, and it DOES open up correctly interpreting the UTF-8 encoding [You learn something new every day!] This file also lacks spine information, so now Author information shows up as "4650-0" and Title shows up as "4650-0" which means once again, if I come back to this machine in two weeks, I will not be able to find this book. Since this file was rendered char72 under the assumption of a fixed pitch font, and since my machine doesn't use fixed pitch fonts, the end result looks silly and amateurish. The "Printers Ornament" renders as laughable junk. The fixed char72 line breaks make the text in practice unreadable unless I choose an impossibly tiny font -- which then still makes the text in practice unreadable. Gratuitous underscores are sprinkled liberally "everywhere" in the text making the text an unreadable hash. I would not read this text if paid $100 to do so. If I paid good money for this text I would ask for double-my-money back. This is my least favorite file format. Further, it also lacks spine information, meaning that again the Author now displays as "4650" and the Title displays as "4650" which means, again, that if I came back to this machine again in two weeks I will not be able to find this book -- which in this case would be a *blessing* ! In my opinion, if I were a first-time "customer" of PG who makes the mistake of choosing this file format to download to read on my brand of machine, I would conclude that PG consists of a bunch of clueless clowns and I would never return to the PG site again. My Opinions Only -- but I would hope this illustrates how IN PRACTICE a real-world customer's opinion of PG will be filtered through the perception of their choice of reading machine -- and in turn how well WHICH choice of PG file formats they happen to choose to download matches the capabilities of their machine. And without the spine information, none of this really works well with my machine.