re: [gutvol-d] lest the message be missed

jon said:
But I believe it is also essential to preserve all accented Latin and non-accented characters found in *all* books.
once again, the minutiae is being brought to the surface. why doesn't anyone here respond to the main message? because you have no response, that's why. the main point doesn't correspond to your petty-politics of throwing mud at michael, so y'all continue to try to shift the emphasis.
But I believe it is also essential to preserve all accented Latin and non-accented characters found in *all* books.
we know that's what you believe, jon. you've said it over and over and over. and i have said, over and over and over, that _i_ believe it is _not_ essential, not in *all* books. so there we have it. i will do things my way, and i expect that you will do things your way. fine! let's leave the other people here alone! as usual, you look only at the _benefits_, without factoring _costs_ into the equation. the _cost_ of including high-bit characters is the e-text then _breaks_ for some users, ones who are using viewer-programs that are not encoding-savvy, or who don't have all of the correct fonts on their computer. or other reasons i haven't come across yet. if the unicode people had done their job right, and made unicode follow the mac philosophy -- "it just works" -- i would be up there on the unicode bandwagon with you and your friends. but it doesn't "just work", not for everyone -- not yet -- and until it does, i don't want to talk about it. and _after_ it does, i don't want to talk about it _then_, either, i just wanna use it and have it work. for everyone. wanna do something useful? _make_it_work_! not just on the new machines, with certain browsers and not any other viewer-programs -- on _every_ machine, with _every_ program. but until then, just stop bugging all of us about it. we've heard it, too often, and we are unconvinced. and buddy, you are _not_ going to convince us by repeating the same old argument _again_, or by asserting your beliefs again and again... with all the time i've wasted discussing this stupid topic for the 829th time, i could have cleaned up the rest of that "my antonia" text. go away. oh never mind, i will... -bowerbird

At 11:33 AM 3/8/2005, you wrote:
jon said:
But I believe it is also essential to preserve all accented Latin and non-accented characters found in *all* books.
once again, the minutiae is being brought to the surface.
why doesn't anyone here respond to the main message?
Maybe because it got lost between all the other stuff you wrote? Ah, I see you mean:
you can take an average p-book from scans to e-book in one evening.
Well that's great, so start going:) Frank

Bowerbird wrote:
jon said:
But I believe it is also essential to preserve all accented Latin and non-accented characters found in *all* books.
once again, the minutiae is being brought to the surface.
The devil is in the details.
as usual, you look only at the _benefits_, without factoring _costs_ into the equation.
On the other hand, there are certain minimum requirements for every project. As a corollary of an adage I've given earlier: "If a job is to be done, it is to be done right."
the _cost_ of including high-bit characters is the e-text then _breaks_ for some users, ones who are using viewer-programs that are not encoding-savvy, or who don't have all of the correct fonts on their computer.
All web browsers today, and most more advanced formats, such as PDF, support the full Unicode set. That's the future. Embrace it, don't fight it. There's a saying: "I focus on the future since that's where I'm going to spend the rest of my life."
if the unicode people had done their job right, and made unicode follow the mac philosophy -- "it just works" -- i would be up there on the unicode bandwagon with you and your friends.
This is a specious argument. The Unicode working group is doing their job right because before Unicode things were a *real* mess and were NOT working. There is a clear need to unify the world's character sets and to create universal text encoding formats (e.g. UTF-8) There is still some controversy regarding some Han scripts, but by and large Unicode has been successful at its stated goals.
wanna do something useful? _make_it_work_! not just on the new machines, with certain browsers and not any other viewer-programs -- on _every_ machine, with _every_ program.
Throwing out important accented characters is unacceptable. Period. The author/publisher considered it important enough to spend the $$$ to include these characters (in the 19th century it took more effort to print books with accented and foreign characters.) It adds richness to the text, and it is hard to argue that the characters are not somehow an integral part of the text. Anyway, it is trivial, as *you said yourself*, to autoconvert text with accented characters to 7-bit ASCII text. So you *can* make your system work for the folk using legacy systems. It is far better to do the job right for the long-term future, than to compromise it for the short-term (legacy hardware and software that is rapidly becoming obsolete.)
but until then, just stop bugging all of us about it. we've heard it, too often, and we are unconvinced.
Who's "we"? It would not surprise me if the majority of PG and DP volunteers consider it important (or at least a very good idea) to reproduce the full character set in all Public Domain texts, especially now that it is easy to do (both by UTF-8/16 encoding, and using character entities in XML/XHTML/TEI.) Hopefully a few of the PGers and DPers will give their thoughts on this particular topic.
and buddy, you are _not_ going to convince us by repeating the same old argument _again_, or by asserting your beliefs again and again...
Who's "us"?
with all the time i've wasted discussing this stupid topic for the 829th time, i could have cleaned up the rest of that "my antonia" text.
If it weren't important *to you*, you would not have replied. I can only interpret your vociferous replies to mean that you consider permanently dumping accented characters to be an *important* requirement to implement your system. That's why I have used the word "inconvenient" since that's the only reason I can think of. But if you have another reason why you believe it o.k. to dump accented characters for most English language PG texts, let us know. You've not given a good reason why they should not be reproduced. (The argument of meeting legacy needs is not a compelling argument since, as you said and I'm repeating what I said above, one can autoconvert a Master document with accented characters to 7-bit ASCII for use by legacy-users. Thus, you can meet the needs of these people *and* the needs and preferences of future generations by preserving the non-ASCII characters. Instead, you inexplicably want to permanently remove accented characters from the digital *Master* versions of most public domain English-language digital texts.) There's a lot of aspects to Public Domain texts that are "inconvenient" which prevent easy digitizing. We figure out how to overcome these "inconveniences" and produce a high-quality product, not make short-term short-cuts so we can avoid dealing with them. Distributed Proofreaders is one example of not giving in to the "convenient", but rather to figure out how to do it right in a reasonably efficient way. Anyway, why the rush to digitize (make structured digital texts) out of page scans, to the point you are willing to sacrifice textual accuracy and quality? So long as the page scans are available for posterity, they can be transcribed any time, and done more carefully and thoughtfully. To me, the most critical thing is to make archival- quality scans of public domain texts and get them online via IA and similar organizations. In the meanwhile, the most popular of these texts can be carefully and methodically converted to Structured Digital Texts (SDT). There are about 1000 very classic Public Domain works (part of the pre-DP PG collection) that should be redone to at least the quality of the "My Antonia" demo project (for those who have not seen it, it is at: http://www.openreader.org/myantonia/ It is still an early "beta", but it's been a real learning experience for several of us working on it.) Jon

Bowerbird@aol.com wrote:
once again, the minutiae is being brought to the surface.
... the minutiae *are* brought to the surface. If we are going to show off in Latin better get our numeri right. -- Marcello Perathoner webmaster@gutenberg.org
participants (4)
-
Bowerbird@aol.com
-
Frank van Drogen
-
Jon Noring
-
Marcello Perathoner