re: [gutvol-d] RFC: Posting Page Scans in DJVU Format

jon said:
This example illustrates that as we begin considering how the PG collection can become more useful to more users in the future, PG will have to require the digital texts to be more carefully and strictly structured
thankfully, that does _not_ require heavy markup, just a consistent dedication to an intelligent design. -bowerbird

Bowerbird wrote:
Jon wrote:
This example illustrates that as we begin considering how the PG collection can become more useful to more users in the future, PG will have to require the digital texts to be more carefully and strictly structured
thankfully, that does _not_ require heavy markup, just a consistent dedication to an intelligent design.
Agreed on both the "heavy" markup (depending upon what is defined to be "heavy", where I know we have a big difference of viewpoint), and agreed on the final point of dedication to an intelligent design, which implies consistency -- and a group dedication to uniformity. It does beg the question, though, of what features, functions, user groups, etc., that we'd like the PG collection of, say, 2020, to support (or integrate with). Michael Hart has brought up his vision for machine translation into other languages. This may impose a set of requirements. Several of us have discussed the collection as being more than just an assemblage of independent and autonomous texts -- we'd like to be able to interlink them, and link them to other digital content repositories such as the many hosted at the Internet Archive. This imposes various requirements on document structure, metadata/identifiers, deeplinking capability, etc. Taking this even further, some are advocating that the PG texts be more integratable into the various social-enhancing technologies now maturing, such as blogs, forums, community and advocacy groups, social networking tools, citizen journalism, etc. This probably will impose a few requirements. Regarding various user groups, PG may want to expand beyond providing a "good read" for personal entertainment, and support the needs of education, library, scholarly, and research uses. This adds various requirements, including improved textual fidelity, more complete and consistent (authoritative) metadata/cataloging, etc. (Interlinking and deeplinking, as mentioned above, factor into the needs of these other user groups.) Another thing that is of interest is increased presentation capability of the collection -- making it more flexible for all kinds and types of presentation, including text-to-speech (improve accessibility.) I know if some people from DAISY stopped by, they would list for us a few requirements that will enhance the overall accessibility of the collection -- most of these requirements will also enhance visual presentation flexibility. (Essentially, well-structured, and sufficiently granular markup.) Certainly others can add more items to this "futures list". If anyone wants to know why I take various hard-nosed positions regarding PG collection development, they are largely based on meeting the requirements, as I understand them, for the above listed items. I want to see the PG collection, and any digital text collection, be able to meet the needs for most human endeavors, rather than being limited in scope. I am very happy that DP is evolving, in my opinion, in the right direction. For example, their plans to implement PGTEI (or similar well-constrained and structurally-oriented XML markup vocabulary) is an important component to meet the requirements for "many-uses". Jon

Any PG volunteers here with experience representing characters in old Greek using unicode? Or perhaps someone at DP? I have a book I'm preparing which has about 12 places where a couple Greek words are quoted, and a few letters I'm uncertain of. Andrew

If it's just a handful of words, what about transliterating them into English characters, as described in http://www.gutenberg.org/howto/greek/ ----- Original Message ----- From: "Andrew Sly" <sly@victoria.tc.ca> To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org> Sent: Tuesday, July 26, 2005 11:13 PM Subject: [gutvol-d] Polytonic Greek
Any PG volunteers here with experience representing characters in old Greek using unicode? Or perhaps someone at DP?
I have a book I'm preparing which has about 12 places where a couple Greek words are quoted, and a few letters I'm uncertain of.
Andrew _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

On Tue, Jul 26, 2005 at 11:13:27PM -0700, Andrew Sly wrote:
Any PG volunteers here with experience representing characters in old Greek using unicode? Or perhaps someone at DP?
I worked on a book at PGDP-EU a few weeks ago with a lot of such things. Title Histoire des Grecs (Tome 1 sur 3) Author Victor Duruy http://dp.rastko.net/tools/proofers/proof.php?project=projectID41e6e8e1a3a9c&proofstate=avail_2 I dit not want to learn how to "type" Greek in a standard way so I developed my own coding scheme in ASCII/latin1 and a Perl script to make the transformation then. Example: I typed "A" for "CAPITAL ALPHA" which got transformed into the right Unicode character. Most of the time the OCR got it right (when the letters looked like latin letters). I tagged Greek between {{...}} for my script to recognize where it started and ended. When there were "accents" (tones) I used a LaTeX notation (à la \'A) or latin1 characters if easier to read (Á). I checked everything after transformation to HTML in a web browser.
I have a book I'm preparing which has about 12 places where a couple Greek words are quoted, and a few letters I'm uncertain of.
For this little I probably can (try to) do it for you. I guess you know your document will have to be coded in some Unicode format in the end (for example: utf-8). Caveat: I never studied ancient Greek, I just "recognize" shapes. Greek friends of mine can (try to) validate the final product if you like. I guess we can continue this conversation in private mail, unless other people on the list are interested.
participants (5)
-
Al Haines (shaw)
-
Andrew Sly
-
Bowerbird@aol.com
-
Jon Noring
-
Sebastien Blondeel