re: Re: [gutvol-d] presentation *is* structure (it's right in front of your eyes)

thank you jon, for weighing in... *** jon said:
Those who understand and speak of XML, they know that XML is not in and of itself a specific markup vocabulary,
those who _know_ x.m.l. do know that, right. but some of the people who _speak_ of x.m.l. do _not_ seem to know it, and they gloss over all the difficulties without much comprehension. they think as long as something is "in x.m.l.", it's gonna have all these magical properties, when the truth of the matter is that you must put a lot of sweat into it to get most of them, sometimes more sweat than they're even worth.
there are many extraordinarily successful applications of XML. One of the most recent applications of XML which a lot of people recognize and use is RSS, used for blog feeds and the like.
on that you are correct. any time that you want to exchange data between incongruent applications, x.m.l. _can_ be a good solution. (it's not _necessarily_ good, a lot of complications can occur that mess things up regardless, but the _potential_ is certainly there.) but even on this "successful" use in the case of r.s.s. and blog feeds, there is -- as i am sure you know -- a great deal of "controversy" concerning whether r.s.s. is the best way of doing it, or "atom" is... and there are additional controversies about _which_ r.s.s. version is the _best_ one. and even when all those things get sorted out, what bloggers might find is they have simply reinvented the wheel previously known as an announcement listserve, where a missive is sent out to a group of subscribers and simultaneously added to a cumulative website, in which case a whole lot of work was done for no real good reason. but hey, as long as everyone had fun along the way, i guess that's ok.
ZML is an example of a "regularized plain text" system to represent certain important textual document structures in a way which is fully machine-readable. I could easily create an XML-based markup vocabulary clone of the ZML system to represent the same identical structures.
you say that often. but you've never really told us what the point is. even if it's possible to represent a simple system in a complex one, nothing is gained. you've only lost the benefit you had of simplicity. and indeed, that's my essence: use the most simple system possible.
Definitely. But what we require is to be able to machine-read and machine-process the structure and semantics of a textual document.
right, and my "machine" (i.e., app) can read and process the structure. (and we really need to handle "structure" and "semantics" separately, because semantics is a _lot_ more complex, and much too thorny to just toss off so casually. but i'll have more to say on that later...)
Even if humans can figure this out by a simple visual glance of the content in a high-typographic-quality presentation, does not automatically mean it is easy for machines to do likewise.
let's put aside the question of how "easy" it is for a machine to do it. what i have said here, and will say elsewhere, is my routines _can_. and when i release the proof, other people will know that it's possible, and they'll then be able to write their own routines that can do it too. then everyone will wonder why they thought it was so difficult before.
It is also not easy to codify because visual presentation is "fuzzy" (pun not intended), sometimes relying on surrounding context to precisely define the document structure.
well, you can go on and on about all the reasons why it is difficult. but once people are doing it, routinely, those "reasons" won't matter.
We have to remember that there are a lot of variances in conventions (both historically and geographically) used for typographic layouts to visually represent structure and semantics.
so someone will modify their routines to work with those conventions.
Not only that, in some cases they don't even follow conventions, especially when there are oddities in the content where no convention has been firmly established.
"oddities" are only "oddities" until someone figures out their pattern. because if there is no pattern, then nobody understood the structure in the first place, so there's no way to mark it up using _any_ system.
And as previously noted, sometimes the context must be factored in to fully ascertain structure and semantics.
ok, _now_ you're finally getting into the "semantic" part. if the only way you can understand how to mark up the text is to actually _understand_ the content, that is _semantic_. and yes, you need a high level of "intelligence" -- either human or artificial, and the artificial kind ain't here yet -- to do that markup, which means that you need humans to do it, and that's why it's costly. and even if you've got a lot of volunteer labor to throw at the task, it might not be enough, because this job is also _complex_ to boot. so you can't just use any volunteers, they have to be highly skilled. and to top it all off, it's time-consuming, so it's even more costly. that's why there are very high costs to doing semantic markup, much higher than the costs of (even manual) structural markup. and you know what the real kicker is? even though the _costs_ are sky-high, the _benefits_ of semantic markup ain't that great. certainly not from the standpoint of the average reader, anyway. (some scholars might make out, if you coded what they want.) hey, it's great that the machine can now tell you with certainty that the reason "new york times" has been rendered in italics is because it's a newspaper. but the reader _already_knew_that_. the writer made it clear in the course of setting the context. i will get to more examples down below, but you get the drift...
The "Gedanken" test I use for the minimum requirements of machine-readable markup (or system such as ZML) for textual documents is if a text-to-speech engine is potentially capable of communicating the structure and semantics of the content to a blind listener (who is unfamiliar with any print conventions -- they've never heard the terms 'italic' or 'bold')
i doubt you'd find a blind person who's never heard those terms. but go on...
so they can, in real-time (i.e., a one-time linear audio presentation), gain the same level of comprehension as a sighted person (familar with typographic conventions) would in reading a high-quality print version of the text. Pass this test, and the markup will likely be pretty good for just about any purpose in addition to accessibility.
not only will a text-to-speech engine be "potentially capable" of communicating the content to a blind person, i actually intend to build such an engine right into my viewer-program. whether or not it delivers the _semantics_ of the content is wholly dependent on whether you put that information _into_ the file in the first place. and -- of course -- that's true of _any_ markup system. but z.m.l. will have a way to put it in, yes, and if you do, then there'll be a way to get it out as well. you'll have to specify exactly _how_ the text-to-speech engine should vocalize this info. but any way you can do it, i can too.
Is ZML or other type of "regularized plain text" (or the XML-based ZML markup vocabulary analog) sufficient to pass this test?
yes. that's what i've been saying all along. that's what the test-suite is all about, baby.
The system only needs to be as complicated as needed to represent the needed document structures and content semantics in a machine-readable way such that it passes the test described above.
if you can do it, i can too.
The $64,000 question therefore is what structure and semantics needs to be represented in a machine-readable way, and to what degree of precision.
different people will require different degrees of "precision". my target-population is the one michael has always targeted.
Maybe ZML (and its markup analog) is sufficient, maybe it isn't.
of course, we can say that about any system, can't we... ;+)
I interpret from those here who have first-hand experience handling large numbers of the various types of texts in Project Gutenberg, that ZML (or any other type of "regularized plain text" system) does not have sufficient granularity to pass the "test."
well, that's how i read the feelings of everyone here who has chimed in so far on the matter, except myself and maybe a couple of other people in varying degrees. but i note once again, for the record, that no one has yet given me a list of "hard e-texts" that they think might give my z.m.l. a run for its money on difficulty. so we really don't have an answer to that yet, do we?
Of course, we can argue whether the test as I describe above is too strict, or maybe not even on-target.
well, my primary aim is sighted people, so your test is not "on-target", but that's ok, i understand what your point is. i should note, however, that blind people seem to me to be the most delighted group of users that project gutenberg has, and are probably the people _most_ appreciative of plain text. all this in spite of the fact that there is _no_ semantic markup -- and very little structural markup either -- in the e-texts. no, it appears the magic formula for _that_ has been simple -- get everything else _out_of_the_way_ of the words themselves. i will let you think about that...
But keep in mind this is what the *accessibility community* wants in machine-readable textual documents, and what they are working towards in their activities -- they've wholeheartedly embraced XML-based approaches, for example.
they've been misled to believe the promises just like everyone else.
To wave one's hand in dismissal
it is dishonest to try to imply i am "waving my hand in dismissal". please don't do that.
and say they are being unrealistic or stupid,
i, of course, have never said anything like that. don't say that i have. please don't do that.
or that they don't really matter in our decision-making,
it is unseemly of you to put those kind of words in _my_ mouth. please don't do that.
is a pretty bigoted and "blind" position (pun intended) to take
which is what makes it so distasteful. so just stop it. please don't do that.
-- it is also stupid since meeting their needs for structure and semantics has many other benefits as well.
enough, jon. please don't do that.
I might ask a few text-to-speech experts I know at DAISY to look at the ZML system and tell me if it has sufficient structural granularity for high-quality text-to-speech purposes.
the judgement of bureaucrats doesn't impress me. i'll listen to the reports of blind users themselves.
As far as I am concerned, if they come back and say "no it doesn't", then I would recommend that PG should not consider ZML for its Master format
i'm not seeking your endorsement, jon, so please feel free to make any recommendation to project gutenberg that you want concerning what they should consider for their master format.
but maybe consider ZML for its plain text output versions.
whatever.
Bold lines which appear by themselves in the flow of text are sometimes used for structures other than headers.
my routines are not so brain-dead as to be confused by that. but thanks for enlightening me.
There are many other similar weirdities involved with italicized text, indented text, etc., that we see in visual layouts of texts.
please do let me know about any mistakes that my routines make on any e-text in the library if you review my program, as i am sure there are "weirdities" i've not yet come across.
Context is often important to consider to unambiguously discern structure for a visual cue. For example, one convention often used is that the names of ships is to be italicized. Thus, if a machine is to discern the name of a ship from linguistically emphasized text, it has to look at the context.
that's a very good example, jon, so i'll discuss it a bit. my approach is to have the o.c.r. program _retain_text_styling_. so if the ship-name was italicized in the original book, it would continue to be italicized in the o.c.r. text (assuming recognition), and that would carry through all the editing to the final version. unless the person creating the digital version were to indicate that those italics represented a ship-name, they would remain as simple italics, and an end-user would be on her own to know why. _just_like_she's_on_her_own_when_she_reads_a_paper-book_. you might consider it to be some huge problem that the reader doesn't know _exactly_why_ something is being italicized, but i don't think it is, because they virtually always figure it out... even a blind reader can figure it out. heck, even in the e-texts with the italics stripped out, the blind reader can figure it out. if you asked any of those readers -- sighted or blind -- how much money they would pay to have that information supplied, to assess how much _value_ they place on it, they would laugh in your face. and that's _all_ you need to know about _that_ cost-benefit ratio. in the _rare_ case where that information _might_ be valuable, i have ways to mark it. and as soon as you show me those cases, and show me exactly how your x.m.l. markup provides a solution, i will be quite happy to show you exactly how i would do it too.
No, I'd say it is more accurate to say "for reading by eyesight, structure is represented by visual presentation cues."
you're talking more about _output_ here. whereas i am talking about _input_ instead. i'm talking about how to examine the p-book -- specifically, the o.c.r. that results from it -- to automatically determine the structure of the text. that structured text can then be rendered visually (on-screen or paper) or via text-to-speech. when i talk about "presentation", i'm talking about the p-book that we work with as our original source. however, in an aside, i've never even heard this _discussed_ yet, not here or anywhere else for that matter, but the time has come where we can expect to start seeing (or should i say "hearing") books that have been "input" using voice-recognition technology. in other words, the age of scanning might come to an abrupt end, or taper off significantly, when people start creating e-books by reading a book aloud into a voice-recognition system. they are remarkably improved these days, according to everything i read, plus their cost might fall _considerably_ in the near future too, and the number of people who might be willing to "enter" a book in this manner is probably far greater than those willing to scan. of course, it will take a new kind of software program to "fix" the transcription errors that will occur using this input method, but maybe that's already a part of these systems, i don't know... not making any predictions here, just keeping my eye open for it. what this might mean for blind people, i don't even have to say...
Remember, there are different types of presentation of text, not only visual.
the mac has had text-to-speech for well over a decade now, jon, right in the system. i've already put it in some of my e-book apps.
To focus on visual as the only form of presentation that matters is being very short-sighted (pun intended.)
good pun, if there can be said to be such a thing... ;+) but making the point to me is totally unnecessary.
And I've stated the core question to answer is: "Is ZML (or any other system of regularized plain text) sufficient to represent document structure and semantics for Project Gutenberg Master texts?"
that _is_ the right question.
I assume Bowerbird is saying "yes"
there's no reason to "assume" that i am saying "yes". i've actually _said_it_, over and over and over again. and built a test-suite to prove it.
and many others here are saying "No".
well, most everyone who has spoken up has said "no". (dale and maybe james have given a limp "perhaps".) and there might be some lurkers who i have convinced. but by and large, all the loudmouths have loudly said "no".
I answer the question with a "No".
well, thanks for putting yourself firmly on the record jon. again.
Amusingly, Networker, a very insightful ebook expert who often posts to The eBook Community, calls ZML a type of ITF, "Impoverished Text Format", to indicate ZML has insufficient granularity -- it is "impoverished".
well, heck, jon, if the only thing i'd ever heard about z.m.l. was the one-sided "descriptions" you've given it over there, i would think that it sounded like a ludicrous idea too. networker will come around when he sees the real thing. everyone will. after all, the proof _is_ in the pudding... -bowerbird

Bowerbird@aol.com wrote:
and indeed, that's my essence: use the most simple system possible.
Use the most simple tool that does the job, but don't use a simpler one. If you had done any research on ebooks before self-proclaiming yourself demi-god, you my have noticed that your toy markup language is woefully underpowered. You don't even handle the very first page of Sherlock Holmes. Mark this up in ZML. Note that "Being a reprint" is a subtitle to "Part I" and not a paragraf. Same goes for "Mr. Sherlock Holmes". Note also that: "John H. Watson" is emphasized, although it's the only part of the title that's not italic. --- PART I. _Being a reprint from the reminiscences of_ JOHN H. WATSON, M.D., _late of the Army Medical Department._ CHAPTER I. MR. SHERLOCK HOLMES. IN the year 1878 I took my degree of Doctor of Medicine of the University of London, and proceeded to Netley to go through the course prescribed for surgeons in the army. ... ---
but i note once again, for the record, that no one has yet given me a list of "hard e-texts" that they think might give my z.m.l. a run for its money on difficulty. so we really don't have an answer to that yet, do we?
How about doing your homework yourself? The world at large was not created to do your bidding. Go, find a slew of difficult texts, mark them up, fix your program and show us what you can do. But, please, stop whining about us not doing your work.
of course, it will take a new kind of software program to "fix" the transcription errors that will occur using this input method, but maybe that's already a part of these systems, i don't know...
Again, researching your stuff before starting a colossal handwave is out of the question. -- Marcello Perathoner webmaster@gutenberg.org
participants (2)
-
Bowerbird@aol.com
-
Marcello Perathoner