New subject: presentation *is* structure (it's right in front of your eyes)

23 Oct 2004

      thank you jon, for weighing in...

***

jon said:
...
Those who understand and speak of XML, they know that 
  XML is not in and of itself a specific markup vocabulary,
those who _know_ x.m.l. do know that, right.

but some of the people who _speak_ of x.m.l.
do _not_ seem to know it, and they gloss over
all the difficulties without much comprehension.

they think as long as something is "in x.m.l.",
it's gonna have all these magical properties,
when the truth of the matter is that you must
put a lot of sweat into it to get most of them,
sometimes more sweat than they're even worth.
...
there are many extraordinarily successful applications of XML.
  One of the most recent applications of XML which a lot of people 
  recognize and use is RSS, used for blog feeds and the like.
on that you are correct.  any time that you want to exchange data
between incongruent applications, x.m.l. _can_ be a good solution.

(it's not _necessarily_ good, a lot of complications can occur that
mess things up regardless, but the _potential_ is certainly there.)

but even on this "successful" use in the case of r.s.s. and blog feeds,
there is -- as i am sure you know -- a great deal of "controversy"
concerning whether r.s.s. is the best way of doing it, or "atom" is...

and there are additional controversies about _which_ r.s.s. version
is the _best_ one.

and even when all those things get sorted out, what bloggers might
find is they have simply reinvented the wheel previously known as
an announcement listserve, where a missive is sent out to a group
of subscribers and simultaneously added to a cumulative website,
in which case a whole lot of work was done for no real good reason.
but hey, as long as everyone had fun along the way, i guess that's ok.
...
ZML is an example of a "regularized plain text" system 
  to represent certain important textual document structures 
  in a way which is fully machine-readable. I could easily create 
  an XML-based markup vocabulary clone of the ZML system 
  to represent the same identical structures.
you say that often.  but you've never really told us what the point is.

even if it's possible to represent a simple system in a complex one,
nothing is gained.  you've only lost the benefit you had of simplicity.

and indeed, that's my essence:  use the most simple system possible.
...
Definitely. But what we require is to be able to machine-read and
  machine-process the structure and semantics of a textual document.
right, and my "machine" (i.e., app) can read and process the structure.

(and we really need to handle "structure" and "semantics" separately,
because semantics is a _lot_ more complex, and much too thorny to
just toss off so casually.  but i'll have more to say on that later...)
...
Even if humans can figure this out by a simple visual glance of 
  the content in a high-typographic-quality presentation, does 
  not automatically mean it is easy for machines to do likewise.
let's put aside the question of how "easy" it is for a machine to do it.
what i have said here, and will say elsewhere, is my routines _can_.
and when i release the proof, other people will know that it's possible,
and they'll then be able to write their own routines that can do it too.
then everyone will wonder why they thought it was so difficult before.
...
It is also not easy to codify because visual presentation is "fuzzy" 
  (pun not intended), sometimes relying on surrounding context 
  to precisely define the document structure.
well, you can go on and on about all the reasons why it is difficult.
but once people are doing it, routinely, those "reasons" won't matter.
...
We have to remember that there are a lot of variances in conventions
  (both historically and geographically) used for typographic layouts 
  to visually represent structure and semantics.
so someone will modify their routines to work with those conventions.
...
Not only that, in some cases they don't even follow conventions, 
  especially when there are oddities in the content where 
  no convention has been firmly established.
"oddities" are only "oddities" until someone figures out their pattern.
because if there is no pattern, then nobody understood the structure
in the first place, so there's no way to mark it up using _any_ system.
...
And as previously noted, sometimes the context must be
  factored in to fully ascertain structure and semantics.
ok, _now_ you're finally getting into the "semantic" part.

if the only way you can understand how to mark up the text
is to actually _understand_ the content, that is _semantic_.

and yes, you need a high level of "intelligence" -- either human or
artificial, and the artificial kind ain't here yet -- to do that markup,
which means that you need humans to do it, and that's why it's costly.

and even if you've got a lot of volunteer labor to throw at the task,
it might not be enough, because this job is also _complex_ to boot.
so you can't just use any volunteers, they have to be highly skilled.

and to top it all off, it's time-consuming, so it's even more costly.

that's why there are very high costs to doing semantic markup,
much higher than the costs of (even manual) structural markup.

and you know what the real kicker is?  even though the _costs_
are sky-high, the _benefits_ of semantic markup ain't that great.
certainly not from the standpoint of the average reader, anyway.
(some scholars might make out, if you coded what they want.)

hey, it's great that the machine can now tell you with certainty 
that the reason "new york times" has been rendered in italics is 
because it's a newspaper.  but the reader _already_knew_that_.
the writer made it clear in the course of setting the context.

i will get to more examples down below, but you get the drift...
...
The "Gedanken" test I use for the minimum requirements 
  of machine-readable markup (or system such as ZML) 
  for textual documents is if a text-to-speech engine 
  is potentially capable of communicating the
  structure and semantics of the content to a blind listener 
  (who is unfamiliar with any print conventions -- 
  they've never heard the terms 'italic' or 'bold')
i doubt you'd find a blind person who's never heard those terms.
but go on...
...
so they can, in real-time (i.e., a one-time linear audio presentation), 
  gain the same level of comprehension as a sighted person 
  (familar with typographic conventions) would in reading
  a high-quality print version of the text. Pass this test, and 
  the markup will likely be pretty good for just about any purpose 
  in addition to accessibility.
not only will a text-to-speech engine be "potentially capable"
of communicating the content to a blind person, i actually
intend to build such an engine right into my viewer-program.

whether or not it delivers the _semantics_ of the content is 
wholly dependent on whether you put that information _into_
the file in the first place.  and -- of course -- that's true of
_any_ markup system.  but z.m.l. will have a way to put it in,
yes, and if you do, then there'll be a way to get it out as well.
you'll have to specify exactly _how_ the text-to-speech engine
should vocalize this info.  but any way you can do it, i can too.
...
Is ZML or other type of "regularized plain text" 
  (or the XML-based ZML markup vocabulary analog) 
  sufficient to pass this test?
yes.  that's what i've been saying all along.
that's what the test-suite is all about, baby.
...
The system only needs to be as complicated as needed to 
  represent the needed document structures and content semantics in 
  a machine-readable way such that it passes the test described above.
if you can do it, i can too.
...
The $64,000 question therefore is 
  what structure and semantics needs to 
  be represented in a machine-readable way, 
  and to what degree of precision.
different people will require different degrees of "precision".
my target-population is the one michael has always targeted.
...
Maybe ZML (and its markup analog) is sufficient, maybe it isn't.
of course, we can say that about any system, can't we...        ;+)
...
I interpret from those here who have 
  first-hand experience handling large numbers of
  the various types of texts in Project Gutenberg, 
  that ZML (or any other type of "regularized plain text" system) 
  does not have sufficient granularity to pass the "test."
well, that's how i read the feelings of everyone here
who has chimed in so far on the matter, except myself
and maybe a couple of other people in varying degrees.

but i note once again, for the record, that no one has
yet given me a list of "hard e-texts" that they think
might give my z.m.l. a run for its money on difficulty.
so we really don't have an answer to that yet, do we?
...
Of course, we can argue whether the test 
  as I describe above is too strict, or maybe not even on-target.
well, my primary aim is sighted people, so your test is not
"on-target", but that's ok, i understand what your point is.

i should note, however, that blind people seem to me to be
the most delighted group of users that project gutenberg has,
and are probably the people _most_ appreciative of plain text.

all this in spite of the fact that there is _no_ semantic markup 
-- and very little structural markup either -- in the e-texts.
no, it appears the magic formula for _that_ has been simple -- 
get everything else _out_of_the_way_ of the words themselves.

i will let you think about that...
...
But keep in mind this is what the *accessibility community* 
  wants in machine-readable textual documents, and 
  what they are working towards in their activities -- they've 
  wholeheartedly embraced XML-based approaches, for example.
they've been misled to believe the promises just like everyone else.
...
To wave one's hand in dismissal
it is dishonest to try to imply i am "waving my hand in dismissal".
please don't do that.
...
and say they are being unrealistic or stupid,
i, of course, have never said anything like that.  don't say that i have.
please don't do that.
...
or that they don't really matter in our decision-making,
it is unseemly of you to put those kind of words in _my_ mouth.
please don't do that.
...
is a pretty bigoted and "blind" position (pun intended) to take
which is what makes it so distasteful.  so just stop it.
please don't do that.
...
-- it is also stupid since meeting their needs for 
  structure and semantics has many other benefits as well.
enough, jon.
please don't do that.
...
I might ask a few text-to-speech experts I know at DAISY 
  to look at the ZML system and tell me if it has
  sufficient structural granularity for 
  high-quality text-to-speech purposes.
the judgement of bureaucrats doesn't impress me.
i'll listen to the reports of blind users themselves.
...
As far as I am concerned, if they come back and say 
  "no it doesn't", then I would recommend that 
  PG should not consider ZML for its Master format
i'm not seeking your endorsement, jon, so please feel free to
make any recommendation to project gutenberg that you want
concerning what they should consider for their master format.
...
but maybe consider ZML for its plain text output versions.
whatever.
...
Bold lines which appear by themselves in the flow of text 
  are sometimes used for structures other than headers.
my routines are not so brain-dead as to be confused by that.
but thanks for enlightening me.
...
There are many other similar weirdities involved with 
  italicized text, indented text, etc., that we see in
  visual layouts of texts.
please do let me know about any mistakes that my routines 
make on any e-text in the library if you review my program,
as i am sure there are "weirdities" i've not yet come across.
...
Context is often important to consider to 
  unambiguously discern structure for a visual cue. 
  For example, one convention often used is that 
  the names of ships is to be italicized. Thus, 
  if a machine is to discern the name of a ship from 
  linguistically emphasized text, it has to look at the context.
that's a very good example, jon, so i'll discuss it a bit.

my approach is to have the o.c.r. program _retain_text_styling_.
so if the ship-name was italicized in the original book, it would
continue to be italicized in the o.c.r. text (assuming recognition),
and that would carry through all the editing to the final version.

unless the person creating the digital version were to indicate
that those italics represented a ship-name, they would remain as
simple italics, and an end-user would be on her own to know why.
_just_like_she's_on_her_own_when_she_reads_a_paper-book_.

you might consider it to be some huge problem that the reader
doesn't know _exactly_why_ something is being italicized, but 
i don't think it is, because they virtually always figure it out...

even a blind reader can figure it out.  heck, even in the e-texts
with the italics stripped out, the blind reader can figure it out.

if you asked any of those readers -- sighted or blind -- how much
money they would pay to have that information supplied, to assess
how much _value_ they place on it, they would laugh in your face.
and that's _all_ you need to know about _that_ cost-benefit ratio.

in the _rare_ case where that information _might_ be valuable,
i have ways to mark it.  and as soon as you show me those cases,
and show me exactly how your x.m.l. markup provides a solution,
i will be quite happy to show you exactly how i would do it too.
...
No, I'd say it is more accurate to say "for reading by eyesight,
  structure is represented by visual presentation cues."
you're talking more about _output_ here.
whereas i am talking about _input_ instead.

i'm talking about how to examine the p-book -- specifically,
the o.c.r. that results from it -- to automatically determine
the structure of the text.  that structured text can then be
rendered visually (on-screen or paper) or via text-to-speech.

when i talk about "presentation", i'm talking about the p-book
that we work with as our original source.

however, in an aside, i've never even heard this _discussed_ yet,
not here or anywhere else for that matter, but the time has come
where we can expect to start seeing (or should i say "hearing")
books that have been "input" using voice-recognition technology.
in other words, the age of scanning might come to an abrupt end,
or taper off significantly, when people start creating e-books by
reading a book aloud into a voice-recognition system.  they are
remarkably improved these days, according to everything i read,
plus their cost might fall _considerably_ in the near future too,
and the number of people who might be willing to "enter" a book
in this manner is probably far greater than those willing to scan.
of course, it will take a new kind of software program to "fix"
the transcription errors that will occur using this input method,
but maybe that's already a part of these systems, i don't know...
not making any predictions here, just keeping my eye open for it.

what this might mean for blind people, i don't even have to say...
...
Remember, there are different types of presentation of text, 
  not only visual.
the mac has had text-to-speech for well over a decade now, jon,
right in the system.  i've already put it in some of my e-book apps.
...
To focus on visual as the only form of presentation that matters 
  is being very short-sighted (pun intended.)
good pun, if there can be said to be such a thing...     ;+)
but making the point to me is totally unnecessary.
...
And I've stated the core question to answer is:
  "Is ZML (or any other system of regularized plain text) 
  sufficient to represent document structure and semantics 
  for Project Gutenberg Master texts?"
that _is_ the right question.
...
I assume Bowerbird is saying "yes"
there's no reason to "assume" that i am saying "yes".
i've actually _said_it_, over and over and over again.

and built a test-suite to prove it.
...
and many others here are saying "No".
well, most everyone who has spoken up has said "no".

(dale and maybe james have given a limp "perhaps".)

and there might be some lurkers who i have convinced.

but by and large, all the loudmouths have loudly said "no".
...
I answer the question with a "No".
well, thanks for putting yourself firmly on the record jon.
again.
...
Amusingly, Networker, a very insightful ebook expert who 
  often posts to The eBook Community, calls ZML a type of ITF, 
  "Impoverished Text Format", to indicate ZML has
  insufficient granularity -- it is "impoverished".
well, heck, jon, if the only thing i'd ever heard about z.m.l.
was the one-sided "descriptions" you've given it over there,
i would think that it sounded like a ludicrous idea too.

networker will come around when he sees the real thing.
everyone will.  after all, the proof _is_ in the pudding...

-bowerbird

re: Re: [gutvol-d] presentation is structure (it's right in front of your eyes)

Bowerbird＠aol.com

Marcello Perathoner

tags

participants (2)

re: Re: [gutvol-d] presentation *is* structure (it's right in front of your eyes)

Bowerbird＠aol.com

Marcello Perathoner

tags

participants (2)

re: Re: [gutvol-d] presentation is structure (it's right in front of your eyes)