
Hi, Am 10.03.2006 um 08:37 schrieb Dave Fawthrop:
On Thu, 9 Mar 2006 18:43:18 -0500, D Garcia <donovan@abs.net> wrote:
|On Thursday 09 March 2006 11:05 am, Bowerbird@aol.com wrote: |> but what _is_ different, and undeniably so, |> is that google has a _huge_ corpus of text |<snip> |> the information about the world represented |> by _billions_ of web-pages out in cyberspace |> could lead to the gleaning of vast knowledge. |> (so much so that it could become very scary.) | |I can see this revealing (or at least quantifying) the disturbingly high rate |of spelling and grammatical errors. Billions and billions of them, to |paraphrase Sagan, or more likely (with sincere apologies to Kubrick) ... |"My God ... it's full of shit." | |Speaking of the web, of course. :)
Clearly we are ?progressing? back to the days of Shakespeare when spelling was much more varied, and he spelled his name in several different ways. Not having a dictionary of ?correct? spelling available did his work no harm. Discuss. ;-) It did him no harm and humans no harm.But, machines are knowledgeless !! They need a dictionary. Humans through their experience and knowledge can recognize all this. A machine has to be given this knowledge. This is not a trival task.
The Cobuild dictionary was the first Dictionary that was completly corpus based, but there was a lot of human man power used, also. Btw. All of Shakespeare works were not written down by himself, but were transcripted during the plays. Therefore the varied portfolios and spellings. Keith.
-- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d