re: [gutvol-d] google and the translation thing

keith said:
The method is not new. It was used successfully for wheather reports already
the "method" might not be new. but what _is_ different, and undeniably so, is that google has a _huge_ corpus of text with which to implement the method now, possibly the "secret sauce" to make it work. this asset, and its bearing on the problem, should not be underestimated. and indeed, that huge corpus could exert all manner of effects on a wide variety of knowledge tasks. the information about the world represented by _billions_ of web-pages out in cyberspace could lead to the gleaning of vast knowledge. (so much so that it could become very scary.) -bowerbird

On Thursday 09 March 2006 11:05 am, Bowerbird@aol.com wrote:
but what _is_ different, and undeniably so, is that google has a _huge_ corpus of text <snip> the information about the world represented by _billions_ of web-pages out in cyberspace could lead to the gleaning of vast knowledge. (so much so that it could become very scary.)
I can see this revealing (or at least quantifying) the disturbingly high rate of spelling and grammatical errors. Billions and billions of them, to paraphrase Sagan, or more likely (with sincere apologies to Kubrick) ... "My God ... it's full of shit." Speaking of the web, of course. :)

On Thu, 9 Mar 2006 18:43:18 -0500, D Garcia <donovan@abs.net> wrote: |On Thursday 09 March 2006 11:05 am, Bowerbird@aol.com wrote: |> but what _is_ different, and undeniably so, |> is that google has a _huge_ corpus of text |<snip> |> the information about the world represented |> by _billions_ of web-pages out in cyberspace |> could lead to the gleaning of vast knowledge. |> (so much so that it could become very scary.) | |I can see this revealing (or at least quantifying) the disturbingly high rate |of spelling and grammatical errors. Billions and billions of them, to |paraphrase Sagan, or more likely (with sincere apologies to Kubrick) ... |"My God ... it's full of shit." | |Speaking of the web, of course. :) Clearly we are ?progressing? back to the days of Shakespeare when spelling was much more varied, and he spelled his name in several different ways. Not having a dictionary of ?correct? spelling available did his work no harm. Discuss. ;-) -- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights.

Hi, Am 10.03.2006 um 08:37 schrieb Dave Fawthrop:
On Thu, 9 Mar 2006 18:43:18 -0500, D Garcia <donovan@abs.net> wrote:
|On Thursday 09 March 2006 11:05 am, Bowerbird@aol.com wrote: |> but what _is_ different, and undeniably so, |> is that google has a _huge_ corpus of text |<snip> |> the information about the world represented |> by _billions_ of web-pages out in cyberspace |> could lead to the gleaning of vast knowledge. |> (so much so that it could become very scary.) | |I can see this revealing (or at least quantifying) the disturbingly high rate |of spelling and grammatical errors. Billions and billions of them, to |paraphrase Sagan, or more likely (with sincere apologies to Kubrick) ... |"My God ... it's full of shit." | |Speaking of the web, of course. :)
Clearly we are ?progressing? back to the days of Shakespeare when spelling was much more varied, and he spelled his name in several different ways. Not having a dictionary of ?correct? spelling available did his work no harm. Discuss. ;-) It did him no harm and humans no harm.But, machines are knowledgeless !! They need a dictionary. Humans through their experience and knowledge can recognize all this. A machine has to be given this knowledge. This is not a trival task.
The Cobuild dictionary was the first Dictionary that was completly corpus based, but there was a lot of human man power used, also. Btw. All of Shakespeare works were not written down by himself, but were transcripted during the plays. Therefore the varied portfolios and spellings. Keith.
-- Dave Fawthrop <dave hyphenologist co uk> Freedom of Speech, Expression, Religion, and Democracy are the keys to Civilization, together with legal acceptance of Fundamental Human rights.
_______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d

In article <A9A9A706-5415-4C9A-A8EE-4EDE665370FD@uni-trier.de>, "Keith J. Schultz" <schultzk@uni-trier.de> writes
Btw. All of Shakespeare works were not written down by himself, but were transcripted during the plays. Therefore the varied portfolios and spellings.
You mean the various quartos. Some may have been bootleg copies for the use of rival theatre companies but the First Folio was produced from working copies of the plays owned by Shakespeare's theatre company. -- Philip Baker

Hi There, Am 09.03.2006 um 17:05 schrieb Bowerbird@aol.com:
keith said:
The method is not new. It was used successfully for wheather reports already
the "method" might not be new.
but what _is_ different, and undeniably so, is that google has a _huge_ corpus of text with which to implement the method now, possibly the "secret sauce" to make it work. Just the opposite is the case. Believe me as a computer linguist. For decades it said that with faster computers bigger corpora MT would have its break through. What has happened. Vaporware and results. It simply does not work. Language can not be sucessfully model. Languages are regularly formed, nor well formed.
this asset, and its bearing on the problem, should not be underestimated. and indeed, that huge corpus could exert all manner of effects on a wide variety of knowledge tasks.
the information about the world represented by _billions_ of web-pages out in cyberspace could lead to the gleaning of vast knowledge. (so much so that it could become very scary.)
All AI projects so far have failed and failure has been admitted. That knowlege can be extracted from corpora. Language does not constitute meaning or knowledge. It just transport it. That is why a good deal in NLP is done in the field of knowledge representation. Do you realize that voilets where originally the color BROWN and not blue !!! (see Goethe). A translator today would translate Goethe braun(brown) to blue since it is what people would expect!!! Keith.
-bowerbird _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
participants (5)
-
Bowerbird@aol.com
-
D Garcia
-
Dave Fawthrop
-
Keith J. Schultz
-
Philip Baker