Newsgroups: comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!nntp.club.cc.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!portc02.blue.aol.com!howland.erols.net!ix.netcom.com!i17
From: i17@netcom.com (Vivian Waugh)
Subject: Re: Semantic Distances and Bigram Smoothing
Message-ID: <i17EE2xqs.AHB@netcom.com>
Organization: Flat Normal
X-Newsreader: TIN [version 1.2 PL2]
References: <33D93418.B05@devinci.fr> <Pine.HPP.3.95L.970726215146.18900K-100000@tw800.eng.cam.ac.uk>
Date: Tue, 29 Jul 1997 12:08:52 GMT
Lines: 38
Sender: i17@netcom13.netcom.com


Nik Cunniffe (njc1001@eng.cam.ac.uk) wrote:

: ... words the don't
: I wish to apply the metric 
:  (1 - cosine of the angle between their associated vectors).

: to smoothing rare Bigram statisitics.  I don't expect to be able to beat
: Discounting followed by Backoff, but the goodness of the model (as
: measured by test-set perplexity) will be a good way of testing my semantic
: distances (I have a few notions of how to calculate them). 


What dimensionality is the space these vectors are in?

Why is the length of your vector not of interest in your metric?
This implies that the semantic values fall on the surface of a hypersphere,
which implies that the space is bounded and that once your linear distances
fall to within the resolution of your datatype, the points are considered
to be equivalent.

What "semantic spaces" and "distance metrics" have others thought of,
and more importantly, when a complex bunch of actual knowledge (er, words..)
is actually sprayed into one of these mapping schemes... WHAT IS THE 
DIMENSIONALITY WHICH SEEMS TO BE TURNING UP... or what is the distribution 
of dimensionalities, considering that some subregions may be quite 2-D, and 
others be of many many dimensions, or simply be unrepresentable this way.

And finally, what is the interpretation of this dimensionality?
Can the dimensions be given names after they have been statistically found?


Ian. 5:03 29jul97. "In a connectionist model, distance is not euclidean."
-- 

   6+ Trillion Dollars GNP in the USA alone, and NO DECENT SOFTWARE YET ?!

     For example:  What are the NSA's supposed decoding-systems like?
                   Where are the WORLD-SIMULATORS run by the BIG boys?
                   Where are the computer-scripted TV-shows and cartoons?

      Or 'perhaps' what I am saying is that somewhere quite nearabouts,
                           SUCH THINGS DO EXIST.
   These are not 'missing links' at all, they are 'denied but in active use'.

     So what's with the technological "eclipsations" going on Around Here?
    Or is it actually a matter of BEING TOLD LIES ABOUT WHAT THIS PLACE IS?

         It is obvious to me that this place is being implemented by 
          VERY 'High Technology', in the Counter-Clarke-wise sense:

   'TECHNOLOGY' IS INDISTINGUISHABLE FROM SUFFICIENTLY LOW FORMS OF MAGIC.
