SoatDev IT Consulting
SoatDev IT Consulting
  • About us
  • Expertise
  • Services
  • How it works
  • Contact Us
  • News
  • August 8, 2023
  • Rss Fetcher

Natural language processing represents words as high-dimensional vectors, on the order of 100 dimensions. For example, the glove-wiki-gigaword-50 set of word vectors contains 50-dimensional vectors, and the the glove-wiki-gigaword-200 set of word vectors contains 200-dimensional vectors.

The intent is to represent words in such a way that the angle between vectors is related to similarity between words. Closely related words would be represented by vectors that are close to parallel. On the other hand, words that are unrelated should have large angles between them. The metaphor of two independent things being orthogonal holds almost literally as we’ll illustrate below.

Cosine similarity

For vectors x and y in two dimensions,

mathbf{x} cdot mathbf{y} = ||mathbf{a} || ,||mathbf{b} || , cos(theta)

where θ is the angle between the vectors. In higher dimensions, this relation defines the angle θ in terms of the dot product and norms:

cos(theta) = frac{mathbf{x} cdot mathbf{y}}{ ||mathbf{a} || ,||mathbf{b} || }

The right-hand side of this equation is the cosine similarity of x and y. NLP usually speaks of cosine similarity rather than θ, but you could always take the inverse cosine of cosine similarity to compute θ. Note that cos(0) = 1, so small angles correspond to large cosines.

Examples

For our examples we’ll use gensim with word vectors from the glove-twitter-200 model. As the name implies, this data set maps words to 200-dimensional vectors.

First some setup code.

    import gensim
    import numpy as np
    word_vectors = api.load("glove-twitter-200")

    def norm(word):
         v = word_vectors[word]
        return np.dot(v, v)**0.5

    def cosinesim(word0, word1):
        v = word_vectors[word0]
        w = word_vectors[word1]
        return np.dot(v, w)/(norm(word0)*norm(word1))

Using this mode, the cosine similarity between “dog” and “cat” is 0.832, which corresponds to about a 34° angle. The cosine similarity between “dog” and “wrench” is 0.145, which corresponds to an angle of 82°. A dog is more like a cat than like a wrench.

The similarity between “dog” and “leash” is 0.487, not because a dog is like a leash, but because the word “leash” is often used in the same context as the word “dog.” The similarity between “cat” and “leash” is only 0.328 because people speaking of leashes are more likely to also be speaking about a dog than a cat.

The cosine similarity between “uranium” and “walnut” is only 0.0054, corresponding to an angle of 89.7°. The vectors associated with the two words are very nearly orthogonal because the words are orthogonal in the metaphorical sense.

Note that opposites are somewhat similar. Uranium is not the opposite of walnut because things have to have something in common to be opposites. The cosine similarity of “expensive” and “cheap” is 0.706. Both words are adjectives describing prices and so in some sense they’re similar, though they have opposite valence. “Expensive” has more in common with “cheap” than with “pumpkin” (similarity 0.192).

The similarity between “admiral” and “general” is 0.305, maybe less than you’d expect. But the word “general” is kinda general: it can be used in more contexts than military office. If you add the vectors for “army” and “general”, you get a vector that has cosine similarity 0.410 with “admiral.”

Related posts

  • Named entity recognition
  • Natural language processing on unnatural language

The post Angles between words first appeared on John D. Cook.

Previous Post
Next Post

Recent Posts

  • Google launches Doppl, a new app that lets you visualize how an outfit might look on you
  • Why a16z VC believes that Cluely, the ‘cheat on everything’ startup, is the new blueprint for AI startups
  • TechCrunch All Stage: Learn how AI can supercharge your MVPs with Chris Gardner
  • Apple updates the rules for its EU App Store by adding more complicated fees
  • Travis Kalanick is trying to buy Pony AI — and Uber might help

Categories

  • Industry News
  • Programming
  • RSS Fetched Articles
  • Uncategorized

Archives

  • June 2025
  • May 2025
  • April 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023

Tap into the power of Microservices, MVC Architecture, Cloud, Containers, UML, and Scrum methodologies to bolster your project planning, execution, and application development processes.

Solutions

  • IT Consultation
  • Agile Transformation
  • Software Development
  • DevOps & CI/CD

Regions Covered

  • Montreal
  • New York
  • Paris
  • Mauritius
  • Abidjan
  • Dakar

Subscribe to Newsletter

Join our monthly newsletter subscribers to get the latest news and insights.

© Copyright 2023. All Rights Reserved by Soatdev IT Consulting Inc.