SoatDev IT Consulting
SoatDev IT Consulting
  • About us
  • Expertise
  • Services
  • How it works
  • Contact Us
  • News
  • July 27, 2023
  • Rss Fetcher

Named entity recognition (NER) is a task of natural language processing: pull out named things text. It sounds like trivial at first. Just create a giant list of named things and compare against that.

But suppose, for example, University of Texas is on your list. If Texas is also on your list, do you report that you have a named entity inside a named entity? And how do you handle The University of Texas? Do you put it on your list as well? What about UT? Can you tell from context whether UT stands for University of Texas, University of Toronto, or the state of Utah?

Searching for Rice University would be even more fun. The original name of the school was The William Marsh Rice Institute for the Advancement of Letters, Science, and Art. I don’t know whether the name was ever officially changed. A friend who went to Rice told me they had a ridiculous cheer that spelled out every letter in the full name. And of course rice could refer to a grain.

Let’s see what happens when we run the following sentence through spaCy looking for named entities.

Researchers from the University of Texas at Austin organized a picleball game with their colleagues from Rice University on Tuesday.

I deliberately did not capitalize the definite article in front of University of Texas because I suspected spaCy might include the article if it were capitalized but not otherwise. It included the article in either case.

The results depend on the language model used. When I used en_core_web_trf it included at Austin as part of the university name.

When I used the smaller en_core_web_sm model it pulled out Austin as a separate entity.

The tag ORG stands for organization and DATE obviously stands for date. GPE is a little less obvious, standing for geopolitical entity.

When I changed Rice University to simply Rice, spaCy still recognized Rice as an organization. When I changed it to rice with no capitalization, it did not recognize it as an organization.

The other day I stress tested spaCy by giving it some text from Chaucer’s Canterbury Tales. Even though spaCy is trained on Modern English, it did better than I would have expected on Middle English.

Using the en_core_web_trf model it recognizes Engelond and Caunterbury as cities.

When I switched to  en_core_web_sm it still recognized Caunterbury as city, but tagged Engelond as a person.

 

The post Named entity recognition first appeared on John D. Cook.

Previous Post
Next Post

Recent Posts

  • Lawyers could face ‘severe’ penalties for fake AI-generated citations, UK court warns
  • At the Bitcoin Conference, the Republicans were for sale
  • Week in Review: Why Anthropic cut access to Windsurf
  • Will Musk vs. Trump affect xAI’s $5 billion debt deal?
  • Superblocks CEO: How to find a unicorn idea by studying AI system prompts

Categories

  • Industry News
  • Programming
  • RSS Fetched Articles
  • Uncategorized

Archives

  • June 2025
  • May 2025
  • April 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023

Tap into the power of Microservices, MVC Architecture, Cloud, Containers, UML, and Scrum methodologies to bolster your project planning, execution, and application development processes.

Solutions

  • IT Consultation
  • Agile Transformation
  • Software Development
  • DevOps & CI/CD

Regions Covered

  • Montreal
  • New York
  • Paris
  • Mauritius
  • Abidjan
  • Dakar

Subscribe to Newsletter

Join our monthly newsletter subscribers to get the latest news and insights.

© Copyright 2023. All Rights Reserved by Soatdev IT Consulting Inc.