Random indexing spaces for bridging the human and data webs

Jose Quesada, Ralph Brandao-Vidal, Lael Schooler

Research output: Contribution to journalConference article

1 Scopus citations

Abstract

There exists a wide gap between the information that people and computers respectively can operate with online. Because most of the web is in plain text and the Semantic Web requires structured information (RDF), bridging the two worlds is an important current research topic. Here we propose a web service that uses a Random Indexing (RI) semantic space trained on the plain text of the one million most central Wikipedia concepts. The space provides us with vectors for each of the equivalent DBpedia concepts and vectors for any text or webpage. It can also provide a hashed version of the RI vector that works as unique handler like URIs do, but with the additional advantage that it represents text meaning. As a result, any page (previously readable only for humans) is now integrated with the Semantic Web graph using links to one of its most central parts, DBpedia.

Original languageEnglish (US)
Pages (from-to)47-58
Number of pages12
JournalCEUR Workshop Proceedings
Volume611
StatePublished - Dec 1 2010
Externally publishedYes
Event2nd Workshop on Inductive Reasoning and Machine Learning on the Semantic Web, IRMLeS 2010 - Held in Conjunction with the 7th Extended Semantic Web Conference, ESWC 2010 - Heraklion, Greece
Duration: May 31 2010May 31 2010

Keywords

  • Identifiers
  • Literals
  • RDF
  • Resources
  • Statistical semantics
  • Structured information
  • Text mining

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Random indexing spaces for bridging the human and data webs'. Together they form a unique fingerprint.

  • Cite this