YAGO

Introduction

This web is a grab bag of code and commentary, mostly centered around YAGO (Yet Another Great Ontology) and related efforts (eg, YAGO-SUMO).

Origins

YAGO is a huge, mechanically-generated ontology, based on:

  • DBpedia

    "DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data."

  • GeoNames

    "The GeoNames geographical database covers all countries and contains over eight million placenames ..."

  • WordNet

    "WordNet is a large lexical database of English. Nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations."

YAGO-SUMO integrates YAGO with SUMO (the Suggested Upper Merged Ontology), adding some axiomatic rigor. Conveniently, SUMO has also been linked to WordNet.

Contents

YAGO2s (version 2.5.3) is distributed in two formats (TSV and Turtle). Each of these is archived and compressed (via 7-zip) into a 2.2 GB file (yago2s_{tsv,ttl}.7z). Uncompressing the Turtle archive yields a large folder (~18.5 GB) containing 25 Turtle (*.ttl) files.

Each file contains introductory comments and definitions (eg, @base, @prefix), followed by a series of RDF triples. Collectively, YAGO2s contains about 310 million RDF triples, distributed as follows:

  Size  Description          Predicate(s)
  ====  ===========          ============
  0.45  DBpedia Classes      owl:equivalentClass
  1.14  DBpedia Instances    owl:sameAs
  8.97  Facts                eg, acted_In, created, dealsWith, diedIn, directed
  0.00  GeoNames Class Ids   hasGeonamesClassId
  0.00  GeoNames Classes     rdfs:subClassOf
 32.22  GeoNames Data        eg, hasGeonamesEntityId, hasLatitude, hasLongitude
  0.11  GeoNames Entity Ids  hasGeonamesEntityId
  0.00  GeoNames Glosses     ???
  2.72  Important Types      rdf:type
 27.01  Labels               eg, hasFamilyName, hasGivenName, hasGloss
  6.69  Literal Facts        eg, diedOnDate, endedOnDate, happenedOnDate
  2.70  Meta Facts           eg, byTransport, happenedIn, happenedOnDate
  0.79  Ml. Class Labels     rdfs:label
  8.16  Ml. Instance Labels  rdfs:label
  0.00  Schema               eg, hasConfidence, rdfs:domain, rdfs:range
  0.01  Simple Taxonomy      owl:disjointWith, rdfs:subClassOf
  5.44  Simple Types         rdf:type
107.65  Sources              extractionSource, extractionTechnique
  0.00  Statistics           hasNumber, hasNumberOfThings, wasCreatedOnDate
  0.83  Taxonomy             rdfs:subClassOf
 43.98  Transitive Type      rdf:type
 18.04  Types                rdf:type
 43.82  Wikipedia Info       hasWikipediaArticleLength, hasWikipediaUrl, linksTo
  0.18  WordNet Domains      hasWordnetDomain, rdf:type, rdfs:label
  0.07  WordNet Ids          hasSynsetId
======
310.98  total (millions of RDF triples)


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r1 - 28 Apr 2014, VickiBrown
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email