This web is a grab bag of code and commentary, mostly centered around YAGO (Yet Another Great Ontology) and related efforts (eg, YAGO-SUMO).


YAGO is a huge, mechanically-generated ontology, based on:

  • DBpedia

    "DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data."

  • GeoNames

    "The GeoNames geographical database covers all countries and contains over eight million placenames ..."

  • WordNet

    "WordNet is a large lexical database of English. Nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations."

YAGO-SUMO integrates YAGO with SUMO (the Suggested Upper Merged Ontology), adding some axiomatic rigor. Conveniently, SUMO has also been linked to WordNet.


YAGO2s (version 2.5.3) is distributed in two formats (TSV and Turtle). Each of these is archived and compressed (via 7-zip) into a 2.2 GB file (yago2s_{tsv,ttl}.7z). Uncompressing the Turtle archive yields a large folder (~18.5 GB) containing 25 Turtle (*.ttl) files.

Each file contains introductory comments and definitions (eg, @base, @prefix), followed by a series of RDF triples. Collectively, YAGO2s contains about 310 million RDF triples, distributed as follows:

  Size  Description          Predicate(s)
  ====  ===========          ============
  0.45  DBpedia Classes      owl:equivalentClass
  1.14  DBpedia Instances    owl:sameAs
  8.97  Facts                eg, acted_In, created, dealsWith, diedIn, directed
  0.00  GeoNames Class Ids   hasGeonamesClassId
  0.00  GeoNames Classes     rdfs:subClassOf
 32.22  GeoNames Data        eg, hasGeonamesEntityId, hasLatitude, hasLongitude
  0.11  GeoNames Entity Ids  hasGeonamesEntityId
  0.00  GeoNames Glosses     ???
  2.72  Important Types      rdf:type
 27.01  Labels               eg, hasFamilyName, hasGivenName, hasGloss
  6.69  Literal Facts        eg, diedOnDate, endedOnDate, happenedOnDate
  2.70  Meta Facts           eg, byTransport, happenedIn, happenedOnDate
  0.79  Ml. Class Labels     rdfs:label
  8.16  Ml. Instance Labels  rdfs:label
  0.00  Schema               eg, hasConfidence, rdfs:domain, rdfs:range
  0.01  Simple Taxonomy      owl:disjointWith, rdfs:subClassOf
  5.44  Simple Types         rdf:type
107.65  Sources              extractionSource, extractionTechnique
  0.00  Statistics           hasNumber, hasNumberOfThings, wasCreatedOnDate
  0.83  Taxonomy             rdfs:subClassOf
 43.98  Transitive Type      rdf:type
 18.04  Types                rdf:type
 43.82  Wikipedia Info       hasWikipediaArticleLength, hasWikipediaUrl, linksTo
  0.18  WordNet Domains      hasWordnetDomain, rdf:type, rdfs:label
  0.07  WordNet Ids          hasSynsetId
310.98  total (millions of RDF triples)

