YAGO Predicates

Although there are millions of subjects and objects in YAGO, no more than 37 unique predicates are used in any single file. Indeed, the entire ontology uses only about a hundred unique predicates. Here is the full list, with usage counts and expansions:

  Predicate Name   Count     MC     Label     PT     I     ExpansionSorted ascending
  yago:actedIn   127513     -   -   -   -   acted in
  yago:byTransport   55871     -   -   -   -   by transport
  rdfs:comment   13     V   -   -   -   comment
  yago:created   278455     -   -   -   -   created
  yago:dealsWith   947     -   -   -   -   deals with
  yago:diedIn   54174     -   -   -   -   died in
  yago:diedOnDate   361933     DV   Zdate   -   -   died on date
  yago:directed   41811     -   -   -   -   directed
  owl:disjointWith   42     -   -   -   -   disjoint with
  rdfs:domain   117     -   -   -   -   domain
  yago:edited   5946     -   -   -   -   edited
  yago:endedOnDate   1     DV   Zdate   -   -   ended on date
  owl:equivalentClass   450345     -   -   -   -   equivalent class
  yago:exports   579     -   -   -   -   exports
  yago:extractionSource   35883403     -   -   -   -   extraction source
  yago:extractionTechnique   4348291     V   -   -   -   extraction technique
  yago:graduatedFrom   30389     -   -   -   -   graduated from
  yago:happenedIn   194161     -   -   -   -   happened in
  yago:happenedOnDate   209031     DV   Zdate   -   -   happened on date
  yago:hasAcademicAdvisor   3340     -   -   -   -   has academic advisor
  yago:hasAirportCode   5715     IV   Zac   -   Y   has airport code
  yago:hasArea   129920     V   -   dbl   -   has area
  yago:hasBudget   717     V   -   dbl   -   has budget
  yago:hasCapital   1937     -   -   -   -   has capital
  yago:hasChild   17695     -   -   -   -   has child
  yago:hasConfidence   76     V   -   dbl   -   has confidence
  yago:hasCurrency   667     -   -   -   -   has currency
  yago:hasDuration   54741     V   -   dbl   -   has duration
  yago:hasEconomicGrowth   141     V   -   dbl   -   has economic growth
  yago:hasExpenses   174     V   -   dbl   -   has expenses
  yago:hasExport   175     V   -   dbl   -   has export
  yago:hasFamilyName   838667     V   -   -   Y   has family name
  yago:hasGDP   241     V   -   dbl   -   has GDP
  yago:hasGender   923364     V   -   -   -   has gender
  yago:hasGeonamesClassId   672     IV   Zgci   -   Y   has GeoNames Class ID
  yago:hasGeonamesEntityId   7267069     IV   Zgei   int   Y   has GeoNames Entity ID
  yago:hasGini   130     V   -   dbl   -   has Gini coefficient
  yago:hasGivenName   827681     V   -   -   Y   has given name
  yago:hasGloss   504     V   -   -   Y   has gloss
  yago:hasHeight   50863     V   -   dbl   -   has height
  yago:hasImdb   9     IV   Zimdb   int   Y   has IMDB
  yago:hasImport   172     V   -   dbl   -   has import
  yago:hasInflation   155     V   -   dbl   -   has inflation
  yago:hasISBN   10917     IV   Zisbn   -   Y   has ISBN
  yago:hasLanguageCode   156     IV   Zlc   -   Y   has language code
  yago:hasLatitude   7554186     V   -   dbl   -   has latitude
  yago:hasLength   9010     V   -   dbl   -   has length
  yago:hasLongitude   7554653     V   -   dbl   -   has longitude
  yago:hasMotto   1619     V   -   -   Y   has motto
  yago:hasMusicalRole   32681     -   -   -   -   has musical role
  yago:hasNumber   96     V   -   int   -   has number
  yago:hasNumberOfPeople   230745     V   -   int   -   has number of people
  yago:hasNumberOfThings   1     V   -   int   -   has number of things
  yago:hasOfficialLanguage   964     IV   Zol   -   Y   has official language
  yago:hasPages   15616     V   -   int   -   has pages
  yago:hasPopulationDensity   39408     V   -   dbl   -   has population density
  yago:hasPoverty   84     V   -   dbl   -   has poverty
  yago:hasPredecessor   20574     -   -   -   -   has predecessor
  yago:hasRevenue   4114     V   -   dbl   -   has revenue
  yago:hasSuccessor   18005     -   -   -   -   has successor
  yago:hasSynsetId   68861     IV   Zsi   int   Y   has Synset ID
  yago:hasThreeLetterLanguageCode   2574     IV   Ztllc   -   Y   has three-letter language code
  yago:hasTLD   270     IV   Ztld   -   Y   has top-level domain
  yago:hasUnemployment   145     V   -   dbl   -   has unemployment
  yago:hasWebsite   226393     -   -   -   -   has website
  yago:hasWeight   21256     V   -   dbl   -   has weight
  yago:hasWikipediaArticleLength   2886878     V   -   -   -   has Wikipedia article length
  yago:hasWikipediaUrl   2886878     -   -   -   -   has Wikipedia URL
  yago:hasWonPrize   73763     -   -   -   -   has won prize
  yago:hasWordnetDomain   87235     -   -   -   -   has WordNet domain
  yago:holdsPoliticalPosition   6029     -   -   -   -   holds political position
  yago:imports   391     -   -   -   -   imports
  yago:influences   26306     -   -   -   -   influences
  yago:isAffiliatedTo   497263     -   -   -   -   is affiliated to
  yago:isCitizenOf   46060     -   -   -   -   is citizen of
  yago:isConnectedTo   33834     -   -   -   -   is connected to
  yago:isInterestedIn   465     -   -   -   -   is interested in
  yago:isKnownFor   500     -   -   -   -   is known for
  yago:isLeaderOf   10700     -   -   -   -   is leader of
  yago:isLocatedIn   1262926     -   -   -   -   is located in
  yago:isMarriedTo   26325     -   -   -   -   is married to
  yago:isPoliticianOf   3752     -   -   -   -   is politician of
  yago:isPreferredMeaningOf   419764     V   -   -   Y   is preferred meaning of
  rdfs:label   7019605     IV   Zrl   -   Y   label
  yago:linksTo   38048450     -   -   -   -   links to
  yago:livesIn   33628     -   -   -   -   lives in
  yago:occursSince   553125     DV   Zdate   -   -   occurs since
  yago:occursUntil   337123     DV   Zdate   -   -   occurs until
  yago:owns   26551     -   -   -   -   owns
  yago:participatedIn   16833     -   -   -   -   participated in
  yago:playsFor   412388     -   -   -   -   plays for
  skos:prefLabel   418563     IV   Zspl   -   Y   preferred label
  rdfs:range   117     -   -   -   -   range
  owl:sameAs   1144824     -   -   -   -   same as
  yago:startedOnDate   9     DV   Zdate   -   -   started on date
  rdfs:subClassOf   7619833     -   -   -   -   sub-class of
  rdfs:subPropertyOf   18     -   -   -   -   sub-property of
  rdf:type   61165323     -   -   -   -   type
  yago:wasBornIn   218757     -   -   -   -   was born in
  yago:wasBornOnDate   805594     DV   Zdate   -   -   was born on date
  yago:wasCreatedOnDate   722657     DV   Zdate   -   -   was created on date
  yago:wasDestroyedOnDate   43975     DV   Zdate   -   -   was destroyed on date
  yago:worksAt   5134     -   -   -   -   works at
  yago:wroteMusicFor   24294     -   -   -   -   wrote music for

Legend

By default, the table is sorted in ascending order by Predicate name.. However, column headings may be clicked to alter the sort order. Clicks cycle the table through ascending, descending, and default sort orders.

Mapping Codes

The "MC" (Mapping Code) column above indicates reasons for mapping the RDF triple to a property on a Neo4j node (as opposed to a relation to another node). The Batch Importer preparation scripts (bi_prep/*) use these reasons to determine how each RDF triple should be handled, eg:

  • DV (Date value)

    Date Values can be organized for rapid access in the database
    (eg, using a Year/Month/Day tree with ordered links at each level).
    This can dramatically speed up chronological searches.

  • IV (ID value)

    ID values (eg, ISBN) are guaranteed to be unique within a namespace. So, they can be used to expedite searching.

  • V (Value)

    Objects which begin with a double quote (eg, numbers, strings)
    are values (eg, dates, numbers, strings).

Labels

Date and ID value predicates cause the creation of a "helper" entity with an identifying Neo4j label (eg, Zdate, Zgci, Zgei, Zimdb, Zisbn, Zlc, Ztld). Other value predicates are stored as node properties; any remaining predicates are stored as relations.

Property Types (PT) and Indexing (I)

The default Property Type (PT) is string, but some values may be stored as dbl (64-bit floating point) or int (32-bit signed integer).

A 'Y' in the Indexing (I) column means that indexing has been requested.

Node Headers

Here is a formatted display of the node headers we give to SBI:

  Node Name     Type   Index  
  ns_name     ID     Xns_name  
  ns     string     Xns  
  name     string     Xname  
  labels     label      
  comment     string      
  extraction_technique     string      
  has_area     double      
  has_budget     double      
  has_confidence     double      
  has_duration     double      
  has_economic_growth     double      
  has_expenses     double      
  has_export     double      
  has_family_name     string     Xhas_family_name  
  has_GDP     double      
  has_gender     string      
  has_Gini_coefficient     double      
  has_given_name     string     Xhas_given_name  
  has_gloss     string     Xhas_gloss  
  has_height     double      
  has_import     double      
  has_inflation     double      
  has_latitude     double      
  has_length     double      
  has_longitude     double      
  has_motto     string     Xhas_motto  
  has_number     int      
  has_number_of_people     int      
  has_number_of_things     int      
  has_pages     int      
  has_population_density     double      
  has_poverty     double      
  has_revenue     double      
  has_unemployment     double      
  has_weight     double      
  has_Wikipedia_article_length     string      
  is_preferred_meaning_of     string     Xis_preferred_meaning_of  

Themes

It could be useful (eg, for filtering) to record the YAGO theme (ie, source file) for each Predicate. If the Predicate is mapped to a relation, this can be accomplished by adding a theme Property.

Otherwise, things get a bit ugly. My current notion is to create a meta-Property of the form yago:hasGDP:theme, but that may not be convenient to use in queries. Suggestions, anyone?


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r18 - 18 Sep 2014, RichMorin
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email