You are here:
(04 Apr 2016,
<noautolink> ---+!! YAGO - Syntax %TOC% YAGO is based on [[WP:Resource_Description_Framework][RDF]] (Resource Description Framework), a family of specifications from the [[WP:World_Wide_Web_Consortium][W3C]] (World Wide Web Consortium). The distribution I'm importing is encoded in [[WP:Turtle_(syntax)][Turtle]] (Terse RDF Triple Language). Neo4j and Turtle both support [[WP:Unicode][Unicode]], but there are some minor character set issues which must be handled. There are also some namespace and related issues. This page is only a descriptive summary; see the conversion code for definitive information. ---++ Issues ---+++ =@base= and =@prefix= YAGO's Turtle (=*.ttl=) files use =@base= and =@prefix= directives, allowing them to shorten URIs in the RDF triples. Using the Unix command line, I did a simple and quick sanity check, making sure that there were no cross-file usage conflicts: <verbatim> $ head -50 *.ttl | egrep '^@' | sort | uniq -c 25 @base <http://yago-knowledge.org/resource/> . 25 @prefix dbp: <http://dbpedia.org/ontology/> . ... </verbatim> The =@prefix= notation is convenient, but not fully utilized in YAGO. So, I extended and regularized things a bit, changing =@base= to an explicit =@prefix=, adding more prefixes, etc: <verbatim> dbpo: http://dbpedia.org/ontology/ dbpr: http://dbpedia.org/resource/ dbpy: http://dbpedia.org/class/yago/ owl: http://www.w3.org/2002/07/owl# rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# skos: http://www.w3.org/2004/02/skos/core# wpen: http://en.wikipedia.org/wiki/ xsd: http://www.w3.org/2001/XMLSchema# yago: http://yago-knowledge.org/resource/ </verbatim> *Note:* =yago= appears to be used far more than any other prefix, so my tentative plan (to save space) is to make it the default. ---+++ Identifiers By default, Cypher identifiers use a rather restricted character set: <verbatim> /^[^A-Za-z_][^0-9A-Za-z_]*$/ </verbatim> This conflicts with YAGO's use of colons (eg, =yago:hasLatitude=) and could conflict with other (eg, Unicode) characters, as well. However, the only YAGO-based identifiers I'm using are names of relations and properties. These are derived from my expansions of YAGO [[Predicates][predicates]], so I can ensure that there are no character set problems. ---+++ Provenance YAGO2s is divided into 25 "themes" (eg, =yagoGeonamesData=), each of which has a unique provenance (ie, nature, origin). I may capture this information (eg, in a =theme= property). Some YAGO names (eg, =geoclass_reefs=) have a prefix (=geoclass=) which further characterizes their provenance. I may split off this information in a future revision. ---+++ Values RDF's literal objects can contain both values and metadata (eg, units). I plan to strip off the metadata and store it in a companion property (=*_M=), eg: <verbatim> RDF Literal prop prop_M ----------- ---- ------ 42, "42"^^xsd:integer 42 'number' 1.85"^^<m> 1.85 'number (^^<m>)' "1.2"^^xsd:a 1.2 'number (^^xsd:a)' "a"@b 'a' 'string (@b) </verbatim> <!-- * Set GH = https://github.com --> %ZB%
ore topic actions
Topic revision: r5 - 04 Apr 2016,
Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
, Plugin API version
Ideas, requests, problems regarding CFCL Wiki?
Send us email