The downloads are provided as N-Triples and N-Quads, where the N-Quads version contains additional provenance information for each statement. All files are bzip2 1 packed. In addition to the RDF version of the data, we also provide a tabular version of some of the core DBpedia data sets as CSV and JSON files. -- http://wiki.dbpedia.org/Downloads2014
For each class in the DBpedia ontology (such as Person, Radio Station, Ice Hockey Player, or Band), we provide a single CSV/JSON file which contains all instances of this class. Each instance is described by its URI, an English label and a short abstract, the mapping-based infobox data describing the instance (extracted from the English edition of Wikipedia), geo-coordinates, and external links. -- http://wiki.dbpedia.org/DBpediaAsTables
DBpedia is considered the Semantic Web mirror of Wikipedia. Over time, Wikipedia articles are revised, which makes the data in DBpedia outdated. The main objective of DBpedia Live is to keep DBpedia always in synchronization with Wikipedia. -- http://wiki.dbpedia.org/DBpediaLiveFinally, it appears that the entire working data of DBpedia is available for download, should one be so inclined.
dbpedia_2014.owlcontains ~30K lines of Resource Description Format (RDF), encoded in Extensible Markup Language (XML). See DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia for details.
long_abstracts_en.nt contains ~4.6M triples.
Objects are full abstracts of Wikipedia articles.
geo_coordinates_en.nt contains ~2.1M triples.
Objects are geographic coordinates.
homepages_en.nt contains ~0.6M triples.
Objects are homepages of persons, organizations, etc.
images_en.nt contains ~6.9M triples.
Objects are links to main and thumbnail images.
infobox_properties_en.nt contains ~68M triples.
Objects are properties, extracted from Wikipedia infoboxes.
infobox_property_definitions_en.nt contains ~0.1M triples.
Objects are definitions of properties / predicates used in infoboxes.
short_abstracts_en.nt contains ~4.6M triples.
Objects are short abstracts (max. 500 characters long) of articles.
labels_en.nt contains ~11M triples.
Objects are article titles, in the corresponding language.
*.nq) and Turtle (
*.ttl) format. Only the N-Triple quad format (
*.nq) includes provenance information. Here is a summary of the file types:
mappingbased_properties_en.nt contains ~33M triples.
Objects are Infobox properties, extracted via mapping.
mappingbased_properties_cleaned_en.nt contains ~33M triples.
Objects are Infobox properties, extracted via mapping and cleaned.
specific_mappingbased_properties_en.nt contains ~0.8M triples.
Objects are Infobox properties from the mapping-based extraction,
using convenient units of measurement.
instance_types_en.nt contains ~28M triples.
Triples are of the form $object rdf:type $class.
instance_types_heuristic_en.nt contains ~3.1M triples.
Triples are of the form $object rdf:type $class,
generated per Paulheim/Bizer: Type Inference on Noisy RDF Data.
article_categories_en.nt contains ~19M triples.
Links relate articles to categories, using the SKOS vocabulary.
category_labels_en.nt contains ~1.1M triples.
Objects are labels for categories, in various languages.
skos_categories_en.nt contains ~4.5M triples.
Information about which concept is a category
and how categories are related, using the SKOS Vocabulary.
page_ids_en.nt contains ~13M triples.
Objects are article page IDs.
revision_ids_en.nt contains ~13M triples.
Objects are article revision IDs.
revision_uris_en.nt contains ~13M triples.
Objects are article revision URIs.
disambiguations_en.nt contains ~1.4M triples.
Links are extracted from Wikipedia disambiguation pages.
external_links_en.nt contains ~7M triples.
Links from articles to external web pages.
interlanguage_links_en.nt contains ~29M triples.
Dataset linking a DBpedia resource to the same resource in other languages and in Wikidata.
iri_same_as_uri_en.nt contains ~0.9M triples.
owl:sameAs) between the IRI and URI format of DBpedia resources.
Only extracted when the IRI and URI differ.
wikipedia_links_en.nt contains ~44M triples.
Dataset linking DBpedia resources to corresponding articles.
old_interlanguage_links_en.nt contains ~0.2M triples.
Remaining interlanguage extracted directly from Wikipedia articles.
page_links_en.nt contains ~153M triples.
Dataset containing internal links between DBpedia instances.
out_degree_en.nt contains ~11M triples.
Number of links from a Wikipedia article to another Wikipedia article.
page_length_en.nt contains ~11M triples.
Numbers of characters contained in an article's source.
persondata_en.nt contains ~7.9M triples.
Information about persons (date and place of birth, etc.).
redirects_en.nt contains ~6.4M triples.
Dataset containing redirects between articles in Wikipedia.
surface_forms_bg.nt contains ~???M triples.
Texts used to refer to Wikipedia articles.
redirects_transitive_en.nt.nt contains ~7.9M triples.
Dataset in which multiple redirects have been resolved and redirect cycles have been removed.