Reference: directory format
This page documents the use of files for storing Paradicms collection data.
Directory structure
Collection data in the Paradicms directory format consists of:
- a single root directory, typically the root of a GitHub repository
- a set of files in the root directory, corresponding to singleton instances of classes in the Paradicms logical data models
- a set of subdirectories of the root directory, corresponding to a class from one of the Paradicms logical data models
- a set of files in each class subdirectory, corresponding to instances of the class
Listing the root directory of the template repository illustrates the structure:
./app-configuration.yaml
./dc-collection.yaml
./dc-image/Difference engine.yaml
./dc-image/Donald Knuth.yaml
./dc-image/Douglas Engelbart.yaml
./dc-image/Linus Torvalds.yaml
./dc-image/Linux.png
./dc-image/Linux.yaml
./dc-image/TeX.png
./dc-image/TeX.yaml
./dc-image/Tim Berners-Lee.yaml
./dc-image/World Wide Web.png
./dc-image/World Wide Web.yaml
./dc-image/oN-Line System.yaml
./dc-physical-object/Difference engine.md
./dc-physical-object/Linux.md
./dc-physical-object/TeX.md
./dc-physical-object/World Wide Web.md
./dc-physical-object/oN-Line System.md
./foaf-person/Charles Babbage.yaml
./foaf-person/Donald Knuth.yaml
./foaf-person/Douglas Engelbart.yaml
./foaf-person/Linus Torvalds.yaml
./foaf-person/Tim Berners-Lee.yaml
Singleton files
In the listing above, ./dc-collection.yaml corresponds to a singleton instance of DcCollection, the Dublin Core logical model of a Paradicms Collection conceptual model.
The stem (dc-collection) of the file is named after the class, or a variant of the class name:
DcCollection: the exact class name (camel case) documented in the logical data models referencedc_collection: snake case variant of the class namedc-collection: spinal case variant of the class name
Class directories
In the listing above, ./foaf-person and ./dc-physical-object are class directories corresponding to the FoafPerson and DcPhysicalObject classes in the Paradicms logical data models, respectively.
The class directories can also be named with variants of the class names:
FoafPerson: the exact class name (camel case) documented in the logical data models referencefoaf_person: snake case variant of the class namefoaf-person: spinal case variant of the class name
The listing above uses spinal case (foaf-person).
Files in a class directory
./foaf-person/Linus Torvalds.md in the listing above is a Markdown file describing an instance of the class FoafPerson. The "File format" section below documents the format of these files in detail.
Image data
Image data (.jpg, .png) should sit directly alongside the associated Image (metadata) in directory corresponding to an Image implementation, as in these files from the listing above:
./dc-image/World Wide Web.png
./dc-image/World Wide Web.yaml
File formats
Files in the directory structure can use a variety of formats. The following sections document the formats as well as the process of converting the formats to RDF for use in Paradicms. Converting each file in the directory tree to RDF produces a set of RDF graphs that can be consumed by Paradicms apps.
JSON files
As noted above, the directory structure and naming conventions mean that every file is associated with a class in the Paradicms logical data models. Each class has an associated JSON-LD context.
Paradicms converts JSON (.json) files to RDF by interpreting them JSON-LD. A JSON file is expected to have a single top-level object ({}). Paradicms adds the JSON-LD context (as a @context key) corresponding to the file's associated class to this top-level object before interpreting the latter as JSON-LD. The JSON-LD context maps keys in the JSON object, such as creator, to RDF predicate IRIs, in this case http://purl.org/dc/terms/creator.
YAML files
YAML (.yaml or .yml) files are treated similarly to JSON, since the two formats have very similar data model. The YAML file is parsed, a @context key is added to the top-level object, and the file is interpreted as JSON-LD.
sameAs: http://www.wikidata.org/entity/Q34253
Markdown files
The following code block shows an abridged version of the Markdown file dc-physical-object/Linux.md:
---
creator: md-foaf-person:Linus%20Torvalds
sameAs: http://www.wikidata.org/entity/Q388
title: Linus Torvalds begins work on the Linux kernel
---
Frustrated by the limitations of existing operating systems and curious about kernel development, Linus Torvalds begins work on what eventually becomes the Linux kernel.
A Markdown file consists of:
- an optional YAML front matter block delimited by
---and--- - paragraphs under a labeled
#heading (not present in the example) - paragraphs not under a labeled
#heading (Frustrated ...in the example)
In order to convert a Markdown file to RDF, it is first converted to JSON through the following process:
- If present, YAML front matter is converted as-is to a root JSON object; otherwise an empty root JSON object (
{}) is synthesized. - Markdown paragraphs are added to the root JSON object following these rules:
- A paragraph under a Markdown heading with the format
# [Your heading](#anykey)is converted to an HTML string, and either- Assigned to
anykeyin the root JSON object ifanykeydoes not exist in that object - Concatenated to the existing string value of
anykeyif it does
- Assigned to
- A paragraph under no heading, like the
Frustrated ...paragraph in the example above, has the implicit keydescription, and otherwise follows the rules of paragraphs with explicit keys. - When a paragraph's key (explicit or implicit) was already present in the YAML front matter, the combined metadata and paragraph text(s) are treated as an instance of a
Textclass, with the paragraph text(s) forming thevalue.
- A paragraph under a Markdown heading with the format
Converting dc-physical-work/Linux.md would result in the following JSON:
{
"creator": "md-foaf-person:Linus%20Torvalds",
"description": "Frustrated by the limitations of existing operating systems and curious about kernel development, Linus Torvalds begins work on what eventually becomes the Linux kernel.",
"title": "Linus Torvalds begins work on the Linux kernel"
}
Note the creator property's IRI value: md-foaf-person:Linus%20Torvalds refers to the FoafPerson instance in the Markdown file foaf-person/Linus Torvalds.md. The file extension is dropped and the space in the remaining file stem is URL-encoded to %20 in order to conform to IRI rules.
The Markdown-derived JSON is then converted to RDF by interpreting it as JSON-LD, in the same manner JSON files are converted to RDF. The dc-physical-object/Linux.md Markdown would thus produce the following small RDF graph (in Turtle format):
<urn:directory:ComputerScienceInventions:dc-physical-object:Linux> a dcmitype:PhysicalObject ;
dcterms:creator <urn:directory:ComputerScienceInventions:person:Linus%20Torvalds> ;
dcterms:description "<p>Frustrated by the limitations of existing operating systems and curious about kernel development, Linus Torvalds begins work on what eventually becomes the Linux kernel.</p>" ;
dcterms:title "Linus Torvalds begins work on the Linux kernel" .
}
Implied RDF
The DcPhysicalObject RDF graph above contains more information than could be directly mapped from the contents of dc-physical-object/Linux.md. For example, the file does not explicitly state an rdf:type, yet it is present in the RDF graph.
Paradicms has a number of rules for inferring parts of the graph associated with a file:
- The subject of a file (
urn:directory:ComputerScienceInventions:dc-physical-object:Linuxin the RDF graph above) is automatically synthesized from- a dataset identifier, which is usually the name of the GitHub repository (here
ComputerScienceInventions) - the class corresponding to the directory where the file resides (here
DcPhysicalObject) - the file stem (here
Linux)
- a dataset identifier, which is usually the name of the GitHub repository (here
- The
rdf:typeof the subject (thea dcmitype:PhysicalObjectstatement) is the class corresponding to the directory where the file resides (dc-physical-object). - If no label property (e.g.,
dcterms:title) is specified in the file, it is adapted from the file stem (Linux.md->"Linux"). - All
Works are assumed to belong to at least oneCollection. If the directory has a singleCollectiondoesn't explicitly link to anyWorks, then it is implicitly linked to all definedWorks. If noCollectionis defined, one is synthesized, and it links to allWorks. - An image file (
.jpg,.png, et al.) placed alongside a metadata file with the same stem (image/Linux.pngandimage/Linux.md) is assumed to be thesrcof thatImage - An
Imagethat has is not explicitly referenced will be implicitly associated with aCollection,Work, or other instance with the same file stem
Most of these rules can be overridden by explicitly specifying a property: adding a src to Image, for example, or including a label property in a Work file instead of allowing the label to be inferred from the file stem. The rules are provided for convenience.