Reference: directory format
This page documents the use of files for storing Paradicms collection data.
Directory structure
Collection data in the Paradicms directory format consists of:
- a single root directory, typically the root of a GitHub repository
- a set of files in the root directory, corresponding to singleton instances of classes in the Paradicms logical data models
- a set of subdirectories of the root directory, corresponding to a class from one of the Paradicms logical data models
- a set of files in each class subdirectory, corresponding to instances of the class
Listing the root directory of the template repository illustrates the structure:
./app-configuration.yaml
./dc-collection.yaml
./dc-image/Difference engine.yaml
./dc-image/Donald Knuth.yaml
./dc-image/Douglas Engelbart.yaml
./dc-image/Linus Torvalds.yaml
./dc-image/Linux.png
./dc-image/Linux.yaml
./dc-image/TeX.png
./dc-image/TeX.yaml
./dc-image/Tim Berners-Lee.yaml
./dc-image/World Wide Web.png
./dc-image/World Wide Web.yaml
./dc-image/oN-Line System.yaml
./dc-physical-object/Difference engine.md
./dc-physical-object/Linux.md
./dc-physical-object/TeX.md
./dc-physical-object/World Wide Web.md
./dc-physical-object/oN-Line System.md
./foaf-person/Charles Babbage.yaml
./foaf-person/Donald Knuth.yaml
./foaf-person/Douglas Engelbart.yaml
./foaf-person/Linus Torvalds.yaml
./foaf-person/Tim Berners-Lee.yaml
Singleton files
In the listing above, ./dc-collection.yaml
corresponds to a singleton instance of DcCollection
, the Dublin Core logical model of a Paradicms Collection
conceptual model.
The stem (dc-collection
) of the file is named after the class, or a variant of the class name:
DcCollection
: the exact class name (camel case) documented in the logical data models referencedc_collection
: snake case variant of the class namedc-collection
: spinal case variant of the class name
Class directories
In the listing above, ./foaf-person
and ./dc-physical-object
are class directories corresponding to the FoafPerson
and DcPhysicalObject
classes in the Paradicms logical data models, respectively.
The class directories can also be named with variants of the class names:
FoafPerson
: the exact class name (camel case) documented in the logical data models referencefoaf_person
: snake case variant of the class namefoaf-person
: spinal case variant of the class name
The listing above uses spinal case (foaf-person
).
Files in a class directory
./foaf-person/Linus Torvalds.md
in the listing above is a Markdown file describing an instance of the class FoafPerson
. The "File format" section below documents the format of these files in detail.
Image data
Image data (.jpg
, .png
) should sit directly alongside the associated Image
(metadata) in directory corresponding to an Image
implementation, as in these files from the listing above:
./dc-image/World Wide Web.png
./dc-image/World Wide Web.yaml
File formats
Files in the directory structure can use a variety of formats. The following sections document the formats as well as the process of converting the formats to RDF for use in Paradicms. Converting each file in the directory tree to RDF produces a set of RDF graphs that can be consumed by Paradicms apps.
JSON files
As noted above, the directory structure and naming conventions mean that every file is associated with a class in the Paradicms logical data models. Each class has an associated JSON-LD context.
Paradicms converts JSON (.json
) files to RDF by interpreting them JSON-LD. A JSON file is expected to have a single top-level object ({}
). Paradicms adds the JSON-LD context (as a @context
key) corresponding to the file's associated class to this top-level object before interpreting the latter as JSON-LD. The JSON-LD context maps keys in the JSON object, such as creator
, to RDF predicate IRIs, in this case http://purl.org/dc/terms/creator
.
YAML files
YAML (.yaml
or .yml
) files are treated similarly to JSON, since the two formats have very similar data model. The YAML file is parsed, a @context
key is added to the top-level object, and the file is interpreted as JSON-LD.
sameAs: http://www.wikidata.org/entity/Q34253
Markdown files
The following code block shows an abridged version of the Markdown file dc-physical-object/Linux.md
:
---
creator: md-foaf-person:Linus%20Torvalds
sameAs: http://www.wikidata.org/entity/Q388
title: Linus Torvalds begins work on the Linux kernel
---
Frustrated by the limitations of existing operating systems and curious about kernel development, Linus Torvalds begins work on what eventually becomes the Linux kernel.
A Markdown file consists of:
- an optional YAML front matter block delimited by
---
and---
- paragraphs under a labeled
#
heading (not present in the example) - paragraphs not under a labeled
#
heading (Frustrated ...
in the example)
In order to convert a Markdown file to RDF, it is first converted to JSON through the following process:
- If present, YAML front matter is converted as-is to a root JSON object; otherwise an empty root JSON object (
{}
) is synthesized. - Markdown paragraphs are added to the root JSON object following these rules:
- A paragraph under a Markdown heading with the format
# [Your heading](#anykey)
is converted to an HTML string, and either- Assigned to
anykey
in the root JSON object ifanykey
does not exist in that object - Concatenated to the existing string value of
anykey
if it does
- Assigned to
- A paragraph under no heading, like the
Frustrated ...
paragraph in the example above, has the implicit keydescription
, and otherwise follows the rules of paragraphs with explicit keys. - When a paragraph's key (explicit or implicit) was already present in the YAML front matter, the combined metadata and paragraph text(s) are treated as an instance of a
Text
class, with the paragraph text(s) forming thevalue
.
- A paragraph under a Markdown heading with the format
Converting dc-physical-work/Linux.md
would result in the following JSON:
{
"creator": "md-foaf-person:Linus%20Torvalds",
"description": "Frustrated by the limitations of existing operating systems and curious about kernel development, Linus Torvalds begins work on what eventually becomes the Linux kernel.",
"title": "Linus Torvalds begins work on the Linux kernel"
}
Note the creator
property's IRI value: md-foaf-person:Linus%20Torvalds
refers to the FoafPerson
instance in the Markdown file foaf-person/Linus Torvalds.md
. The file extension is dropped and the space in the remaining file stem is URL-encoded to %20
in order to conform to IRI rules.
The Markdown-derived JSON is then converted to RDF by interpreting it as JSON-LD, in the same manner JSON files are converted to RDF. The dc-physical-object/Linux.md
Markdown would thus produce the following small RDF graph (in Turtle format):
<urn:directory:ComputerScienceInventions:dc-physical-object:Linux> a dcmitype:PhysicalObject ;
dcterms:creator <urn:directory:ComputerScienceInventions:person:Linus%20Torvalds> ;
dcterms:description "<p>Frustrated by the limitations of existing operating systems and curious about kernel development, Linus Torvalds begins work on what eventually becomes the Linux kernel.</p>" ;
dcterms:title "Linus Torvalds begins work on the Linux kernel" .
}
Implied RDF
The DcPhysicalObject
RDF graph above contains more information than could be directly mapped from the contents of dc-physical-object/Linux.md
. For example, the file does not explicitly state an rdf:type
, yet it is present in the RDF graph.
Paradicms has a number of rules for inferring parts of the graph associated with a file:
- The subject of a file (
urn:directory:ComputerScienceInventions:dc-physical-object:Linux
in the RDF graph above) is automatically synthesized from- a dataset identifier, which is usually the name of the GitHub repository (here
ComputerScienceInventions
) - the class corresponding to the directory where the file resides (here
DcPhysicalObject
) - the file stem (here
Linux
)
- a dataset identifier, which is usually the name of the GitHub repository (here
- The
rdf:type
of the subject (thea dcmitype:PhysicalObject
statement) is the class corresponding to the directory where the file resides (dc-physical-object
). - If no label property (e.g.,
dcterms:title
) is specified in the file, it is adapted from the file stem (Linux.md
->"Linux"
). - All
Work
s are assumed to belong to at least oneCollection
. If the directory has a singleCollection
doesn't explicitly link to anyWork
s, then it is implicitly linked to all definedWork
s. If noCollection
is defined, one is synthesized, and it links to allWork
s. - An image file (
.jpg
,.png
, et al.) placed alongside a metadata file with the same stem (image/Linux.png
andimage/Linux.md
) is assumed to be thesrc
of thatImage
- An
Image
that has is not explicitly referenced will be implicitly associated with aCollection
,Work
, or other instance with the same file stem
Most of these rules can be overridden by explicitly specifying a property: adding a src
to Image
, for example, or including a label property in a Work
file instead of allowing the label to be inferred from the file stem. The rules are provided for convenience.