Skip to main content

Reference: directory format

This page documents the use of files for storing Paradicms collection data.

Directory structure

Collection data in the Paradicms directory format consists of:

  • a single root directory, typically the root of a GitHub repository
  • a set of files in the root directory, corresponding to singleton instances of classes in the Paradicms logical data models
  • a set of subdirectories of the root directory, corresponding to a class from one of the Paradicms logical data models
  • a set of files in each class subdirectory, corresponding to instances of the class

Listing the root directory of the template repository illustrates the structure:

./app-configuration.yaml
./dc-collection.yaml
./dc-image/Difference engine.yaml
./dc-image/Donald Knuth.yaml
./dc-image/Douglas Engelbart.yaml
./dc-image/Linus Torvalds.yaml
./dc-image/Linux.png
./dc-image/Linux.yaml
./dc-image/TeX.png
./dc-image/TeX.yaml
./dc-image/Tim Berners-Lee.yaml
./dc-image/World Wide Web.png
./dc-image/World Wide Web.yaml
./dc-image/oN-Line System.yaml
./dc-physical-object/Difference engine.md
./dc-physical-object/Linux.md
./dc-physical-object/TeX.md
./dc-physical-object/World Wide Web.md
./dc-physical-object/oN-Line System.md
./foaf-person/Charles Babbage.yaml
./foaf-person/Donald Knuth.yaml
./foaf-person/Douglas Engelbart.yaml
./foaf-person/Linus Torvalds.yaml
./foaf-person/Tim Berners-Lee.yaml

Singleton files

In the listing above, ./dc-collection.yaml corresponds to a singleton instance of DcCollection, the Dublin Core logical model of a Paradicms Collection conceptual model.

The stem (dc-collection) of the file is named after the class, or a variant of the class name:

  • DcCollection: the exact class name (camel case) documented in the logical data models reference
  • dc_collection: snake case variant of the class name
  • dc-collection: spinal case variant of the class name

Class directories

In the listing above, ./foaf-person and ./dc-physical-object are class directories corresponding to the FoafPerson and DcPhysicalObject classes in the Paradicms logical data models, respectively.

The class directories can also be named with variants of the class names:

  • FoafPerson: the exact class name (camel case) documented in the logical data models reference
  • foaf_person: snake case variant of the class name
  • foaf-person: spinal case variant of the class name

The listing above uses spinal case (foaf-person).

Files in a class directory

./foaf-person/Linus Torvalds.md in the listing above is a Markdown file describing an instance of the class FoafPerson. The "File format" section below documents the format of these files in detail.

Image data

Image data (.jpg, .png) should sit directly alongside the associated Image (metadata) in directory corresponding to an Image implementation, as in these files from the listing above:

./dc-image/World Wide Web.png
./dc-image/World Wide Web.yaml

File formats

Files in the directory structure can use a variety of formats. The following sections document the formats as well as the process of converting the formats to RDF for use in Paradicms. Converting each file in the directory tree to RDF produces a set of RDF graphs that can be consumed by Paradicms apps.

JSON files

As noted above, the directory structure and naming conventions mean that every file is associated with a class in the Paradicms logical data models. Each class has an associated JSON-LD context.

Paradicms converts JSON (.json) files to RDF by interpreting them JSON-LD. A JSON file is expected to have a single top-level object ({}). Paradicms adds the JSON-LD context (as a @context key) corresponding to the file's associated class to this top-level object before interpreting the latter as JSON-LD. The JSON-LD context maps keys in the JSON object, such as creator, to RDF predicate IRIs, in this case http://purl.org/dc/terms/creator.

YAML files

YAML (.yaml or .yml) files are treated similarly to JSON, since the two formats have very similar data model. The YAML file is parsed, a @context key is added to the top-level object, and the file is interpreted as JSON-LD.

foaf-person/Linus Torvalds.yaml
sameAs: http://www.wikidata.org/entity/Q34253

Markdown files

The following code block shows an abridged version of the Markdown file dc-physical-object/Linux.md:

dc-physical-object/Linux.md
---
creator: md-foaf-person:Linus%20Torvalds
sameAs: http://www.wikidata.org/entity/Q388
title: Linus Torvalds begins work on the Linux kernel
---

Frustrated by the limitations of existing operating systems and curious about kernel development, Linus Torvalds begins work on what eventually becomes the Linux kernel.

A Markdown file consists of:

  1. an optional YAML front matter block delimited by --- and ---
  2. paragraphs under a labeled # heading (not present in the example)
  3. paragraphs not under a labeled # heading (Frustrated ... in the example)

In order to convert a Markdown file to RDF, it is first converted to JSON through the following process:

  • If present, YAML front matter is converted as-is to a root JSON object; otherwise an empty root JSON object ({}) is synthesized.
  • Markdown paragraphs are added to the root JSON object following these rules:
    • A paragraph under a Markdown heading with the format # [Your heading](#anykey) is converted to an HTML string, and either
      • Assigned to anykey in the root JSON object if anykey does not exist in that object
      • Concatenated to the existing string value of anykey if it does
    • A paragraph under no heading, like the Frustrated ... paragraph in the example above, has the implicit key description, and otherwise follows the rules of paragraphs with explicit keys.
    • When a paragraph's key (explicit or implicit) was already present in the YAML front matter, the combined metadata and paragraph text(s) are treated as an instance of a Text class, with the paragraph text(s) forming the value.

Converting dc-physical-work/Linux.md would result in the following JSON:

{
"creator": "md-foaf-person:Linus%20Torvalds",
"description": "Frustrated by the limitations of existing operating systems and curious about kernel development, Linus Torvalds begins work on what eventually becomes the Linux kernel.",
"title": "Linus Torvalds begins work on the Linux kernel"
}

Note the creator property's IRI value: md-foaf-person:Linus%20Torvalds refers to the FoafPerson instance in the Markdown file foaf-person/Linus Torvalds.md. The file extension is dropped and the space in the remaining file stem is URL-encoded to %20 in order to conform to IRI rules.

The Markdown-derived JSON is then converted to RDF by interpreting it as JSON-LD, in the same manner JSON files are converted to RDF. The dc-physical-object/Linux.md Markdown would thus produce the following small RDF graph (in Turtle format):

<urn:directory:ComputerScienceInventions:dc-physical-object:Linux> a dcmitype:PhysicalObject ;
dcterms:creator <urn:directory:ComputerScienceInventions:person:Linus%20Torvalds> ;
dcterms:description "<p>Frustrated by the limitations of existing operating systems and curious about kernel development, Linus Torvalds begins work on what eventually becomes the Linux kernel.</p>" ;
dcterms:title "Linus Torvalds begins work on the Linux kernel" .
}

Implied RDF

The DcPhysicalObject RDF graph above contains more information than could be directly mapped from the contents of dc-physical-object/Linux.md. For example, the file does not explicitly state an rdf:type, yet it is present in the RDF graph.

Paradicms has a number of rules for inferring parts of the graph associated with a file:

  • The subject of a file (urn:directory:ComputerScienceInventions:dc-physical-object:Linux in the RDF graph above) is automatically synthesized from
    • a dataset identifier, which is usually the name of the GitHub repository (here ComputerScienceInventions)
    • the class corresponding to the directory where the file resides (here DcPhysicalObject)
    • the file stem (here Linux)
  • The rdf:type of the subject (the a dcmitype:PhysicalObject statement) is the class corresponding to the directory where the file resides (dc-physical-object).
  • If no label property (e.g., dcterms:title) is specified in the file, it is adapted from the file stem (Linux.md -> "Linux").
  • All Works are assumed to belong to at least one Collection. If the directory has a single Collection doesn't explicitly link to any Works, then it is implicitly linked to all defined Works. If no Collection is defined, one is synthesized, and it links to all Works.
  • An image file (.jpg, .png, et al.) placed alongside a metadata file with the same stem (image/Linux.png and image/Linux.md) is assumed to be the src of that Image
  • An Image that has is not explicitly referenced will be implicitly associated with a Collection, Work, or other instance with the same file stem

Most of these rules can be overridden by explicitly specifying a property: adding a src to Image, for example, or including a label property in a Work file instead of allowing the label to be inferred from the file stem. The rules are provided for convenience.