SQR-020: Expressing LSST Project Metadata with JSON-LD

  • Jonathan Sick

Latest Revision: 2018-01-08

Note

This technote is a work in progress.

1   Abstract

This technote explores how JSON-LD (Linked Data) can be used to describe a variety of LSST project artifacts, including source code and documents. We provide specific examples using standard vocabularies (http://schema.org and CodeMeta) and explore whether custom terms are needed to support LSST use cases.

2   Technotes

Technotes are documents published by LSST as websites. Technotes are also software-like, with code repositories and continuous integration services.

2.1   Technote as a Report type

To simply describe a technote as a document, we can use the schema.org Report type. Report extends both the Article and CreativeWork types, and specifically provides a reportNumber field. We’ll use that reportNumber field to hold the technote’s handle, such as SQR-020 for this technote.

This example shows how SQR-006 can be described in JSON-LD as a schema.org Report:

{
  "@context": "http://schema.org",
  "@type": "Report",
  "@id": "https://sqr-006.lsst.io",
  "name": "The LSST the Docs Platform for Continuous Documentation Delivery",
  "reportNumber": "SQR-006",
  "url": "https://sqr-006.lsst.io",
  "license": "http://creativecommons.org/licenses/by/4.0/",
  "description": "The document describes the backend services that deploy documentation to the web whenever the Jenkins continuous integration system is triggered. The service is specifically designed for the Eups and Scons-based build tooling used by the LSST Stack.",
  "author": [
    {
      "@type": "Person",
      "name": "Jonathan Sick"
    }
  ],
  "copyrightHolder": {
    "@type": "Organization",
    "name": "Association of Universities for Research in Astronomy, Inc."
  },
  "copyrightYear": 2016,
  "dateModified": "2016-07-28",
  "citation": [
    {
      "@type": "Book",
      "name": "Site Reliability Engineering: How Google Runs Production Systems",
      "author": [
        {
          "@type": "Person",
          "name": "Niall Richard Murphy",
          "position": 1
        },
        {
          "@type": "Person",
          "name": "Jennifer Petoff",
          "position": 2
        },
        {
          "@type": "Person",
          "name": "Chris Jones",
          "position": 3
        },
        {
          "@type": "Person",
          "name": "Betsy Beyer",
          "position": 4
        }
      ],
      "publisher": "O'Reilly Media",
      "copyrightYear": 2016
    }
  ]
}

Report types allow for more terms that we’ve used here. For example, the articleBody field can contain the full plain-text contents of the document. This could be useful for full-text search services.

2.2   Technote with additional SoftwareSourceCode terms

Technotes are also code, as we mentioned previously. We can incorporate metadata about the technote’s underlying code infrastructure by adding the schema.org SoftwareSourceCode type. JSON-LD allows nodes to have an array of types, so that our technote can effectively become both a document and a code project, with terms from both types.

For code projects, we can also use the expanded term vocabulary from the CodeMeta project. CodeMeta and SoftwareSourceCode provide these terms, (among others):

  • programmingLanguage. Name of the dominant programming language. We should use GitHub Linguist’s languages.yml as a controlled vocabulary for language names.
  • codeRepository. Link to the GitHub repository.
  • contIntegration (CodeMeta). URL of the continuous integration service dashboard.
  • readme (CodeMeta): URL of the README file.
  • developmentStatus (CodeMeta): Description of development status. Use terms from http://repostatus.org.
  • maintainer (CodeMeta): the Person that is responsible for maintaining the repository.

This example shows SQR-006 described as a combined Report and SoftwareSourceCode:

{
  "@context": [
    "https://raw.githubusercontent.com/codemeta/codemeta/2.0-rc/codemeta.jsonld",
    "http://schema.org"
  ],
  "@type": ["Report", "SoftwareSourceCode"],
  "@id": "https://sqr-006.lsst.io",
  "name": "The LSST the Docs Platform for Continuous Documentation Delivery",
  "reportNumber": "SQR-006",
  "url": "https://sqr-006.lsst.io",
  "license": "http://creativecommons.org/licenses/by/4.0/",
  "description": "The document describes the backend services that deploy documentation to the web whenever the Jenkins continuous integration system is triggered. The service is specifically designed for the Eups and Scons-based build tooling used by the LSST Stack.",
  "programmingLanguage": "reStructuredText",
  "codeRepository": "https://github.com/lsst-sqre/sqr-006",
  "contIntegration": "https://travis-ci.org/lsst-sqre/sqr-006",
  "readme": "https://github.com/lsst-sqre/sqr-006/blob/master/README.rst",
  "developmentStatus": "Inactive",
  "maintainer": {
    "@type": "Person",
    "name": "Jonathan Sick"
  },
  "author": [
    {
      "@type": "Person",
      "name": "Jonathan Sick"
    }
  ],
  "copyrightHolder": {
    "@type": "Organization",
    "name": "Association of Universities for Research in Astronomy, Inc."
  },
  "copyrightYear": 2016,
  "dateModified": "2016-07-28",
  "citation": [
    {
      "@type": "Book",
      "name": "Site Reliability Engineering: How Google Runs Production Systems",
      "author": [
        {
          "@type": "Person",
          "name": "Niall Richard Murphy",
          "position": 1
        },
        {
          "@type": "Person",
          "name": "Jennifer Petoff",
          "position": 2
        },
        {
          "@type": "Person",
          "name": "Chris Jones",
          "position": 3
        },
        {
          "@type": "Person",
          "name": "Betsy Beyer",
          "position": 4
        }
      ],
      "publisher": "O'Reilly Media",
      "copyrightYear": 2016
    }
  ]
}