Active

Policy on the Assignment of Digital Object Identifiers (DOIs) to Personal Scholarly Works

Legal document with structured content including metadata, citations, and references. Use heading navigation to jump between sections or the table of contents for overview.

Abstract

This policy defines how the author assigns DOIs to personal scholarly digital objects in accordance with the FAIR principles. It specifies eligibility criteria, metadata requirements, versioning and governance processes, and preservation practices to ensure works are findable, accessible, interoperable, and reusable.

Visitor Summary: Trusted Metadata Architecture

When you access a work on this site assigned a DOI, you are interacting with a Scholarly Digital Object that is preserved, machine-readable, and rigorously described.

Key Technical Guarantees:

  1. Authoritative Identifiers: We utilize a DOI-First Policy. If a DOI exists, all machine-readable metadata (JSON-LD, Microformats) treats the DOI URL (e.g., https://doi.org/...) as the canonical identifier (@id), ensuring seamless integration with the global academic graph.
  2. Multi-Layered Metadata: To ensure maximum interoperability, every work is simultaneously published with:
    • Schema.org (JSON-LD): For search engines and knowledge graphs.
    • MODS 3.8 (XML): For library systems and Fedora/Islandora repositories.
    • Highwire Press: For Google Scholar, Zotero, and Mendeley.
    • PRISM & Dublin Core: For aggregators and syndication.
  3. Provenance & Integrity: All citations are verified against a local Unified Citation Cache (SQLite-backed) to prevent link rot and ensure metadata consistency. Changes are tracked via a transparent Version History system.
  4. FAIR Signposting & Discovery: The site emits typed signposting links (RFC 8288) as <link> tags in page <head> and generates static RFC 9264 JSON Link Sets (linkset.json) for scholarly objects at build time. A discovery catalog (FAIRiCat) is published at /.well-known/faircat.json. Linksets are generated deterministically via getSignpostingLinks(), validated with Zod, and integrated into the build pipeline (scripts/generate-faircat.mjs, scripts/verify-signposting.*). This enforces parity between HTML link relations and JSON Link Sets and improves automated discovery.

1. Preamble & Statement of Purpose

This document establishes the governance framework for assigning Digital Object Identifiers (DOIs) to scholarly digital objects created by the author. It serves as a personal commitment to the FAIR Guiding Principles (Findable, Accessible, Interoperable, and Reusable) [1]. It also references work from the RDA Data Foundation & Terminology (DFT) effort to align terminology and object modeling [2], and preemptively addresses common funder/institution requirements such as the Horizon Europe data-management guidance [3].

Unlike standard blogs, this platform operates as a Scholarly Knowledge Graph. It does not merely host content; it emits structured, semantic data that allows machines to parse, index, and cite the work with the same rigor as a formal academic journal. This architecture aligns with the Data Citation Roadmap for scholarly repositories outlined by Fenner et al. [4], ensuring that landing pages serve as persistent, metadata-rich entry points.

1.1. Scope

This policy applies to Scholarly Digital Objects (articles, software, datasets, policies) that meet the rigor required for permanent archiving.


2. To be Findable

The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers.

F1: (Meta)data are assigned a globally unique and persistent identifier

Principle: A globally unique, persistent, and resolvable identifier (PID) is the fundamental requirement for findability.

Implementation:

  • Primary PID: All eligible objects are assigned a DOI via Zenodo (DataCite) [5].
  • Local Canonical ID: The platform generates stable, canonical URLs that persist even if underlying infrastructure changes.
  • Sitemap Integration: A build-time system injects accurate lastmod dates derived from content frontmatter into the XML sitemap, ensuring search engines crawl the most recent versions.

F2: Data are described with rich metadata

Principle: The richer the metadata, the more likely an object is to be discovered.

Implementation: This platform utilizes a Multi-Layer Metadata System to describe every object:

  1. Descriptive: Title, Abstract, Keywords, Reading Time, Hub/Tier classification.
  2. Administrative: License (CC BY 4.0), Rights Holder, Version, Publication/Modification Dates.
  3. Structural: MODS 3.8 XML generation for library cataloging and PRISM metadata for publishing aggregators.
  4. Bibliographic: Highwire Press tags ensure immediate, accurate ingestion by reference managers (Zotero, Mendeley).

F3: Metadata clearly and explicitly include the identifier of the data it describes

Principle: The metadata record must unambiguously link to the specific digital object.

Implementation:

  • DOI-First Resolution: The system implements an Authoritative @id Strategy. When generating Schema.org JSON-LD, if a valid DOI is present in the frontmatter, the @id of the CreativeWork or ScholarlyArticle node is set to the DOI URL, not the local page URL. This hard-links the semantic data to the persistent identifier.
  • Cross-Walking: The local Unified Citation Cache ensures that the DOI recorded in the local frontmatter matches the DOI registered in the external repository (Zenodo).

F4: (Meta)data are registered or indexed in a searchable resource

Principle: Metadata must be placed in resources indexed by search engines.

Implementation:

  • Global Indexing: Via Zenodo, metadata is pushed to DataCite Commons, OpenAIRE, and Google Scholar.
  • Local Discovery: The site exposes Atom 1.0 feeds (bilingual en/ru) containing rich metadata and RDF-compliant license links, allowing aggregators to discover content updates immediately.

3. To be Accessible

Once the user finds the required data, they need to know how they can be accessed.

A1: (Meta)data are retrievable by their identifier using a standardized communications protocol

Principle: The PID should be resolvable using a standard, open protocol.

Implementation:

  • Protocol: HTTPS is enforced for all resources.
  • Accessibility Standards: The platform adheres to WCAG 2.2 Level AA. This ensures that the “Access” in “Accessible” applies to all humans, including those using assistive technologies. Features include semantic HTML5, aria-live regions for dynamic content, and skip-links.

A2: Metadata are accessible, even when the data are no longer available

Principle: Metadata should persist even if the object is removed.

Implementation:

  • Tombstoning: In the event of retraction, the local page is replaced with a “Tombstone” notice, but the MODS and JSON-LD metadata records remain accessible to verify the existence and retraction status of the work.
  • Archival Redundancy: Zenodo guarantees metadata persistence independent of this website’s uptime.

4. To be Interoperable

The data usually need to be integrated with other data and interoperate with applications.

I1: (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation

Principle: Metadata should be structured for machine parsing.

Implementation: The platform emits metadata in multiple formal languages simultaneously, adhering to the best practices for academic tagging described by Verrelli [6]:

  • JSON-LD: Serialized using the Schema.org vocabulary.
  • XML: Serialized using the MODS 3.8 standard.
  • Microformats 2.0: HTML classes (h-entry, h-cite, u-uid) allow parsers to extract data directly from the DOM.
  • RDF: Atom feeds include <link rel="license"> pointing to RDF URIs for Creative Commons licenses.

I2: (Meta)data use vocabularies that follow FAIR principles

Principle: Values within metadata should use terms from standard vocabularies.

Implementation:

  • Controlled Vocabularies: Keywords and subjects are mapped to Wikidata entities where possible [7].
  • Person Identifiers: Authors are identified via ORCID, ISNI, and VIAF URIs rather than just text strings.
  • License URIs: Licenses are referenced via canonical SPDX or Creative Commons URIs (e.g., https://creativecommons.org/licenses/by/4.0/).

I3: (Meta)data include qualified references to other (meta)data

Principle: Metadata should be richly interlinked.

Implementation:

  • Unified Citation System: The platform maintains a local SQLite cache of all cited works. This allows the generation of qualified references (e.g., dcterms:references, citation_reference) that are verified against CrossRef and DataCite.
  • Bidirectional Linking: Metadata records include IsIdenticalTo relations linking the local representation to the archival DOI.

5. To be Reusable

The ultimate goal of FAIR is to optimise the reuse of data.

R1: Meta(data) are richly described with a plurality of accurate and relevant attributes

Principle: Provide sufficient context for reuse.

R1.1: (Meta)data are released with a clear and accessible data usage license

Implementation:

  • Default License: CC BY 4.0 for content; CC0 for metadata.
  • Machine-Readability: The license is embedded in JSON-LD (license), Highwire tags, MODS XML (<accessCondition>), and Atom feeds (RFC 4946).

R1.2: (Meta)data are associated with detailed provenance

Implementation:

  • Version History: A dedicated Version History Manager tracks changes, diffs, and archival snapshots (Wayback Machine) for every document.
  • Image Provenance: An Image Metadata Preservation Service ensures that embedded XMP/IPTC rights metadata (Creator, Copyright) is preserved during image optimization and format conversion (e.g., JPG to AVIF).

R1.3: (Meta)data meet domain-relevant community standards

Implementation:

  • Security: The platform adheres to OWASP ASVS L3 standards for input validation and output encoding, ensuring the integrity of the delivered data.
  • Bibliographic: Citation exports are provided in BibTeX and RIS formats, conforming to standard academic workflows.

6. Metadata Coherence Strategy

To ensure consistency across the distributed web of data (Local Site ↔ Zenodo ↔ Search Engines):

  1. Canonical Source: The local frontmatter serves as the single source of truth.
  2. Build-Time Validation: A Metadata Validator runs during the build pipeline to enforce the presence of required fields (e.g., checking that a DOI is present if the content type is “Research”).
  3. Conflict Detection: The harvester detects if a local reference ID conflicts with an external authoritative ID (e.g., mismatched DOIs) and halts the build to prevent data corruption.

7. Implementation and Governance

7.1. Decision Authority

The author retains sole discretion over DOI assignment.

7.2. Quality Assurance

Automated CI/CD pipelines verify:

  • Schema validity (JSON-LD, MODS).
  • Link integrity (no 404s).
  • Accessibility compliance (WCAG).
  • Security headers (CSP).

7.3. Retraction

Retractions are handled transparently via the Version History system and metadata updates to the DOI registrar, ensuring the “tombstone” page remains discoverable.

7.4. FAIR Signposting System

The site implements a FAIR Signposting pipeline that emits RFC 8288 typed links in page <head> and generates RFC 9264 Link Sets (linkset.json) at build time for blog, legal, family, and author pages. Link Sets are validated against a Zod schema during the build and written to dist/.../linkset.json. A discovery catalog (FAIRiCat) is produced at /.well-known/faircat.json by scripts/generate-faircat.mjs. The signposting pipeline ensures relations such as cite-as, author, license, describedby, item, collection, type, and linkset are emitted and includes automated parity checks via scripts/verify-signposting-parity.mjs as part of secure builds (npm run build:cf:stats).

7.5. MODS Generation & Validation

The MODS generation system produces MODS 3.8 XML for all content collections and now includes DOI identifiers (<identifier type="doi">) when present. Outputs are placed under public/mods/{lang}/{collection}/{slug}.mods. The build uses an esbuild pre-bundling strategy (with fallbacks) to run the TypeScript generator (src/lib/mods-generator.ts) via scripts/generate-mods-metadata.mjs. Validation is performed by scripts/validate-mods.mjs (fast-xml-parser + xmllint) and is executed during secure builds.

7.6. WebFinger Identity & Discovery

A WebFinger endpoint is available at /.well-known/webfinger, implemented as a Cloudflare Pages Function (functions/.well-known/webfinger.ts). It serves JRD (application/jrd+json) responses derived from the canonical person entity (src/data/person_david_osipov.ts) and supports acct:, mailto:, and https: resources with optional rel filtering. Responses include subject, aliases, properties, and links (for example: me, profile-page, avatar, mailto, pgpkey, updates-from). Security headers and CORS for the endpoint are configured in public/_headers.

References

7 source(s)
  1. article
    Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., et al. . The FAIR Guiding Principles for scientific data management and stewardship . Scientific Data , 3 (1) .
    View all 51 authors
    Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J. W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J. G., Groth, P., Goble, C., Grethe, J. S., Heringa, J., Hoen, P. A. C. 't, Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S. A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K.
  2. report
    Berg-Cross, G., Ritz, S., Wittenburg, P. . RDA Data Foundation and Terminology (DFT): Results RFC .
Document version history with 6 versions. Click on any row to view detailed changes for that version.

Version History 6

VersionStatusDateActions