Visitor Summary: Trusted Metadata Architecture
When you access a work on this site assigned a DOI, you are interacting with a Scholarly Digital Object that is preserved, machine-readable, and rigorously described.
Key Technical Guarantees:
- Authoritative Identifiers: We utilize a DOI-First Policy. If a DOI exists, all machine-readable metadata (JSON-LD, Microformats) treats the DOI URL (e.g.,
https://doi.org/...) as the canonical identifier (@id), ensuring seamless integration with the global academic graph. - Multi-Layered Metadata: To ensure maximum interoperability, every work is simultaneously published with:
- Schema.org (JSON-LD): For search engines and knowledge graphs.
- MODS 3.8 (XML): For library systems and Fedora/Islandora repositories.
- Highwire Press: For Google Scholar, Zotero, and Mendeley.
- PRISM & Dublin Core: For aggregators and syndication.
- Provenance & Integrity: All citations are verified against a local Unified Citation Cache (SQLite-backed) to prevent link rot and ensure metadata consistency. Changes are tracked via a transparent Version History system.
- FAIR Signposting & Discovery: The site emits typed signposting links (RFC 8288) as
<link>tags in page<head>and generates static RFC 9264 JSON Link Sets (linkset.json) for scholarly objects at build time. A discovery catalog (FAIRiCat) is published at/.well-known/faircat.json. Linksets are generated deterministically viagetSignpostingLinks(), validated with Zod, and integrated into the build pipeline (scripts/generate-faircat.mjs,scripts/verify-signposting.*). This enforces parity between HTML link relations and JSON Link Sets and improves automated discovery.
1. Preamble & Statement of Purpose
This document establishes the governance framework for assigning Digital Object Identifiers (DOIs) to scholarly digital objects created by the author. It serves as a personal commitment to the FAIR Guiding Principles (Findable, Accessible, Interoperable, and Reusable) [1]. It also references work from the RDA Data Foundation & Terminology (DFT) effort to align terminology and object modeling [2], and preemptively addresses common funder/institution requirements such as the Horizon Europe data-management guidance [3].
Unlike standard blogs, this platform operates as a Scholarly Knowledge Graph. It does not merely host content; it emits structured, semantic data that allows machines to parse, index, and cite the work with the same rigor as a formal academic journal. This architecture aligns with the Data Citation Roadmap for scholarly repositories outlined by Fenner et al. [4], ensuring that landing pages serve as persistent, metadata-rich entry points.
1.1. Scope
This policy applies to Scholarly Digital Objects (articles, software, datasets, policies) that meet the rigor required for permanent archiving.
2. To be Findable
The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers.
F1: (Meta)data are assigned a globally unique and persistent identifier
Principle: A globally unique, persistent, and resolvable identifier (PID) is the fundamental requirement for findability.
Implementation:
- Primary PID: All eligible objects are assigned a DOI via Zenodo (DataCite) [5].
- Local Canonical ID: The platform generates stable, canonical URLs that persist even if underlying infrastructure changes.
- Sitemap Integration: A build-time system injects accurate
lastmoddates derived from content frontmatter into the XML sitemap, ensuring search engines crawl the most recent versions.
F2: Data are described with rich metadata
Principle: The richer the metadata, the more likely an object is to be discovered.
Implementation: This platform utilizes a Multi-Layer Metadata System to describe every object:
- Descriptive: Title, Abstract, Keywords, Reading Time, Hub/Tier classification.
- Administrative: License (CC BY 4.0), Rights Holder, Version, Publication/Modification Dates.
- Structural: MODS 3.8 XML generation for library cataloging and PRISM metadata for publishing aggregators.
- Bibliographic: Highwire Press tags ensure immediate, accurate ingestion by reference managers (Zotero, Mendeley).
F3: Metadata clearly and explicitly include the identifier of the data it describes
Principle: The metadata record must unambiguously link to the specific digital object.
Implementation:
- DOI-First Resolution: The system implements an Authoritative @id Strategy. When generating Schema.org JSON-LD, if a valid DOI is present in the frontmatter, the
@idof theCreativeWorkorScholarlyArticlenode is set to the DOI URL, not the local page URL. This hard-links the semantic data to the persistent identifier. - Cross-Walking: The local Unified Citation Cache ensures that the DOI recorded in the local frontmatter matches the DOI registered in the external repository (Zenodo).
F4: (Meta)data are registered or indexed in a searchable resource
Principle: Metadata must be placed in resources indexed by search engines.
Implementation:
- Global Indexing: Via Zenodo, metadata is pushed to DataCite Commons, OpenAIRE, and Google Scholar.
- Local Discovery: The site exposes Atom 1.0 feeds (bilingual en/ru) containing rich metadata and RDF-compliant license links, allowing aggregators to discover content updates immediately.
3. To be Accessible
Once the user finds the required data, they need to know how they can be accessed.
A1: (Meta)data are retrievable by their identifier using a standardized communications protocol
Principle: The PID should be resolvable using a standard, open protocol.
Implementation:
- Protocol: HTTPS is enforced for all resources.
- Accessibility Standards: The platform adheres to WCAG 2.2 Level AA. This ensures that the “Access” in “Accessible” applies to all humans, including those using assistive technologies. Features include semantic HTML5, aria-live regions for dynamic content, and skip-links.
A2: Metadata are accessible, even when the data are no longer available
Principle: Metadata should persist even if the object is removed.
Implementation:
- Tombstoning: In the event of retraction, the local page is replaced with a “Tombstone” notice, but the MODS and JSON-LD metadata records remain accessible to verify the existence and retraction status of the work.
- Archival Redundancy: Zenodo guarantees metadata persistence independent of this website’s uptime.
4. To be Interoperable
The data usually need to be integrated with other data and interoperate with applications.
I1: (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
Principle: Metadata should be structured for machine parsing.
Implementation: The platform emits metadata in multiple formal languages simultaneously, adhering to the best practices for academic tagging described by Verrelli [6]:
- JSON-LD: Serialized using the Schema.org vocabulary.
- XML: Serialized using the MODS 3.8 standard.
- Microformats 2.0: HTML classes (
h-entry,h-cite,u-uid) allow parsers to extract data directly from the DOM. - RDF: Atom feeds include
<link rel="license">pointing to RDF URIs for Creative Commons licenses.
I2: (Meta)data use vocabularies that follow FAIR principles
Principle: Values within metadata should use terms from standard vocabularies.
Implementation:
- Controlled Vocabularies: Keywords and subjects are mapped to Wikidata entities where possible [7].
- Person Identifiers: Authors are identified via ORCID, ISNI, and VIAF URIs rather than just text strings.
- License URIs: Licenses are referenced via canonical SPDX or Creative Commons URIs (e.g.,
https://creativecommons.org/licenses/by/4.0/).
I3: (Meta)data include qualified references to other (meta)data
Principle: Metadata should be richly interlinked.
Implementation:
- Unified Citation System: The platform maintains a local SQLite cache of all cited works. This allows the generation of qualified references (e.g.,
dcterms:references,citation_reference) that are verified against CrossRef and DataCite. - Bidirectional Linking: Metadata records include
IsIdenticalTorelations linking the local representation to the archival DOI.
5. To be Reusable
The ultimate goal of FAIR is to optimise the reuse of data.
R1: Meta(data) are richly described with a plurality of accurate and relevant attributes
Principle: Provide sufficient context for reuse.
R1.1: (Meta)data are released with a clear and accessible data usage license
Implementation:
- Default License: CC BY 4.0 for content; CC0 for metadata.
- Machine-Readability: The license is embedded in JSON-LD (
license), Highwire tags, MODS XML (<accessCondition>), and Atom feeds (RFC 4946).
R1.2: (Meta)data are associated with detailed provenance
Implementation:
- Version History: A dedicated Version History Manager tracks changes, diffs, and archival snapshots (Wayback Machine) for every document.
- Image Provenance: An Image Metadata Preservation Service ensures that embedded XMP/IPTC rights metadata (Creator, Copyright) is preserved during image optimization and format conversion (e.g., JPG to AVIF).
R1.3: (Meta)data meet domain-relevant community standards
Implementation:
- Security: The platform adheres to OWASP ASVS L3 standards for input validation and output encoding, ensuring the integrity of the delivered data.
- Bibliographic: Citation exports are provided in BibTeX and RIS formats, conforming to standard academic workflows.
6. Metadata Coherence Strategy
To ensure consistency across the distributed web of data (Local Site ↔ Zenodo ↔ Search Engines):
- Canonical Source: The local frontmatter serves as the single source of truth.
- Build-Time Validation: A Metadata Validator runs during the build pipeline to enforce the presence of required fields (e.g., checking that a DOI is present if the content type is “Research”).
- Conflict Detection: The harvester detects if a local reference ID conflicts with an external authoritative ID (e.g., mismatched DOIs) and halts the build to prevent data corruption.
7. Implementation and Governance
7.1. Decision Authority
The author retains sole discretion over DOI assignment.
7.2. Quality Assurance
Automated CI/CD pipelines verify:
- Schema validity (JSON-LD, MODS).
- Link integrity (no 404s).
- Accessibility compliance (WCAG).
- Security headers (CSP).
7.3. Retraction
Retractions are handled transparently via the Version History system and metadata updates to the DOI registrar, ensuring the “tombstone” page remains discoverable.
7.4. FAIR Signposting System
The site implements a FAIR Signposting pipeline that emits RFC 8288 typed links in page <head> and generates RFC 9264 Link Sets (linkset.json) at build time for blog, legal, family, and author pages. Link Sets are validated against a Zod schema during the build and written to dist/.../linkset.json. A discovery catalog (FAIRiCat) is produced at /.well-known/faircat.json by scripts/generate-faircat.mjs. The signposting pipeline ensures relations such as cite-as, author, license, describedby, item, collection, type, and linkset are emitted and includes automated parity checks via scripts/verify-signposting-parity.mjs as part of secure builds (npm run build:cf:stats).
7.5. MODS Generation & Validation
The MODS generation system produces MODS 3.8 XML for all content collections and now includes DOI identifiers (<identifier type="doi">) when present. Outputs are placed under public/mods/{lang}/{collection}/{slug}.mods. The build uses an esbuild pre-bundling strategy (with fallbacks) to run the TypeScript generator (src/lib/mods-generator.ts) via scripts/generate-mods-metadata.mjs. Validation is performed by scripts/validate-mods.mjs (fast-xml-parser + xmllint) and is executed during secure builds.
7.6. WebFinger Identity & Discovery
A WebFinger endpoint is available at /.well-known/webfinger, implemented as a Cloudflare Pages Function (functions/.well-known/webfinger.ts). It serves JRD (application/jrd+json) responses derived from the canonical person entity (src/data/person_david_osipov.ts) and supports acct:, mailto:, and https: resources with optional rel filtering. Responses include subject, aliases, properties, and links (for example: me, profile-page, avatar, mailto, pgpkey, updates-from). Security headers and CORS for the endpoint are configured in public/_headers.