This document is a dissertation conducted at the University of Hamburg with applications in the Earth Sciences in the context of CMIP6 and the Earth System Grid Federation.
Abstract: The advent of widespread developments colloquially subsumed under the notion of data-intensive science poses challenges for data management and end-user applications. The observed increase in volume, variety and number of data objects requires large data infrastructures used for research data management today to further automate their operating workflows. Not only in the Earth sciences, data infrastructures typically rely on distributed middleware services, which must become more scalable and provide more trustworthy and precise information. A widely discussed approach for addressing these challenges is to employ persistent identifiers, traditionally used in the library sciences and scholarly publishing. Such identifiers give globally unique names to digital objects, making them referenceable and possibly accessible independent from their actual storage location. They can also be used by both human and machine agents to retrieve essential state information about the objects.
However, the concept of persistent identification has so far not evolved enough to adequately address data infrastructure challenges. Therefore, this thesis presents a conceptual framework for understanding persistent identifiers in this new context, facilitating solutions that unify access to state information and increase interoperability between distributed identifier systems. Based on a formal model, the conceptual framework defines distinct classification criteria that clarify the differences between identifier usage scenarios and can shape suitable policies of identifier providers. To facilitate interoperability and support scenarios geared towards machine agents, the framework further advocates the use of types to structure state information and to construct digital object collections with unified operations. Existing solutions are shown to partially match the conceptual framework or be adequately extendable, and exemplary Earth science data management usage scenarios can be enabled through its mechanisms. The framework contributes to ongoing international efforts to establish a coherent digital object infrastructure driven by practical needs. In the context of Linked Data, the framework can provide a foundational unification layer and foster the adoption of persistent identifiers for web-based applications.