Persistent Identifier (PID) Definition

I’m interested to learn how others/organizations define the term persistent identifier (PID). Here’s the current definition we use:

A PID is a digital identifier that is globally unique, persistent, machine resolvable, has an associated metadata schema, identifies an entity (e.g., individual researcher, publication, award, digital research output, organization) in perpetuity, and is frequently used to disambiguate between entities.

1 Like

This part - " has an associated metadata schema" is a bit sensitive, @carly.robinson , for me personally. Handles, for instance, are just pointers, they don’t come with a metadata schema, still PIDs though…

Also, " in perpetuity" in kinda a big statement, PIDs always break and need fixing and redirecting…

Eugene

2 Likes

I agree that the metadata schema requirement may be up for debate. ARKs for example don’t necessarily need to have metadata attached.

1 Like

Thanks for raising this @carly.robinson! As mentioned in the call yesterday, I (rather lazily, admittedly!) usually use the Wikipedia definition - or a version of it: " A persistent identifier (PID) is a long-lasting reference to a digital object. Typically, such an identifier is not only persistent but actionable." It would be great to agree a PID Forum community definition that we could all use! I’ve just invited feedback on Twitter too…

1 Like

Thank you for inviting feedback on Twitter! Agreed, it would be great if there was a community definition.

1 Like

Thanks so much for sharing! I really appreciate the comment from you and @sheila.rabun about the associated metadata schema being sensitive in the definition. It would be great to hear more about use cases where an associated metadata schema isn’t needed - e.g. using Handles or ARKs vs. another PID.

For our use cases, the associated metadata is key for creating connections between PIDs. For example, we assign DOIs to datasets. In the dataset DOI metadata we want to include the ORCID iDs for the data creator, the ROR ID for the data creator’s affiliation, the ROR ID for the funding organization, an Award DOI for the funding, associated journal article DOIs in related identifiers, etc. Without the associated metadata schema, we wouldn’t be able to create those connections.

One example would be licensed data where copies are available in multiple places and we don’t want the metadata being exposed broadly not to confuse users. In our Abacus Dataverse - https://abacus.library.ubc.ca/, we share more than 40,000 data files with handles only. For research data and research objects, we do mint DOIs, in fact, more than 260,000 of them in the last few years.

E.

1 Like

The definition that DiSSCo (Distributed System of Scientific Collections) uses is: “a persistent identifier is a string (functioning as a symbol/name) that identifies a digital object. The identifier can be persistently and reliably resolved to digitally actionable meaningful information about the identified digital object.

We don’t say that it is globally unique, but it otherwise reliable resolution won’t occur. Also, we don’t mention metadata because that is a characteristic of the object, not a characteristic of its identifier. As Eugene @0000-0002-5119-2271 said, Handles (which are the PIDs DiSSCo will use) are just pointers but only once they have a PID Record associated with them. Before that, they’re just names, like my name “Alex”. It doesn’t tell you anything about me or how/where to find me.

Metadata is often needed of course, to tell you something about the identified thing and to make connections to other things, as has been mentioned already.

It’s also not true to say that PIDs exist in perpetuity. They exist only for as long as they are needed, which can be a very long time (>100 years in the case of DiSSCo). How long, is a policy decision related to the purpose for which they are being used. There are use cases, for example where workflows can create huge numbers of PIDs for intermediate results during multiple parameters sweeps and data sweep executions that don’t need to be retained beyond the workflow runs.

However, while PIDs do exist they must persistently identify the correct thing and persistently resolve reliably and stably. It is the identifier that is persistent, not necessarily the thing to which it resolves, although in many use cases that is also the case.

2 Likes

Hi @carly.robinson, the DOI Foundation needed a broad definition in anticipation of a wide variety of use cases. This is what we came up with: A DOI is an identifier of an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks.

1 Like

Thanks @Jonathan_DOI! How do you define/think about persistence? Related to @Hardisty great comment - we were tying persistence to identifying an entity in perpetuity, but maybe that isn’t the correct way to think about it.