Choosing entity_id and entity_uri for external references
When you annotate data with an external resource using
HERD.add_ref, each reference records two
fields that identify the external term:
entity_idA compact identifier (a CURIE) of the form
prefix:identifier(e.g.NCBITaxon:10090). Theprefixnames the registry or ontology and theidentifieris the term’s accession within it.entity_uriThe full URL that the
entity_idresolves to — a persistent, dereferenceable web address for that exact term.
Recommended practice
Use a CURIE for
entity_id. Prefer an identifier whoseprefixis registered with bioregistry.io. The Bioregistry is a comprehensive registry of prefixes that maps each CURIE to a canonical, resolvable URL, which avoids the ambiguity of the many overlapping identifier schemes (e.g.NCBITaxonvs.taxonomyvs.NCBI_TAXON).Use the resolved URL for
entity_uri. Theentity_urishould be the URL that the CURIE resolves to. You can look this up by resolving the CURIE through the Bioregistry: visitinghttps://bioregistry.io/<entity_id>(for examplehttps://bioregistry.io/NCBITaxon:10090) redirects to the canonical provider URL, which is the value to store inentity_uri.
Keeping entity_id and entity_uri consistent in this way means a reader can both
recognize the registry from the compact entity_id and dereference the entity_uri to land
on an authoritative description of the term.
Commonly used registries
All of the registries below are registered with the Bioregistry. The entity_uri column shows
the canonical URL the example entity_id resolves to.
Prefix |
Use for |
Common NWB field(s) |
Example |
Example |
|---|---|---|---|---|
|
Species |
|
|
|
|
Organizations / institutions |
|
|
|
|
People (researchers) |
|
|
|
|
Brain regions (cross-species) |
Brain-region location fields [1] |
|
|
|
Brain regions (Allen Mouse Brain Atlas) |
Brain-region location fields [1] |
|
|
|
Brain regions (Allen Human Brain Atlas) |
Brain-region location fields [1] |
|
|
|
Dandisets |
(identifies the dataset as a whole) |
|
|
Example
# the species of the subject, mapped to NCBI Taxonomy
herd.add_ref(
container=nwbfile.subject,
attribute="species",
key="Mus musculus",
entity_id="NCBITaxon:10090",
entity_uri="http://purl.obolibrary.org/obo/NCBITaxon_10090",
)
Resources without individually resolvable URLs
Some resources do not provide a dereferenceable URL for each individual term. For example, many brain atlases (such as the macaque D99 atlas) publish a single document or download for the whole atlas rather than one persistent URL per region.
In that case:
Put the URL of the resource as a whole in
entity_uri(e.g. the atlas’s landing or download page).Put the resource’s identifier for the specific term — for example, the brain area ID used by the atlas — in
entity_id.
This keeps every reference dereferenceable to something authoritative (the resource) while still recording the precise term identifier, even when a per-term URL does not exist.
# a region from an atlas that has no per-region URL: identify the region by its
# atlas-specific ID and point entity_uri at the atlas itself
herd.add_ref(
container=electrodes_table,
attribute="location",
key="area_42",
entity_id="42",
entity_uri="https://afni.nimh.nih.gov/pub/dist/atlases/macaque/D99_macaque/",
)
See also
HERD for the full API, and
HERD.add_ref for adding references.