![A single taxonomy structure shown on two screens, the first in a spreadsheet, and the second in a custom taxonomy management tool in Sanity Studio..](https://cdn.sanity.io/images/7v0qvet6/production/66de94e98e4755bbf2139eaf75d385646148801f-3018x1538.png?w=375)
Sanity Studio Taxonomy Migration
In recent articles, I've discussed using AI to generate taxonomy terms and using a purpose-driven design process for shaping those terms into a taxonomy that meets business and user needs. None of this does any good, however, if your shiny new taxonomy sits moldering in a spreadsheet and never makes it to your production environment. (You would be alarmed by how often this happens. Alarmed.)
In this article I'll share the process I use to migrate a taxonomy from a spreadsheet to the taxonomy management tool in Sanity CMS. I realize, of course, that not everyone's into Sanity. So it goes. While this example will focus on a Sanity implementation, the principles and core elements of the migration approach are not platform dependent. If you need a starting point for migrating taxonomy terms on a different platform, read on—I'll give guidance along the way on how you can adapt this process to other contexts.
Embracing Standards
Whatever tech stack you use, embracing open, shared standards is a cornerstone of creating interoperable, reusable content ecosystems. The interoperability that comes with the use of standards is particularly important to the implementation of a taxonomy for (at least) two reasons.
First, taxonomies are living documents. To remain effective, they must change and evolve—sometimes growing, sometimes shrinking or consolidating to more effectively meet business needs. Embracing a standards-based approach helps you support this inherent change by making updates and evolution predictable and by allowing you to support your taxonomy with different tooling as your feature set, team, and business require.
Second, taxonomies often serve as the "connective tissue" between knowledge systems. Taxonomy is literally the shared vocabulary that allows different systems to mean the same thing when they say a particular thing. When your own taxonomy is based on predictable, shared standards, you'll be more likely to communicate directly with other standards-based applications, services, and vocabularies.
The Migration Script
The script we'll use to migrate terms from spreadsheet to CMS-based management tool relies on adherence to the W3C's Simple Knowledge Organization System (SKOS) standard in both the source spreadsheet and the destination tool. This is just one example of how following standards can make moving between environments smooth and painless (or at least less painful). If I eventually need to move this taxonomy out of Sanity in order to take advantage of the features of a standalone taxonomy management tool, working with a standards-based structure will make that move much less painful in turn.
At a high level, this script:
- defines a concept scheme based on script settings
- creates concepts by extracting preferred labels from the taxonomy hierarchy, mapping metadata to SKOS keys used by the Taxonomy Manager Plugin, and adding `broader` concept relationships based on hierarchy
- adds concepts to concept scheme array
- creates a single transaction to import into the designated Sanity content store
The core logical problem that makes this script more than just "content migration" happens in step two. This is where we map the hierarchy from the end-user-friendly spatial layout of the spreadsheet to a set of relationships that represent the formal, standards-based relationships the Taxonomy Manager plugin expects (and enforces).
// 3c. Map concepts to the hierarchy if (conceptLevel == 0) { // Top Concept hierarchy = [currentConceptRef] // reset hierarchy with new top concept previous = currentConceptRef // set previous to the current concept _ref for the next iteration topConceptSet = true } else if (conceptLevel == 1 && topConceptSet == false) { // L1 concept with no top concept hierarchy = [currentConceptRef] // reset hierarchy with new concept previous = currentConceptRef } else if ( conceptLevel - 1 == previous.level || // Child concept conceptLevel == previous.level || // Sibling concept conceptLevel < previous.level // Ancestor concept ) { mapChildConcept( hierarchy, currentConcept, broaderDetected, conceptLevel, currentConceptRef, toRemove, ) } else if (conceptLevel > previous.level + 1) { // Orphan concept: two or levels lower than previous term throw new Error('Inconsistent Hierarchy: Orphan term.') }
As this code block in the mapping function moves through each term in the supplied .csv
, it tracks which terms immediately precede it and assigns a level of hierarchy based on the relative position of each term. This allows us to detect child, sibling, and ancestor concepts, as well as new instances of Top Concepts or first-level concepts where no Top Concept is present. These relationships are then added as references to the broader
array of the term in question.
If you're importing your taxonomy into a different platform, the details of how you connect and upload to that system will change, but the basic structure of extracting the hierarchy and mapping it to a SKOS concept scheme should be adaptable—if not something you can just drop into your implementation (provided you're using standards-compliant tools, of course). Check out the full script on Sanity Exchange for annotations and explanations of the additional functions and processes that support this core structure.
The Workflow
I explored purpose-driven taxonomy design in a recent article. That post provides a high level process for deciding what terms to include in your taxonomy and how to structure them. Once you've defined, tested, and vetted your taxonomy, you can use these steps to migrate it into your instance of Sanity CMS.
Preparing the Taxonomy
The migration script is designed to work with a .csv
file exported from this Google Sheets template. Download the "Taxonomy Template" tab (or whatever you choose to rename it) as a .csv
and add its location as the sourceFile
.
Taxonomy Preparation Tips
- Use only one term per row. This is modeled in the spreadsheet sample data. The script will throw an error if there is more than one term per row or if there is no term specified for a row that otherwise contains metadata
- Don't repeat terms. All your concept labels must be unique.
- Use Top Concepts or L1 only, but not both. L1s will be nested under the nearest top concept if top concepts are used. Read more about Top Concepts in the Sanity Taxonomy Manager docs.
- Include definitions! Definitions are helpful for your authors and indexers, as well as to any other "agents" (a-hem: 🤖) that may use your terms. Provide examples and scope notes, too, where appropriate. Read more about scope notes and examples in the W3C's SKOS Primer.
Script Settings
Once you have the .csv
saved and the path updated, add a title for the taxonomy scheme and an optional (but recommended) description.
// 🚨 Change these to your values const sourceFile = 'topic-taxonomy.csv' // required; expects format provided by template linked in description const conceptSchemeName = 'Topic Taxonomy' // required const conceptSchemeDescription = 'A taxonomy of topics used in Sanity Studio' // optional const baseIri = 'https://studio.sanity.io/taxonomy/' // required; your studio domain is usually a good choice
You'll also add a baseIri
. This is the default URI (Uniform Resource Identifier) for your concepts and concept schemes. Unique identifiers allow for the clear and unambiguous identification of concepts across namespaces, for example between https://shipparts.com/vocab/Bow
and https://wrappingsupplies.com/vocab/Bow
. The base URI of these concepts is https://shipparts.com/vocab/
and https://wrappingsupplies.com/vocab/
, respectively.
- In most cases, it makes sense for your base URI to be a directory or subdirectory of your website.
- In all cases, the URI you choose should be in a domain that you control.
The baseIri
can be changed later if you need to adjust namespaces after import.
Running the Script
To connect the script with your Sanity Content Lake instance, you'll need to include an .env
file with:
- your Sanity project ID
- the name of your dataset
- an API token with write access
With that in place and dependencies installed, run node taxo-tool.js
to migrate your taxonomy.
![An animation showing the draft taxonomy in a spreadsheet, a script running in a terminal window, and then the taxonomy available in Sanity Studio.](https://cdn.sanity.io/images/7v0qvet6/production/91378ba52ce0d33295379c6c74863f66d9db0819-2080x998.gif)
Add flags to test or remove the entire scheme (and all its concepts):
--test
prints the schema name, the first three concepts, and the total concept count to the terminal. No data is sent to Sanity.--remove
removes the taxonomy as configured in the script from your Studio. Use this if you need to make major changes and would rather start fresh.
The script is idempotent, so if you update your schema and run it again, it will only change concepts that have been modified. Note that concept and scheme IDs are based on a hash of the concept preferred label (or the scheme title), so if you change labels or the scheme title, you will end up with duplicate entries. In this case you may just want to delete the entire scheme and migrate a fresh version.
Next Steps
My hope is that this post helps those of you using Sanity—or another standards-compliant CMS—avoid the embarrassing fate of being stuck with taxonomy shelfware.
The next step, of course, is to start tagging content. In a future post, I'll explore some ways you can (responsibly) use LLMs to help with that. I'll also discuss how your structure and term definitions (you are defining your terms, right? right??? 🧐) can help you create consistent, reliable results using purposefully coordinated, resource-appropriate models.