Sanity Studio Taxonomy Migration

In recent articles, I've discussed using AI to generate taxonomy terms and using a purpose-driven design process for shaping those terms into a taxonomy that meets business and user needs. None of this does any good, however, if your shiny new taxonomy sits moldering in a spreadsheet and never makes it to your production environment. (You would be alarmed by how often this happens. Alarmed.)

In this article I'll share the process I use to migrate a taxonomy from a spreadsheet to the taxonomy management tool in Sanity CMS. I realize, of course, that not everyone's into Sanity. So it goes. While this example will focus on a Sanity implementation, the principles and core elements of the migration approach are not platform dependent. If you need a starting point for migrating taxonomy terms on a different platform, read on—I'll give guidance along the way on how you can adapt this process to other contexts.

Embracing Standards

Whatever tech stack you use, embracing open, shared standards is a cornerstone of creating interoperable, reusable content ecosystems. The interoperability that comes with the use of standards is particularly important to the implementation of a taxonomy for (at least) two reasons.

First, taxonomies are living documents. To remain effective, they must change and evolve—sometimes growing, sometimes shrinking or consolidating to more effectively meet business needs. Embracing a standards-based approach helps you support this inherent change by making updates and evolution predictable and by allowing you to support your taxonomy with different tooling as your feature set, team, and business require.

Second, taxonomies often serve as the "connective tissue" between knowledge systems. Taxonomy is literally the shared vocabulary that allows different systems to mean the same thing when they say a particular thing. When your own taxonomy is based on predictable, shared standards, you'll be more likely to communicate directly with other standards-based applications, services, and vocabularies.

The Migration Script

The script we'll use to migrate terms from spreadsheet to CMS-based management tool relies on adherence to the W3C's Simple Knowledge Organization System (SKOS) standard in both the source spreadsheet and the destination tool. This is just one example of how following standards can make moving between environments smooth and painless (or at least less painful). If I eventually need to move this taxonomy out of Sanity in order to take advantage of the features of a standalone taxonomy management tool, working with a standards-based structure will make that move much less painful in turn.

At a high level, this script:

  1. defines a concept scheme based on script settings
  2. creates concepts by extracting preferred labels from the taxonomy hierarchy, mapping metadata to SKOS keys used by the Taxonomy Manager Plugin, and adding `broader` concept relationships based on hierarchy
  3. adds concepts to concept scheme array
  4. creates a single transaction to import into the designated Sanity content store

The core logical problem that makes this script more than just "content migration" happens in step two. This is where we map the hierarchy from the end-user-friendly spatial layout of the spreadsheet to a set of relationships that represent the formal, standards-based relationships the Taxonomy Manager plugin expects (and enforces).

// 3c. Map concepts to the hierarchy
  if (conceptLevel == 0) {
    // Top Concept
    hierarchy = [currentConceptRef] // reset hierarchy with new top concept
    previous = currentConceptRef // set previous to the current concept _ref for the next iteration
    topConceptSet = true
  } else if (conceptLevel == 1 && topConceptSet == false) {
    // L1 concept with no top concept
    hierarchy = [currentConceptRef] // reset hierarchy with new concept
    previous = currentConceptRef
  } else if (
    conceptLevel - 1 == previous.level || // Child concept
    conceptLevel == previous.level || // Sibling concept
    conceptLevel < previous.level // Ancestor concept
  ) {
    mapChildConcept(
      hierarchy,
      currentConcept,
      broaderDetected,
      conceptLevel,
      currentConceptRef,
      toRemove,
    )
  } else if (conceptLevel > previous.level + 1) {
    // Orphan concept: two or levels lower than previous term
    throw new Error('Inconsistent Hierarchy: Orphan term.')
  }

As this code block in the mapping function moves through each term in the supplied .csv, it tracks which terms immediately precede it and assigns a level of hierarchy based on the relative position of each term. This allows us to detect child, sibling, and ancestor concepts, as well as new instances of Top Concepts or first-level concepts where no Top Concept is present. These relationships are then added as references to the broader array of the term in question.

If you're importing your taxonomy into a different platform, the details of how you connect and upload to that system will change, but the basic structure of extracting the hierarchy and mapping it to a SKOS concept scheme should be adaptable—if not something you can just drop into your implementation (provided you're using standards-compliant tools, of course). Check out the full script on Sanity Exchange for annotations and explanations of the additional functions and processes that support this core structure.

The Workflow

I explored purpose-driven taxonomy design in a recent article. That post provides a high level process for deciding what terms to include in your taxonomy and how to structure them. Once you've defined, tested, and vetted your taxonomy, you can use these steps to migrate it into your instance of Sanity CMS.

Preparing the Taxonomy

The migration script is designed to work with a .csv file exported from this Google Sheets template. Download the "Taxonomy Template" tab (or whatever you choose to rename it) as a .csv and add its location as the sourceFile.

Taxonomy Preparation Tips

Script Settings

Once you have the .csv saved and the path updated, add a title for the taxonomy scheme and an optional (but recommended) description.

// 🚨 Change these to your values

const sourceFile = 'topic-taxonomy.csv' // required; expects format provided by template linked in description
const conceptSchemeName = 'Topic Taxonomy' // required
const conceptSchemeDescription = 'A taxonomy of topics used in Sanity Studio' // optional
const baseIri = 'https://studio.sanity.io/taxonomy/' // required; your studio domain is usually a good choice

You'll also add a baseIri. This is the default URI (Uniform Resource Identifier) for your concepts and concept schemes. Unique identifiers allow for the clear and unambiguous identification of concepts across namespaces, for example between https://shipparts.com/vocab/Bow and https://wrappingsupplies.com/vocab/Bow. The base URI of these concepts is https://shipparts.com/vocab/ and https://wrappingsupplies.com/vocab/, respectively.

The baseIri can be changed later if you need to adjust namespaces after import.

Running the Script

To connect the script with your Sanity Content Lake instance, you'll need to include an .env file with:

With that in place and dependencies installed, run node taxo-tool.js to migrate your taxonomy.

An animation showing the draft taxonomy in a spreadsheet, a script running in a terminal window, and then the taxonomy available in Sanity Studio.
Migrating the taxonomy to the Sanity Taxonomy Manager plugin

Add flags to test or remove the entire scheme (and all its concepts):

The script is idempotent, so if you update your schema and run it again, it will only change concepts that have been modified. Note that concept and scheme IDs are based on a hash of the concept preferred label (or the scheme title), so if you change labels or the scheme title, you will end up with duplicate entries. In this case you may just want to delete the entire scheme and migrate a fresh version.

Next Steps

My hope is that this post helps those of you using Sanity—or another standards-compliant CMS—avoid the embarrassing fate of being stuck with taxonomy shelfware.

The next step, of course, is to start tagging content. In a future post, I'll explore some ways you can (responsibly) use LLMs to help with that. I'll also discuss how your structure and term definitions (you are defining your terms, right? right??? 🧐) can help you create consistent, reliable results using purposefully coordinated, resource-appropriate models.