Organizing Information for Retrieval

On Table of Contents, Indices, Glossaries, Citations, and Search

This week, I’ve been reviewing a past project’s documentation. It’s a bit bumpy.

Problem with Existing Documentation

I struggle with it for a variety of reasons:

  • I didn’t participate in the generation of the project documentation, so I don’t have insight into how and why they team chose those specific things to document in those specific ways.
  • It is unclear what is published and what was abandoned.
  • There’s minimal wayfinding tools.
  • There’s certainly likely context floating in the ether.

I’m not expecting a complete transcription of all conversations. Instead I’m digging into how we’re documenting or could be documenting projects.

For example, meeting minutes should never be transcriptions but instead a synthesis of what was discussed and the resulting action items.

In other words, I’m observing that this project’s documention (and many others) is unclear on the level of abstraction. To me, a directory structure implies that items at the root should be more general and as you drill into the subfolders they can be more specific. But is that a shared assumption? Have I taken the time to share my assumption?

Enter the Hivemind

As I was workshopping this blog post with a coworker friend, they provided the following insight:

I think there’s something about losing the imperative to organize making the imperative for quality or standards or consistency seem optional as well.

And I think there’s something about the daily flood of email inboxes or social media feeds that make people forget that there is also this function where you write things down in order for it to become formal and lasting. There’s a transience problem as well.

Both of us have been reading and referencing Cal Newport 🔍’s A World without Email 🔍; in particular the phrase hyperactive hive mind 🔍.

And Sarah’s right, we don’t separate what we want to be formal and lasting with what we consider transient. Who deletes their email when you can archive it?

And I think the hyperactive hive mind has played a role in the disorganization of information.

Why? Because the hive mind’s favored approach to retrieving information is to ask the hive mind.

Someone in the hive mind has tucked away an email thread that can elucidate the answer to the question. Another person remembers writing something somewhere and can track it down for you. And if you’re lucky, someone in that hive mind keeps a personal wiki and can recall anything that they’ve stuck in there.

I digress.

Information Retrieval Tools

I want to circle back to my specific problem with the project’s documentation, and filter it through a lens of information retrieval. Note, I don’t have a library science degree, but I do have a lot of experience information management.

I think of five information retrieval tools that I use a lot:

  1. Table of Contents
  2. Index
  3. Glossary
  4. Citations
  5. Search

Table of Contents

At the front of most reference books, you’ll find a table of contents 🔍. Most every Role Playing Game (RPG 🔍) of more than a handful of pages has a table of contents. Digital documents also provide mechanisms for exposing a table of contents.

In a way, the tree command in Unix can generate a table of contents for a directory. This is what I used in . A directory’s table of contents does assume descriptive filenames.

A useful table of contents requires thoughtful construction—and management of—a directory or document’s hierarchy. That thoughtfulness comes by creating a process that gives consideration. For example, the “Publish” activity in .

Index

At the back of most reference books, you’ll find an index 🔍. Words identified as important are added to the index, with references to pages where that indexed word or phrase is deemed important; who identifies the words? who deems the importance of a page relative to that word? That depends. A well-groomed index is a functional work of beauty.

Taking the time to craft an index demonstrates that someone’s attending to the corpus and gleaning insight into the relevant terms of that corpus. A reader referencing the index can see the terms that the indexer found important and relevant.

Glossary

You might find a glossary 🔍 in a reference book. Where an index entry for a term might point you to the page that contextualizes the term, the glossary provides the definition. Depending on the length of the definition, you might overload an index with glossary type behavior.

For a project a glossary is an invaluable tool to help foster an understanding of the scope of corpus. By explaining the terms in the context of the project, you’re creating a more inclusive project. Maybe a term the project uses is different from the usage to which I’m accustomed, the glossary helps highlight that difference.

In fact the presence of a glossary entry does not have an expected definition can lead to clarifying conversations and a deeper understanding.

I’ve also used glossaries to help me better craft search queries. By scanning the glossary, I can key in on a few terms and the language used around those terms to refine my search.

Citations

I include citation 🔍 as another means of information retrieval. When I’m writing documentation for a project, I’ll cite sources outside the project. When I do that, I’m often saying something about this bit of information is supported by another source.

The citation gives the reader an opportunity to look to materials that the citing author deemed as noteworthy for contributing to that particular moment in the document. In addition, in looking at a corpus’s citations, you can a sense of the variety of sources they used to inform the project.

This is one reason why I include an page; I don’t expect that to be overly useful but it might be useful to see who I take the time to cite. Note to self, I just looked at that page, and maybe I should bring some domain level organization to it.

This is the brute force method of information retrieval. At it’s most basic, you enter a word or phrase and find all instances. If there’s a sizeable result set, you’ll need to dig deeper into the context.

Depending on the tools, you can refine your search. You can often specify the type of file, creation or modification date ranges, or other facets. The effectiveness of your search benefits from an understanding of the corpus; A well-groomed table of contents and index provides an excellent mechanism for understanding that corpus.

This is one reason I provide a which includes a , an index of and . It’s also why I added the subtle navigation links at the top of each post. All of those tools provide different means of orienting to the corpus of Take on Rules.

Realization

All of this is a lead up to the paragraph that sprung into my mind before I started writing this post:

Organizing information for groups has the same considerations as organizing information across time and space. The almighty search sought to shatter the need for organization, but along the way, it lost itself to the desirous ways of surveillance capitalism.

I know a bit about Search Engine Optimization (SEO 🔍), and there’s always a conversation about content organization and keywords. At the same time, search engine providers must navigate the hostile waters of people trying to game the system that the providers put in place.

In the case of a project, with a narrow corpus, you may not want to rely on the same search engine providers tooling; after all the goals of their service may not be congruent with how you went about crafting/dumping your information.

In other words, you may not want to fully cede information retrieval to a general purpose search engine. So you might want to consider how you’re writing your documentation.

tl;dr - Craft your documentation to help people orient to the domain of that which you documented. To be effective this takes time and discipline. And search should be your method of last resort.

And if you’re reading this, I am smirking. I have the words indexes and indices in the same document. My college professor would be proud that I continue to pluralize index as indices. But in the era of search enginies the verb “to index” emerged. And indexes sounds right for that case. Cheers Merritt!