This article is my first read through of the code changes we added to a Hyku implementation for the University of Louisville. My task is to get an understanding of what it might take to extract a general solution.
The goal of the solution is to allow Hyku implementations to have Uniform Resource Locators (URLs 📖) that are “human-readable”. In most implementations the URLs are derived from the object’s opaque identifier.
Many folks want the URL to look like https://my.hyku/collections/the-jeremy-friesen-collection
instead of https://my.hyku/collections/dr23zy19b
(or whatever opaque configuration you choose with your Hyrax minter configuration.)
Secondary goals are around implementation and ownership costs. What effort does it take to add to your application? Both data migrations and code changes. Also, what effort does it take to upgrade your application once you have this customization in place.
What follows is my quick read-through and how I did my quick read-through.
Checking the Repository
Based on conversations and reviews with the team, the foundational commit for preparing for extracting the functionality was in Secure Hash Algorithm (SHA 📖) 6f2e01f4eca1a5cf68ccd29c60f56740ace88a3b
.
It follows that any commits since that point in time may also be relevant for extraction.
In the local repository, I pulled down the latest main
branch.
Then ran the following:
git log 6f2e01f4~.. --reverse --no-merges --pretty=format: \
'| [[https://gitlab.com/notch8/louisville-hyku/-/commit/%H][%h]] | %s |' | \
pbcopy
Let’s break that down:
git log
- The Git 📖 command to run.
6f2e01f4~..
- All commits since and including
6f2e01f4
; the short form of the longer aforemenntions SHA. --reverse
- I want the “top” of the list to be the foundational commit.
--no-merges
- Skip showing merge commits; these are the commits that merged the relevant commit into main.
--pretty=format: '| [[https://gitlab.com/notch8/louisville-hyku/-/commit/%H][%h]] | %s |' |
- Render the log in specified format (e.g.
| link to commit | subject |
). Why this format? Because I’m going to copy it into an Org-Mode table format. The pipe character (e.g.|
) is the table cell demarcation, and the[[url][label]]
is to create a web-link. pbcopy
- Notice this last pipe character is not within the apostrophe (e.g.
'
); I’m “piping” the output of the log to my paste buffer using thepbcopy
command.
The results are the body of the Table 241: Commits that might be related to the preliminary slug extraction.
SHA | Subject |
---|---|
6f2e01f4 | Add custom slugs for works and collections |
8e9b56d1 | Remove duplicated form field |
639c8d2b | i149 - fix timing issue that cause child ocr search bug |
0717aebf | 150 - manifest bug prevented uv from displaying when work had no identifier |
50aea645 | i154 - fix nested collections |
6f0dff81 | i153 - fix collection thumbnail dropdown options |
c9500c56 | i151 - item count on collections |
faca33ce | i152 - Collection links would lead user to unauthorize page |
42d323e3 | I155 - allow users to still use ids in url path |
1f0ec86a | update-deploy |
91abe8f0 | 151 - revert changes because alternative introduced bug |
03169413 | 153 - collection thumbnail bug |
94d8df8c | Adding SLUG documentation |
Another way that I’m “orienting” to the code is to see what all files we introduced in the foundational 6f2e01f4
commit. I ran the following to look at the files: git show --stat 6f2e01f4
. The output does not show me the specific change simply what files changed.
By default each file includes metadata on the size of change. For purposes of reviewing, I don’t need that.
Orienting to the Foundational Commit
I want to understand what the initial change was; it’s scope and then work from there.
Below are the files touched as part of the 6f2e01f4
commit.
We have 47 changed files to consider:
- Gemfile
- Gemfile.lock
- app/actors/hyrax/actors/create_with_files_actor.rb
- app/indexers/app_indexer.rb
- app/indexers/collection_indexer.rb
- app/models/art.rb
- app/models/collection.rb
- app/models/custom_slugs/custom_slugs_decorator.rb
- app/models/custom_slugs/custom_slugs_term.rb
- app/models/custom_slugs/manipulations.rb
- app/models/custom_slugs/slug_README.md
- app/models/custom_slugs/slug_behavior.rb
- app/models/custom_slugs/slug_indexer.rb
- app/models/custom_slugs/slug_solr_attributes.rb
- app/models/generic_work.rb
- app/models/image.rb
- app/models/solr_document.rb
- app/models/text.rb
- app/presenters/hyrax/iiif_manifest_presenter.rb
- app/presenters/hyrax/presents_attributes_decorator.rb
- app/services/hyrax/manifest_builder_service_decorator.rb
- app/views/hyrax/base/_form_share.html.erb
- spec/factories/art.rb
- spec/factories/collections.rb
- spec/factories/generic_works.rb
- spec/factories/images.rb
- spec/factories/text.rb
- spec/features/oai_pmh_spec.rb
- spec/features/show_page_theme_spec.rb
- spec/forms/hyrax/art_form_spec.rb
- spec/forms/hyrax/forms/collection_form_spec.rb
- spec/forms/hyrax/generic_work_form_spec.rb
- spec/forms/hyrax/image_form_spec.rb
- spec/forms/hyrax/text_form_spec.rb
- spec/indexers/app_indexer_spec.rb
- spec/indexers/collection_indexer_spec.rb
- spec/models/art_spec.rb
- spec/models/collection_spec.rb
- spec/models/image_spec.rb
- spec/models/text_spec.rb
- spec/presenters/hyku/work_show_presenter_spec.rb
- spec/requests/institution_visibility_spec.rb
- spec/routing/curation_concerns_url_spec.rb
- spec/support/custom_slugs/form_includes_custom_slugs.rb
- spec/support/custom_slugs/form_requires_custom_slugs.rb
- spec/support/custom_slugs/indexes_custom_slugs.rb
- spec/support/custom_slugs/object_includes_custom_slugs.rb
Let’s dig into the changes from 6f2e01f4
; I’m using git show 6f2e01f4
to show me the diff of what changed as part of this commit.
I’m not going to print out the results as that’s many hundreds of lines.
To get a handle on this I want to look at the second level categories (e.g. spec/support
, spec/models
, app/models
, app/presenters
, etc.). I’m looking for the patterns of change. For this work, I can check the sub-directories git show 6f2e01f4 -- app/models
.
To find the categories I run the following command: git show --stat 6f2e01f4 | rg "^ \w+/\w+" --only-matching | uniq
.
git show --stat 6f2e01f4
- Lists touched files for this commit.
rg "^ \w+/\w+" --only-matching
- From the above list, generate a list of each of those files top-level directory and immediate sub-directory.
uniq
- Make a unique list from the above directory.
Below is the list to consider and further exploration:
- app/actors
- This change looks to be un-related to the specifics of slug work; it instead solved an issue with the ActorStack processing.
- app/indexers
- We add
include CustomSlugs::SlugIndexer
into two application specific indexers. Something to note and document. - app/models
- We add some new modules and include them in the models. This is a significant chunk of the modifications for slugs.
- app/presenters
- These changes have two purposes. First, casting to slug or ids for
Hyrax::ManifestPresenter#member_ids
. Second, some changes that may not be related to the extraction. Also, it’s unclear if theapp/presenters/hyrax/iiif_manifest_presenter.rb
is copied in and overridden or is generated or is a new class for this project. - app/views
- This looks to be a bug fix for an upstream Hyrax issue; Something to consider.
- spec/factories
- Adjust how are referencing creating, via FactoryBot, a slug (via the
identifier
property). - spec/features
- Changes to the spec to move from visit
work.id
to visitwork.to_param
; this is likely a good upstream contribution. - spec/forms
- Adding some specs that leverage the shared examples
"custom_slugs"
and"require_slugs"
something to look forward to inspec/support
. - spec/indexers
- Adding some specs that leverage the shared examples
"indexes_custom_slugs"
. - spec/models
- Adding some specs that leverage the shared examples
"object includes slugs"
. - spec/presenters
- Disabling specs that are very specific to how identifiers were generated in
spec/factories
. The original specs are rather dubious. Something to change upstream. - spec/requests
- Changes to the spec to move from visit
work.id
to visitwork.to_param
; this is likely a good upstream contribution. - spec/routing
- Adding specs demontrating that we can route either to a work by it’s Fedora ID or by it’s slug.
- spec/support
- The shared examples that appear ripe for consideration.
This does omit the Gemfile
and Gemfile.lock
.
Moving from the Foundational Commit Through History
With 6f2e01f4
quickly scanned, I want to see how the other commits relate.
- 6f2e01f4 Add custom slugs for works and collections
- [Include with mods] Our foundational work; as mentioned above
- 8e9b56d1 Remove duplicated form field
- [Include] This commit only modifies
app/models/custom_slugs/custom_slugs_decorator.rb
, which is a foundational file. - 639c8d2b i149 - fix timing issue that cause child ocr search bug
- [Skip] This does not appear related to slug work.
- 0717aebf 150 - manifest bug prevented uv from displaying when work had no identifier
- [Include] This commit only modifies
app/models/custom_slugs/custom_slugs_decorator.rb
, which is a foundational file. - 50aea645 i154 - fix nested collections
- [Include] This commit only modifies
app/models/custom_slugs/custom_slugs_decorator.rb
, which is a foundational file. - 6f0dff81 i153 - fix collection thumbnail dropdown options
- [Include] This commit only modifies
app/models/custom_slugs/custom_slugs_decorator.rb
, which is a foundational file. - c9500c56 i151 - item count on collections
- [Skip] On the surface this looks like something to include. But
91abe8f0
later reverts this change. - faca33ce i152 - Collection links would lead user to unauthorize page
- [Include?] This looks to be an amalgamation of both slug work and specific work for the client.
- 42d323e3 I155 - allow users to still use ids in url path
- [Include with mod?] This commit ensures we cast the given “ID”; there are some high-level changes that we might be able to instead handle lower in a not-often-leveraged configuration spot. The feature is necessary, but I might want to revisit the general implementation if we’re extracting code.
- 1f0ec86a update-deploy
- [Skip] not relevant for slugs.
- 91abe8f0 151 - revert changes because alternative introduced bug
- [Include with mod?] this demonstrated required places to change. My question is, should those overwritten methods rely on two different methods (e.g.
id
andto_param
); further work is necessary to minimize directo overrides. - 03169413 153 - collection thumbnail bug
- [Include] this demonstrates functionality necessary to fix a bug.
- 94d8df8c Adding SLUG documentation
- [Skip with a caveat] The over-arching README is something to consider for the gem.
In quickly reading through the code most of the commits are relevant for extraction. There is some work to consider for how best to extract and minimize overrides.
Conclusion
This is my first pass at reading through the code changes. That’s not quite true. I did help pair for quite a bit of this work. The purpose of the write-up is two-fold: to orient to the changes and write down my process. From this analysis, I’ll start looking towards specific implementation details.
Throughout this work, I leveraged Git and rg, looking only at the changes via the command-line.
Next Steps
I’m going to let my mind mull on this overnight. I’m going to “prime” the wandering with a bit more pondering.
I want to get all of the relevant code into a single commit; mostly so I can see the overlay of changes via diffing tools. To do this I
would create a branch from main
, fire up an interactive Git rebase
session starting from the foundational commit 6f2e01f4
.
I’d drop the commits I marked as [Skip]
and squash together everything relevant. Then I’d look at what else I might remove from the one commit, make the changes and amend
my commit.
Then I’ll need to look at what we have and articulate a game plan for things that should go into Hyku, Hyrax, or a new gem (Hyrax::LittleSlugger
perhaps).
The above outlined tasks seem like a great exercise for tomorrow morning.