Next steps for Walters Art Museum data

Today I attempted to refactor the Walters Art Museum provider APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. script (see this GitHub issue). While working on this refactor, I noticed that I could neither use the testing sandbox provided by the API nor create a user account to receive an API key. We have tried reaching out a number of times over the past year to ask for the CC Search API key to no avail.

As it stands, we have no way of confirming that the API could be accessible once this DAG is turned on. We only have 16,948 records in the catalog/API (confirmed in both places). The last update to the API codebase was made on August 7th, 2015, and the last update to any of our data was December 1st, 2020. The media that our data references still exists AFAICT.

Given all this context, I propose that we:

  1. Create a one-off script to populate height, width, filesize, and filetype (see the filesize/filtype and height/width backfill GitHubGitHub GitHub is a website that offers online implementation of git repositories that can easily be shared, copied and modified by other developers. Public repositories are free to host, private repositories require a paid subscription. GitHub introduced the concept of the ‘pull request’ where code changes done in branches by contributors can be reviewed and discussed before being merged be the repository owner. https://github.com/ issues). This can likely be done without an API key using the direct image URLs we have in our database.
  2. Move the Walters provider script into the Retired DAGs directory and decommission the DAG.

It does not seem likely that API will become accessible to us again in the near future. The backfills described above would at least allow us to have the minimum data we’d like to have now as part of our ongoing data normalization effort and allow us to continue to serve the data we have in the API.

What do y’all think?

#data-normalization, #provider