Shepherd: Automating Cross-repo Code Changes

August 22nd 2018

At NerdWallet, we've fully embraced a modern microservice architecture. Our website is made up of dozens of React apps that pull content and data from a variety of internal services. This way of working has clear benefits: teams can work independently, iterate quickly, and deploy frequently with fault isolation. When working with so many codebases, it’s desirable to extract common components into reusable packages: we have shared libraries that provide server-side rendering, data fetching, caching, React components, and more. This is great for design consistency and reducing the amount of boilerplate that needs to go into a new app. However, it can make updating shared libraries difficult:

  • The person updating a library must communicate the change to consumers of the library

  • The consumer must understand how to make the change and subsequently update their code

  • The consumer must test, merge, and deploy those changes

Even for relatively minor changes, this process could easily stretch into hundreds of man-hours across all of our teams. This exact problem was called out in a recent and widely-shared blog post from Segment in which they discussed why they moved back to a monolith:

"Testing and deploying changes to these shared libraries impacted all of our destinations. It began to require considerable time and effort to maintain. Making changes to improve our libraries, knowing we’d have to test and deploy dozens of services, was a risky proposition."

The time needed to update apps when shared code changes is time that could be better spent innovating on our product. As an engineer, my natural inclination is to automate everything that can possibly be automated.

Facebook's jscodeshift is a great example of a tool that automates code changes within a codebase. A person can write a "codemod" that makes changes at the file level, and then jscodeshift will apply that codemod to all applicable files in a codebase. However, this still leaves a person with a lot of manual work to do: they have to identify which repositories to change, clone them to their machine, make the changes, commit and push the changes, and open a pull request for each one. For a change that needs to take place across hundreds of repositories, this still isn't a huge gain in efficiency.

The process of discovering which code needs to be changed, making the changes, and submitting the changes back to teams for review doesn't have to be manual; it's a perfect candidate for automation as well. Enter Shepherd.

Shepherd

Shepherd is an open-source CLI tool developed at NerdWallet that coordinates the application of code changes across all of our repositories, from checking them out, to making the changes, to submitting pull requests. Migrations can be performed with any language or tool you like, and Shepherd is agnostic to languages used in repositories. It's also agnostic to the type of version control system being used: it ships with support for GitHub, but all interactions with repositories have been abstracted into a generic adapter, so it would be easy to add support for additional services like GitLab or Bitbucket.

Migrations are written declaratively in a YAML file. Individual steps are written as shell commands, meaning you can use your favorite Unix tools when they're sufficient, but also call more complex scripts with node or python as needed. You can use arbitrary code to decide which repositories need migrations and to generate potentially complex pull request messages.

A simple example

ESLint has deprecated extensionless .eslintrc config files in favor of explicit extensions like .eslintrc.yml or .eslintrc.json. However, all of our JavaScript apps were created from a template that used a .eslintrc file. To keep our apps up to date, we wanted to rename those files to .eslintrc.yml. This is an easy enough change for one repo, but manually making that change across 80+ relevant repositories would take an inordinate amount of time for any one engineer. Thankfully, with Shepherd, it's easy to automate the entire process. Here's a simplified version of the migration spec for this process:


id: 2018.07.16-eslintrc-yml
title: Rename all .eslintrc files to .eslintrc.yml
adapter:
  type: github
  search_query: org:NerdWallet path:/ filename:.eslintrc
hooks:
  should_migrate:
    - ls .eslintrc
    - git log -1 --format=%cd | grep 2018 --silent
  apply: mv .eslintrc .eslintrc.yml
  pr_message: echo "Hey! This PR renames `.eslintrc` to `.eslintrc.yml`"

Let's walk through this example. id specifies a unique identifier for this migration that's used internally by Shepherd to track state, as well as the branch name. title is used to build a commit message and pull request title. adapter specifies what version control adapter should be used, as well as options for that adapter. This example is using the github adapter, and it's using GitHub's code search qualifiers to find repositories that have a .eslintrc file in the root. Any repository that contains a file matching this query will be considered a candidate for this migration.

The hooks section allows you to define "lifecycle" hooks that Shepherd will call throughout the process of applying a migration. The should_migrate hooks let you perform additional filtering of repositories after they are checked out. In this case, we have a sanity check that .eslintrc actually exists, and we also check that this repository's most recent commit was in 2018. We do this to avoid creating noise on old repositories that might not be maintained or in use anymore. If any of those commands exit with a non-zero exit code, that check is considered failed and the repository will not have the migration applied to it.

The apply hook specifies the commands to call to actually apply a migration to each repository. Here, we're simply renaming a file with mv. However, this step could be considerably more complex, as we'll touch on later.

Finally, the pr_message hooks allows you to dynamically generate the message that will be used for each pull request. In this case, the message is a simple static string, but in more complex migrations you could build a message that lists specific things that might need human attention, such as specific dependencies that might need a major version bump. This hook will take anything written to standard out by each step and concatenate the output into a pull request message.

That's it! Now we just need five simple commands to go from checking out relevant repositories to submitting a pull request. Assuming your shepherd.yml spec is in a directory called eslintrc-migration, this is what you'd run:


shepherd checkout ./eslint-migration
shepherd apply ./eslint-migration
shepherd commit ./eslint-migration
shepherd push ./eslint-migration
shepherd pr ./eslint-migration

Your repos will now have pull requests waiting for review.

A complex use case: upgrading to React 16

The first major use case we had for Shepherd was during our process of updating apps to React 16. We want our developers to be able to take advantage of the latest and greatest React features. However, we have a lot of React apps and components, and manually updating them all to be React 16 compatible was a daunting task. We identified as a task that could be easy to automate at least in part: things like updating dependencies and running some of the React codemods are easy to do automatically. There are some things that couldn't be fully automated. For instance, some of our internal React components required breaking changes for React 16 compatibility, and we couldn't always update the usages of those components automatically. However, we were able to identify dependencies that needed a major version bump and specifically call those out in the pull request. This made it easier for the human that ultimately had to review and test the pull request.

Here's most of the migration spec that we used to build this migration:


id: 2018.07.20-react-16
title: Update all packages to be React 16 compatible
adapter:
  type: github
  search_query: org:NerdWallet path:/ filename:package.json react
hooks:
  should_migrate:
    - node $SHEPHERD_MIGRATION_DIR/should_migrate.js
    - git log -1 --format=%cd | grep 2018 --silent # Only migrate things that have seen commits in 2018
  apply:
    # Run nw-react tool
    - node $SHEPHERD_MIGRATION_DIR/apply.js
    # Add prop-types dependency
    - npm install prop-types
    # Update test-utils usage
    - npm uninstall react-addons-test-utils
    - git grep -l react-addons-test-utils -- ':!package.json' ':!package-lock.json' | xargs sed -i '' 's/react-addons-test-utils/react-dom\/test-utils/g'
    # Update to @nerdwallet/react-router if on react-router@1
    - $SHEPHERD_MIGRATION_DIR/update_react_router.sh
    # Update to react-transition-group
    - $SHEPHERD_MIGRATION_DIR/update_transition_group.sh
    # Regenerate package-lock
    - rm -rf node_modules
    - rm package-lock.json
    - npm install
  pr_message: $SHEPHERD_MIGRATION_DIR/node_modules/.bin/nw-react react@16 --pr

In our previous example, all the migration logic was located in the migration spec itself. This React 16 example shows how it's possible to use other tools (CLIs from npm, Node scripts, and shell scripts) to build complex logic. You'll notice that some of the apply command contain $SHEPHERD_MIGRATION_DIR. By default, the working directory for each command is set to the root of the repository being worked on. This makes it easy to use Unix utilities in each repository, but it means that we'll need to know the absolute path to auxiliary scripts that might live alongside your shepherd.yml spec. Shepherd exposes the absolute path of the directory containing your shepherd.yml file via the $SHEPHERD_MIGRATION_DIR environment variable, allowing you to run auxiliary scripts.

This is also a good example of being able to programmatically generate useful pull request messages. By looking at our dependencies and the dependencies of those dependencies, we were able to figure out which packages required major version bumps and which ones we couldn't update automatically. We then created todo lists that called out those packages, which the reviewer could then use to keep track of their progress.

This example and the supporting scripts that are referenced from it is visible on the Shepherd GitHub repository. While it contains a fair amount of NerdWallet-specific code, including references to a private package called nw-react that holds some internal tools, it should help illustrate the power of Shepherd.

Conclusion

When I first started working on Shepherd this summer, I didn’t fully realize the impact that this kind of tool could have on the way that work is done, and neither did anyone else on my team. It took a few months of building this tool and using it within NerdWallet to really internalize how much of an impact it could have. Now, it’s not uncommon to overhear people asking “Could we automate this with Shepherd?” around the office. Hopefully, this blog post will get you to start asking that same question.

Shepherd is of course not a one-size-fits-all solution. If you don’t have very many places where code is used, it might be more trouble than it’s worth to write an automated change. There are also certain classes of problems that are naturally more suited to automation than others. For example, Shepherd would be poorly suited to execute complex refactors. It's also worth noting that monorepos can also help mitigate the problems that make this kind of tool necessary in the first place. However, there are also lots of problems that are suited for automation, and Shepherd can help you address them while maintaining the benefits of a microservice architecture.

Even though we’ve been using it successfully at NerdWallet for a few months now, Shepherd still has a lot of room to grow! For instance, it’s still somewhat tedious to share migrations - we think it would be cool to be able to give Shepherd an npm package or a link to a gist and have it automatically fetch the migration. Shepherd is also relatively slow, as changes are executed sequentially across every repository. It would be awesome to be able to parallelize multiple repos on one machine, or even across multiple machines. Check out some of our GitHub issues to see some of the other ideas we have in mind.

If you're interested in learning more of the specifics of how Shepherd works or want to walk through a more complete example, check out the tutorial on GitHub. If you have any question or comments, feel free to reach out via the comments or a GitHub issue!

Want to spend your time innovating on hard problems and not updating your utility libraries? Check out the engineering team here at NerdWallet.