Migrating a large Drupal 7 site to headless Drupal 8 and Gatsby

By gatsby, 1 July, 2021

In late 2020 we set out to migrate a large, traditional Drupal 7 site to a Gatsby front-end fed by a Drupal 8 back-end. We ran into a few roadblocks on the way but the community has been an extraordinary help and there's been a lot of interest and work put into getting the Drupal + Gatsby integration rock solid. Here's a few of the key things we learned along the way:

1. Content migration

One of the first steps in any Drupal 7 to Drupal 8 upgrade is setting up the content migration paths. Luckily in Drupal 8 this has been made very easy by way of the Migrate API. You can define content mappings in YAML files, or even have Drupal 8 migrate content directly from Drupal 7's database via a SQL connection. In our case we decided to go with a more hands on migration path, defining the data mappings ourselves, because of a substantial change to our data model between the D7 and D8 site. That being said, anyone still working on the migration process should consider the out of the box D7 -> D8 mapping whenever possible.

2. Sourcing content

Now that our content was in the new Drupal 8 site we needed to decide how we wanted to source our data from Drupal. The choice largely depends on how the Gatsby site and Drupal interact but for most people choosing either GraphQL or JSON:API would be best. There's pros and cons to both APIs.

GraphQL

GraphQL is a good choice if your site isn't too big (you are pulling less than 10k nodes) and needs to push data back to Drupal.

  • ✅ you have complete control over what data is requested potentially reducing the time spent sourcing data during builds
  • ✅ your schema matches the API on the Drupal side allowing for clean write operations back to Drupal through GraphQL mutations
  • ✅ you get a great DevX, Gatsby loves GraphQL
  • ❌ hard to cache which can make sourcing a large number of nodes take a while
  • ❌ not a core module so there's a bit less support on the Drupal side
  • ❌ not integrated with the official Gatsby source plugin for Drupal

JSON:API

JSON:API is the out of the box Drupal API for exposing content. It is flexible, fast and reliable. Its addition to Drupal core in 8.7 solidified it as the go to API for accessing data.

  • ✅ highly cacheable which improves source times (and there are proposed changes coming to make caching even better)
  • ✅ easy to configure, just enable the core module and install the Gatsby plugin
  • ❌ not as easy as GQL to setup write operations from Gatsby side (eg. POST, PATCH)
  • ❌ Gatsby schema is not shared with Drupal causing redundancies in data normalization

We chose JSON:API because it had better community support and was more efficient at sourcing a site of our size. The rest of this article will assume you are using JSON:API but will still be relevant to a site sourcing via GraphQL.

3. Performance

Remember when I said JSON:API is fast and cacheable? While that is true, depending on the way you setup Gatsby or Drupal, you might not always get all the benefits of caching JSON:API has to offer. In our case, we started out by only allowing authenticated users with explicit permissions to access content on our Drupal site. We made gatsby-source-drupal fetch content as an authenticated user. The benefit of this was that we could have tight control on what content was available on our Drupal site to anonymous users.

Authenticated user fetching has only one issue, it side-steps all caching. Drupal (and its partner in crime Varnish) will only effectively cache content for anonymous users (with the exception being the Dynamic Page Cache). What this means in practise is that fetching a single listing page (eg. /jsonapi/node/article) might take 50ms when you don't include authentication but when you do you could be waiting up to 1000ms or more for a response. This translates to a 20x increase or decrease in the time gatsby-source-drupal spends sourcing content from your Drupal site.

To make matters worse, Drupal (and by extension Varnish) will dump all cached responses for any page matching /jsonapi/node any time a node is updated. This means that if a single change is made to any node on the site your content source times can be 20x slower. This patch makes the cache purge bundle specific such that a change to an Article node, for example, won't impact the cache for a Blog node.

With this information in mind, we switched back to anonymous user fetching and found workarounds for content that we needed to access as an authenticated user. We are working on a patch to the way gatsby-source-drupal downloads content to make it avoid this issue but until that patch is merged it's important to be wary of bypassing the cache when you might not intend to.

4. Previews and drafting

Another sticking point for us when initially considering the move was drafting and previewing content before publishing to the live site. We knew there had been some work done to get the gatsby-source-drupal plugin to work with Gatsby Preview but due to the sheer size of our site we had issues implementing it. Several content changes from multiple editors happening at the same time on the Drupal side meant that the preview environment was constantly rebuilding.

We decided to roll our own previewing setup which we think works pretty well for the time being but we'd like to see the gatsby-source-drupal support of Gatsby Previews improve to the point where we can switch back. Our solution uses the same page templates that render our site but source and transform the data client-side by leveraging JSON:API. We make authenticated requests on the content editors behalf to fetch the latest revision data in Drupal required for a particular template and then feed that data into the page template. With this method we were able to improve upon the experience with Gatsby Preview (GP):

  • save to preview time went from approximately 2 minutes to less than 5 seconds
  • concurrent editing doesn't affect previews whereas each edit on GP will trigger a rebuild of the preview environment
  • easy to preview any translation whereas gatsby-source-drupal still needs work to support GP of translations

It's important to note that whilst it made more sense for us to run our own previews, Gatsby Preview with gatsby-source-drupal is going to make sense for most people. If your site: is roughly 1k nodes or less, is usually only being edited by one person and previewing translations isn't required then it will be much faster to setup Gatsby Preview than your own preview system. We are actively working to support Gatsby Preview with gatsby-source-drupal and it will eventually reach a point where it will be powerful enough to make previewing a fast and reliable experience for sites of all sizes.

5. Multilingual

After getting a basic proof of concept up and running it was time to start thinking about translation. When we started migrating, the out of the box gatsby-source-drupal module didn't support "Content Translation" (the core Drupal module for translations) which meant we weren't able to source content written in anything but our Drupal site's default language (in our case English).

However, through collaboration with the Gatsby team and the Drupal community we were able to get a PR merged which enabled this functionality. Now fetching any translation of your content is as easy as updating your gatsby-config.js file to tell the plugin which languages you want to use.

6. Images

Deciding the best manner to integrate our client's vast library of images was the final piece of the puzzle. At first we tried the standard gatsby-plugin-image setup, whereby gatsby-source-drupal downloads all the images from Drupal and transforms them for you during your build. This approach worked great and made it easy for our team to get started whilst we were working on a subset of the site's full dataset. However, once we started trying full builds it was clear we needed something else. Downloading upwards of 20,000 images for our local development environments was not going to make sense.

So we turned to Cloudinary, a media transformation API and CDN. Using Cloudinary meant we could enable the skipFileDownloads flag in gatsby-source-drupal and then leverage their remote image fetching service to source and transform the images for us the first time they are requested. Now our team can view all the images locally without having to wait 30 minutes to download them from Drupal. It's the best of both worlds.

It's worth noting that although we still recommend Cloudinary on larger sites with lots of images for the time being, we're spearheading some development on a module which would allow the image transformation process to stay in Drupal, and integrate cleanly with gatsby-plugin-image. gatbsy-plugin-image has a lot to offer so just using out of the box Cloudinary with a standard img tag is leaving a lot to be desired.


Hopefully you found some of these points useful for your own migration process. If you have any questions feel free to join the #gatsby channel in the Drupal Slack and hit me up @David Disch.

Author