How I migrated from Posterous to Octopress while keeping page rank

When Posterous announced their pending closure (which takes place tomorrow, Apr. 30, 2013), I immediately began searching for a new blogging platform. I wanted a self-hosted setup on my own domain, so I would never again be subject to another blog service’s site closure. I settled on Octopress (based on Jekyll), due to its hacker-friendly command line nature, integration with Git, ability to publish to Amazon S3, and easily customizable default theme.

Migration from Posterous required three major steps:

  1. Exporting posts from Posterous
  2. Importing posts into Octopress
  3. Setting up redirection from Posterous to my new blog

Follow along below to back up your own Posterous blog.

Export posts from Posterous

This part was quite straightforward. I used the Backup tool in Posterous’s control panel to create a .zip archive of my blog’s data. There’s a delay of a few minutes to hours between requesting a backup and the backup becoming available, so check back periodically. Although Posterous shuts down tomorrow, it’s not too late to request a backup of your old Posterous space. According to the Posterous shutdown announcement, the backup feature will be available until May 31, 2013. If you haven’t done so already, back up your old space ASAP.

Once you have your backup, a bit of postprocessing is needed because Posterous’s generated XML in wordpress_export_1.xml appears to strip extended characters outside of 7-bit ASCII. Even simple things like M- and N-dashes might be replaced with “???”.

1
2
3
cd /path/to/space-[numbers, name, etc.]
cat head.xml posts/*.xml > fixed_export.xml
echo '</channel></rss>' >> fixed_export.xml

Import posts into Octopress or Jekyll

Importing, editing, and fixing links within dozens of posts by hand seemed rather tedious and boring, so I wrote a script in Ruby to do the job for me. This script loads the fixed-up RSS feed generated in the previous step, then generates files under source/_posts. Images within posts are downloaded into post-specific directories, organized hierarchically by date. Encoded versions of videos are downloaded from Posterous (this is likely to stop working on Apr. 30!). Links to other posts on the same blog will be adjusted to point to their new location on the new blog.

To use the script, save it as posterous_import.rb in your Octopress or Jekyll blog’s base directory, then run the script with the path to fixed_export.xml generated in the last step:

1
2
cd /path/to/new/blog
./posterous_import.rb /path/to/space-[numbers, name, etc.]/fixed_export.xml

Complete usage information is included in the script.

Set up redirection from Posterous

Posterous doesn’t explicitly support redirection to a new blog. It does, however, allow one to use a custom domain with a Posterous blog. Once you set up a custom domain with Posterous, all of your old XYZ.posterous.com URLs will redirect to the custom domain, which is expected to point to Posterous’s own servers. We can cleverly exploit this behavior to try to transfer search engine rankings to our new custom blog. Unfortunately there may not be enough time left for Google to crawl your old blog and find the redirections before Posterous shuts down.

Set up permalink redirectors

Since Octopress defaults to permalinks with dates, it’s necessary to redirect the top-level Posterous shortlinks to the correct location on the new blog. I added an option to posterous_import.rb to do just this (--links must be the first argument to the script):

1
2
cd /path/to/new/blog
./posterous_import.rb --links /path/to/space-[numbers, name, etc.]/fixed_export.xml

This will create a directory under source/ for each post in the Posterous backup. Within each of those directories an index.html file will be generated that contains a redirection to the post’s new location.

Redirect feed.xml to atom.xml

I recommend setting up a 301 redirection from /rss.xml to /atom.xml. Octopress generates /atom.xml, while Posterous used /rss.xml. The file formats may not be the same, but most, if not all, feed readers can handle all the major feed formats.

This will allow subscribed RSS readers to find your new blog’s feed (until Posterous stops redirecting tomorrow). I used the S3 management console to create the redirection for my blog; your own hosting solution will have its own method.

Point Posterous at the custom domain

Here’s where the magic happens. Using Posterous’s control panel, I clicked Spaces at the top, clicked the gear icon next to my blog’s entry under Your Spaces, then clicked Space Settings in the popup menu. The first entry on the settings page is “Name Your Space”. Click the big Edit button to the right of your blog’s name and URL. At the bottom of the new page, there is a “Custom Domains” section. Here you can tell Posterous your blog is hosted at the domain of your choosing.

Thankfully, the Posterous engineers decided not to verify that the domain we enter points to Posterous. Instead of entering a domain that points back to Posterous, we’ll enter the domain that points to our new blog. At this point, all requests to XYZ.posterous.com/url will be redirected to our new blog.

Verify it works

If you’re just completing this process now, it’s likely that Google and other search engines won’t crawl the redirections in time. However, since I migrated my blog several weeks ago, I can check to see whether my new blog has replaced the old one in search queries. I’m hoping that GoogleBot caches the 301 results after Posterous shuts down, for the sake of all those links on news sites to my old blog.

My old blog used to rank on the first page of results for some searches related to home automation. Sure enough, when I try these queries now, the new blog shows up instead, and my new blog’s traffic has risen to match the old blog’s levels. At this point the new blog is running smoothly.