Watch on YouTube: youtu.be/ScBoaQw84Ks


Overview

This is based off of Migrate from Blogger to Jekyll from many moons ago.

You're going to

  • export your BlogSpot blog (posts and comments) to xml
  • translate in from xml to html for static bloggers
  • set up redirects to maintain trafic and SEO

It used to be that in order to redirect without losing your google page rank (SEO), you had to use a custom domain so that the redirect would be on the same domain. Supposedly that's not true anymore but, whatever, we'll play on the safe side. Plus it means more is being handled by you and your servers rather than by Google anyway.

To be clear, the full migration path involves 3 redirects:

  1. The custom domain redirect which is handled by HTTP 301 + Location (done by blogspot automatically)

    oldblog.blogger.com
        -> oldblog.mysite.com
    
  2. The template variable redirect handled by a **<meta> refresh** + canonical <link> in <head> (inserted by you into a classic template)

    oldblog.mysite.com/post-url
        -> newblog.mysite.com/redirect?blogger=http://oldblog.mysite.com/post-url
    
  3. Your server redirect handled by HTTP 301 + Location (via nginx, NodeJS, ruby, python, etc)

    newblog.mysite.com/redirect?blogger=http://oldblog.mysite.com/post-url
        -> newblog.mysite.com/post-url
    

Remember: once you embark on this journey, you MUST keep the redirects in place for several months (or as long as you can).

Most users will probably never update their bookmarks, but most users probably use search terms to find your site again more than they use bookmarks to find your site again.

You see, there was some talk about how non-HTTP cross-domain redirects could be malicious and even talk about future browsers ignoring cross-domain meta refreshes in HTML. I think it's all bunk and we'll be fine.

Setting up a custom domain

If you don't already have a domain, I'd recommend purchasing from name.com or gandi.net

  1. Login at Blogger.com (you should end up at http://blogger.com/home)
  2. Click on the title of the blog you wish to migrate
  3. At the bottom of the bar on the left, click settings
  4. Under Publishing select Blog Address and Add a custom domain
  5. You will need to add 2 CNAME records through your domain registrar. (In the screencast I show how this is done with http://name.com, but it will vary from site to site)

Backup your blog in full in both "upgraded" and "classic" formats

Back on your blog's settings page we'll download all posts, comments, settings, etc as they currently stand (for your safety).

  1. Click Settings -> Other
  2. At the top under Blog tools you'll see export blog, click it
  3. Rename the file from blog-dd-mm-yyyy.xml to something like blog-dd-mm-yyyy.current.xml

The "classic" template is what I actually tested with and I know it works. With the "upgraded" template style, it's not clear to me how to use the variables we'll need later on.

  1. Click Template on the left bar
  2. Scroll to the bottom and click Revert to classic template

Now to export again (this is the one we'll use)

  1. Go back to Settings on the left, and then back to other.
  2. Click export blog again (your blog is now in a different format, btw)
  3. Rename the file to something like blog-dd-mm-yyyy.classic.xml

Import to Ruhoh (for example)

Go on over to http://nodejs.org and download and install NodeJS and then translate your backup xml file into html with yaml frontmatter using blogger2jekyll

npm install -g blogger2jekyll
blogger2jekyll ~/Downloads/blog-*.classic.xml ~/blogger-posts/

Note: I used xmllint --format to help me look at the xml glob and figure out what it meant.

Now head to http://rvm.io and make sure you have ruby-1.9.3 afterwards.

source ~/.rvm/scripts/rvm
rvm reload
rvm use default
ruby --version

See Installing Ruby on Ubuntu 12.04 if you run into any trouble with the default install.

Time to get ruhoh up and rolling as per http://ruhoh.com

pushd ~
git clone 'git://github.com/ruhoh/blog.git' 'blog-v2'
pushd blog-v2
git checkout "2.0.alpha"
bundle install
bundle exec rackup -p 9292

Checkout your empty blog at http://localhost:9292 then hit ctrl+c in the terminal to kill rackup.

rsync -avhP ~/blogger-posts/ ~/blog-v2/posts/
bundle exec rackup -p 9292

Checkout your fancy blog at http://localhost:9292 then hit ctrl+c to exit.

You should really see Hosting your blog on ruhoh.com, because you still need to configure a few things before you're ready to rumble in the blog jungle!

Get hosted

Now it's time to get your site hosted. If you don't have a server yet, I recommend either thrustvps or chunkhost.

bundle exec ruhoh compile
rsync -avhP ./compiled/ user@server:/var/www/blog.example.com/
ssh user@server
npm install -g blogger2jekyll
sudo blogger2jekyll-server 80 /var/www/blog.example.com/

See the appendix for instructions on how to host with Nginx.

Contact me if you need further help. There is such a broad range of environments, I can't explain it all here.

Finalize the redirects

You need to get to the template editor:

  1. Go back to your Blog settings
  2. Click Template on the left bar
  3. Scroll to the bottom and where the editor is

Now you need to put some template code up in the <head>. It will not work in <body>!

Here's an example of the code. Note, however, that you need to change all four occurances of localhost:8080 with your domain!

<head>
  ...

  <MainPage>
    <meta
      http-equiv=refresh
      content="0; url=http://localhost:8080/">
    <link
      rel="canonical"
      href="http://localhost:8080/" />
  </MainPage>

  <ItemPage><Blogger>
    <meta
      http-equiv=refresh
      content="0; url=http://localhost:8080/redirect?blogger=<$BlogItemPermalinkURL$>">
    <link
      href="http://localhost:8080/redirect?blogger=<$BlogItemPermalinkURL$>" 
      rel="canonical"/>
  </Blogger></ItemPage>

  ...
</head>
<body>
  ...
  <!--script>
    location.href = 'http://blog.coolaj86.com/?blogger=' + location.href; //location.pathname.split('/').pop()
  </script-->
</body>

Once you hit save you should utilize http://validator.w3.org/checklink to make sure that everything was moved over and redirected properly.

BTW, If you don't want to take my word for what the heck that glob means, feel free to check my references:

Note: I tried using just $BlogItemURL$, but it didn't work.

Be Happy

Now you sit and await the probably-will-never-come day when you can remove your old blag.

Links I couldn't have done without

Appendix

Nginx

You'll probably want to put this in /etc/nginx/conf.d/blogger-redirect.inc or directly into /etc/nginx/sites-available/blog.example.com.

# matches '/redirect' and '/redirect/'
location ~ /redirect/?$ {

  # match a querystring 'blogger=<anything>//<anything-not-slash>/(all-the-leftovers)'
  if ($args ~ blogger=.*//[^/]*/(?<postUrl>.*)){

    # '^' matches any url (we've already limited ourselves to /redirect above)
    # '$postUrl' is the variable as matched above
    # the trailing '?' means 'throw away the original query arguments'
    rewrite ^ /$postUrl? permanent;

    # the arguments matched but the rewrite failed? that's a server error
    return 500;
  }

  # Couldn't process the redirect because it didn't match the pattern
  return   422;
}

TODO yet another tutorial

Bad Words

If you're wondering why I didn't mention Apache or PHP in any of this... just do us all a favor and get with the program! What is this, 1996? Stop it. No. Just, no!


By AJ ONeal

Was this useful to you? Share it!

Also, you can give me a tip or hire me.


blog comments powered by Disqus

Published

2013-01-19


Help a brother out.


Categories


Tags


I'm available for hire (and I appreciate tips)