I recently migrated my blog from Octopress 2 to Octopress 3 and thought I’d provide a write-up on the migration process. The process covers changing over with the least amount of fuss. I provide code to help automate things as well as code to help sanitize your data. This process can be completed in 6 steps.

Background

Jekyll is a static-file publishing platform. It’s designed to help you create a “Plain Old Web Site” with consistent look and feel. You write a plain-text document of content, run a program, and Jekyll wraps the page with template styling, footers, headers, etc. It helps turn “a heap of text documents” into a coherent site. Helping this process is that you get to define templates which serve as the rules for transforming your plain-old text files into “the content of web pages.”

From this, it should be evident that a “blog” is really just a site constructed from uniformly styled / formatted plain-text files that underwent a templating process. A side perk of this design is that since your content lives in plain text files you can remove your dependency on a database for holding the content. You won’t need to worry about repeated reloads slowing down the site, etc. Also, because your site doesn’t have to do a query to the database, the content will load very quickly.

Obviously, if you take the basic non-blogging-platform Jekyll, and use it as a blogging platform, it might fall short in terms of features. You might yearn for a command to start a new file with this look-and-feel or for a shorthand for referencing images. A set of extension of capabilities was written by Ian Mathis and shared as a “fork” off of Jekyll. As Mr. Mathis discovered / authored / integrated contributions to his “fork” of Jekyll it became more useful. Eventually the Mathis-originated version of Jekyll deserved its own name, Octopress.

Problems

I was a fairly early convert to Octopress. However staying up to date was implemented in a somewhat complex way. One of the present maintainers (Brandon Mathis) summed the matter up as:

What’s wrong?

If I’m being harsh, I’ll tell you that as it is now, Octopress is basically some guy’s Jekyll blog you can fork and modify. The first, and most obvious flaw, is that Octopress is distributed through Git. I want to punch through a wall when I think about users wrestling with merge conflicts from updating their sites. It’s absurd.

Tough talk from the maintainer.

As I started on my work hiatus I went to octopress.org, found the post containing the citation above, and saw that the last update date was: JAN 15TH,

  1. I was quite scared that there was going to be no way to move forward, that Octopress was basically abandoned.

To my surprise, Octopress 3.0 did emerge, did see the light of day, and is usable. The team adopted the approach of using Jekyll as the base software and wrapping the Octopress extensions up as a library (or Ruby “gem”) that can be rolled in. This guide will take you from migrating your blog from Octopress 2 → 3.

Step 1: Install Jekyll and Make an Initial Commit

Follow the quick install guide at Jekyll’s landing page or in its more extensive installation guide. And create a new site in a new directory. We’ll migrate the old content momentarily.

Initialize this directory with git.

git init; git add . ; git commit -m "Jekyll install"

should do the work for you.

Step 2: Preliminary Customization

Files ending in .md in the “top-level” directory are “static pages.” By default, you’re given an about page as well as an index page. I’d recommend customizing your about page and I’d recommend editing (and then removing) the welcome-to-jekyll page in _posts just to prove that you can make changes.

Your _config.yml also probably should be changed from defaults: name, contact information, tagline, etc.

To verify these changes, use the built-in web server.

bundle; bundle exec jekyll s

This will start a local web server on port 4000 that you can browse to.

If your changes look correct, create a commit.

Customization (Advanced, Skippable)

I do my work on a VPS. I keep my development workspaces on a server run by [Digial Ocean][] which I connect to over the network. This means that any computer is basically a keyboard and screen for my “thinking computer” which lives in a data center with backups “in the cloud.”

Consequently, when I start up a local server I need it to run on a different hostname than localhost. To make this change I provide --host vps-host-domain-name. Similarly, the--port can be changed as well.

Step 3: Import!

This is surprisingly easy. Take the old posts in your old Octopress site’s _posts directory and dump them in the new. Add them all and make a commit.

You’ll see your web server recognize the new content and rebuild (or attempt to rebuild )the site. Congratulations, your data has been migrated. It’s possible something will go wrong here – in fact it’s most likely guaranteed. Dont’ worry, they’re easy to fix.

Step 4: Data Sanitization

It’s possible that the content you used in Octopress 2 is not supported in Octopress 3. Or, perhaps you used a character in a way that Jekyll’s templating engine thinks has significance and it’s confused. This section will guide you through reasoning about these.

You’re going to want to fix these, but I’ll guide you on a path to dealing with the most likely culprits.

The {% img % } tag of Octopress

Even writing {% above, just now, triggered a build error ). It’s OK, we’ll fix this!

In the Jekyll templating language, Liquid, curly brace and percents trigger a call to plugins, Ruby programming, or template capabilities. If you have calls that aren’t supported by Jekyll, which you probably do since you were using Octopress, you have to clean those up.

My most popular newly-invalidted tag was the img tag that helped style and center images. I had the choice of removing all of those….or coding in support for the img “plugin.” I opted for the latter. Here’s the code I used:

module Jekyll
  class OldOctopressImgTag < Liquid::Tag
    def initialize(tag_name, text, tokens)
      super
      args = text.split(/\s+/)

      @class_name = args.shift
      @path = args.shift

      if args.length >= 3
        @height = args.shift.to_i
        @width = args.shift.to_i
      end

      @alt_text = args.join(" ")
    end

    def render(context)
      output = "<img class='"
      output += @class_name
      output += "' src='#{@path}' "

      output = height_and_width(output)
      output += "alt_text='#{@alt_text}' />"
    end

    def height_and_width(s="")
      return s if @height.nil? or @width.nil?
      s + "height='#{@height}' width='#{@width}' "
    end
  end
end

Liquid::Template.register_tag('img', Jekyll::OldOctopressImgTag)

You can use this img command with:


{% img style-class path-to-image height width "alt-text" %}

After adding a plugin you’ll need to restart any running web servers. They’re only read in at start time. As the server starts up you’ll see whether you’ve cleared these issues. Alternatively run jekyll build and see if you’re clean of errors.

Remove other Meta-Tags

In several posts I used {{ or }} as a parenthetical. Those characters are used by Liquid and thus a no-no. Using grep to find them and fix them and commit the changes is wise. I replaced them with HTML entity codes instead. For the record, ou can tell Liquid to stop processing a block by using its “raw” directive.

Also, in some places I posted LaTeX formatted code which also uses those same characters.

Optional: Normalize your Categories

Over the years I’d been inconsistent with my categories tags. As a result in my custom categories page (code at bottom) I had both e.g. ajax and Ajax. I wrote a quick Python script to clean those files up. My advice is to make sure you’re working files are all committed and then apply this program. As a caveat, the effectiveness of this program will be impacted by how clean your data are. Nevertheless, it might give you a leg up if you have to do some clean-up. This script ensures all categories are “- Capitalized”. This is written in Python3. Lastly, this is not robust, clean code. It was write-once ;)

#!/usr/bin/env python3

# Takes a list of files as argument e.g. `ls _posts/2017* | review-file.py`

import os.path
import re

DIR = "./_posts"

def build_fullpath(file_name):
    return os.path.join(DIR, file_name)

def capitalize_categories_in(f):

    # Build path names
    current_file_path = build_fullpath(f)
    new_file_path = os.path.join("/tmp", f + "_new")

    # Find categories block's offsets
    current_file = open(current_file_path)
    lines = current_file.readlines()
    if (not 'categories:\n' in lines):
        return
    start = lines.index('categories:\n', 1) + 1
    end = lines.index('---\n', 1)

    # Replace categories to be "Single capitalized and rest lower-cased"
    replacement_categories = [ "- " + line[2:].capitalize() for line in lines[start:end]]

    # Write the new file
    the_new = lines[0:start] + replacement_categories + lines[end:]
    out = open(new_file_path, "w")
    out.writelines(the_new)

    # Replace the old file
    os.unlink(current_file_path)
    os.rename(new_file_path, current_file_path)

for fi in input().split():
    capitalize_categories_in(fi)

These are the major techniques at your disposal. The end goal is to get your Jekyll build to run cleanly.

Step 5: Migrate the Images

I recursively copied my images directory from my old Octopress directory to the new. git add the images directory and commit. Done!

To tell Jekyll to use the permalink style that your Octopress 2 style site was indexed with, add the following to _config.yml:

# Add to preserve permalinks from Octopress site
permalink: /blog/:year/:month/:day/:title/

This will require a restart of your server.

Step 6: Advanced Customization

Custom Templates

To customize the default theme, you’re expected to copy its template OUT of the Gem directory and put it in your site directory. This takes precedence over the default them’s version and thus you can customize the look and feel of the site.

For example, consider the case that I wanted to modify the footer.

$ grep theme _config.yml
theme: minima

As you can see, I’m running the default theme, “minima.” So I ask bundler where my “minima” is installed:

$ bundle show minima
/home/user/.gem/ruby/2.4.1/gems/minima-2.1.1

I copy footer.html out of the _includes directory contained in: bundle show minima and place it in my own _includes/footer.html file. Thus my file will override the theme-provided file by the same name. The Jekyll customization page covers this well.

Custom CSS

Add custom CSS in assets/main.scss

Custom Pages

Customizing your templates or using their helper “includes” follows the general pattern described above in “Custom Templates.” You take something out of a gem, put it locally which overrides the default behavior, and then you customize.

I wound up adding the following:

  • categories.html
  • _includes/
  • _includes/footer.html
  • _includes/google_search.html
  • _includes/post_body.html
  • _includes/header.html
  • _layouts/
  • _layouts/home.html
  • _templates/
  • _templates/page
  • _templates/draft
  • _templates/post

I’ve included their bodies below (for searchability) but they’ll upset the flow of this post. Consequently let me wind things up and paste them at the absolute end of this post.

Deployment & Conclusion

From there on out, the work is learning Liquid Templates rules so that you can learn about how to customize your site. I’ve found Liquid’s documentation to be approachable and easy. It operates like most template frameworks (e.g. ERB, Handlebars) but has a rich set of transformations (“filters”) which make doing template-layer transformation easy.

I wound up creating another web host that serves my content. So I do a “git push” to a git repository on another server. That server is my web server. Deployment happens by means of a git post-receive hook. After I push to that repository, the hook code runs which effectively jekyll builds the content into a new directory on the remote site. I’ll include that code here as well.

$ cat post-receive
GIT_REPO=/full/path/to/repo
TMP_GIT_CLONE=/full/path/to/build/dir/for/jekyll
PUBLIC_WWW=/full/path/to/directory/served/by/nginx/or/other/webserver

export PATH=/home/user/.rbenv/bin:$PATH
/home/user/.rbenv/bin/rbenv local 2.4.1
/home/user/.rbenv/shims/bundle

git clone $GIT_REPO $TMP_GIT_CLONE
cd $TMP_GIT_CLONE
/home/user/.rbenv/shims/bundle exec jekyll build -s $TMP_GIT_CLONE -d $PUBLIC_WWW
rm -Rf $TMP_GIT_CLONE
exit

This gets Jekyll + Octopress up and running. To use Octopress’ handy scripts like octopress new post or octopress isolate <filename> consult octopress --help. Jekyll + Octopress are a great combination.

And that’s it! Enjoy your screamin’ fast site!

Customized Files

categories.html

---
layout: page
permalink: /categories/
title: Categories
---

<div id="archives">

{% assign sorted_categories = site.categories |sort %}
{% for category in sorted_categories %}
  <div class="archive-group">
    {% capture category_name %}{{ category | first }}{% endcapture %}
    <h3 class="category-head">
      {{ category_name | capitalize }} ({{ site.categories[category_name] | size }})
      <a class="expand" href="#">&nbsp;Expand</a>
    </h3>
    <a name="{{ category_name | slugize }}"></a>
    {% assign sorted_posts = site.categories[category_name] | sort %}
    {% for post in sorted_posts %}
    <article class="archive-item hidden">
      <h4><a href="{{ site.baseurl }}{{ post.url }}">{{post.title}}</a></h4>
    </article>
    {% endfor %}

  </div>
{% endfor %}
</div>

<script type="application/javascript">
Array.from(document.querySelectorAll("a.expand")).forEach(elem => {
  elem.addEventListener('click', e => {
    e.preventDefault();

    let expandLink = e.target;
    let articles = expandLink.parentElement.parentElement.querySelectorAll("article");

    articles.forEach(article => article.classList.toggle("hidden"));
    expandLink.classList.toggle("expanded-category-link");
  });
});

</script>

_includes/footer.html

<footer class="site-footer">

  <div class="wrapper">

    <h2 class="footer-heading">{{ site.title | escape }}</h2>

    <div class="footer-col-wrapper">
      <div class="footer-col footer-col-1">
        <ul class="contact-list">
          <li>
            {% if site.author %}
              {{ site.author | escape }}
            {% else %}
              {{ site.title | escape }}
            {% endif %}
            </li>
            {% if site.email %}
            <li><a href="mailto:{{ site.email }}">{{ site.email }}</a></li>
            {% endif %}
        </ul>
      </div>

      <div class="footer-col footer-col-2">
        <ul class="social-media-list">
          {% if site.github_username %}
          <li>
            {% include icon-github.html username=site.github_username %}
          </li>
          {% endif %}

          {% if site.twitter_username %}
          <li>
            {% include icon-twitter.html username=site.twitter_username %}
          </li>
          {% endif %}
        </ul>
      </div>

      <div class="footer-col footer-col-3">
        <p>{{ site.description | escape }}</p>
      </div>
    </div>

  </div>

</footer>

<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'YOUR GOOGLE ANALYTICS ID HERE', 'auto');
  ga('send', 'pageview');

</script>

_includes/google_search.html

<form action="http://google.com/search" method="get" id="search-form">
  <fieldset role="search">
    <input type="hidden" name="q" value="site:stevengharms.com/" />
    <input class="search" type="text" name="q" results="0" placeholder="Search"/>
  </fieldset>
</form>

_includes/post_body.html

<article class="post" itemscope itemtype="http://schema.org/BlogPosting">
  <header class="post-header">
    <h1 class="post-title" itemprop="name headline">{{ include.content.title | escape }}</h1>
    <p class="post-meta">
      <a href="{{ include.content.url | relative_url }}">
        <time datetime="{{ include.content.date | date_to_xmlschema }}" itemprop="datePublished">
          {% assign date_format = site.minima.date_format | default: "%b %-d, %Y" %}
          {{ include.content.date | date: date_format }}
        </time>
      </a>
      {% if include.content.author %}
        • <span itemprop="author" itemscope itemtype="http://schema.org/Person"><span itemprop="name">{{ include.content.author }}</span></span>
      {% endif %}</p>
  </header>

  <div class="post-content" itemprop="articleBody">
    {% assign word_limit = 100 %}
    {% capture word_count %}{{include.content.content | number_of_words | minus: word_limit}}{% endcapture %}

    {% if include.content.content contains "<!-- more -->" %}
      {% assign words = include.content.content | split: "<!-- more -->"  %}
      {{ words[0] }}
      <a class="preview post-link" href="{{ include.content.url | relative_url }}"><em>Read more</em></a>
    {% elsif word_count contains "-" %}
      {{ include.content.content }}
    {% elsif post.noabbrev %}
      {{ include.content.content }}
    {% else %}
      {{ include.content.content | truncatewords: word_limit -}}...
      <a class="preview post-link" href="{{ include.content.url | relative_url }}"><em>Continue</em></a>
    {% endif %}
  </div>

  {% if site.disqus.shortname %}
    {% include disqus_comments.html %}
  {% endif %}
</article>

_includes/header.html

<header class="site-header" role="banner">

  <div class="wrapper">
    {% assign default_paths = site.pages | map: "path" %}
    {% assign page_paths = site.header_pages | default: default_paths %}
    <a class="site-title" href="{{ "/" | relative_url }}">{{ site.title | escape }}</a>


    {% if page_paths %}
      <nav class="site-nav">
        <input type="checkbox" id="nav-trigger" class="nav-trigger" />
        <label for="nav-trigger">
          <span class="menu-icon">
            <svg viewBox="0 0 18 15" width="18px" height="15px">
              <path fill="#424242" d="M18,1.484c0,0.82-0.665,1.484-1.484,1.484H1.484C0.665,2.969,0,2.304,0,1.484l0,0C0,0.665,0.665,0,1.484,0 h15.031C17.335,0,18,0.665,18,1.484L18,1.484z"/>
              <path fill="#424242" d="M18,7.516C18,8.335,17.335,9,16.516,9H1.484C0.665,9,0,8.335,0,7.516l0,0c0-0.82,0.665-1.484,1.484-1.484 h15.031C17.335,6.031,18,6.696,18,7.516L18,7.516z"/>
              <path fill="#424242" d="M18,13.516C18,14.335,17.335,15,16.516,15H1.484C0.665,15,0,14.335,0,13.516l0,0 c0-0.82,0.665-1.484,1.484-1.484h15.031C17.335,12.031,18,12.696,18,13.516L18,13.516z"/>
            </svg>
          </span>
        </label>

        <div class="trigger">
          {% include google_search.html %}
          {% for path in page_paths %}
            {% assign my_page = site.pages | where: "path", path | first %}
            {% if my_page.title %}
            <a class="page-link" href="{{ my_page.url | relative_url }}">{{ my_page.title | escape }}</a>
            {% endif %}
          {% endfor %}
        </div>
      </nav>
    {% endif %}
  </div>
</header>

_layouts/home.html

---
layout: default
---

<div class="home">

  <h1 class="page-heading">Posts</h1>

  {{ content }}

  <ul class="post-list">
    {% assign i = 0 %}
    {% for post in site.posts %}
      {% if i < 3 %}
        {% include post_body.html content=post %}
        <hr/>
      {% else %}
        <li>
        {% assign date_format = site.minima.date_format | default: "%b %-d, %Y" %}
        <span class="post-meta">{{ post.date | date: date_format }}</span>

        <h2>
          <a class="post-link" href="{{ post.url | relative_url }}">{{ post.title | escape }}</a>
        </h2>
      </li>
      {% endif %}
      {% assign i = i | plus: 1 %}
    {% endfor %}
  </ul>

  <p class="rss-subscribe">subscribe <a href="{{ "/feed.xml" | relative_url }}">via RSS</a></p>

</div>

_templates/page

---
layout: {{ layout }}
title: {{ title }}
---

_templates/draft

---
layout: {{ layout }}
title: {{ title }}
---

_templates/post

---
layout: {{ layout }}
title: {{ title }}
date: {{ date }}
---