Reclaiming My Facebook Content

May 24, 2021 - 5 minutes read - 915 words

The process of de-Facebooking my life has taken multiple stabs(1, 2, 3, 4, 5), but I have successfully accomplished my goal:

Delete Facebook Account
Migrate important life posts that I rue giving to Facebook to a platform of my own…
…while not losing the pictures or datestamps of the posts (i.e. don’t ruin the historical record)

One can find my Former-Faceook posts at: Ex-Facebook Category or by visiting my posts page and look for the posts marked with the inverted Facebook logo.

If you’re interested in the technical strategy, read on.

Due to changes that I can’t predict in what Facebook publishes as part of your “take-it-with-you” export, I’ll not be providing pure code but rather a strategy. You’ll need beginner-to-intermediate programming experience to do this. I used Python 3.

Assets

Facebook will let you export your data as JSON. Get this.
Facebook will let you export your data as HTML. Get this as well. Make your life easy.
Identify your output platform. In my case, it’s hugo static site generator
Extract the contents of the JSON and HTML directories. I’ll call those $JSON_ROOT and $HTML_ROOT below

Strategy

The JSON export has: posts/your_posts_n.json. I had 3 of these. The goal will be be something like in Python pseudocode:
```
for json_post in list_of_posts:
  render_into_output_template(json_post)
```
Based on the data in each json_post you will create whatever assets your new publication platform needs in order to display your content. Since json_post is structured JSON data, we don’t have to worry about vagaries that we might have to sweat by working with the HTML exports. The HTML will be handy later on though.
Pseudocode
1. Read in a file of JSON in $JSON_ROOT/posts/your_posts_n.json
2. This is an Array of JSON Objects representing posts, save that as list_of_posts
3. Define function render_into_output_template that receives json_post.
Tasks of render_into_output_template
1. Extract the timestamp field
2. Open the HTML equivalent of your JSON file file ($HTML_ROOT/posts/your_posts_1.json means open $HTML_ROOT/posts/your_posts_1.html)
  1. Search across all divs with class of "pam": thus: div.pam. Admittedly, this is a very brittle approach and may fail as Facebook changes their export algorithm. Adjustment might be required.
  2. Find the node with the same datestamp as above (conversion from Epoch seconds will be required)
  3. Save that HTML string as body
3. OK so you have the HTML body as body and the timestamp. You’re good to start creating your new owned asset
4. Here’s how it works on my hugo configuration
  1. Create a directory base on the timestamp e.g.: /content/posts/YYYY-MM-DD-HH-MM
  2. Populate /content/posts/YYYY-MM-DD-HH-MM/index.md
    1. I did this by defining a template using the “Jinja” Python library
    2. This template is what my hugo expects to see for a post
    3. Thus for each body I extracted, I put that in the (as hugo calls it) “content” section of my file
    4. In each index.md file I also created the required (as hugo calls it) “front-matter” section. The main focus was to put my timestamp into the date: field so that the timestamp on the post matched the directory name YYYY-MM-DD-HH-MM that the index.md is in. I do this so that my default ls command sorts correctly.
    5. 🎉 You now have your text content in memory. Don’t write it yet, but do celebrate!
  3. Populate /content/posts/YYYY-MM-DD-HH-MM/images
    1. Take a look at the body you saved above. That’s HTML. Use an HTML parser like “beautiful soup” to find all the <img>, <video> or other media links. Extract their src attributes. That will be the location, within your HTML export, on your hard disk where the image is: For example, in my export the path looked like: <video src="photos_and_videos/videos/obscure_file_name.mp4".... This src value, before it is written into your file may require further massaging. Read on for more about path massaging
    2. Now you know the relative path and the file name, copy the file from $HTML_ROOT/photos_and_videos... into your /content/posts/YYYY-MM-DD-HH-MM/images/ directory
    3. Because of the way hugo treats images directories with a given post (see: page bundle) it treats /content/posts/YYYY-MM-DD-HH-MM/ as a single unit. From within the index.md you can access the file as src="/images/posts/YYYY-MM-DD-HH-MM/images/obscure_file_name.mp4"
    4. HOWEVER, this can be customized in LOTS of ways. Maybe you need to change the pathing prefix of your links, etc. It’s hard to be prescriptive here since hugo, and web site ownership both allow for a lot of custom definitions, redirects, etc. Your situation will have some wrinkles I can’t predict. You might need to revisit the previous section when you captured the src path to images and make some adjustments.
    5. Write the filled-in template with your massaged data to /content/posts/YYYY-MM-DD-HH-MM/index.md
    6. 🎉 You now have your page bundle: a new directory, whose name is based on the timestamp you got from the JSON payload; inside of this, there is an index.md with appropriate front-matter along with the (massaged where necessary) HTML content extracted from your Facebook HTML export. Within the index.md files, you refer to assets that are locally contained within the images subdirectory
  4. Look in the local hugo server website. You should see your posts rendered with their images.
5. Adjust. Realistically, you’re not likely to get this right on the first try. Consider making a JSON import of one post. Iterate on getting a page-bundle-compatible output directory, with index.md with image assets copied. Once you’re sure that’s right, try it against your full data set

Conclusion

While it’s not as convenient as me having written a universal importer for you, this is the strategy I used to export and migrate my content. I’m so much happier not having Facebook properties in my life. I hope you can find your way to that place, too.