Reclaiming My Facebook Content
- 5 minutes read - 915 wordsThe process of de-Facebooking my life has taken multiple stabs(1, 2, 3, 4, 5), but I have successfully accomplished my goal:
- Delete Facebook Account
- Migrate important life posts that I rue giving to Facebook to a platform of my own…
- …while not losing the pictures or datestamps of the posts (i.e. don’t ruin the historical record)
One can find my Former-Faceook posts at: Ex-Facebook Category or by
visiting my posts page and look for the posts marked
with the inverted Facebook logo.
If you’re interested in the technical strategy, read on.
Due to changes that I can’t predict in what Facebook publishes as part of your “take-it-with-you” export, I’ll not be providing pure code but rather a strategy. You’ll need beginner-to-intermediate programming experience to do this. I used Python 3.
Assets
- Facebook will let you export your data as JSON. Get this.
- Facebook will let you export your data as HTML. Get this as well. Make your life easy.
- Identify your output platform. In my case, it’s hugo static site generator
- Extract the contents of the JSON and HTML directories. I’ll call those
$JSON_ROOTand$HTML_ROOTbelow
Strategy
- The JSON export has:
posts/your_posts_n.json. I had 3 of these. The goal will be be something like in Python pseudocode:Based on the data in eachfor json_post in list_of_posts: render_into_output_template(json_post)json_postyou will create whatever assets your new publication platform needs in order to display your content. Sincejson_postis structured JSON data, we don’t have to worry about vagaries that we might have to sweat by working with the HTML exports. The HTML will be handy later on though. - Pseudocode
- Read in a file of JSON in
$JSON_ROOT/posts/your_posts_n.json - This is an
Arrayof JSONObjectsrepresenting posts, save that aslist_of_posts - Define function
render_into_output_templatethat receivesjson_post.
- Read in a file of JSON in
- Tasks of
render_into_output_template- Extract the
timestampfield - Open the HTML equivalent of your JSON file file (
$HTML_ROOT/posts/your_posts_1.jsonmeans open$HTML_ROOT/posts/your_posts_1.html)- Search across all
divs withclassof"pam": thus:div.pam. Admittedly, this is a very brittle approach and may fail as Facebook changes their export algorithm. Adjustment might be required. - Find the node with the same
datestampas above (conversion from Epoch seconds will be required) - Save that HTML string as
body
- Search across all
- OK so you have the HTML body as
bodyand thetimestamp. You’re good to start creating your new owned asset - Here’s how it works on my hugo configuration
- Create a directory base on the
timestampe.g.:/content/posts/YYYY-MM-DD-HH-MM - Populate
/content/posts/YYYY-MM-DD-HH-MM/index.md- I did this by defining a template using the “Jinja” Python library
- This template is what my hugo expects to see for a post
- Thus for each
bodyI extracted, I put that in the (as hugo calls it) “content” section of my file - In each
index.mdfile I also created the required (as hugo calls it) “front-matter” section. The main focus was to put mytimestampinto thedate:field so that the timestamp on the post matched the directory nameYYYY-MM-DD-HH-MMthat theindex.mdis in. I do this so that my defaultlscommand sorts correctly. - 🎉 You now have your text content in memory. Don’t write it yet, but do celebrate!
- Populate
/content/posts/YYYY-MM-DD-HH-MM/images- Take a look at the
bodyyou saved above. That’s HTML. Use an HTML parser like “beautiful soup” to find all the<img>,<video>or other media links. Extract theirsrcattributes. That will be the location, within your HTML export, on your hard disk where the image is: For example, in my export the path looked like:<video src="photos_and_videos/videos/obscure_file_name.mp4".... Thissrcvalue, before it is written into your file may require further massaging. Read on for more about path massaging - Now you know the relative path and the file name, copy the file from
$HTML_ROOT/photos_and_videos...into your/content/posts/YYYY-MM-DD-HH-MM/images/directory - Because of the way hugo treats
imagesdirectories with a given post (see: page bundle) it treats/content/posts/YYYY-MM-DD-HH-MM/as a single unit. From within theindex.mdyou can access the file assrc="/images/posts/YYYY-MM-DD-HH-MM/images/obscure_file_name.mp4" - HOWEVER, this can be customized in LOTS of ways. Maybe you need to change the pathing prefix of your links, etc. It’s hard to be prescriptive here since hugo, and web site ownership both allow for a lot of custom definitions, redirects, etc. Your situation will have some wrinkles I can’t predict. You might need to revisit the previous section when you captured the
srcpath to images and make some adjustments. - Write the filled-in template with your massaged data to
/content/posts/YYYY-MM-DD-HH-MM/index.md - 🎉 You now have your page bundle: a new directory, whose name is based on the
timestampyou got from the JSON payload; inside of this, there is anindex.mdwith appropriate front-matter along with the (massaged where necessary) HTML content extracted from your Facebook HTML export. Within theindex.mdfiles, you refer to assets that are locally contained within theimagessubdirectory
- Take a look at the
- Look in the local
hugo serverwebsite. You should see your posts rendered with their images.
- Create a directory base on the
- Adjust. Realistically, you’re not likely to get this right on the first try. Consider making a JSON import of one post. Iterate on getting a page-bundle-compatible output directory, with
index.mdwith image assets copied. Once you’re sure that’s right, try it against your full data set
- Extract the
Conclusion
While it’s not as convenient as me having written a universal importer for you, this is the strategy I used to export and migrate my content. I’m so much happier not having Facebook properties in my life. I hope you can find your way to that place, too.