POSTS
Reclaiming My Facebook Content
BlogThe process of de-Facebooking my life has taken multiple stabs(1, 2, 3, 4, 5), but I have successfully accomplished my goal:
- Delete Facebook Account
- Migrate important life posts that I rue giving to Facebook to a platform of my own…
- …while not losing the pictures or datestamps of the posts (i.e. don’t ruin the historical record)
One can find my Former-Faceook posts at: Ex-Facebook Category or by visiting my posts page and look for the posts marked with the inverted Facebook logo.
If you’re interested in the technical strategy, read on.
Due to changes that I can’t predict in what Facebook publishes as part of your “take-it-with-you” export, I’ll not be providing pure code but rather a strategy. You’ll need beginner-to-intermediate programming experience to do this. I used Python 3.
Assets
- Facebook will let you export your data as JSON. Get this.
- Facebook will let you export your data as HTML. Get this as well. Make your life easy.
- Identify your output platform. In my case, it’s hugo static site generator
- Extract the contents of the JSON and HTML directories. I’ll call those
$JSON_ROOT
and$HTML_ROOT
below
Strategy
- The JSON export has:
posts/your_posts_n.json
. I had 3 of these. The goal will be be something like in Python pseudocode:Based on the data in eachfor json_post in list_of_posts: render_into_output_template(json_post)
json_post
you will create whatever assets your new publication platform needs in order to display your content. Sincejson_post
is structured JSON data, we don’t have to worry about vagaries that we might have to sweat by working with the HTML exports. The HTML will be handy later on though. - Pseudocode
- Read in a file of JSON in
$JSON_ROOT/posts/your_posts_n.json
- This is an
Array
of JSONObjects
representing posts, save that aslist_of_posts
- Define function
render_into_output_template
that receivesjson_post
.
- Read in a file of JSON in
- Tasks of
render_into_output_template
- Extract the
timestamp
field - Open the HTML equivalent of your JSON file file (
$HTML_ROOT/posts/your_posts_1.json
means open$HTML_ROOT/posts/your_posts_1.html
)- Search across all
div
s withclass
of"pam"
: thus:div.pam
. Admittedly, this is a very brittle approach and may fail as Facebook changes their export algorithm. Adjustment might be required. - Find the node with the same
datestamp
as above (conversion from Epoch seconds will be required) - Save that HTML string as
body
- Search across all
- OK so you have the HTML body as
body
and thetimestamp
. You’re good to start creating your new owned asset - Here’s how it works on my hugo configuration
- Create a directory base on the
timestamp
e.g.:/content/posts/YYYY-MM-DD-HH-MM
- Populate
/content/posts/YYYY-MM-DD-HH-MM/index.md
- I did this by defining a template using the “Jinja” Python library
- This template is what my hugo expects to see for a post
- Thus for each
body
I extracted, I put that in the (as hugo calls it) “content” section of my file - In each
index.md
file I also created the required (as hugo calls it) “front-matter” section. The main focus was to put mytimestamp
into thedate:
field so that the timestamp on the post matched the directory nameYYYY-MM-DD-HH-MM
that theindex.md
is in. I do this so that my defaultls
command sorts correctly. - 🎉 You now have your text content in memory. Don’t write it yet, but do celebrate!
- Populate
/content/posts/YYYY-MM-DD-HH-MM/images
- Take a look at the
body
you saved above. That’s HTML. Use an HTML parser like “beautiful soup” to find all the<img>
,<video>
or other media links. Extract theirsrc
attributes. That will be the location, within your HTML export, on your hard disk where the image is: For example, in my export the path looked like:<video src="photos_and_videos/videos/obscure_file_name.mp4"...
. Thissrc
value, before it is written into your file may require further massaging. Read on for more about path massaging - Now you know the relative path and the file name, copy the file from
$HTML_ROOT/photos_and_videos...
into your/content/posts/YYYY-MM-DD-HH-MM/images/
directory - Because of the way hugo treats
images
directories with a given post (see: page bundle) it treats/content/posts/YYYY-MM-DD-HH-MM/
as a single unit. From within theindex.md
you can access the file assrc="/images/posts/YYYY-MM-DD-HH-MM/images/obscure_file_name.mp4"
- HOWEVER, this can be customized in LOTS of ways. Maybe you need to change the pathing prefix of your links, etc. It’s hard to be prescriptive here since hugo, and web site ownership both allow for a lot of custom definitions, redirects, etc. Your situation will have some wrinkles I can’t predict. You might need to revisit the previous section when you captured the
src
path to images and make some adjustments. - Write the filled-in template with your massaged data to
/content/posts/YYYY-MM-DD-HH-MM/index.md
- 🎉 You now have your page bundle: a new directory, whose name is based on the
timestamp
you got from the JSON payload; inside of this, there is anindex.md
with appropriate front-matter along with the (massaged where necessary) HTML content extracted from your Facebook HTML export. Within theindex.md
files, you refer to assets that are locally contained within theimages
subdirectory
- Take a look at the
- Look in the local
hugo server
website. You should see your posts rendered with their images.
- Create a directory base on the
- Adjust. Realistically, you’re not likely to get this right on the first try. Consider making a JSON import of one post. Iterate on getting a page-bundle-compatible output directory, with
index.md
with image assets copied. Once you’re sure that’s right, try it against your full data set
- Extract the
Conclusion
While it’s not as convenient as me having written a universal importer for you, this is the strategy I used to export and migrate my content. I’m so much happier not having Facebook properties in my life. I hope you can find your way to that place, too.