Introduction

I just discovered ArchiveBox on my GitHub feed.

ArchiveBox allows you to store copies of webpages at a specific time.

It is still new for me, but from what I see, my workflow will be something like this:

  • to store copies of interesting webpages that I may want to read again later, i.e., my bookmarks; and then, use these archives:
    • as a backup link when the main page is outdated
    • as a way of comparing how the webpage would have changed with time (diff)
    • to list my interesting links
  • to periodically monitor changes of webpages I want to follow over time, i.e., my public social profiles, or this site web

To make it easier for me to maintain, I want to update a Google Spreadsheet and never touch a shell anymore.

My setup

First, write some links on a Google Spreadsheet document.

Then, publish the document in CSV format.

And finally create a script that will fetch the links in CSV and run the archiver against those URLs.

Here is my custom Makefile:

And my adapted docker-compose.yml file:

Run make loop in a tmux or another process-backgrounding method.

Result

I now only need to add new links to a Google Spreadsheet and let my script do the rest.