An always up to date BlogRoll page

Thursday, July 11, 2024 - Permalink

Categories: blog -- Tags: #blogroll #miniflux #sourcehut #hugo

Nota: This post is tagged as a long post, meaning it may be better to prepare yourself a coffee or a drink of your choice before starting reading this page :).

Table of Content

Introduction

I’ve finally took the time to work on a better way to update my blogroll page. As I wrote previously, my first export was manual. As a reminder, the first export went like this:

Export OPML file from Miniflux
Remove unwanted sections in OPML file
Convert OPML file in JSON - For this I used the online service BeautifyTools
Add the json file within the data directory in hugo
Leverage hugo data template capability that can read files like json (or yaml, toml, csv) and use them directory in template as data
Use hugo shortcodes to include the blog list from a markdown file

But it was too manual to my taste and probably the reason I haven’t updated it since. The recent conversation amongs a few bloggers (myself included) reminded me of this “flaw” and thus I decided to change that. It went further than expected so I’m happy about the new process.

The idea

As I’ve explained before, my feed reader contains more than bloggers’ feed. I have private feeds (eg: webmention.io or reddit), a couple of youtube feeds, some news and sport related ones, etc… And I don’t want to share them all on the blogroll page because it wouldn’t make sense. Only bloggers (or adjacent) needs to be in there. I don’t care about the subject of the blog though, so I don’t exclude anyone from my 2 main categories: “English Bloggers” and “French Bloggers”.

In short, I needed to first remove feeds from unwanted categories.

Next was the display of the page itself. After the first export, I transformed the redacted opml files to json and use this json file as the data used by hugo (via shortcodes) to create the page (read previous post about it for more info). It was nice, but annoying to do. So my next idea was to use the opml file itself as the data in the hugo shortcode. Turns out it is possible, as long as the file extension is xml and not opml.

Step 2 was to change how hugo build the page, using a xml file instead of a json file.

And I thought it was it, but then I realized it would not be difficult to also automate the export of the opml file instead of doing it via the UI manually, so I also added that. From there, adding this to the automated build workflow on sourcehut was a piece of cake :).

Let’s deep dive!

The opml format

Just as a side note, this is what the exported files looks like (opml v2 standard format).

<?xml version="1.0" encoding="UTF-8"?>
<opml version="2.0">
    <head>
        <title>Miniflux</title>
        <dateCreated>Thu, 11 Jul 2024 08:15:29 UTC</dateCreated>
    </head>
    <body>
        <outline text="English Blogger">
            <outline title="title of the feed" text="description of the field" xmlUrl="<url to the feed>" htmlUrl="<url of the web page" type="rss"></outline>
            <outline title="title of the feed" text="description of the field" xmlUrl="<url to the feed>" htmlUrl="<url of the web page" type="rss"></outline>
            […]
        <outline text="French Blogger">
            […]
        </outline>
        […]
    </body>
</opml>

Within the <body> tag, the first level (<outline text=“English Blogger”) are the categories. Within those categories are the feeds themselves.

V1: Manual export

So starting with a feeds.opml file, I needed to remove the categories I didn’t want.

I decided to code this little tool in python (because why not), and instead of installing extra dependencies to manage opml files, I simply used the xml library built-in within python. I know there are extra libraries to help, but for my use case they didn’t bring any values and generating an opml file with “just” the standard xml python library was easy. So easy that the script is less than 40 lines:

import xml.etree.ElementTree as ET

## Configuration ##
blogroll_title = "Title of the blogroll"
# List of categories to keep:
to_keep = ["English Blogger", "French Blogger"]
# Path to the original opml file:
opml_file = '/path/to/feeds.opml'
# Export to hugo data directory (must be xml extention, not opml)
output_data_file = '/path/to/hugo/data/blogs.xml'
# Export to hugo static directory to link the file directly in the page for download:
output_opml_file = '/path/to/hugo/static/files/blogs.opml'
## /configuration ##

tree = ET.parse(opml_file)
root = tree.getroot()

opml = ET.Element('opml')
opml.set('version', '2.0')
head = ET.SubElement(opml, 'head')
ET.SubElement(head, 'title').text = blogroll_title
ET.SubElement(head, 'dateCreated').text = root.find('head').find('dateCreated').text

body = ET.SubElement(opml, 'body')

for o in root.find("body"):
    if o.get('text') in to_keep:
        t = ET.SubElement(body, 'outline')
        t.text = o.get('text')
        for child in o:
            feed = ET.SubElement(t, 'outline')
            feed.set('title', child.get('title'))
            feed.set('text', child.get('text'))
            feed.set('xmlUrl', child.get('xmlUrl'))
            feed.set('htmlUrl', child.get('htmlUrl'))
            feed.set('type', child.get('type'))


tree = ET.ElementTree()
tree._setroot(opml)
tree.write(output_data_file, encoding='utf-8', xml_declaration=True)
tree.write(output_opml_file, encoding='utf-8', xml_declaration=True)

Simple and reusable. All you need for this script is to configure 4 small things:

blogroll_title which contains the title of the opml
to_keep which contains the different categories to keep from the analyzed opml file
opml_file is the full path to the feeds.opml file (including its filename)
output_data_file is the full path to the file to create for hugo to use as data. Must be a .xml file.
output_opml_file is the full path to the file to create for hugo static file.

output_data_file and output_opml_file will be exactly the same. Only the name (and their location) change. That’s because one is used by hugo as a data source and must have an xml extension, the other is a file that I’m now making available to download from the blogroll page itself, so it needs to be in the static folder. I could have exported it once and copy / pasted it but that sounded easier (I am lazy)…

And that’s it, you get as an output a nice opml file (twice for the price of one!) that can be used as the data source to generate the page.

Hugo updates to use the new xml file

My first version used a json file instead of this xml file. But that wasn’t necessary to transform the format, as hugo can manage xml files as data source just as well. After adding the xml in the right place (the data folder within hugo), I simply had to change my custom shortcut to parse and display data. Refer to my previous post for more information about hugo shortcuts, but the new one looks like this:

{{ $filename := $.Get 0 }}
{{ $data := index .Site.Data $filename }}
{{ with $data.body.outline }}
  {{ range . }}
    <div>
      <h3 id="{{ index . "#text" | urlize }}">{{ index . "#text" }} ({{ .outline | len }})</h3>
      <ul>
        {{ range .outline }}
          <li><a href="{{ index . "-htmlUrl" }}">{{ index . "-title" }}</a></li>
        {{ end }}
      </ul>
    </div>
  {{ end }}
{{ end }}

Placed in layouts/shortcodes/blogroll.html.

And that was it for displaying the data!

At this stage, I thought I was done because I achieved the goal I had: simplify the creation of the blogroll page by automating the data creation from time to time…

V2: Semi Automated

But that wasn’t enough (apparently), so I decided to take one step further. What about not having to think about exporting this file from time to time and just run the previous code to download it from me?

Turns out, there is a nice python client to work with miniflux API! Granted it adds a dependency to the script, but the gain here is worth it (IMHO). So I edited my script to make it download the file using the API removing yet another step!

First, I installed the client using pip in a virtual env:

python -m venv "./opml"
cd "./opml"
./bin/pip install miniflux
./bin/python path/to/python/script.py

Then I edited the python script to:

import os
import miniflux
import xml.etree.ElementTree as ET

## Configuration ##
# List of categories to keep:
to_keep = ["English Blogger", "French Blogger"]
# Export to hugo data directory (must be xml extention, not opml)
output_data_file = '/path/to/hugo/data/blogs.xml'
# Export to hugo static directory to link the file directly in the page for download:
output_opml_file = '/path/to/hugo/static/files/blogs.opml'
## /configuration ##

client = miniflux.Client(os.environ["MINIFLUX_URL"], api_key=os.environ["MINIFLUX_API_TOKEN"])
opml = client.export_feeds()

root = ET.fromstring(opml)

opml = ET.Element('opml')
opml.set('version', '2.0')
head = ET.SubElement(opml, 'head')
ET.SubElement(head, 'title').text = "Bacardi55's blogroll"
ET.SubElement(head, 'dateCreated').text = root.find('head').find('dateCreated').text

body = ET.SubElement(opml, 'body')

for o in root.find("body"):
    if o.get('text') in to_keep:
        t = ET.SubElement(body, 'outline')
        t.text = o.get('text')
        for child in o:
            feed = ET.SubElement(t, 'outline')
            feed.set('title', child.get('title'))
            feed.set('text', child.get('text'))
            feed.set('xmlUrl', child.get('xmlUrl'))
            feed.set('htmlUrl', child.get('htmlUrl'))
            feed.set('type', child.get('type'))


tree = ET.ElementTree()
tree._setroot(opml)
tree.write(output_data_file, encoding='utf-8', xml_declaration=True)
tree.write(output_opml_file, encoding='utf-8', xml_declaration=True)

For the script to run, you need 2 environment variables:

MINIFLUX_URL: the URL to miniflux (eg: https://miniflux.domain.tld)
MINIFLUX_API_TOKEN: the API token generated in miniflux

To test, you can just use

export MINIFLUX_URL="https://miniflux.example.com" && export MINIFLUX_API_TOKEN="my super token"

And then run the script in the same terminal.

Now the script export a fresh opml file from my miniflux instance and then generate the opml and xml files at the right place.

Goal achieved… Right?… Right?

No, it was still not enough, because I thought: “wait a minute, if now it is fully automated, why not run that script everytime you build the blog file within sourcehut automated build?”. So the story was not over.

V3: Fully Automated

I’ve-wrote a lengthy post about my blog deployment workflow, so I’ll try to stay short this time.

First step was to upload to sourcehut the miniflux secrets (url and API). I uploaded a simple file using sourcehut secrets that just contains the following:

export MINIFLUX_URL="https://miniflux.example.com"
export MINIFLUX_API_TOKEN="my super secret token"

Then, I edited the .build.yaml to add a new task that will take care of creating the xml and opml files:

[…]
tasks:
  […]
  - blogroll: |
      set +x
      . ~/.miniflux_secrets
      set -x
      cd ~/writting-deploy && ./build_blogroll.sh      
[…]

You can see the full build file on sourcehut.

You can see here that I’m not launching the python script but a shell script. The reason is because I needed to install the miniflux python client first (it isn’t available in archlinux package or aur) via virtual environment before being able to launch the python script. Because of this, I also simplified the python script (see below) not to care about where to export the file, the shell script will do it itself. It was easier that way.

The build_blogroll.sh script looks like this:

#!/bin/bash

source ./dirconfig.sh

echo "Starting blogroll generation"

echo "Creating temp directory: ${temp:?}"
mkdir "${temp:?}" || exit

echo "Creating virtual environment"
python -m venv "${temp:?}/opml"
cd "${temp:?}/opml"
echo "Installing miniflux python client"
./bin/pip install miniflux
./bin/python "${blog_blogroll_script:?}" || exit

echo "Moving generated file to data directory"
mv ./blogs.xml "${blog:?}/${blog_blogroll_data:?}/blogs.xml" || exit
echo "Copying generated file to static directory"
cp "${blog:?}/${blog_blogroll_data:?}/blogs.xml" "${blog:?}/${blog_blogroll_opml:?}/blogs.opml" || exit

echo "Cleaning temp dir ${temp}"
cd ~
rm -rf "${temp:?}" || exit

echo "Blogroll files generated successfully"

Which loads some directory path from the dirconfig.sh file (read the mentioned post about my blog deployment for more info), create a python virtual environment, install the miniflux python client and start the script to generate the file. Then, the file is copied in the hugo data directory as blogs.xml and in hugo static/files directory as blogs.opml.

The extracted version of the dirconfig.sh is:

[…]
blog_blogroll_script="/home/build/writting-deploy/generate_blogroll_data.py"
blog_blogroll_data="data" # relative to hugo root path
blog_blogroll_opml="static/files" # relative to hugo root path
[…]

You can see the full dirconfig file here.

As said above, I’ve edited the python script to its final (for now^^) version:

import os
import miniflux
import xml.etree.ElementTree as ET

## Configuration ##
# List of categories to keep:
to_keep = ["English Blogger", "French Blogger"]
## /configuration ##

client = miniflux.Client(os.environ["MINIFLUX_URL"], api_key=os.environ["MINIFLUX_API_TOKEN"])
opml = client.export_feeds()

root = ET.fromstring(opml)

opml = ET.Element('opml')
opml.set('version', '2.0')
head = ET.SubElement(opml, 'head')
ET.SubElement(head, 'title').text = "Bacardi55's blogroll"
ET.SubElement(head, 'dateCreated').text = root.find('head').find('dateCreated').text

body = ET.SubElement(opml, 'body')

for o in root.find("body"):
    if o.get('text') in to_keep:
        t = ET.SubElement(body, 'outline')
        t.text = o.get('text')
        for child in o:
            feed = ET.SubElement(t, 'outline')
            feed.set('title', child.get('title'))
            feed.set('text', child.get('text'))
            feed.set('xmlUrl', child.get('xmlUrl'))
            feed.set('htmlUrl', child.get('htmlUrl'))
            feed.set('type', child.get('type'))


tree = ET.ElementTree()
tree._setroot(opml)
tree.write("blogs.xml", encoding='utf-8', xml_declaration=True)

Now only the to_keep configuration is needed. blogs.xml is generated and then copied to 2 places by the shell script that started this python script.

And now my ~~watch~~ work has ended.

Conclusion

I had fun overcomplexifying even more my blog deployment workflow (see all related posts) to add the blogroll update to it. The good thing with this is now I can forgot about my blogroll page, I know it will stay up to date with my feed reader. The only change I may need to do is to add/remove/rename categories to keep from my export if I change my categorization within my feed reader. Otherwise, I don’t have to care about it anymore :].

From the « Unnecessary Complex Deployment Workflow »: collection:

Contact

If you find any issue or have any question about this article, feel free to reach out to me via webmentions, email, mastodon, matrix or even IRC, see the About page for details.

« JulyReply should be the norm

What helps me browse a... »