Thinking It Through: Smarter RSS Scheduling

I read RSS through email, using a Python program called Newspipe. I have a cron script set up that automatically runs Newspipe every hour to hit my feed list.

(This script also checks to see if I'm online, and if not, tries to connect via bluetooth over my phone to do the RSS and mail fetch.)

Anyone who uses RSS a lot knows the woes. Particularly annoying are duplicate stories. I've tried several different readers, including things like Bloglines, and none of them solve this problem completely. The problem generally seems to be on the server's end. Maybe it's because they make small corrections to the stories, but a lot of times I don't see any difference in the text.

One of the worst offenders is the Washington Post. What the hell are you guys doing over there? There is one and only one column I want to read there, which is "Poet's Choice" by Robert Pinsky. I have to keep tabs on what the School of Quietude (see Silliman) is up to. But I get literally the entire history of the column and the rest of the book review section every time I hit the feed.

Reading the RSS mail in Gnus makes all this much more bearable, because stories with the same subject lines are threaded, like any mailing list discussion would be. This is a big step up from what happens with Bloglines or any other dedicated RSS reader I've used.

But with my number of feeds exceeding 160 (Justin wasn't lying), duplication is finally getting to me, because I tend to enter and read these mail groups multiple times per day. So despite the threading I see a lot of these articles multiple times. And a lot of the early drafts of these articles, multiple times.

The thing is, I only need to read a lot of these feeds once a day. Many of them I only need to read once per week. Some only once per month! If I know that a publication only publishes once a week, why am I checking it every hour?

Also, there is one particular set of feeds that can be troubling sometimes, which is the Boston Globe. I have a paper subscription to the Sunday Globe. So, I don't want to see the RSS version on Sunday.

I started out with the plan to write separate cron scripts for hourly, daily, weekly, and monthly checking, each with its own OPML file. When I subscribe to a new feed (yes, sadly, I am still adding new feeds to the list), I would then add it to the appropriate place.

Then I went back and read over the many cool options that Newspipe has, and discovered that delay is an option that can be applied to groups of feeds within the main OPML file. So, instead of writing different scripts, I need only divide my feeds into groups, and set the appropriate value for that attribute for each group. I don't remember seeing this kind of individualized setting in any other reader that I've tried so far.

I have created the following groups in my OPML file.

    <outline text="Hourly" delay="60">
    <outline text="Daily" delay="1440">
    <outline text="Weekly" delay="10080">
    <outline text="Monthly" delay="40320">
    <outline text="Not Sunday">

I haven't figured out the Not Sunday group yet, but there are a couple options that together should be able to do this.

I have high hopes that this will result in a leaner, meaner me.

Tags: newspipe, opml, python, rss, thinking it through
