I read RSS through email, using a Python program called Newspipe. I have a cron
script set up that automatically runs Newspipe every hour to hit my feed list.
(This script also checks to see if I'm online, and if not, tries to connect via
bluetooth over my phone to do the RSS and mail fetch.)
Anyone who uses RSS a lot knows the woes. Particularly annoying are duplicate
stories. I've tried several different readers, including things like Bloglines,
and none of them solve this problem completely. The problem generally seems to
be on the server's end. Maybe it's because they make small corrections to the
stories, but a lot of times I don't see any difference in the text.
One of the worst offenders is the Washington Post. What the hell
are you guys doing over there? There is one and only one column I want to read
there, which is "Poet's Choice" by Robert Pinsky. I have to keep tabs on what
the School of Quietude (see Silliman) is up to. But I get literally the entire
history of the column and the rest of the book review section every time I hit
Reading the RSS mail in Gnus makes all this much more bearable, because stories
with the same subject lines are threaded, like any mailing list discussion
would be. This is a big step up from what happens with Bloglines or any other
dedicated RSS reader I've used.
But with my number of feeds exceeding 160 (Justin wasn't lying), duplication is
finally getting to me, because I tend to enter and read these mail groups
multiple times per day. So despite the threading I see a lot of these articles
multiple times. And a lot of the early drafts of these articles, multiple
The thing is, I only need to read a lot of these feeds once a day. Many of them
I only need to read once per week. Some only once per month! If I know that a
publication only publishes once a week, why am I checking it every hour?
Also, there is one particular set of feeds that can be troubling sometimes,
which is the Boston Globe. I have a paper subscription to the
Sunday Globe. So, I don't want to see the RSS version on Sunday.
I started out with the plan to write separate cron scripts for hourly, daily,
weekly, and monthly checking, each with its own OPML file. When I subscribe to
a new feed (yes, sadly, I am still adding new feeds to the list), I would then
add it to the appropriate place.
Then I went back and read over the many cool options that Newspipe has, and
delay is an option that can be applied to groups
of feeds within the main OPML file. So, instead of writing different scripts, I
need only divide my feeds into groups, and set the appropriate value for that
attribute for each group. I don't remember seeing this kind of individualized
setting in any other reader that I've tried so far.
I have created the following groups in my OPML file.
<outline text="Hourly" delay="60">
<outline text="Daily" delay="1440">
<outline text="Weekly" delay="10080">
<outline text="Monthly" delay="40320">
<outline text="Not Sunday">
I haven't figured out the Not Sunday group yet, but there are a couple options
that together should be able to do this.
I have high hopes that this will result in a leaner, meaner me.
LJ Tags: rss python opml newspipe thinking it through