Looking for a feed parser

1 07 2005

One of the planned features for The Open Source Zone is an RSS aggregator that could be used to fetch and aggregate news channels from project websites, blogs, Freshmeat announcements and the like.

Obviously, I want to reuse the best Open Source foundations available for accomplishing this task. Looking around for Java-based solutions I found Rome and the Jakarta FeedParser.

The latter seem somewhat more mature and it already includes “an advanced networking layer which meets the requirements necessary for providing XML aggregations services over HTTP. This includes support for If-None-Match (ETags), If-Modified-Since (HTTP 304 Not Modified), gzip content encoding (compression), User Agent modification, non-infinite timeouts, event callbacks for download progress, support for setting HTTP Referrer headers, maximum content downloads (no files larger than N bytes), ability to use custom HTTP methods (HEAD, GET, PUT, POST) etc.”

It also supports autodiscovery and apparently it is being used by Rojo, so it’s not vaporware.

On the other hand, a suitable networking layer is available for Rome as a subproject. Moreover, there is at least one implementation of a persistence mechanism for feeds (Aqueduct-Prevayler) while there doesn’t seem to be one for FeedParser.

Everything considered, I’d be inclined to start experimenting with FeedParser, unless you, my dear readers, have some suggestions to make. In which case, please leave a comment.

Update: my first brush with FeedParser didn’t exactly inspire much confidence in me, as there is no downloadable distribution, but you have to use Subversion and the SVN URL on the website is wrong (hint: the correct one seems to be http://svn.apache.org/repos/asf/jakarta/commons/proper/feedparser/trunk/). Then no build instructions are provided. Looks like it uses Maven :(. Luckily, a plain Ant build file is provided and I managed to build a JAR file.




5 responses

1 07 2005
Nick Lothian

(I’m the author of the Feed Fetcher networking module, but I’ve also contributed minor things to FeedParser). I’m sure Kevin Buton will read this, but anyway…Go with ROME. The distribution is more mature, it has more contributors, it has better docs, the dependencies are less (and are defined) and it as fast or faster in most circumstances (See http://www.mackmo.com/syndbench/index.html).

FeedParser does have an OPML parser and autodetection of feeds, which ROME doesn’t have.

The ROME feed fetcher has an extensive set of features, too. See http://wiki.java.net/bin/view/Javawsxml/RomeFetcher

If it tips the balance, ROME has an ant build maintained by “Mr Bile Blog” Hani himself…

2 07 2005
Charles Miller

We recently switched javablogs.com over to use Rome, and while we only use the parser part of it (we have our own fetcher and persistence layer), it’s worked very well for us so far.

31 07 2005
George Papayiannis

I’m in a similair position as your, trying to decide between ROME and FeedParser.

Out of curiousity, which did you end up choosing?

2 08 2005

Nothing at the moment. I’ve put the project on the backburner for a while, but I’m about to tackle it once more and I plan on testing Rome.

5 03 2008

we use rome (parser only) for 1.000.000 feeds/day and are quite happy with it

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: