Parsing Feeds with Clojure and Rome

For me, a reason for considering Clojure was the problem that modern web-standards are getting more and more complicated, so it takes a lot of time to implement and maintain them for example for Common Lisp. And even if I think this is not a good thing, I cant do much against it, so I have to somehow get used to it – either taking more work or use something else. I could use ABCL, or jScheme, but Clojure is – as I already wrote – a very interesting Lisp-Dialect, and I see no reason why not using it.

Today, I have tried to parse feeds with Clojure. There are a lot of Feedparser-Libraries for Java out there, but surprisingly, most of them seem to have disadvantages or are not maintained anymore. I decided to use rome. There is a good QuickStart for it, which already tells most, I want to do so far (just for testing purposes). There is also a JavaDoc for it.

Ok, I first start clojure. I have already set the $CLASSPATH-Variable to an appropriate value. And  I use rlwrap. Notice that I will break some lines to fit them into the wordpress-layout:

$ rlwrap java -cp $CLASSPATH clojure.lang.Repl
Clojure
user=>

Then I get all necessary requirements:

user=> (import '(com.sun.syndication.feed.synd SyndFeed SyndContentImpl SyndEntryImpl))
nil
user=> (import '(com.sun.syndication.io SyndFeedInput XmlReader))
nil
user=> (import '(java.net URL))
nil
user=>

Then I download the feed:

user=> (def input (new SyndFeedInput))
#'user/input
user=> (def feed (. input build (new XmlReader
 (new URL "http://matthias.benkard.de/journal/feed/"))))
#'user/feed
user=>

I hope Matthias wont be peeved that I raise his costs ;-)

So, now that we have fetched the feed, we can go on getting the first entry and its contents:

user=> (def entries (. feed getEntries))
#'user/entries
user=> (def entry0 (. entries get 0))
#'user/entry0
user=> (def content (. entry0 getContents))
#'user/content
user=>(def content0 (. content get 0))
#'user/content0
user=>

So far so good. Now lets see what the content contains:

user=> (. content0 getValue)
"rn            <div xmlns="http://www.w3.org/1999/xhtml">rn
                rn<p><a href="http://freitag.de/">Der Freitag</a>
 ist laut dem <a href="http://www.spiegelfechter.com/wordpress/475/%e
2%80%9eder-freitag%e2%80%9c-auferstanden-aus-ruinen">Spiegelfechter</a>
 die „letzte ‚links-intellektuelle‘ Wochenzeitung“.  Ist er ein
 Printmedium, das es sich zu abonnieren lohnt?  Vielleicht gar das
 einzige? rn</p>rn            </div>rn        "
user=>

This looks good. Seems to be the content of http://matthias.benkard.de/journal/63.

user=> (. feed getDescription )
"n        Geschwafel eines libertärsozialistischen Geeksn    "

Leet.

Eine Antwort zu Parsing Feeds with Clojure and Rome

  1. This really does look surprisingly good. I like the simple API. (It’s a Java library, after all. The API could have been much more baroque.)

Schreibe einen Kommentar

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s

%d Bloggern gefällt das: