Will "the Mighty" Strohl

Fixing the Core Blog Module XML Feed Output

I had built a module a while ago called the WillStrohl.FeedHandler.  It was specifically built for what I am about to talk about, but it can be used to make adjustments to nearly any XML or RSS feed.  Basically, it consumes an existing XML feed, and allows you to change whatever you want in the feed, while supplying you with a new XML feed URL to pass on to your website visitors and other XML feed consumers.  I mainly only use this module with the XML feed from DotNetNuke® sites, and most often with the Blog Module.

The XML output in the RSS engine in DNN has some known issues.  I won’t go into them all, as they are obvious when you try to consume the feed.  Here are a few things that I fix in my blog’s XML before allowing you to consume it to your RSS readers.

Standalone Ampersands

First, I have found that the XML that is rendered will occasionally be littered with standalone ampersands.  I have not researched why this happens.  I just know that it does.  Why are these a problem?  Well, they can invalidate the XML.  Many readers and other consuming services may have a problem reading the XML with this problem in place.  Also, the rendering of the XML might have a problem if you get passed the reading issue.  Using the WillStrohl.FeedHandler module, I change the output with a little regular expression magic.

RegEx Replacement
(\s+)&(\s+) $1&$2

The preceding regular expression looks for any ampersand that has one or more whitespaces before and after the ampersand.  It then replaces it with the HTML equivalent of the ampersand, adding the whitespace back the way it was.  The reason I do that is before the \s switch will match line breaks as well as regular spaces.  In order to not cause any weird problems, it is necessary to put the whitespace back the way that we found it.

Fix Image Paths

One thing that the FCK editor in DNN does is put relative image paths into the HTML markup that is saved as your blog post instead of the full path.  This of course makes sense to a developer integrating the module, as the HTML is supposed to be only rendered on the website.  This changes when RSS is enabled and used, but the typical developer doesn’t account for that, and why would they?  In most cases, they should not worry about that.  However, as a result, the device or service that consumes your RSS feed will show broken images. 

Here is the regular expression that I use to fix this using the same module.

RegEx Replacement*
\ssrc="/Portals/

src="http://www.willstrohl.com/Portals/

* Please note that the replacement above has a preceding space in it.

The above regular expression looks for any instance of the src attribute that doesn’t have the domain name in it.  It then replaces the found text with the domain added.  The result is that when your feed is consumed, the consuming service or device now truly knows exactly where the images are found.

Remove Extra Domains

This is a problem that I believe is limited to me and my use of PageBlaster.  However, just in case you have the same problem, I am going to talk about it anyway.  Through the ways that I am using PageBlaster on this site, my XML feed URLs end up with the domain being inserted into the XML feed twice.  Using the following regular expression, I remove the extra domain.

RegEx Replacement
http://www\.willstrohl\.comhttp://www\.willstrohl\.com http://www.willstrohl.com
... OR ...
(http://www\.willstrohl\.com){2} http://www.willstrohl.com

That’s it!  Everything else seems to work just fine.  I just pass my new feed URL that the WillStrohl.FeedHandler module provided to FeedBurner, and then allow people to consume my RSS feeds that result from FeedBurner.



blog comments powered by Disqus