Subscribe: Comments on: My brother’s (feed’s) keeper
http://weblog.philringnalda.com/2005/02/18/my-brothers-feeds-keeper/feed/
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
character references  character  defined  doesn  dtd  entities  entity  feed  numeric character  pgp signature  pgp  rarr  signature  xml 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Comments on: My brother’s (feed’s) keeper

Comments on My brother’s (feed’s) keeper



a digital magpie



Updated: 2016-10-24T13:44:47Z

 



By: James

-0001-11-30T00:00:00Z

It looks like that bug in Textpattern’s Atom code is still around as of 1.0RC1. I’ll file a ticket.




By: Jacques Distler

-0001-11-30T00:00:00Z

(X)HTML (+MathML) named entities in feeds are exactly the reason I wrote the Numeric Entitities plugin for MovableType and the accompanying MathML::Entities Perl Module.

Evan Nemerson wrote a PHP implementation.

So there’s no real excuse any more for sending HTML entities in an RSS/Atom feed. If your blogging tool doesn’t convert them to numeric character references (or, possibly, utf-8 characters, depending on what encoding it uses for your feed), then your blogging tool is broken.




By: Pete Prodoehl

-0001-11-30T00:00:00Z

Sigh, I know, I have a few feeds in FoF that I never see because they’re invalid… I’ve choosen to ignore the problem, which may not be the best solution. I suppose public shaming is a possible solution.




By: Joe Clark

-0001-11-30T00:00:00Z

I thought the only XML characters you *had* to escape were less-than/greater-than and ampersand?

Anyway, does that headline display incorrectly in any known browser? It doesn’t on Mac.

Anyway2, what kind of validator does one use to determine well-formedness?




By: Mark

-0001-11-30T00:00:00Z

feedvalidator.org will catch this and many other common mistakes.




By: Mark

-0001-11-30T00:00:00Z

Must. Control. Fist. Of. Death.




By: Mark

-0001-11-30T00:00:00Z

In Netscape’s RSS 0.91, the HTML entities were defined in the DTD, and were therefore valid to include verbatim in your feed. Userland’s RSS 0.91 removed the DTD, and therefore broke this very useful feature, and we’ve been DTD-less in syndication land ever since.




By: Phil Ringnalda

-0001-11-30T00:00:00Z

Wups, this version of this post does rather look like I’m trying to shame the authors into doing better. The first version did a much better job of explaining that I was writing in public rather than email only partly because people ignore my email, and mostly because I thought it much more likely that the authors of the software would see it here. For most feed problems I trip over, that’s where the problem really should be stopped. Then when I rewrote the whole thing, after accidently closing the wrong tab, it lost that flavor.




By: Phil Ringnalda

-0001-11-30T00:00:00Z

The only things you have to escape are less- and greater-than in element content, quotes in attributes, and any ampersand that doesn’t signal the start of an entity reference to a defined entity. In HTML, you have a DTD (either explicitly referenced or implied) that defines all the named HTML entities, but in RSS you don’t, so → is undefined, and because XML does a lot more with entity references than HTML, that has to be a fatal error. Any & not followed by a defined entity means it’s time to halt and catch fire (or, for most aggregators, time to switch to the liberal and forgiving non-XML parser or to fix the error and parse again).

If there’s any browser that fails to display it, they should be shamed into oblivion: in HTML, failing to recognize an HTML entity is unacceptable. It’s just in XML where failing to recognize it is a sign of a fatal error in the input, rather than the processor.

And while feedvalidator.org is the place to really check your feed, for problems like that just loading your feed directly in your browser will usually tell you. At least in Gecko, there’s a bug that will keep it from reporting to you that you have a character which isn’t defined in your encoding, so Dunstan would be out of luck, but if you feed them XML browsers are happy to tell you about undefined entities.




By: Jacques Distler

-0001-11-30T00:00:00Z

Escaping → to → in your feed is good. Recoding → to → is better. The latter will actually display as the desired → character.

Numeric character references are always safe. If you’re using utf-8, so is typing → (you, of course, have an easy way to just type ”→”, don’t you? ;-).