Simple XML is Simpler!

Genius!

Sucks that it isn't for PHP 4. For this site, at the moment, I'm forced to use PHP4, since PHP5 breaks permalinks for WordPress for some reason. It's not Wordpress's fault, though. Just the crappiness that is this shared hosting.

Anyways, I've used SimpleXML for most of XML parsing in PHP. Mainly for the reason that I'm lazy. This time, I wanted the site to run on here under PHP4, so I didn't have that success. But PHP's XML Parser shouldn't be too bad, right?

Here's how to go through all the <item> tags in an RSS feed via SimpleXML and placing them into an associative array:

$sxml = new SimpleXMLElement($xml);
 
foreach($sxml->channel->item as $item) {
	$items[$i]['title'] = $item->$item_title;
	$items[$i]['link'] = $item->$item_link;
	$items[$i]['desc'] = $item->$item_desc;
	$items[$i]['date'] = $item->$item_date;
}

Now in XML Parser:

$parser = xml_parser_create('UTF-8');
xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
xml_parse_into_struct($parser, $xml, $vals, $index);
xml_parser_free($parser);
 
$feed = array();
$items = array();
 
$length = ($item_title == $title) ? count($index[$item_title]) - 1 : count($index[$item_title]);
 
$titleoffset = ($item_title == $title) ? 1 : 0;
$linkoffset = ($item_link == $link) ? 1 : 0;
$descoffset = ($item_desc == $desc) ? 1 : 0;
$dateoffset = ($item_date == $date) ? 1 : 0;
 
for($i=0;$i<$length;$i++) {
	$items[$i]['title'] = $vals[$index[$item_title][$i+$titleoffset]]['value'];
	$items[$i]['link'] = $vals[$index[$item_link][$i+$linkoffset]]['value'];
	$items[$i]['desc'] = $vals[$index[$item_desc][$i+$descoffset]]['value'];
	$items[$i]['date'] = $vals[$index[$item_date][$i+$dateoffset]]['value'];
}

Only about 10 more lines and ridiculous looking arrays.

This also doesn't show that you have to run strtoupper($tagname) on all of the tags you want.

One thing that SimpleXML couldn't do is parse elements with a colon in them. Like <dc:creator> or <content:encoded>. A simple fix was just to run str_replace("<dc:creator>","<creator>",$xml)

Uggh. At least the annoying part is finally out of the way, I hope.

Update- Haha. That doesn't even work. It's nowhere near the correct order. But I totally forgot about MagpieRSS. I'm using that instead.

4 Comments

  1. Ryan Parman says:

    Since you're talking about parsing RSS, I thought I might throw SimplePie out there. If you do decide to take a look, I'd recommend checking out the latest trunk version from SVN, as we're oh-so-close to releasing 1.0.

    http://simplepie.org

  2. ah_skeet says:

    Great. Thanks. Just grabbed the trunk.

    Just looking at the demo code, it looks really nice.

    I like it.

  3. This does work, with a little more code:
    str_replace("","",$xml)

    $xml=str_replace("","",$xml);
    $xml=str_replace("","",$xml);

    Or, atleast it does for me, when working with the iTunes music feed. Thanks for the idea, and reminded me that it is just plain text prior to being loaded into simpleXML.

    Thanks,
    Brad
    http://www.iPhods.com

  4. [...] an idea for parsing out those <dc:creator> or <dc:date> using PHP and [...]

Leave a Reply