PHP RSS Parsers

| | Comments (0) | TrackBacks (0)
So, the next thing to build in my site is an RSS parser. I'm uploading video files to a hosting provider, who generates their own unique ID for each video. However, when users browse my site they want to play these videos, which means that I need to know the hosting provider's ID for the video the user wants to watch.

My video host provides an RSS feed of my uploaded content, which includes their IDs. So I need a scheduled task which will poll this feed and parse it so the IDs can be written back to my database.

The first try was using Magpie RSS, which took about 10 minutes to download, install, and get working.

require_once 'rss_fetch.inc';
$url = 'http://myprovider.com/feed.rss';
$rss = fetch_rss($url);
$rss is now a structured array containing all the data from the provider's RSS feed, and can be accessed like any other array:
print "Channel: ".$rss->channel['title'];

The big drawback is that the array only contains the data from the XML elements in the feed - it doesn't include the attributes. My provider's feed conforms to Yahoo's Media RSS specification, which includes a <media:thumbnail> element where all the thumbnail information is held in attributes:
<media:thumbnail url="http://www.foo.com/keyframe.jpg" width="75" height="50" time="12:05:01.123" />

So the next candidate was Simple Pie. This proved to be much more flexible and capable, but a bit more complex to get up and running, but got there in the end:
require_once('simplepie.inc');
$rss = new SimplePie();
$rss->set_feed_url('http://myprovider.com/feed.rss');
$rss->enable_cache(false);
$rss->init();

foreach ($rss->get_items() as $item) {
  $user_data = $item->get_item_tags('http://myprovider.com/WebService/userdata/0.1/', 'id');
  $video_id = $user_data[0]['data'];

  $thumbnail = $item->get_item_tags('http://search.yahoo.com/mrss/', 'thumbnail');
  $thumbnail_url = $thumbnail[0]['attribs']['']['url'];
}

Drupal has a very powerful module called Tokens, which makes various pieces of data available in other modules. As an example, the Pathauto module will create friendly URLs for any piece of content based on some feature of that content, using token substitution. By default, a new piece of content will be available at the URL http://www.mysite.com/node/5 (assuming you have enabled clean URLs). However, you can create a pathauto rule which creates an alias to make that content available at http://www.mysite.com/reviews/run_fatboy_run - much more SEO-friendly.

This pathauto rule looks like this:
reviews/[title-raw]

So far, so good. However, what happens when you want to use some feature of the content other than the title? What other data is available?

It turns out this is pretty easy, once you know where to look. In the modules/token directory is an include file for each module which exposes tokens. I wanted to use the user name as a token, so checked in token_user.inc, and found
$values['user-raw'] = $account->uid ? $account->name : variable_get('anonymous', 'Anonymous');


so I've now created a pathauto rule on user account page paths of
users/[user-raw]
so every user gets a nice URL to their account page, like http://www.mysite.com/users/bob

Incidentally, the token_user.inc file also includes a bunch of help text in the overridden hook_token_list function, so it looks as though there is support for displaying this information somewhere in the Drupal admin interface. I can't find it anywhere though, and can't find any documentation around it (other than in the API.txt file to say "This function is used to provide help and inline documentation for all of the possible replacement tokens.", but no clue as to where this help and inline documentation may be). All answers on a postcard please...

I've now got a bunch of structured data in a relational database, and need to get it into Drupal. I've created my content types using CCK. Fortunately, there's a great overview here which highlights some of the core Drupal tables I need to populate which aren't immediately apparent.

Everything is pretty straightforward, apart from populating deltas for repeating fields in CCK.

An example will help here.

Suppose I have a content type of person, who can have multiple phone numbers. CCK lets me add a field of type text, which is defined as a repeating field. To the end user, this looks as though you can simply enter as many phone numbers as you like for this person.

In the database, CCK creates a table called content_type_person, which holds simple fields (i.e. fields where there can only be one value - things like first name, surname). It also creates a separate table for repeating fields, so phone numbers will be stored in content_field_contact_telephone. This new table is keyed on vid and nid (i.e. the current version of the node this person belongs to), and delta. Delta is a zero-based autoincrementing field, which increments for each new value for this nid.

Again, an example will help.

I create a new person, which is internally given nid 1 and vid 1, and added to content-type-person. If I add three phone numbers, these are added to content_field_telephone with nid and vid of 1, and deltas of 0, 1 and 2. A second person with nid 2 would get deltas 0, 1 and 2 again:

nid     vid     delta     phone
11001234 567890
11101234 567891
11201234 567892
22001234 567893
22101234 567894
22201234 567895


I've got hundreds of people, and several phone numbers for each. I can create the content node OK and get the right nid and vid, but how do I then insert all these into the database, and get nice zero-based incrementing delta values?

Drupal Tutorials

| | Comments (0) | TrackBacks (0)
I'm starting to get the hang of Drupal now, largely thanks to Robert Safuto's video tutorials (especially the one about CCK and Views) - nice one Robert!
A few weeks ago, I decided that the best foundation for the site I'm working on would be Drupal. After some discussions around the scale of the site and the likely complexity of our customisations, this seemed to be the natural choice. One of the key factors in this decision was the perceived strength of Drupal's architecture. As a fairly seasoned developer, I want an architecture which enables me to customise existing functionality and develop new functionality in a consistent and predictable way, with consistent and predictable results.

My last project used OSCommerce as a foundation, and while I've been able to build a good (if fairly simple) e-commerce site with it, the architecture is somewhat inconsistent. Different parts of the system have varying coding styles and variable naming conventions (which isn't surprising, given it's open source roots), meaning that I have to adopt different coding styles and mindsets when debugging or enhancing different parts. Not insurmountable, but not a good foundation for great endeavours.

The other key reasons for choosing Drupal were
  • product maturity
  • rich third party module library
  • very active developer community, including conferences
  • large-scale implementations (like MTV UK)
  • open source (i.e. debuggable, extensible, customisable and free!)
So today I've been getting into the nuts and bolts of it. In our sprint planning, we've included some basic user profile functionality, as I thought this would be a good place to introduce myself to the Drupal architecture. Wrong, wrong, and wrong.

One of our requirements is to support multiple user types, each of which will have different attributes. A good analogy (although not an actual example) would be to have
  • car drivers (who might want to add their car make and model to their profile)
  • mechanics (who might want to add specialisms such as engine tuning to their profile)
The core Drupal profile module doesn't support this, but a quick run through the modules list reveals Advanced Profile, which does. However
  • Advanced Profile requires Panels, CCK and Node Profile
  • Node Profile requires Node Family and Subforms
Although it's all pretty easy to install and configure, I'm now trying to get my head around what each of these modules brings to the party, and I suspect this is making my supposed "gentle introduction" more difficult than it really needs to be. Still, I'm sure it will start to make sense tomorrow.