Blogspot allows users of their service to export blogs as an XML file and I thought it would be a nice experience to import the XML data into a MySQL database and display the blog using the feature-free framework (which is starting to get a few features) . The XML file stores everything as an "entry". The Entry may be a blog, an edit of a blog, a comment, or some configuration data. First things first, make a database table to hold just the blog postings. My schema ends up looking like this
CREATE TABLE `blogs` (
`blogs_id` int(11) NOT NULL auto_increment,
`title` char(99) collate latin1_general_ci NOT NULL,
`body` text collate latin1_general_ci NOT NULL,
`date` datetime NOT NULL,
`id` char(100) collate latin1_general_ci NOT NULL,
`hidden` tinyint(4) NOT NULL default '1',
PRIMARY KEY (`blogs_id`),
KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
A bit of PHP code using the SimpleXML
handled parsing the XML data. I say mostly because the exported contains XML elements with colons and dashes in their name, for example <thr:in-reply-to>, which SimpleXML doesn't seem to handle. Oh well, a bit of Regex on the initial XML string took care of the colons and dashes.
Back to the schema; do you see the "id" field? Let me explain. The "id" is a unique string based upon the title of a blog post. Take a blog title, strip the non-alphanumeric characters and replace spaces with _, and you have an id. Why? I don't like the way most blogs create urls for posts and I want to create links based upon a unique representation of a blog entry's title,
It is my opinion that the more directories an item is in, the more focused and unique the content of the item and the rarer it should be in search results. Similarly, I find that a webpage should be related to its parent directories in a way not based on date. For example, an item that would normally be accessed at www.example.com/blogs/2009/03/05/some_article has a directory depth of 4. Using the code I wrote, the article would be displayed as www.example.com/blogs/some_article and would have a directory depth of 1. Something with a depth of 1 is more important although theoretically more vague than something with a depth of 4. Again, this is just my personal opinion.