2009-03-03
Blogspot allows users of their service to export blogs as an XML file and I thought it would be a nice experience to import the XML data into a MySQL database and display the blog using the feature-free framework (which is starting to get a few features) . The XML file stores everything as an "entry". The Entry may be a blog, an edit of a blog, a comment, or some configuration data. First things first, make a database table to hold just the blog postings. My schema ends up looking like this
CREATE TABLE `blogs` ( `blogs_id` int(11) NOT NULL auto_increment, `title` char(99) collate latin1_general_ci NOT NULL, `body` text collate latin1_general_ci NOT NULL, `date` datetime NOT NULL, `id` char(100) collate latin1_general_ci NOT NULL, `hidden` tinyint(4) NOT NULL default '1', PRIMARY KEY (`blogs_id`), KEY `id` (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;

A bit of PHP code using the SimpleXML functions mostly handled parsing the XML data. I say mostly because the exported contains XML elements with colons and dashes in their name, for example <thr:in-reply-to>, which SimpleXML doesn't seem to handle. Oh well, a bit of Regex on the initial XML string took care of the colons and dashes.

Back to the schema; do you see the "id" field? Let me explain. The "id" is a unique string based upon the title of a blog post. Take a blog title, strip the non-alphanumeric characters and replace spaces with _, and you have an id. Why? I don't like the way most blogs create urls for posts and I want to create links based upon a unique representation of a blog entry's title,

It is my opinion that the more directories an item is in, the more focused and unique the content of the item and the rarer it should be in search results. Similarly, I find that a webpage should be related to its parent directories in a way not based on date. For example, an item that would normally be accessed at www.example.com/blogs/2009/03/05/some_article has a directory depth of 4. Using the code I wrote, the article would be displayed as www.example.com/blogs/some_article and would have a directory depth of 1. Something with a depth of 1 is more important although theoretically more vague than something with a depth of 4. Again, this is just my personal opinion.
Comments
Name:
not required
Email:
not required (will not be displayed)
Website:
not required (will link your name to your site)
Comment:
required
Please do not post HTML code or bbcode unless you want it to show up as code in your post. (or if you are a blog spammer, in which case, you probably aren't reading this anyway).
Prove you are human by solving a math problem! I'm sorry, but due to an increase of blog spam, I've had to implement a CAPTCHA.
Problem:
8 minus 8
Answer:
required
subscribe
 
2019
2016
2015
2014
2013
2012
2011
2010
December
November
October
September
August
July
June
May
April
March
February
January
2009
December
November
October
September
August
July
June
May
April
March
February
January
2008