Challenges Facing Automatic WordPress Database Updating and Sourcing

The updating of the WordPress database for ensuring up to date data is an important point and rapidly pushes WordPress from the blogging platform to considerations needed for more general content management systems.There are some challenges to consider for updating the databases and for sourcing the data to fill them.

In projects going forward this will be from the database model of WordPress version 3 onwards and if this is going to be done safely then all the things that WordPress normally takes care of when putting the data away need to be taken care of, including the correct storage of taxonomies and their related data.

A simple model for updating a live WordPress system would possibly be to go for two databases, one that contains the actual data that exists online and another to show the updated version. This would allow the use of dumping the data straight onto the database and the luxury of falling back to an earlier one if the update fails. What is really needed to efficiently update a WordPress database though is to consider incremental and transactional models. No one would want to update anything that is actually being changed online at the same time. Of course there is the need not to hold up the live site whilst a download and update takes place either. I wonder too if MySQL can be talked to directly whilst this is happening or is it just a ‘using a hammer’ case of uploading a dump file into it. The first is great if the data is small and incremental, the other useful if there are updates of megabytes of data.

Another challenge that requires more databases is the gathering of data in whatever form it is available. The most obvious is from the web page itself, although RSS feeds, data feeds and other forms may be necessary.  Collecting this information or screen scraping is not something that is favoured upon by the owners of the web site and as such great care must be taken when doing it, no one can certainly be seen hammering a web site with hundreds of page views collecting information. It needs to be done in a natural, efficient and productive way.

How can the fulfilment of data be done? There is of course the method of scraping that can achieve this. The main choices for this are, either traverse the structure of the HTML using possibly the DOM model, or use the powerful features of Grep. At this moment there are the most common methods, but both need a little work on and both suffer from of any changes made to site page, template or source at any time.

Be Sociable, Share!