When you are working with data from Databases and Spreadsheets you really shouldn’t overlook the CSV format. Although it is an older container type the CSV format can provide the ability to capture and manage data between many different sources.
I have been working with a number of Government datasets in the past year and many are provided in XML which is nice and others are provided in CSV format which is also good but when it comes time to apply theses datasets to final content there is a lot of work that must be done to get things right.
I have found that by making use of the CSV format I am able to perform tasks that would be difficult or impossible when working in other formats but there are limitations and cautions you must take when preparing data for use in your final projects.
When working with or building CSV data I suggest that you have at least the following tools.
Spread Sheet software Open Office is fine
A word processor that can handle extended find and replace including multi-line replacements. I am using Komodo Edit at this time.
A MySql import export tool that can handle a variety of formats including XML and CSV.
Getting Started
The first caution is file length and field size. If you are presented with an unknown dataset that is under 250k in size then open it up in notepad or wordpad and take a look if not use an ap that can open a larger file as txt.
It is important to look at it as text first to understand what you have. If you have a HUGE file then find a tool to trim it.
The first line if you are lucky should have the layout of the data with column titles if not then you are flying blind.
Next you want to take a quick look (with word wrap set to off) and see if you have any large text fields. If you are importing the data right into MySql then the size of the field is not as important because you can set it to largetext but if you are importing into a spreadsheet you risk truncating your data.
Manipulating Your Data
I work with a lot of data that doesn’t quite fit into a standard webpage but I know what I begin with will eventually endup being imported into WordPress to generate Posts.
When I import these datasets I need to format a Post Title, Category, Tags, Postdates and a content block along with using the remaining data as Custom Fields to be displayed in the general area of the loop.
To format all of this information I need to duplicate columns for the whole database and then order, merge and format them to get the results I need.
For instance one project I had included information about a specific recreation site at a geographic location. To format the title I needed to copy the
site name|city|state
Then merge these fields to come up with something like
Big Mountain Park – Culver City, Nevada
(not a real location but you get the idea)
To do so I selected those columns in Calc and copied them in order next to each other but also placed a place holder column between the fields such as this.
site name|del1|city|del2|state
I then exported to a CSV Text file and did a search / replace in Komodo Edit of
|del1| … replace with –
and
|del2| … replace with , space
When I brought it back into Calc I had my title generated for all of the rows in the dataset.
So this is pretty simple editing but by using a few different applications I was able to generate everything I needed.
You can get more creative and do things like a sample content area by using some of the fields in your dataset and adding generic content.
Come and visit|del1|Title|del2|it is a great place to bring the family blah blah blah.
Final Thoughts
When working with any datasets remember to take advantage of all the options you have to make your work easier. In the example above it may be easy enough to do everything in PHP and generate a new dataset with all the features you need for your final content but you can also do it with basic tools and just a few copy and pastes.
The idea is be creative and make use of what you have.