I’ve been working on a project that includes migrating existing data from Microsoft Excel spreadsheets to a SQL Server 2005 database. The volume of data isn’t terribly large, but it’s large enough that entering it manually into SQL Server would be mind-numbing, not to mention the likelihood of introducing errors. Also, since I’m using the Express version of SQL Server on my development machine, neither the Data Transformation Services tool nor SQL Server Integrated Services are available to handle the job.
Being a programmer, and a lazy one at that, I thought I’d create a quick Windows app to automate the process. I’ve never had much success working with the XML tools built into Excel, so I instead settled on exporting CSV (Comma Separated Values) files that I could then easily parse in the C# code.
The problem is that several of the fields of data are text and, you guessed it, some contain commas. Now, Excel wraps any such fields in quotes during the export, so a (fictitious) row like this:
|1||Foo||Bar||Variable length text that, of course, might have commas||2009-10-26|
1,Foo,Bar,"Variable length text that, of course, might have commas",2009-10-26
in the CSV file. Further obfuscating things, after running
String.Split() on this row and passing in a comma as the split character, the resulting array looks like this:
|3||“Variable length text that|
|5||might have commas”|
There are any number of approaches that could be used to deal with this. I thought briefly about trying to use regular expressions to format each row before splitting it into an array. One issue is that some of the text not only has commas imbedded but also quotes. This would quickly devolve into a mess of lookaheads and lookbehinds and would almost certainly not work on all the data because regular expressions are not particularly adept at matching nested patterns like this.
Then I had the idea to export the spreadsheets using a separator character other than a comma. If I could choose a character that I know doesn’t appear in any of the data, I’d be all set. But after spending several minutes in Excel scouring through the Save As dialog and also Tools > Options, I began to realize that Microsoft apparently didn’t think anyone would ever need a CSV with anything other than a comma separator.
After wrestling for awhile back in the C# code, to no avail of course, I finally did what I should have done at the beginning, I opened up a browser and spent some time with Google searching for a way to change the separator character in Excel. Turns out it’s actually really easy. It’s just not located in Excel, or anywhere else in Microsoft Office. It’s in the Windows Control Panel.
Reprinted from here:
- Click the Windows Start menu.
- Click Control Panel.
- Open the Regional and Language Options dialog box.
- Click the Regional Options Tab.
- Click Customize.
- Type a new separator in the List separator box.
- Click OK twice.
I changed the character to a caret (^), which I knew didn’t occur in the data (regex is useful for some things), and adjusted the C# code accordingly and within a few minutes I had all the data migrated into SQL Server.