Welcome to my first blog post. Hi, I’m Ray (psst! that’s the name of this website).
This is an easy exercise that I occasionally come across, parsing comma separated values (CSV). I’ve seen a lot of code purporting to do this on the Web but a lot of it doesn’t work in practice. This is my take on parsing CSV’s.
Note: This is also a real-life application of stacks (but we don’t care about that right now).
1. Start simple
If you have a CSV string, such as below, it’s simple to split the string by the commas. You get a five item array.
one,two,three,four,five
Sometimes the CSV will have double-quotes surrounding each item. You split the string and trim the quotes. This is the code I see most often, which is also fine.
"one","two","three","four","five"
2. Commas gone wild
The reason for surrounding each item in quotes is so they can contain a comma without breaking the format.
Items don’t need to be wrapped-up if they don’t need to be.
In this case if you split by commas you get an array of seven but there should only be five.
one,"two,three","four,five",six,"seven"
You can see that commas are “special” characters that separate the values.
We’ve added double-quotes as special characters for surrounding values. Commas inside a value is not a special character.
If you have a double-quote inside a value, it will be escaped such that it doesn’t affect the CSV format. This is typically done with a back-slash.
one,"two,three","four,\"five\"",six,"seven"
As you can see, the split and trim doesn’t work any more. We need to find another solution.
3. The disclaimer
I’ve seen some solutions using regular-expressions and others with complicated loop-di-loops.
This is my take on the problem, it’s what is known as, “keep it simple, stupid”. Or I call it, “easy-mode”.
4. The solution
This is when we revisit the stack. The stack is simply a list where the last thing added is the first thing removed. Think of it as a stack of plates. Except in this case we’re going to be adding and removing characters, it’s called pushing and popping.
Okay.
We start off with an empty stack and we’re going to look at each character in the CSV string from left to right. We’ll also want an array for the output items.
Remember that we only care about commas, quotes and slashes.
one,"two,three","four,\"five\"",six,"seven"
We have one item in the array.
"two,three","four,\"five\"",six,"seven"
We have two items in the array.
"four,\"five\"",six,"seven"
We have three items in the array.
six,"seven"
We have four items in the array.
"seven"
5. The end
You should have five items in the array. The method is long-winded but you won’t be doing it by hand.
one
two,three
four,"five"
six
seven
I have deliberately overlooked a few issues to keep the walk-through straight-forward but I have addressed them in the code sample. Of course you will likely have language specific considerations to address as well.
I hope someone finds this post useful. I’ll try to think up something more interesting next time.
6. The code
The sample code differs a little from the walk-through but you can adjust it as you will.
Posted on Wed 27th Apr 2016
Modified on Sat 5th Aug 2023