Autodetect delimiter in the csv/tsv files #269

akash-rajput · 2019-11-15T06:47:52Z

Is your feature request related to a problem? Please describe.
If the data source is sending out multiple delimiter type files it should be possible to detect the delimiter automatically.

Describe the solution you'd like
Simple string comparison in the first few lines can give the column count equivalent character & finding the suitable delimiter

Describe alternatives you've considered
N/A
Additional context
N/A

wdavidw · 2019-11-15T08:37:35Z

So the idea could be that if the existing delimiter option or a new auto_delimiter or a combination of both options (like in the example below) equals an array of character delimiters or true (converted to the most common delimiters), auto-detection is activated and the first character matching the set will define the delimiter for the rest of the data set, right ?

delimiter set to true activate auto detection:

parse("a,b|c\n1,2|3", delimiter: true, function(err, data){
  data.should.eql([
    ["a", "b|c"],
    ["1", "2|3"],
  ])
})

auto_delimiter provide a list of potentially accepted delimiters

parse("a,b|c\n1,2|3", delimiter: true, auto_delimiter: ["|", ","], function(err, data){
  data.should.eql([
    ["a,b", "c"],
    ["1,2", "3"],
  ])
})

Any comments ?

ajaz-ur-rehman · 2020-10-18T15:09:47Z

What if the delimiter isn't commonly used and is just a random character like ^ ?
Can we somehow detect any delimiter like Google Sheets or Excel?

wdavidw · 2020-10-19T06:54:23Z

I am personally quite uncomfortable with this issue because it implies to store in memory the first few lines and going backward once we decide on a delimiter. It feels more appropriate to write a dedicated stream transform plugged just before csv-parse to determine what is the delimiter.

ajaz-ur-rehman · 2020-10-19T07:26:28Z

You are right. That makes more sense.

akash-rajput added the enhancement label Nov 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autodetect delimiter in the csv/tsv files #269

Autodetect delimiter in the csv/tsv files #269

akash-rajput commented Nov 15, 2019

wdavidw commented Nov 15, 2019

ajaz-ur-rehman commented Oct 18, 2020

wdavidw commented Oct 19, 2020

ajaz-ur-rehman commented Oct 19, 2020

Autodetect delimiter in the csv/tsv files #269

Autodetect delimiter in the csv/tsv files #269

Comments

akash-rajput commented Nov 15, 2019

wdavidw commented Nov 15, 2019

ajaz-ur-rehman commented Oct 18, 2020

wdavidw commented Oct 19, 2020

ajaz-ur-rehman commented Oct 19, 2020