Skip to content
This repository has been archived by the owner on Jun 28, 2021. It is now read-only.

Autodetect delimiter in the csv/tsv files #269

Open
akash-rajput opened this issue Nov 15, 2019 · 4 comments
Open

Autodetect delimiter in the csv/tsv files #269

akash-rajput opened this issue Nov 15, 2019 · 4 comments

Comments

@akash-rajput
Copy link

Is your feature request related to a problem? Please describe.
If the data source is sending out multiple delimiter type files it should be possible to detect the delimiter automatically.

Describe the solution you'd like
Simple string comparison in the first few lines can give the column count equivalent character & finding the suitable delimiter

Describe alternatives you've considered
N/A
Additional context
N/A

@wdavidw
Copy link
Member

wdavidw commented Nov 15, 2019

So the idea could be that if the existing delimiter option or a new auto_delimiter or a combination of both options (like in the example below) equals an array of character delimiters or true (converted to the most common delimiters), auto-detection is activated and the first character matching the set will define the delimiter for the rest of the data set, right ?

delimiter set to true activate auto detection:

parse("a,b|c\n1,2|3", delimiter: true, function(err, data){
  data.should.eql([
    ["a", "b|c"],
    ["1", "2|3"],
  ])
})

auto_delimiter provide a list of potentially accepted delimiters

parse("a,b|c\n1,2|3", delimiter: true, auto_delimiter: ["|", ","], function(err, data){
  data.should.eql([
    ["a,b", "c"],
    ["1,2", "3"],
  ])
})

Any comments ?

@ajaz-ur-rehman
Copy link
Contributor

What if the delimiter isn't commonly used and is just a random character like ^ ?
Can we somehow detect any delimiter like Google Sheets or Excel?

@wdavidw
Copy link
Member

wdavidw commented Oct 19, 2020

I am personally quite uncomfortable with this issue because it implies to store in memory the first few lines and going backward once we decide on a delimiter. It feels more appropriate to write a dedicated stream transform plugged just before csv-parse to determine what is the delimiter.

@ajaz-ur-rehman
Copy link
Contributor

You are right. That makes more sense.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants