Skip lines future (Feature Suggestion) #738 #1021

bhuvaneshwararaja · 2023-09-24T15:41:29Z

Issues No : #738

Description:

In this feature enhancement, Added the capability to skip a defined number of lines when converting a CSV (Comma-Separated Values) file to JSON format. This functionality offers users more flexibility when dealing with CSV files with header information or metadata at the beginning, allowing for a more efficient and precise conversion process.

Changes Made:

Introduced a new configuration parameter, skipFirstNLines, which allows users to specify the number of lines to skip at the beginning of the CSV file during the conversion to JSON. This parameter supports values from 0 and above.
Implemented validation for the skipFirstNLines configuration parameter to ensure that it does not negatively impact the parsing process, even if it is set to 0 or given a negative value.
Enhanced the user interface of the player.html file by adding an input field that allows users to input the desired value for skipFirstNLines.
Updated the player.js script to retrieve and utilize the skipFirstNLines input from the user and incorporate it into the parser's configuration.

Test cases:

Included test cases to validate the functionality of the skipFirstNLines feature, ensuring that it behaves correctly under various scenarios.

Test 1: "Skip First N number of lines , with header",

Test 2: "Skip First N number of lines , without header",

Test 3: "Skip First N lines with value as 0 with header false",

Test 4: "Skip First N lines with value as 0 with header true",

Test 5: "Skip First N lines with value less then 0, with header true",

Test 6: "Without Skip First N Lines config",

pokoli · 2023-09-25T08:23:51Z

What you are proposing is already implemented using the preview config option.

bhuvaneshwararaja · 2023-09-25T08:48:47Z

Hi @pokoli ,
I have gone through the documentation (https://www.papaparse.com/docs) for the preview options
and got to know that will help to parse the expected number of rows in csv
example: If we have 20 rows in csv, If the preview value is 10 . Out of 20 only 10 rows will be parsed .

But this PR will add a config to skip the N number of rows in the beginning of the csv.
Example: If we have 20 rows in csv , If skipNLines value is 10 . Out of 20 , first 10 rows will be skipped .

Please check it once again , it is relevant to the issue #738 as well.

pokoli · 2023-09-25T09:18:44Z

Ok, then we need to document this new flag to explain which is the ussage.

I think the flag will be better named as skipFirstLines.

Could you have a look at the test suite? it seems you introduced some failing tests.

bhuvaneshwararaja · 2023-09-25T10:13:24Z

Hi @pokoli , Thanks
Sure i will check those test suite , I will update the PR with doc

bhuvaneshwararaja · 2023-09-25T17:29:55Z

Hi @pokoli , I have updated the PR

papaparse.js

tests/test-cases.js

papaparse.js

janisdd · 2023-10-10T06:49:57Z

Just a small note: This does not work for quoted fields...

1,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C
2,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C
3,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C
4,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C
5,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C

5 lines with 12 columns

Papa.parse('1,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C\n2,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C\n3,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C\n4,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C\n5,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C', 
{skipFirstNLines: 5})

skips the "real" first line (5 because \n after each line).

pokoli · 2023-10-10T07:03:38Z

@janisdd It will be great if you can fill an issue/merge request to make this work with quotes also.

janisdd · 2023-10-10T20:18:19Z

Well, no... I don't think this feature should be added to the library. Here are my thoughts:

it is not clear how it interacts with other settings (e.g. header, skipEmptyLines, to be fair, the interaction of other options is not good documented either)
can be solved by 1 line of code (see issue) and also works for quoted fields (via parsing everything)
even if implemented to work with quoted fields, one has to parse the first N lines and then throw them away
- this is because we only know it's a quoted field when fully parsed
- so, skipping is essentially just parsing + discarding -> can be done by the user (in 1 line)

However, here are some notes if someone wants to implement it anyway:

have a look at how the setting preview is implemented
... essentially just not add the row in

PapaParse/papaparse.js

Line 1742 in e2d570e

function pushRow(row)

as long as some skip counter is too small

I'm not sure if this would also work for streaming/chunking (or whatever, I just use the normal parsing). But as far as i know there is only one core logic/function for the actual parsing... which means this should also work for streaming

jkruke · 2024-03-06T05:47:28Z

I think this feature hasn't been implemented as wanted in the referenced issue #738 !
The example CSV was:

This is a data file generated by some old software.
Next line will contain a headers of parameters.
Temperature, Humidity, Voltage
22.5, 45.5, 220
23.0, 44.0, 219

Expected output should be (to be fair, it wasn't precisely specified):

Temperature, Humidity, Voltage
22.5, 45.5, 220
23.0, 44.0, 219

But the actual output with skipFirstNLines=2 is (according to the test cases):

This is a data file generated by some old software.
23.0

Analogue to the following test case in the code:
https://github.com/mholt/PapaParse/pull/1021/files#diff-e0ce8cb4901057c1880bee545909a64f38c7383b4d41982d6f2db9a8ec81eac7R1588

{
		description: "Skip First N number of lines , with header and 3 rows",
		input: 'a,b,c,d\n1,2,3,4\n4,5,6,7',
		config: { header: true, skipFirstNLines: 1 },
		expected: {
			data: [{a: '4', b: '5', c: '6', d: '7'}],
			errors: []
		}
	}

This test case is not realistic because it does not reflect a CSV with some preamble lines to be skipped.
should be rather: input: 'to-be-ignored\na,b,c,d\n1,2,3,4', and expected.data: [{a: '1', b: '2', c: '3', d: '4'}].

If someone wanted to skip rows of the record set, it should not be done during the parsing phase, but later when working with the data sets by doing some simple postprocessing such as data = data.slice(1).

@bhuvaneshwararaja, @pokoli could you please take a look at it and give me some feedback about my thoughts? :)

pokoli closed this Sep 25, 2023

pokoli reopened this Sep 25, 2023

bhuvaneshwararaja force-pushed the master branch from 6c63dec to b133cfb Compare September 25, 2023 17:27

pokoli requested changes Sep 26, 2023

View reviewed changes

papaparse.js Outdated Show resolved Hide resolved

tests/test-cases.js Outdated Show resolved Hide resolved

tests/test-cases.js Outdated Show resolved Hide resolved

papaparse.js Outdated Show resolved Hide resolved

papaparse.js Show resolved Hide resolved

bhuvaneshwararaja force-pushed the master branch from b133cfb to 28dd2de Compare September 26, 2023 14:05

Skip lines future (Feature Suggestion) mholt#738

9ee3a20

bhuvaneshwararaja force-pushed the master branch from 28dd2de to 9ee3a20 Compare September 26, 2023 15:39

bhuvaneshwararaja requested a review from pokoli October 7, 2023 05:59

pokoli approved these changes Oct 9, 2023

View reviewed changes

pokoli merged commit e2d570e into mholt:master Oct 9, 2023
3 checks passed

pokoli mentioned this pull request Oct 9, 2023

Skip lines future (Feature Suggestion) #738

Closed

jkruke mentioned this pull request Mar 7, 2024

skipFirstNLines should not take the first line as header #1045

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip lines future (Feature Suggestion) #738 #1021

Skip lines future (Feature Suggestion) #738 #1021

bhuvaneshwararaja commented Sep 24, 2023 •

edited

pokoli commented Sep 25, 2023

bhuvaneshwararaja commented Sep 25, 2023

pokoli commented Sep 25, 2023

bhuvaneshwararaja commented Sep 25, 2023

bhuvaneshwararaja commented Sep 25, 2023

janisdd commented Oct 10, 2023

pokoli commented Oct 10, 2023

janisdd commented Oct 10, 2023

jkruke commented Mar 6, 2024 •

edited

Skip lines future (Feature Suggestion) #738 #1021

Skip lines future (Feature Suggestion) #738 #1021

Conversation

bhuvaneshwararaja commented Sep 24, 2023 • edited

Issues No : #738

Description:

Changes Made:

Test cases:

pokoli commented Sep 25, 2023

bhuvaneshwararaja commented Sep 25, 2023

pokoli commented Sep 25, 2023

bhuvaneshwararaja commented Sep 25, 2023

bhuvaneshwararaja commented Sep 25, 2023

janisdd commented Oct 10, 2023

pokoli commented Oct 10, 2023

janisdd commented Oct 10, 2023

jkruke commented Mar 6, 2024 • edited

bhuvaneshwararaja commented Sep 24, 2023 •

edited

jkruke commented Mar 6, 2024 •

edited