Skip to content
This repository has been archived by the owner on Jun 28, 2021. It is now read-only.

Allow unescaped commas in "last" field #193

Closed
ryanwhite04 opened this issue May 17, 2018 · 6 comments
Closed

Allow unescaped commas in "last" field #193

ryanwhite04 opened this issue May 17, 2018 · 6 comments

Comments

@ryanwhite04
Copy link

Some formats which claim to be partially compatible with csv formats like ASS assume that it's ok to have unescaped commas in the last field because the number of fields was registered when the header line was parsed.

You can see this in the ASS specification

The format line specifies how SSA will interpret all following Event lines. The field names must be spelled correctly, and are as follows:
Marked, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
The last field will always be the Text field, so that it can contain commas.

and here

The information fields in each line are separated by a commas.
This makes it illegal to use commas in character names and style names (SSA prevents you putting commas in these). It also makes it quite easy to load chunks of an SSA script into a spreadsheet as a CSV file, and chop out columns of information you need for another subtitling program.

To be able to parse files like this, assuming you've already separated the data into "Chunks", it would be useful to have a flag which assumes any comma in the last field according to the number of fields calculated from the header row are escaped.

I couldn't find any options to do this in the documentation, but if I set the "relax_column_count" to true, it almost adds on post unescaped final field content as extra "default" columns, but isn't ideal, and seems to remove some of the text.

1, 2, 3, 4, 5
a, b, c, d, foo bar
a, b, c, d, Lorem Ipsum, dolores umbridge, something latin
a, b, c, d, upcoming unescaped commas!, one, two, three, oh no!

parsed with

{
  columns: true,
  ltrim: true,
  relax_column_count: true,
}

returns

[ 
  { '1': 'a', '2': 'b', '3': 'c', '4': 'd', '5': 'foo bar' },
  { '1': 'a', '2': 'b', '3': 'c', '4': 'd', '5': 'Lorem Ipsum', undefined: 'something latin' },
  { '1': 'a', '2': 'b', '3': 'c', '4': 'd', '5': 'upcoming unescaped commas!', undefined: 'oh no!' }
]

Ideally parsing the same csv with

{
  columns: true,
  ltrim: true,
  ignore_final_field_commas: true, // But obviously a better name...
}

would return

[ 
  { '1': 'a', '2': 'b', '3': 'c', '4': 'd', '5': 'foo bar' },
  { '1': 'a', '2': 'b', '3': 'c', '4': 'd', '5': 'Lorem Ipsum, dolores umbridge, something latin' },
  { '1': 'a', '2': 'b', '3': 'c', '4': 'd', '5': 'upcoming unescaped commas!, one, two, three, oh no!' }
]

just to make it work with frustratingly close attempts at csv formats.

@wdavidw
Copy link
Member

wdavidw commented May 18, 2018

This is very similar to the last issue #192 opened 2 or 3 days ago. What do you think?

@ryanwhite04
Copy link
Author

I don't think they are the same. The other one is about properly escaping commas and quotes, where as mine is a feature request to enable csv-parse to be used with ASS files and work fine with unescaped quotes and commas as long as they are in the "Last Field".

It's probably not worth putting a lot of effort into this feature request though because I don't know of any other formats which allow unescaped commas and quotes in the last field of otherwise csv compatible files.

@wdavidw
Copy link
Member

wdavidw commented May 18, 2018

The subject is different but the solution I propose at the end seems similar, no?

@wdavidw
Copy link
Member

wdavidw commented Sep 13, 2019

Closing due to lack of activity.

@wdavidw wdavidw closed this as completed Sep 13, 2019
@jgod
Copy link

jgod commented Jan 17, 2021

This would be nice

@wdavidw
Copy link
Member

wdavidw commented Jan 17, 2021

I have a working implementation of this feature, it need more testing but it seems to work. I will only release it at one condition, to find a better name than ignore_final_field_commas.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants