Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unicode chars to formula escape #387

Merged
merged 3 commits into from May 9, 2023
Merged

Add unicode chars to formula escape #387

merged 3 commits into from May 9, 2023

Conversation

tomemelko
Copy link
Contributor

This PR consists of two commits:

  • A performance improvement for formula escaping, doing switch vs [].includes resulted in a 4.5x speed up on my machine (https://jsperf.app/wupoqu)
  • Addition of unicode equivalent characters for =, +, -, and @ in formula escaping

@tomemelko
Copy link
Contributor Author

@wdavidw do you mind taking a look?

@atlanteh as a CC since I'm changing their code

@wdavidw
Copy link
Member

wdavidw commented Apr 23, 2023

@tomemelko I am ok with the changes, only the tests must be enriched. Could you do this ?

Also, do we have a reference or specification about those escape characters ? I wish to enrich the comments with it.

@tomemelko
Copy link
Contributor Author

Hey @wdavidw what needs to be improved in the tests? I can try to do whatever you need, I just need to know what you want.

The other escape characters are just the Unicode equivalents. Apple Numbers does Unicode Normalization which will translate the Unicode characters to their ASCII equivalents, leading to the CSV injection.

For example the Unicode equals sign ('=') will be translated to the ascii equals sign ('=') and then the formula will be interpreted by Apple Numbers.

You can test this (if you have access to a Mac) by writing the string =2+2 to a CSV file and then opening the CSV file in Numbers. You will see 4 in the only cell, meaning the formula has been interpreted.

@wdavidw
Copy link
Member

wdavidw commented Apr 28, 2023

About the test:

The option.escape_formulas.coffee must be updated with the new characters.

About the documentation:

The other escape characters are just the Unicode equivalents.

Yes, I understood, what is relevant is to reference where those characters are documented as relevant/special characters. I personally have no idea about those and I don't wish to blindly commit code which I don't know anything about. For example, you mention that "Apple Numbers does Unicode Normalization which will translate the Unicode characters to their ASCII equivalents". I wish to know where this is written.

@arnagar
Copy link

arnagar commented Apr 28, 2023

It might be worth allowing changing the characters to custom characters as it might be useful (for example for people using csv for number reports for negative numbers as - will be escaped so the escape_formulas will be useless for them

@tomemelko
Copy link
Contributor Author

@wdavidw

Tests

I could've sworn that I had already updated the test! No wonder I was confused, I really thought I had updated them already. Sorry! It should be updated now.

Documentation

CSV Injection

You can read about CSV Injection and the special formula characters here:

Apple Numbers Unicode Normalization

I cannot find documentation that Apple Numbers is doing Unicode Normalization, but I do have empirical evidence from testing on a Mac. Like I mentioned, if you write the string =2+2 (which has the unicode '=' character) to a CSV file and open it in Numbers it will interpret the formula.

@arnagar

I could do that as a separate PR, but as it stands right now escape_formula will already escape negative numbers, this PR doesn't change that behavior.

@wdavidw wdavidw merged commit 1fc177c into adaltas:master May 9, 2023
@tomemelko tomemelko deleted the add-unicode-chars-to-formula-escape branch May 10, 2023 14:05
salceson added a commit to evidenceprime/node-csv that referenced this pull request Sep 22, 2023
* chore: latest dependencies

* fix: uncaught errors with large stream chunks (fix adaltas#386)

* chore(release): publish

 - csv-demo-browser@0.1.6
 - csv-demo-cjs@0.2.4
 - csv-demo-eslint@0.1.10
 - csv-demo-esm@0.0.18
 - csv-issues-cjs@0.1.5
 - csv-issues-esm@0.0.9
 - csv-demo-ts-moduleresolution-node16-cjs@0.2.4
 - csv-demo-ts-module-node16@0.2.4
 - csv-demo-webpack-ts@0.1.6
 - csv-demo-webpack@0.1.8
 - csv-generate@4.2.3
 - csv-parse@5.3.7
 - csv-stringify@6.3.1
 - csv@6.2.9
 - stream-transform@3.2.3

* test(csv-stringify): fix legacy

* chore(release): publish

 - csv-demo-browser@0.1.7
 - csv-demo-cjs@0.2.5
 - csv-demo-eslint@0.1.11
 - csv-demo-esm@0.0.19
 - csv-issues-cjs@0.1.6
 - csv-issues-esm@0.0.10
 - csv-demo-ts-moduleresolution-node16-cjs@0.2.5
 - csv-demo-ts-module-node16@0.2.5
 - csv-demo-webpack-ts@0.1.7
 - csv-demo-webpack@0.1.9
 - csv-generate@4.2.4
 - csv-parse@5.3.8
 - csv-stringify@6.3.2
 - csv@6.2.10
 - stream-transform@3.2.4

* build: remove trailing slash in home url

* chore: latest dependencies

* fix(csv): fixed CJS types under modern `modernResolution` options (adaltas#388)

* fix(csv): remove ts files in cjs dist

* chore(release): publish

 - csv-demo-browser@0.1.8
 - csv-demo-cjs@0.2.6
 - csv-demo-eslint@0.1.12
 - csv-demo-esm@0.0.20
 - csv-issues-cjs@0.1.7
 - csv-issues-esm@0.0.11
 - csv-demo-ts-moduleresolution-node16-cjs@0.2.6
 - csv-demo-ts-module-node16@0.2.6
 - csv-demo-webpack-ts@0.1.8
 - csv-demo-webpack@0.1.10
 - csv-generate@4.2.5
 - csv-parse@5.3.9
 - csv-stringify@6.3.3
 - csv@6.2.11
 - stream-transform@3.2.5

* docs: minor upercase modification

* chore: latest dependencies

* chore(release): publish

 - csv-demo-browser@0.1.9
 - csv-demo-cjs@0.2.7
 - csv-demo-eslint@0.1.13
 - csv-demo-esm@0.0.21
 - csv-issues-cjs@0.1.8
 - csv-issues-esm@0.0.12
 - csv-demo-ts-moduleresolution-node16-cjs@0.2.7
 - csv-demo-ts-module-node16@0.2.7
 - csv-demo-webpack-ts@0.1.9
 - csv-demo-webpack@0.1.11
 - csv-generate@4.2.6
 - csv-parse@5.3.10
 - csv-stringify@6.3.4
 - csv@6.2.12
 - stream-transform@3.2.6

* feat: add unicode chars to formula escape (adaltas#387)

* fix(csv-stringify): use switch in formula escaping

* fix(csv-stringify): add unicode character equivalents in formula sanitization

* chore: update tests

* docs(csv-stringify): escape formulas references

* chore(release): publish

 - csv-demo-browser@0.1.10
 - csv-demo-cjs@0.2.8
 - csv-demo-eslint@0.1.14
 - csv-demo-esm@0.0.22
 - csv-issues-cjs@0.1.9
 - csv-issues-esm@0.0.13
 - csv-demo-ts-moduleresolution-node16-cjs@0.2.8
 - csv-demo-ts-module-node16@0.2.8
 - csv-demo-webpack-ts@0.1.10
 - csv-demo-webpack@0.1.12
 - csv-stringify@6.4.0
 - csv@6.3.0

* feat(csv-parse): add `columns` property in `Info` object type (adaltas#390)

* fix(ts): Add `columns` property in `Info` object type

* Add disabled options to columns type

* build(csv-parse): build and write test after info ts definition

* chore(release): publish

 - csv-demo-browser@0.1.11
 - csv-demo-cjs@0.2.9
 - csv-demo-esm@0.0.23
 - csv-issues-cjs@0.1.10
 - csv-issues-esm@0.0.14
 - csv-demo-ts-moduleresolution-node16-cjs@0.2.9
 - csv-demo-ts-module-node16@0.2.9
 - csv-demo-webpack-ts@0.1.11
 - csv-demo-webpack@0.1.13
 - csv-parse@5.4.0
 - csv@6.3.1

* docs: update build badge urls

* docs(csv-generate): comment indentation in samples

* refactor(csv-issues-cjs): code format

* refactor(csv-issues-cjs): remove unused arguments

* test(csv-issues-cjs): fix stdout maxBuffer length exceeded

* test(csv-issues-esm): use spawn instead of exec

* fix: commonjs types, run tsc and lint to validate changes (adaltas#397)

* fix: types weren't working for commonjs. Run tsc and lint to validate changes

* chore: needs to work on linux and BSD

* chore: latest dependencies

* chore(release): publish

 - csv-demo-browser@0.1.12
 - csv-demo-cjs@0.2.10
 - csv-demo-eslint@0.1.15
 - csv-demo-esm@0.0.24
 - csv-issues-cjs@0.1.11
 - csv-issues-esm@0.0.15
 - csv-demo-ts-moduleresolution-node16-cjs@0.2.10
 - csv-demo-ts-module-node16@0.2.10
 - csv-demo-webpack-ts@0.1.12
 - csv-demo-webpack@0.1.14
 - csv-generate@4.2.7
 - csv-parse@5.4.1
 - csv-stringify@6.4.1
 - csv@6.3.2
 - stream-transform@3.2.7

* feat(csv-issues-cjs): 399 issue

* fix(csv-demo-ts-cjs-node16): upgrade module definition after latest typescript

* feat(csv-parse): new comment_no_infix option (fix adaltas#325)

* test(csv-issues-esm): reproduce issue adaltas#391

* refactor(csv-stringify): rename variable in sample

* test(csv-issues-cjs): reproduce issue 327

* chore(release): publish

 - csv-demo-browser@0.1.13
 - csv-demo-cjs@0.2.11
 - csv-demo-eslint@0.1.16
 - csv-demo-esm@0.0.25
 - csv-issues-cjs@0.2.0
 - csv-issues-esm@0.0.16
 - csv-demo-ts-cjs-node16@0.2.11
 - csv-demo-ts-module-node16@0.2.11
 - csv-demo-webpack-ts@0.1.13
 - csv-demo-webpack@0.1.15
 - csv-generate@4.2.8
 - csv-parse@5.5.0
 - csv-stringify@6.4.2
 - csv@6.3.3
 - stream-transform@3.2.8

* docs(csv-parse): comment_no_infix sample

---------

Co-authored-by: David Worms <david@adaltas.com>
Co-authored-by: Petter <petter@petterhaggholm.net>
Co-authored-by: Mateusz Burzyński <mateuszburzynski@gmail.com>
Co-authored-by: Tom Emelko <tom.emelko@gmail.com>
Co-authored-by: Elia Maino <eliamaino@gmail.com>
Co-authored-by: David Tanner <darthtanner@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants