New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cspell Ignores Many Words In .NET Dictionary #589
Comments
The issue here seems to be that the second set of words are only permitted when |
A few things: gzcat cshart.text.gz | less In most cases,
|
Very enlightening. I had read that document before, but mistook the opening line to apply only to the text being checked, not also to the dictionaries themselves.
One thing I still don't understand though (and I am probably missing something again here) is why cspell trace doesn't find "net" + "fx" for "netfx" in the dotnet dictionary even when compound words are allowed, but does find "annotation" + "dialog" for "annotationdialog." Is this on account of the default minimum word length of 4? |
@Jason3S, not sure if you happen to know the answer to this question off the top of your head, but definitely not worth your time if not. |
There are two different versions of the tool that compiles the dictionaries. The old version "filters" out characters and text that would not be checked. It splits words on CamelCase boundaries and other non-letter characters. It could take some sample code and make a dictionary out of it. But this approach introduced problems, like splitting on the wrong place and introducing lots of word segments that only made sense when combined with the original text. The new version expects the word list to be cleaner. I have been moving most of the natural language dictionaries to use the new tool. But since the output format is not compatible with CSpell 4, it has to be a major version bump. The new format handles case and accents to allow for strict and loose checking. Which is why I started with the natural language dictionaries. There were a lot of requests to be able to ignore accents. I have not started moving over the other word lists yet. I could use your help if you are willing. I'll convert one as an example. |
PR #702 is an example. |
I am happy to take a stab at this. I noticed in your PR that you removed some special characters from the dictionary. Are there particular special characters that are allowed in the new format? I feel like I'm not getting the connection between the different versions of the tool, and the discrepancy in behavior given that all of the examples I was giving were in the .NET dictionary. |
Thank you. The key part is: # Moved source files into `src` and use `cspell-tools-cli compile --split`
- "build": "cspell-tools compile \"companies.txt\" -o .",
- "test": "head -n 100 \"companies.txt\" | cspell -v -c ./cspell-ext.json --local=* --languageId=* stdin",
+ "build": "cspell-tools-cli compile --split \"src/companies.txt\" -o .",
+ "test": "head -n 100 \"src/companies.txt\" | cspell -v -c ./cspell-ext.json --local=* --languageId=* stdin",
The other changes: # Remove unnecessary ]
- Phillips]
+ Phillips
# This was to fix an old encoding issue. Everything should be UTF-8.
- The Estée Lauder Companies Inc.
+ The Estée Lauder Companies Inc.
# Fix a missing space.
- The Jones Financial Companies,L.L.L.P.
+ The Jones Financial Companies, L.L.L.P. |
@Jason3S, apologies, I completely missed your reply. I have a feeling I may be too late to be helpful here, but are there any dictionaries that still use the old version of the dictionary compiler? |
Dot NET has not been done yet. See #705 . |
Based on dotnet being shown as done in the linked #705 above, I believe this issue can be closed.
|
When asked to trace words present in the "dotnet" dictionary, cspell v5.9.0 correctly believes that "Apply," "Appx," "dotnettools," and "unplated" are present, but incorrectly claims "AnnotationDialog," "NetFx," "propertyChangedEventDescr," "XmlUndefinedPrefix," and "Zone_HeaderStyle" among many others are missing, likely resulting in thousands of false positives. See streetsidesoftware/cspell#1626 for the .cspell.json in use.
The text was updated successfully, but these errors were encountered: