c++ track standard library symbols from cppreference symbol index #1167

david-fong · 2022-06-03T06:31:54Z

(continuing from #1101)

cppreference.com has a "symbol index" page listing names of symbols (ie. functions, constants, etc.). The api can be found here, and the JSON source is here. They are using the Creative Commons licence so it is okay to use.

If it is made automated to parse these symbols from this page, it will be easy to update for future standard library revisions (as opposed to how in #1101, I went through cppreference manually). Also, in #1101, I didn't know yet that there is a user configuration to also check three-letter-words, so I skipped several three-letter-words from cppreference thinking they would never be needed.

Do note that some larger subsections of the standard library are listed separately from the "root" page / API response, such as the contents of std::ranges; those links above are only of the root content.

The fix for this issue will replace most- but not all- of the dictionaries added in #1101. Ex. the jargon, names of people, and ecosystem / tooling dictionaries are not covered by cppreference's symbol index.

One point for discussion: do you think things like cregex_iterator should be registered as cregex or cregex_iterator? Or maybe a better example is the comp_ellint_1, comp_ellint_1f, comp_ellint_1l, comp_ellint_2, etc. I can't think of a strong argument for one over the other off the top of my head. In #1101, I went on a case-by-case basis, using the split approach where it would save adding many dictionary entries with common parts, such as in the ellint case, and otherwise using the full thing if there were no other symbols with common parts.

The text was updated successfully, but these errors were encountered:

Jason3S · 2022-06-03T06:55:29Z

One point for discussion: do you think things like cregex_iterator should be registered as cregex or cregex_iterator? Or maybe a better example is the comp_ellint_1, comp_ellint_1f, comp_ellint_1l, comp_ellint_2, etc. I can't think of a strong argument for one over the other off the top of my head. In #1101, I went on a case-by-case basis, using the split approach where it would save adding many dictionary entries with common parts, such as in the ellint case, and otherwise using the full thing if there were no other symbols with common parts.

This is a challenge where a bit of preprocessing might be necessary. Too many comp_ellint_1, comp_ellint_1f..., make the dictionary quite unnecessarily large, but at the same time we want to avoid adding misspellings or strange words that exist in the reference.

One rule of thumb is to check to see if all of the parts (split on _) already exist in the dictionary, then it can be dropped. Do not assume that the English dictionary is loaded.

As a first pass, this might not be necessary.

david-fong mentioned this issue Jun 29, 2022

What should/does the cpp dictionary contain #54

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

c++ track standard library symbols from cppreference symbol index #1167

c++ track standard library symbols from cppreference symbol index #1167

david-fong commented Jun 3, 2022

Jason3S commented Jun 3, 2022 •

edited

c++ track standard library symbols from cppreference symbol index #1167

c++ track standard library symbols from cppreference symbol index #1167

Comments

david-fong commented Jun 3, 2022

Jason3S commented Jun 3, 2022 • edited

Jason3S commented Jun 3, 2022 •

edited