[WIP] tools: Add pygments import script #100

iamkroot · 2018-12-18T16:40:30Z

Currently, we can retrieve the regex patterns from lexers for the required
tokens of all languages not found in the coAST schema.

TODO:

Identify all the required Token types, and corresponding coAST entities
Write proper abstraction to handle regex -> keyword conversion
(Optional) Add the filenames property to Language schema

Will close #96

Currently, we can retrieve the regex patterns for the required tokens of all languages not found in the coAST schema. Closes coala#96

Many edge cases are yet to be covered. For now, the script simply skips over all the languages for which it was unable to parse the patterns properly.

iamkroot · 2018-12-21T17:29:37Z

I feel like the number of lines is getting too big. Will probably break up the script into two or three files.

iamkroot · 2018-12-21T17:44:48Z

Also, I'm not really satisfied with the extraction logic for the keywords. I'm currently going word by word, handling each regex metacharacter and its behaviour separately, which is obviously not very sustainable, and leaves out many edge cases.
To verify that keywords have been extracted properly, we simply match each keyword with the original pattern if was extracted from. As of now, the script fails for about 100 languages, which can be improved drastically, by doing either of the following:

manually handle each edge case - easily leads to bloated code, which will be hard to maintain/update
make a nice parser/abstraction

I've been trying, rather unsuccessfully, to do the second one using regexes, but I'm not very skilled at that, so I couldn't figure out the proper logic to do so. If someone can help out, it would be greatly appreciated 😃

A simple way to check the correctness of the keyword, is to ensure that it matches the regex pattern that it was extracted from.

iamkroot · 2019-03-09T18:38:43Z

I guess most of the hard part is completed now. I've hit a snag on the yaml file dumping, as the pyyaml package sorts the keys in alphabetical order before the dump. There's already a PR in place to fix this over at yaml/pyyaml#254, so we might have to wait for that to be merged, but that too will only help for Py >= 3.6 where creation order is preserved in dicts. The other alternative is to use wimglenn/oyaml, but I would prefer not to add another dependency for this.

tools: Add pygments import script

f9e73aa

Currently, we can retrieve the regex patterns for the required tokens of all languages not found in the coAST schema. Closes coala#96

jayvdb added process/wip size/XS difficulty/medium labels Dec 18, 2018

jayvdb assigned iamkroot Dec 18, 2018

tools: Add keywords extraction logic for pygments

e7b6db4

Many edge cases are yet to be covered. For now, the script simply skips over all the languages for which it was unable to parse the patterns properly.

jayvdb added size/S and removed size/XS labels Dec 21, 2018

iamkroot added 4 commits December 23, 2018 22:42

tools: pygments: Verify extracted keyword using regex

05173cc

A simple way to check the correctness of the keyword, is to ensure that it matches the regex pattern that it was extracted from.

tools: Break up pygments import script

896c3e4

pygments: Use words.words instead of words.get

0cba7a9

pygments: Update coast lang defs from lexers

a47c373

jayvdb added size/M and removed size/S labels Jan 16, 2019

iamkroot added 2 commits March 6, 2019 10:53

pygments-parser: Raise exception on failure

9bc0f90

pygments: Overhaul processing logic

0238800

iamkroot added 2 commits March 10, 2019 11:06

pygments: Save definitions to YAML files

2603dc2

pygments: Use sre_yield to generate regexes

6466539

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] tools: Add pygments import script #100

[WIP] tools: Add pygments import script #100

iamkroot commented Dec 18, 2018 •

edited

iamkroot commented Dec 21, 2018

iamkroot commented Dec 21, 2018 •

edited

iamkroot commented Mar 9, 2019

[WIP] tools: Add pygments import script #100

Are you sure you want to change the base?

[WIP] tools: Add pygments import script #100

Conversation

iamkroot commented Dec 18, 2018 • edited

iamkroot commented Dec 21, 2018

iamkroot commented Dec 21, 2018 • edited

iamkroot commented Mar 9, 2019

iamkroot commented Dec 18, 2018 •

edited

iamkroot commented Dec 21, 2018 •

edited