You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think there is an issue with the TeraTerm lexer which causes Pygments to guess it for many code snippets (more than 50% of the random snippets I try).
Here's an example of text that is correctly identified as C in Pygments 2.3.1 but identified as TerraTerm in Pygments 2.4.0 and above (including 2.4.2):
frompygments.lexersimportguess_lexerTEST_C='''#include <stdio.h>#include <stdlib.h>int main(void);int main(void) { uint8_t x = 42; uint8_t y = x + 1; /* exit 1 for success! */ return 1;}'''print(guess_lexer(TEST_C))
I think it might be because of this special scoring logic:
I'm not quite sure what the fix is here; it seems like analyze_text is used to pick between filename matches, but in this case where no filename is provided, a large number of samples will go for TerraTerm as most languages use some of the commands it's looking for:
I think there is an issue with the TeraTerm lexer which causes Pygments to guess it for many code snippets (more than 50% of the random snippets I try).
Here's an example of text that is correctly identified as C in Pygments 2.3.1 but identified as TerraTerm in Pygments 2.4.0 and above (including 2.4.2):
I think it might be because of this special scoring logic:
pygments/pygments/lexers/teraterm.py
Lines 152 to 158 in c3fdd7b
I'm not quite sure what the fix is here; it seems like
analyze_text
is used to pick between filename matches, but in this case where no filename is provided, a large number of samples will go for TerraTerm as most languages use some of the commands it's looking for:pygments/pygments/lexers/teraterm.py
Lines 57 to 97 in c3fdd7b
The text was updated successfully, but these errors were encountered: