Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ts/js) use identifier to match potential keywords #2519

Merged
merged 10 commits into from May 7, 2020
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
28 changes: 18 additions & 10 deletions docs/language-guide.rst
Expand Up @@ -64,17 +64,19 @@ and most interesting parsing happens inside tags.
Keywords
--------

In the simple case language keywords are defined in a string, separated by space:
In the simple case language keywords can be defined with a string, separated by space:

::

{
keywords: 'else for if while'
}

Some languages have different kinds of "keywords" that might not be called as such by the language spec
but are very close to them from the point of view of a syntax highlighter. These are all sorts of "literals", "built-ins", "symbols" and such.
To define such keyword groups the attribute ``keywords`` becomes an object each property of which defines its own group of keywords:
Some languages have different kinds of "keywords" that might not be called as
such by the language spec but are very close to them from the point of view of a
syntax highlighter. These are all sorts of "literals", "built-ins", "symbols"
and such. To define such keyword groups the attribute ``keywords`` becomes an
object each property of which defines its own group of keywords:

::

Expand All @@ -85,19 +87,25 @@ To define such keyword groups the attribute ``keywords`` becomes an object each
}
}

The group name becomes then a class name in a generated markup enabling different styling for different kinds of keywords.
The group name becomes the class name in the generated markup enabling different
themeing for different kinds of keywords.

To detect keywords highlight.js breaks the processed chunk of code into separate words — a process called lexing.
The "word" here is defined by the regexp ``[a-zA-Z][a-zA-Z0-9_]*`` that works for keywords in most languages.
Different lexing rules can be defined by the ``lexemes`` attribute:
To detect keywords highlight.js breaks the processed chunk of code into separate
words — a process called lexing. By default "words" are matched with the regexp
``\w+``, and that works well for many languages. Different lexing rules can be
defined by the magic ``$pattern`` attribute:

::

{
lexemes: '-[a-z]+',
keywords: '-import -export'
keywords: {
$pattern: /-[a-z]+/, // allow keywords with dash in them
joshgoebel marked this conversation as resolved.
Show resolved Hide resolved
keyword: '-import -export'
}
}

Note: The older ``lexemes`` setting has been deprecated in favor of using
``keywords.$pattern``. They are functionally identical.

Sub-modes
---------
Expand Down
17 changes: 11 additions & 6 deletions docs/mode-reference.rst
Expand Up @@ -241,14 +241,19 @@ and ``endSameAsBegin: true``.

.. _lexemes:

lexemes
^^^^^^^
lexemes (now keywords.$pattern)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**type**: regexp

A regular expression that extracts individual lexemes from language text to find :ref:`keywords <keywords>` among them.
Default value is ``hljs.IDENT_RE`` which works for most languages.
A regular expression that extracts individual "words" from the code to compare
against :ref:`keywords <keywords>`. The default value is ``\w+`` which works for
many languages.

Note: It's now recommmended that you use ``keywords.$pattern`` instead of
``lexemes``, as this makes it easier to keep your keyword pattern associated
with your keywords themselves, particularly if your keyword configuration is a
constant that you repeat multiple times within different modes of your grammar.

.. _keywords:

Expand All @@ -259,8 +264,8 @@ keywords

Keyword definition comes in two forms:

* ``'for while if else weird_voodoo|10 ... '`` -- a string of space-separated keywords with an optional relevance over a pipe
* ``{'keyword': ' ... ', 'literal': ' ... '}`` -- an object whose keys are names of different kinds of keywords and values are keyword definition strings in the first form
* ``'for while if|0 else weird_voodoo|10 ... '`` -- a string of space-separated keywords with an optional relevance over a pipe
* ``{keyword: ' ... ', literal: ' ... ', $pattern: /\w+/ }`` -- an object that describes multiple sets of keywords and the pattern used to find them

For detailed explanation see :doc:`Language definition guide </language-guide>`.

Expand Down
8 changes: 4 additions & 4 deletions src/highlight.js
Expand Up @@ -131,8 +131,8 @@ const HLJS = function(hljs) {
}

let last_index = 0;
top.lexemesRe.lastIndex = 0;
let match = top.lexemesRe.exec(mode_buffer);
top.keywordPatternRe.lastIndex = 0;
let match = top.keywordPatternRe.exec(mode_buffer);
let buf = "";

while (match) {
Expand All @@ -148,8 +148,8 @@ const HLJS = function(hljs) {
} else {
buf += match[0];
}
last_index = top.lexemesRe.lastIndex;
match = top.lexemesRe.exec(mode_buffer);
last_index = top.keywordPatternRe.lastIndex;
match = top.keywordPatternRe.exec(mode_buffer);
}
buf += mode_buffer.substr(last_index);
emitter.addText(buf);
Expand Down
14 changes: 8 additions & 6 deletions src/languages/1c.js
Expand Up @@ -5,7 +5,7 @@ Description: built-in language 1C:Enterprise (v7, v8)
Category: enterprise
*/

export default function(hljs){
export default function(hljs) {

// общий паттерн для определения идентификаторов
var UNDERSCORE_IDENT_RE = '[A-Za-zА-Яа-яёЁ_][A-Za-zА-Яа-яёЁ_0-9]+';
Expand Down Expand Up @@ -446,9 +446,12 @@ export default function(hljs){
// meta : инструкции препроцессора, директивы компиляции
var META = {
className: 'meta',
lexemes: UNDERSCORE_IDENT_RE,

begin: '#|&', end: '$',
keywords: {'meta-keyword': KEYWORD + METAKEYWORD},
keywords: {
$pattern: UNDERSCORE_IDENT_RE,
'meta-keyword': KEYWORD + METAKEYWORD
},
contains: [
COMMENTS
]
Expand All @@ -463,7 +466,6 @@ export default function(hljs){
// function : объявление процедур и функций
var FUNCTION = {
className: 'function',
lexemes: UNDERSCORE_IDENT_RE,
variants: [
{begin: 'процедура|функция', end: '\\)', keywords: 'процедура функция'},
{begin: 'конецпроцедуры|конецфункции', keywords: 'конецпроцедуры конецфункции'}
Expand All @@ -474,9 +476,9 @@ export default function(hljs){
contains: [
{
className: 'params',
lexemes: UNDERSCORE_IDENT_RE,
begin: UNDERSCORE_IDENT_RE, end: ',', excludeEnd: true, endsWithParent: true,
keywords: {
$pattern: UNDERSCORE_IDENT_RE,
keyword: 'знач',
literal: LITERAL
},
Expand All @@ -496,8 +498,8 @@ export default function(hljs){
return {
name: '1C:Enterprise',
case_insensitive: true,
lexemes: UNDERSCORE_IDENT_RE,
keywords: {
$pattern: UNDERSCORE_IDENT_RE,
keyword: KEYWORD,
built_in: BUILTIN,
class: CLASS,
Expand Down
2 changes: 1 addition & 1 deletion src/languages/armasm.js
Expand Up @@ -21,8 +21,8 @@ export default function(hljs) {
name: 'ARM Assembly',
case_insensitive: true,
aliases: ['arm'],
lexemes: '\\.?' + hljs.IDENT_RE,
keywords: {
$pattern: '\\.?' + hljs.IDENT_RE,
meta:
//GNU preprocs
'.2byte .4byte .align .ascii .asciz .balign .byte .code .data .else .end .endif .endm .endr .equ .err .exitm .extern .global .hword .if .ifdef .ifndef .include .irp .long .macro .rept .req .section .set .skip .space .text .word .arm .thumb .code16 .code32 .force_thumb .thumb_func .ltorg '+
Expand Down
2 changes: 1 addition & 1 deletion src/languages/avrasm.js
Expand Up @@ -9,8 +9,8 @@ export default function(hljs) {
return {
name: 'AVR Assembly',
case_insensitive: true,
lexemes: '\\.?' + hljs.IDENT_RE,
keywords: {
$pattern: '\\.?' + hljs.IDENT_RE,
keyword:
/* mnemonic */
'adc add adiw and andi asr bclr bld brbc brbs brcc brcs break breq brge brhc brhs ' +
Expand Down
2 changes: 1 addition & 1 deletion src/languages/bash.js
Expand Up @@ -81,8 +81,8 @@ export default function(hljs) {
return {
name: 'Bash',
aliases: ['sh', 'zsh'],
lexemes: /\b-?[a-z\._]+\b/,
keywords: {
$pattern: /\b-?[a-z\._]+\b/,
keyword:
'if then else elif fi for while in do done case esac function',
literal:
Expand Down
2 changes: 1 addition & 1 deletion src/languages/basic.js
Expand Up @@ -11,8 +11,8 @@ export default function(hljs) {
case_insensitive: true,
illegal: '^\.',
// Support explicitly typed variables that end with $%! or #.
lexemes: '[a-zA-Z][a-zA-Z0-9_\$\%\!\#]*',
keywords: {
$pattern: '[a-zA-Z][a-zA-Z0-9_\$\%\!\#]*',
keyword:
'ABS ASC AND ATN AUTO|0 BEEP BLOAD|10 BSAVE|10 CALL CALLS CDBL CHAIN CHDIR CHR$|10 CINT CIRCLE ' +
'CLEAR CLOSE CLS COLOR COM COMMON CONT COS CSNG CSRLIN CVD CVI CVS DATA DATE$ ' +
Expand Down
6 changes: 3 additions & 3 deletions src/languages/clojure.js
Expand Up @@ -7,8 +7,11 @@ Category: lisp
*/

export default function(hljs) {
var SYMBOLSTART = 'a-zA-Z_\\-!.?+*=<>&#\'';
var SYMBOL_RE = '[' + SYMBOLSTART + '][' + SYMBOLSTART + '0-9/;:]*';
var globals = 'def defonce defprotocol defstruct defmulti defmethod defn- defn defmacro deftype defrecord';
var keywords = {
$pattern: SYMBOL_RE,
'builtin-name':
// Clojure keywords
globals + ' ' +
Expand Down Expand Up @@ -41,8 +44,6 @@ export default function(hljs) {
'lazy-seq spread list* str find-keyword keyword symbol gensym force rationalize'
};

var SYMBOLSTART = 'a-zA-Z_\\-!.?+*=<>&#\'';
var SYMBOL_RE = '[' + SYMBOLSTART + '][' + SYMBOLSTART + '0-9/;:]*';
var SIMPLE_NUMBER_RE = '[-+]?\\d+(\\.\\d+)?';

var SYMBOL = {
Expand Down Expand Up @@ -86,7 +87,6 @@ export default function(hljs) {
};
var NAME = {
keywords: keywords,
lexemes: SYMBOL_RE,
className: 'name', begin: SYMBOL_RE,
starts: BODY
};
Expand Down
2 changes: 1 addition & 1 deletion src/languages/crystal.js
Expand Up @@ -11,6 +11,7 @@ export default function(hljs) {
var CRYSTAL_METHOD_RE = '[a-zA-Z_]\\w*[!?=]?|[-+~]\\@|<<|>>|[=!]~|===?|<=>|[<>]=?|\\*\\*|[-/+%^&*~|]|//|//=|&[-+*]=?|&\\*\\*|\\[\\][=?]?';
var CRYSTAL_PATH_RE = '[A-Za-z_]\\w*(::\\w+)*(\\?|\\!)?';
var CRYSTAL_KEYWORDS = {
$pattern: CRYSTAL_IDENT_RE,
keyword:
'abstract alias annotation as as? asm begin break case class def do else elsif end ensure enum extend for fun if ' +
'include instance_sizeof is_a? lib macro module next nil? of out pointerof private protected rescue responds_to? ' +
Expand Down Expand Up @@ -187,7 +188,6 @@ export default function(hljs) {
return {
name: 'Crystal',
aliases: ['cr'],
lexemes: CRYSTAL_IDENT_RE,
keywords: CRYSTAL_KEYWORDS,
contains: CRYSTAL_DEFAULT_CONTAINS
};
Expand Down
2 changes: 1 addition & 1 deletion src/languages/csp.js
Expand Up @@ -11,8 +11,8 @@ export default function(hljs) {
return {
name: 'CSP',
case_insensitive: false,
lexemes: '[a-zA-Z][a-zA-Z0-9_-]*',
keywords: {
$pattern: '[a-zA-Z][a-zA-Z0-9_-]*',
keyword: 'base-uri child-src connect-src default-src font-src form-action ' +
'frame-ancestors frame-src img-src media-src object-src plugin-types ' +
'report-uri sandbox script-src style-src',
Expand Down
2 changes: 1 addition & 1 deletion src/languages/d.js
Expand Up @@ -30,6 +30,7 @@ export default function(hljs) {
* @type {Object}
*/
var D_KEYWORDS = {
$pattern: hljs.UNDERSCORE_IDENT_RE,
keyword:
'abstract alias align asm assert auto body break byte case cast catch class ' +
'const continue debug default delete deprecated do else enum export extern final ' +
Expand Down Expand Up @@ -245,7 +246,6 @@ export default function(hljs) {

return {
name: 'D',
lexemes: hljs.UNDERSCORE_IDENT_RE,
keywords: D_KEYWORDS,
contains: [
hljs.C_LINE_COMMENT_MODE,
Expand Down
10 changes: 5 additions & 5 deletions src/languages/elixir.js
Expand Up @@ -9,14 +9,15 @@ Website: https://elixir-lang.org
export default function(hljs) {
var ELIXIR_IDENT_RE = '[a-zA-Z_][a-zA-Z0-9_.]*(\\!|\\?)?';
var ELIXIR_METHOD_RE = '[a-zA-Z_]\\w*[!?=]?|[-+~]\\@|<<|>>|=~|===?|<=>|[<>]=?|\\*\\*|[-/+%^&*~`|]|\\[\\]=?';
var ELIXIR_KEYWORDS =
'and false then defined module in return redo retry end for true self when ' +
var ELIXIR_KEYWORDS = {
$pattern: ELIXIR_IDENT_RE,
keyword: 'and false then defined module in return redo retry end for true self when ' +
'next until do begin unless nil break not case cond alias while ensure or ' +
'include use alias fn quote require import with|0';
'include use alias fn quote require import with|0'
};
var SUBST = {
className: 'subst',
begin: '#\\{', end: '}',
lexemes: ELIXIR_IDENT_RE,
keywords: ELIXIR_KEYWORDS
};
var NUMBER = {
Expand Down Expand Up @@ -174,7 +175,6 @@ export default function(hljs) {

return {
name: 'Elixir',
lexemes: ELIXIR_IDENT_RE,
keywords: ELIXIR_KEYWORDS,
contains: ELIXIR_DEFAULT_CONTAINS
};
Expand Down
9 changes: 5 additions & 4 deletions src/languages/erlang.js
Expand Up @@ -136,11 +136,12 @@ export default function(hljs) {
relevance: 0,
excludeEnd: true,
returnBegin: true,
lexemes: '-' + hljs.IDENT_RE,
keywords:
'-module -record -undef -export -ifdef -ifndef -author -copyright -doc -vsn ' +
keywords: {
$pattern: '-' + hljs.IDENT_RE,
keyword: '-module -record -undef -export -ifdef -ifndef -author -copyright -doc -vsn ' +
'-import -include -include_lib -compile -define -else -endif -file -behaviour ' +
'-behavior -spec',
'-behavior -spec'
},
contains: [PARAMS]
},
NUMBER,
Expand Down
2 changes: 1 addition & 1 deletion src/languages/excel.js
Expand Up @@ -10,9 +10,9 @@ export default function(hljs) {
name: 'Excel formulae',
aliases: ['xlsx', 'xls'],
case_insensitive: true,
lexemes: /[a-zA-Z][\w\.]*/,
// built-in functions imported from https://web.archive.org/web/20160513042710/https://support.office.com/en-us/article/Excel-functions-alphabetical-b3944572-255d-4efb-bb96-c6d90033e188
keywords: {
$pattern: /[a-zA-Z][\w\.]*/,
built_in: 'ABS ACCRINT ACCRINTM ACOS ACOSH ACOT ACOTH AGGREGATE ADDRESS AMORDEGRC AMORLINC AND ARABIC AREAS ASC ASIN ASINH ATAN ATAN2 ATANH AVEDEV AVERAGE AVERAGEA AVERAGEIF AVERAGEIFS BAHTTEXT BASE BESSELI BESSELJ BESSELK BESSELY BETADIST BETA.DIST BETAINV BETA.INV BIN2DEC BIN2HEX BIN2OCT BINOMDIST BINOM.DIST BINOM.DIST.RANGE BINOM.INV BITAND BITLSHIFT BITOR BITRSHIFT BITXOR CALL CEILING CEILING.MATH CEILING.PRECISE CELL CHAR CHIDIST CHIINV CHITEST CHISQ.DIST CHISQ.DIST.RT CHISQ.INV CHISQ.INV.RT CHISQ.TEST CHOOSE CLEAN CODE COLUMN COLUMNS COMBIN COMBINA COMPLEX CONCAT CONCATENATE CONFIDENCE CONFIDENCE.NORM CONFIDENCE.T CONVERT CORREL COS COSH COT COTH COUNT COUNTA COUNTBLANK COUNTIF COUNTIFS COUPDAYBS COUPDAYS COUPDAYSNC COUPNCD COUPNUM COUPPCD COVAR COVARIANCE.P COVARIANCE.S CRITBINOM CSC CSCH CUBEKPIMEMBER CUBEMEMBER CUBEMEMBERPROPERTY CUBERANKEDMEMBER CUBESET CUBESETCOUNT CUBEVALUE CUMIPMT CUMPRINC DATE DATEDIF DATEVALUE DAVERAGE DAY DAYS DAYS360 DB DBCS DCOUNT DCOUNTA DDB DEC2BIN DEC2HEX DEC2OCT DECIMAL DEGREES DELTA DEVSQ DGET DISC DMAX DMIN DOLLAR DOLLARDE DOLLARFR DPRODUCT DSTDEV DSTDEVP DSUM DURATION DVAR DVARP EDATE EFFECT ENCODEURL EOMONTH ERF ERF.PRECISE ERFC ERFC.PRECISE ERROR.TYPE EUROCONVERT EVEN EXACT EXP EXPON.DIST EXPONDIST FACT FACTDOUBLE FALSE|0 F.DIST FDIST F.DIST.RT FILTERXML FIND FINDB F.INV F.INV.RT FINV FISHER FISHERINV FIXED FLOOR FLOOR.MATH FLOOR.PRECISE FORECAST FORECAST.ETS FORECAST.ETS.CONFINT FORECAST.ETS.SEASONALITY FORECAST.ETS.STAT FORECAST.LINEAR FORMULATEXT FREQUENCY F.TEST FTEST FV FVSCHEDULE GAMMA GAMMA.DIST GAMMADIST GAMMA.INV GAMMAINV GAMMALN GAMMALN.PRECISE GAUSS GCD GEOMEAN GESTEP GETPIVOTDATA GROWTH HARMEAN HEX2BIN HEX2DEC HEX2OCT HLOOKUP HOUR HYPERLINK HYPGEOM.DIST HYPGEOMDIST IF IFERROR IFNA IFS IMABS IMAGINARY IMARGUMENT IMCONJUGATE IMCOS IMCOSH IMCOT IMCSC IMCSCH IMDIV IMEXP IMLN IMLOG10 IMLOG2 IMPOWER IMPRODUCT IMREAL IMSEC IMSECH IMSIN IMSINH IMSQRT IMSUB IMSUM IMTAN INDEX INDIRECT INFO INT INTERCEPT INTRATE IPMT IRR ISBLANK ISERR ISERROR ISEVEN ISFORMULA ISLOGICAL ISNA ISNONTEXT ISNUMBER ISODD ISREF ISTEXT ISO.CEILING ISOWEEKNUM ISPMT JIS KURT LARGE LCM LEFT LEFTB LEN LENB LINEST LN LOG LOG10 LOGEST LOGINV LOGNORM.DIST LOGNORMDIST LOGNORM.INV LOOKUP LOWER MATCH MAX MAXA MAXIFS MDETERM MDURATION MEDIAN MID MIDBs MIN MINIFS MINA MINUTE MINVERSE MIRR MMULT MOD MODE MODE.MULT MODE.SNGL MONTH MROUND MULTINOMIAL MUNIT N NA NEGBINOM.DIST NEGBINOMDIST NETWORKDAYS NETWORKDAYS.INTL NOMINAL NORM.DIST NORMDIST NORMINV NORM.INV NORM.S.DIST NORMSDIST NORM.S.INV NORMSINV NOT NOW NPER NPV NUMBERVALUE OCT2BIN OCT2DEC OCT2HEX ODD ODDFPRICE ODDFYIELD ODDLPRICE ODDLYIELD OFFSET OR PDURATION PEARSON PERCENTILE.EXC PERCENTILE.INC PERCENTILE PERCENTRANK.EXC PERCENTRANK.INC PERCENTRANK PERMUT PERMUTATIONA PHI PHONETIC PI PMT POISSON.DIST POISSON POWER PPMT PRICE PRICEDISC PRICEMAT PROB PRODUCT PROPER PV QUARTILE QUARTILE.EXC QUARTILE.INC QUOTIENT RADIANS RAND RANDBETWEEN RANK.AVG RANK.EQ RANK RATE RECEIVED REGISTER.ID REPLACE REPLACEB REPT RIGHT RIGHTB ROMAN ROUND ROUNDDOWN ROUNDUP ROW ROWS RRI RSQ RTD SEARCH SEARCHB SEC SECH SECOND SERIESSUM SHEET SHEETS SIGN SIN SINH SKEW SKEW.P SLN SLOPE SMALL SQL.REQUEST SQRT SQRTPI STANDARDIZE STDEV STDEV.P STDEV.S STDEVA STDEVP STDEVPA STEYX SUBSTITUTE SUBTOTAL SUM SUMIF SUMIFS SUMPRODUCT SUMSQ SUMX2MY2 SUMX2PY2 SUMXMY2 SWITCH SYD T TAN TANH TBILLEQ TBILLPRICE TBILLYIELD T.DIST T.DIST.2T T.DIST.RT TDIST TEXT TEXTJOIN TIME TIMEVALUE T.INV T.INV.2T TINV TODAY TRANSPOSE TREND TRIM TRIMMEAN TRUE|0 TRUNC T.TEST TTEST TYPE UNICHAR UNICODE UPPER VALUE VAR VAR.P VAR.S VARA VARP VARPA VDB VLOOKUP WEBSERVICE WEEKDAY WEEKNUM WEIBULL WEIBULL.DIST WORKDAY WORKDAY.INTL XIRR XNPV XOR YEAR YEARFRAC YIELD YIELDDISC YIELDMAT Z.TEST ZTEST'
},
contains: [
Expand Down
9 changes: 5 additions & 4 deletions src/languages/gcode.js
Expand Up @@ -8,9 +8,11 @@
export default function(hljs) {
var GCODE_IDENT_RE = '[A-Z_][A-Z0-9_.]*';
var GCODE_CLOSE_RE = '\\%';
var GCODE_KEYWORDS =
'IF DO WHILE ENDWHILE CALL ENDIF SUB ENDSUB GOTO REPEAT ENDREPEAT ' +
'EQ LT GT NE GE LE OR XOR';
var GCODE_KEYWORDS = {
$pattern: GCODE_IDENT_RE,
keyword: 'IF DO WHILE ENDWHILE CALL ENDIF SUB ENDSUB GOTO REPEAT ENDREPEAT ' +
'EQ LT GT NE GE LE OR XOR'
};
var GCODE_START = {
className: 'meta',
begin: '([O])([0-9]+)'
Expand Down Expand Up @@ -61,7 +63,6 @@ export default function(hljs) {
// Some implementations (CNC controls) of G-code are interoperable with uppercase and lowercase letters seamlessly.
// However, most prefer all uppercase and uppercase is customary.
case_insensitive: true,
lexemes: GCODE_IDENT_RE,
keywords: GCODE_KEYWORDS,
contains: [
{
Expand Down