Skip to content

Commit

Permalink
add docs
Browse files Browse the repository at this point in the history
  • Loading branch information
joshgoebel committed May 2, 2020
1 parent 582968b commit 503bf51
Show file tree
Hide file tree
Showing 6 changed files with 48 additions and 27 deletions.
28 changes: 18 additions & 10 deletions docs/language-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,17 +64,19 @@ and most interesting parsing happens inside tags.
Keywords
--------

In the simple case language keywords are defined in a string, separated by space:
In the simple case language keywords can by defined with a string, separated by space:

::

{
keywords: 'else for if while'
}

Some languages have different kinds of "keywords" that might not be called as such by the language spec
but are very close to them from the point of view of a syntax highlighter. These are all sorts of "literals", "built-ins", "symbols" and such.
To define such keyword groups the attribute ``keywords`` becomes an object each property of which defines its own group of keywords:
Some languages have different kinds of "keywords" that might not be called as
such by the language spec but are very close to them from the point of view of a
syntax highlighter. These are all sorts of "literals", "built-ins", "symbols"
and such. To define such keyword groups the attribute ``keywords`` becomes an
object each property of which defines its own group of keywords:

::

Expand All @@ -85,19 +87,25 @@ To define such keyword groups the attribute ``keywords`` becomes an object each
}
}

The group name becomes then a class name in a generated markup enabling different styling for different kinds of keywords.
The group name becomes the class name in the generated markup enabling different
themeing for different kinds of keywords.

To detect keywords highlight.js breaks the processed chunk of code into separate words — a process called lexing.
The "word" here is defined by the regexp ``[a-zA-Z][a-zA-Z0-9_]*`` that works for keywords in most languages.
Different lexing rules can be defined by the ``lexemes`` attribute:
To detect keywords highlight.js breaks the processed chunk of code into separate
words — a process called lexing. By default "words" are matched with the regexp
``\w+``, and that works well for many languages. Different lexing rules can be
defined by the magic ``$pattern`` attribute:

::

{
lexemes: '-[a-z]+',
keywords: '-import -export'
keywords: {
$pattern: /-[a-z]+/, // allow keywords with dash in them
keyword: '-import -export'
}
}

Note: The older ``mode.lexemes`` setting has been deprecated in favor of using
``keywords.$pattern``. They are functionally identical.

Sub-modes
---------
Expand Down
17 changes: 11 additions & 6 deletions docs/mode-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -241,14 +241,19 @@ and ``endSameAsBegin: true``.

.. _lexemes:

lexemes
^^^^^^^
lexemes (now keywords.$pattern)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**type**: regexp

A regular expression that extracts individual lexemes from language text to find :ref:`keywords <keywords>` among them.
Default value is ``hljs.IDENT_RE`` which works for most languages.
A regular expression that extracts individual "words" from the code to compare
against :ref:`keywords <keywords>`. The default value is ``\w+`` which works for
many languages.

Note: It's now recommmended that you use ``keywords.$pattern`` instead of
``lexemes``, as this makes it easier to keep your keyword pattern associated
with your keywords themselves, particularly if your keyword configuration is a
constant that you repeat multiple times within different modes of your grammar.

.. _keywords:

Expand All @@ -259,8 +264,8 @@ keywords

Keyword definition comes in two forms:

* ``'for while if else weird_voodoo|10 ... '`` -- a string of space-separated keywords with an optional relevance over a pipe
* ``{'keyword': ' ... ', 'literal': ' ... '}`` -- an object whose keys are names of different kinds of keywords and values are keyword definition strings in the first form
* ``'for while if|0 else weird_voodoo|10 ... '`` -- a string of space-separated keywords with an optional relevance over a pipe
* ``{keyword: ' ... ', literal: ' ... ', $pattern: /\w+/ }`` -- an object that describes multiple sets of keywords and the pattern used to find them

For detailed explanation see :doc:`Language definition guide </language-guide>`.

Expand Down
8 changes: 4 additions & 4 deletions src/highlight.js
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,8 @@ const HLJS = function(hljs) {
}

let last_index = 0;
top.lexemesRe.lastIndex = 0;
let match = top.lexemesRe.exec(mode_buffer);
top.keywordPatternRe.lastIndex = 0;
let match = top.keywordPatternRe.exec(mode_buffer);
let buf = "";

while (match) {
Expand All @@ -148,8 +148,8 @@ const HLJS = function(hljs) {
} else {
buf += match[0];
}
last_index = top.lexemesRe.lastIndex;
match = top.lexemesRe.exec(mode_buffer);
last_index = top.keywordPatternRe.lastIndex;
match = top.keywordPatternRe.exec(mode_buffer);
}
buf += mode_buffer.substr(last_index);
emitter.addText(buf);
Expand Down
2 changes: 1 addition & 1 deletion src/languages/javascript.js
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ export default function(hljs) {
end: /\/[A-Za-z0-9\\._:-]+>|\/>/
};
var KEYWORDS = {
$lexemes: ECMAScript.IDENT_RE,
$pattern: ECMAScript.IDENT_RE,
keyword: ECMAScript.KEYWORDS.join(" "),
literal: ECMAScript.LITERALS.join(" "),
built_in: ECMAScript.BUILT_INS.join(" ")
Expand Down
2 changes: 1 addition & 1 deletion src/languages/typescript.js
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ export default function(hljs) {
"abstract"
];
var KEYWORDS = {
$lexemes: ECMAScript.IDENT_RE,
$pattern: ECMAScript.IDENT_RE,
keyword: ECMAScript.KEYWORDS.concat(TS_SPECIFIC_KEYWORDS).join(" "),
literal: ECMAScript.LITERALS.join(" "),
built_in: ECMAScript.BUILT_INS.concat(TYPES).join(" ")
Expand Down
18 changes: 13 additions & 5 deletions src/lib/mode_compiler.js
Original file line number Diff line number Diff line change
Expand Up @@ -206,18 +206,26 @@ export function compileLanguage(language) {
// __beforeBegin is considered private API, internal use only
mode.__beforeBegin = null;

let kw_lexemes = null;
mode.keywords = mode.keywords || mode.beginKeywords;

let kw_pattern = null;
if (typeof mode.keywords === "object") {
kw_lexemes = mode.keywords.$lexemes;
delete mode.keywords.$lexemes;
kw_pattern = mode.keywords.$pattern;
delete mode.keywords.$pattern;
}

mode.keywords = mode.keywords || mode.beginKeywords;
if (mode.keywords) {
mode.keywords = compileKeywords(mode.keywords, language.case_insensitive);
}

mode.lexemesRe = langRe(mode.lexemes || kw_lexemes || /\w+/, true);
// both are not allowed
if (mode.lexemes && kw_pattern) {
throw new Error("ERR: Prefer `keywords.$pattern` to `mode.lexemes`, BOTH are not allowed. (see mode reference) ");
}

// `mode.lexemes` was the old standard before we added and now recommend
// using `keywords.$pattern` to pass the keyword pattern
mode.keywordPatternRe = langRe(mode.lexemes || kw_pattern || /\w+/, true);

if (parent) {
if (mode.beginKeywords) {
Expand Down

0 comments on commit 503bf51

Please sign in to comment.