Problem with `tokenize` or regexs #635

rrthomas · 2024-04-22T12:28:38Z

I ran into this issue when the functx:lines function didn't seem to work for me, giving me an extra blank line after each line.

But I can find a simple reproducer using an example from: https://www.altova.com/xpath-xquery-reference/fn-tokenize which says:

For example:
fn:tokenize("abracadabra", "(ab)|(a)") returns ("", "r", "c", "d", "r", "")

But with fontoxpath:

var fontoxpath = require("fontoxpath")

console.log(fontoxpath.evaluateXPathToStrings(
'fn:tokenize("abracadabra", "(ab)|(a)")',
 null, 
undefined, 
undefined,
 {language: fontoxpath.evaluateXPath.XQUERY_3_1_LANGUAGE}))

Output:

[
  '',          'ab',
  'undefined', 'r',
  'undefined', 'a',
  'c',         'undefined',
  'a',         'd',
  'ab',        'undefined',
  'r',         'undefined',
  'a',         ''
]

Looks like the captures are being incorrectly returned as part of the results of tokenize.

The text was updated successfully, but these errors were encountered:

See FontoXML/fontoxpath#635

DrRataplan · 2024-05-14T15:36:37Z

Hey Reuben,

Sorry for the long wait! Many changes: I'm no longer with Fonto, but I'm still involved in development!

Got it: we use regular JS regexes here, which indeed output capture groups... I made a fix, which I'll PR shortly!

Kind regards,

Martin

rrthomas · 2024-05-14T20:26:41Z

Many thanks @DrRataplan!

rrthomas added a commit to rrthomas/ruth that referenced this issue Apr 22, 2024

FunctX: add a workaround to functx:lines for a fontoxpath bug

4de590f

See FontoXML/fontoxpath#635

DrRataplan mentioned this issue May 14, 2024

Fix tokenize with capture groups #638

Merged

bwrrp closed this as completed in #638 May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with `tokenize` or regexs #635

Problem with `tokenize` or regexs #635

rrthomas commented Apr 22, 2024

DrRataplan commented May 14, 2024

rrthomas commented May 14, 2024

Problem with tokenize or regexs #635

Problem with tokenize or regexs #635

Comments

rrthomas commented Apr 22, 2024

DrRataplan commented May 14, 2024

rrthomas commented May 14, 2024

Problem with `tokenize` or regexs #635

Problem with `tokenize` or regexs #635