Readium cfi and epubjs cfi #1358

panjadyboy · 2023-09-15T07:09:18Z

I am developing applications using epubjs ,but data is in readium cfi . Can we convert readium cfi of same to epubjs cfi

panjadyboy · 2023-09-15T07:10:02Z

Can anyone please help me on this ?

danielweck · 2023-09-15T07:25:43Z

EPUB CFI is an open standard, it should be implemented identically by Readium, EPUB.js, etc.
There may of course be bugs in any of these implementations ...

danielweck · 2023-09-15T11:46:57Z

in other words, cross-exchange of EPUB CFI expressions should work out-of-the-box. If you encounter problems then it's likely a bug in either of the CFI implementations you are relying on.
Once you find a specific bug, file it here at EPUB.js and at the Readium CFI GitHub repository, so that the developers can investigate. Until you find a specific bug to troubleshoot, the problem is too vague. The developers must be able to reproduce a particular behaviour in order to debug their code.

johnfactotum · 2023-09-22T06:46:17Z

It seems that the CFI implementation in Epub.js is bugged : #470.

Additionally, Epub.js doesn't seem to handle escaping with ^ at all.

danielweck · 2023-09-22T10:44:00Z

What about this implementation? https://github.com/fread-ink/epub-cfi-resolver

johnfactotum · 2023-09-22T11:31:08Z

It seems to have one serious bug, namely fread-ink/epub-cfi-resolver#14, that is, it always has a leading /2 referencing the root element, which is incorrect. Apart from that it looks solid.

By the way, Calibre also has the exact same issue (or at least the CFI it uses for storing annotations and bookmarks has this bug; I'm not sure if/how it uses CFIs in other places). See johnfactotum/foliate#849. There's also another Calibre bug described in the comments (johnfactotum/foliate#849 (comment)) but I haven't really investigate the cause of that one.

danielweck · 2023-09-25T16:14:16Z

Thank you @johnfactotum I noticed these bugs / inconsistencies, and I was wondering about your own FoliateJS implementation. I am trying to get a sense of what CFI libraries (written in Javascript, TypeScript or other dialect that transpiles to JS) currently implement parsing (from string expression) and generation (from DOM Range) most "correctly" (as demonstrated by unit test coverage).

johnfactotum · 2023-09-26T06:56:30Z

Good list, Daniel. There's also the old sample prototype implementation from epub-specs: https://github.com/w3c/epub-specs/blob/20140228/src/samples/cfi/epubcfi.js

It would a good idea to make a test file containing some common edge cases. But below is my quick assessment based purely on looking at the codebases of each project.

Parsing

As noted in epub-cfi-resolver's readme, escaped characters in assertions can trip up parsers that try to match !, /, and [ without accounting for context.

readium-cfi-js, epub-cfi-resolver, and foliate-js have tests for this.
I couldn't find any tests for CFI in Vivliostyle and Bibi. Just by looking at the code, Vivliostyle should be good.
Bibi's approach seems to be correct, though it would only recognize alphanumeric IDs, and text location assertions must be URI component encoded.
Epub.js doesn't handle this correctly.

Generating

I couldn't find where Bibi is generating CFI.

There are mainly two issues when generating. The first issue is still related to escaping characters in assertions.

epub-cfi-resolver and foliate-js have tests for this.
I couldn't find anything on this in readium-cfi-js.
It appears that Vivliostyle only escapes text location assertions but not ID assertions.
Epub.js: doesn't do this.

The second issue is the treatment of character data (adjacent text nodes should be treated as a single chunk, etc.).

readium-cfi-js, epub-cfi-resolver, and foliate-js have tests for this.
Vivliostyle: looks good.
Epub.js: not sure if I'm understanding the code, but kind of looks incorrect to me. See the .textNodes() method.

Applying/locating

The main problem is again with character data.

readium-cfi-js, epub-cfi-resolver, and foliate-js have tests for this.
Vivliostyle: looks good.
Bibi: from a quick glance at the code, it doesn't seem to handle this.
Epub.js: Specific CFI Problems via the XPATH reference #470.

johnfactotum · 2023-09-26T09:47:08Z

For the sake of completeness, here's Calibre's implementation: https://github.com/kovidgoyal/calibre/blob/master/src/pyj/read_book/cfi.pyj. It's written in a Python-like language called RapydScript that transpiles to JavaScript. It says that it's based on Peter Sorotkin's code (which is the sample implementation linked in my comment above).

Also of interest is this project: https://github.com/valeriangalliat/kobo-highlights-to-calibre, which contains a description of some quirks of the CFI implementations in Kobo and Calibre.

danielweck · 2023-09-26T11:28:44Z

Thanks John, very useful. Also see: w3c/epubcheck#150 (comment)

mikespub mentioned this issue Mar 13, 2024

Intity Epubreader mikespub-org/intity-epubjs-reader#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readium cfi and epubjs cfi #1358

Readium cfi and epubjs cfi #1358

panjadyboy commented Sep 15, 2023

panjadyboy commented Sep 15, 2023

danielweck commented Sep 15, 2023

danielweck commented Sep 15, 2023

johnfactotum commented Sep 22, 2023

danielweck commented Sep 22, 2023

johnfactotum commented Sep 22, 2023

danielweck commented Sep 25, 2023 •

edited

johnfactotum commented Sep 26, 2023

johnfactotum commented Sep 26, 2023

danielweck commented Sep 26, 2023

Readium cfi and epubjs cfi #1358

Readium cfi and epubjs cfi #1358

Comments

panjadyboy commented Sep 15, 2023

panjadyboy commented Sep 15, 2023

danielweck commented Sep 15, 2023

danielweck commented Sep 15, 2023

johnfactotum commented Sep 22, 2023

danielweck commented Sep 22, 2023

johnfactotum commented Sep 22, 2023

danielweck commented Sep 25, 2023 • edited

johnfactotum commented Sep 26, 2023

Parsing

Generating

Applying/locating

johnfactotum commented Sep 26, 2023

danielweck commented Sep 26, 2023

danielweck commented Sep 25, 2023 •

edited