Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readium cfi and epubjs cfi #1358

Open
panjadyboy opened this issue Sep 15, 2023 · 10 comments
Open

Readium cfi and epubjs cfi #1358

panjadyboy opened this issue Sep 15, 2023 · 10 comments

Comments

@panjadyboy
Copy link

I am developing applications using epubjs ,but data is in readium cfi . Can we convert readium cfi of same to epubjs cfi

@panjadyboy
Copy link
Author

Can anyone please help me on this ?

@danielweck
Copy link

EPUB CFI is an open standard, it should be implemented identically by Readium, EPUB.js, etc.
There may of course be bugs in any of these implementations ...

@danielweck
Copy link

in other words, cross-exchange of EPUB CFI expressions should work out-of-the-box. If you encounter problems then it's likely a bug in either of the CFI implementations you are relying on.
Once you find a specific bug, file it here at EPUB.js and at the Readium CFI GitHub repository, so that the developers can investigate. Until you find a specific bug to troubleshoot, the problem is too vague. The developers must be able to reproduce a particular behaviour in order to debug their code.

@johnfactotum
Copy link
Contributor

It seems that the CFI implementation in Epub.js is bugged : #470.

Additionally, Epub.js doesn't seem to handle escaping with ^ at all.

@danielweck
Copy link

What about this implementation? https://github.com/fread-ink/epub-cfi-resolver

@johnfactotum
Copy link
Contributor

It seems to have one serious bug, namely fread-ink/epub-cfi-resolver#14, that is, it always has a leading /2 referencing the root element, which is incorrect. Apart from that it looks solid.

By the way, Calibre also has the exact same issue (or at least the CFI it uses for storing annotations and bookmarks has this bug; I'm not sure if/how it uses CFIs in other places). See johnfactotum/foliate#849. There's also another Calibre bug described in the comments (johnfactotum/foliate#849 (comment)) but I haven't really investigate the cause of that one.

@danielweck
Copy link

danielweck commented Sep 25, 2023

Thank you @johnfactotum I noticed these bugs / inconsistencies, and I was wondering about your own FoliateJS implementation. I am trying to get a sense of what CFI libraries (written in Javascript, TypeScript or other dialect that transpiles to JS) currently implement parsing (from string expression) and generation (from DOM Range) most "correctly" (as demonstrated by unit test coverage).

@johnfactotum
Copy link
Contributor

Good list, Daniel. There's also the old sample prototype implementation from epub-specs: https://github.com/w3c/epub-specs/blob/20140228/src/samples/cfi/epubcfi.js

It would a good idea to make a test file containing some common edge cases. But below is my quick assessment based purely on looking at the codebases of each project.

Parsing

As noted in epub-cfi-resolver's readme, escaped characters in assertions can trip up parsers that try to match !, /, and [ without accounting for context.

  • readium-cfi-js, epub-cfi-resolver, and foliate-js have tests for this.
  • I couldn't find any tests for CFI in Vivliostyle and Bibi. Just by looking at the code, Vivliostyle should be good.
  • Bibi's approach seems to be correct, though it would only recognize alphanumeric IDs, and text location assertions must be URI component encoded.
  • Epub.js doesn't handle this correctly.

Generating

I couldn't find where Bibi is generating CFI.

There are mainly two issues when generating. The first issue is still related to escaping characters in assertions.

  • epub-cfi-resolver and foliate-js have tests for this.
  • I couldn't find anything on this in readium-cfi-js.
  • It appears that Vivliostyle only escapes text location assertions but not ID assertions.
  • Epub.js: doesn't do this.

The second issue is the treatment of character data (adjacent text nodes should be treated as a single chunk, etc.).

  • readium-cfi-js, epub-cfi-resolver, and foliate-js have tests for this.
  • Vivliostyle: looks good.
  • Epub.js: not sure if I'm understanding the code, but kind of looks incorrect to me. See the .textNodes() method.

Applying/locating

The main problem is again with character data.

@johnfactotum
Copy link
Contributor

For the sake of completeness, here's Calibre's implementation: https://github.com/kovidgoyal/calibre/blob/master/src/pyj/read_book/cfi.pyj. It's written in a Python-like language called RapydScript that transpiles to JavaScript. It says that it's based on Peter Sorotkin's code (which is the sample implementation linked in my comment above).

Also of interest is this project: https://github.com/valeriangalliat/kobo-highlights-to-calibre, which contains a description of some quirks of the CFI implementations in Kobo and Calibre.

@danielweck
Copy link

Thanks John, very useful. Also see: w3c/epubcheck#150 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants