New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting the text per rendered section / block using epub js? #1376
Comments
Thank you for providing that documentation. After looking it over, it looks like I need the current render instance to hook in to some of these items (like the layout) --- Would I be able to utilize my current epub rendition object to "hook in" and obtain all of the info necessary to form that mapping construction and extract the text sections? I'm just trying to find the "tie-in" piece between my current rendition object and making the mapping to extract from. Thank you for your time. |
The mapping object is available from Also worth noting that Epub.js calculates the ranges by splitting the text by the space character which could be problematic when the columns ("pages") start or end on hyphens or other none-white-space characters. |
Ok, I think I have a function up and running --- If I wanted to extend the DefaultViewManager class to include this new function in my component, do you know what the import path would be for class extension? This is a rough implementation:
It's also worth noting while digging through the manager that I saw isVisible() / visible() functions --- I'm wondering if the center could also be calculated by using these functions to divide the visible page length by 2. You've been very helpful, thanks! |
Don't know. Extending it could be problematic. I would just do it in the application code.
Not sure what you mean, but those only check the visibility of "views", which are essentially iframes, each of which would be a chapter. Looking at the code, it seems to me the "visible page length" you want is the |
Could you clarify what you mean by "do it in the application code?" Just trying to see how I would be able to add that function. Right now, I'm importing epub js via npm install node module and interacting with the epub object via the import line. and if I wanted to call If I wanted to change this statement to |
I mean doing it outside Epub.js. If all the properties are public and accessible from the Indeed you don't have to make use of the manager at all. You can get the start and end CFIs from If you really need to extend the Manager, well, currently only the following classes are exported: |
I just noticed that there's actually |
From the previous edit, if that function does what I think it does, I have edited the function to be able to access the properties. That just leaves some of the layout properties that are inaccessible from the rendition property. Would I be able to use something similar with the manager to be able to access these properties? Something like Instead of |
No, here class MyManager extends rendition.requireManager('default') {
getCenteredLocation() {
}
} To actually use this, you need to pass this class as the let rendition = book.renderTo('viewer', { manager: MyManager }) Or you might be able to use |
Thank you for that suggestion. Here is a new version, not creating a CustomRender but creating a CustomManager:
Here is my implementation:
When running, it seems the application is returning the following error code: I wonder if this is because we are required to use this in an instance instead of a static context. That would explain why we couldn't call requireManager("default") on the Rendition class itself. One possible solution would be to try to create an instance of Rendition within CustomManager() and call requireManager() on that instance. The challenge would be having access to the same functions as the default manager in this case. |
Yeah, it's not a static method, so you have to create an instance first (then create your custom class, and then create the actual instance that you're going to use). Which makes no sense (see for instance #966), but that's the way it is now. |
Would you be able to provide me with an example of what that looks like from an initialization standpoint? I dont mind the workaround, as long as it works. Also side note, do you know if this library is still being actively developed / updated? |
Something like: import { Rendition, Book } from "epubjs";
const DefaultViewManager = new Rendition(new Book()).requireManager("default");
class CustomManager extends DefaultViewManager {
}
I believe it's still developed. There were some new commits in |
Thank you for that clarification and all of the help you have provided so far. With that new implementation, would I use this line as if it was a static class? or these lines to setup an instance:
|
Yes, I think that's the way.
Lines 229 to 234 in f09089c
|
Thanks for that heads up, yeah that's a bit misleading. For testing my application, I'm going to focus on just getting the render to accept the custom manager at the moment w/ all inherited properties from default manager, and not worry so much about the new function for now. Once I can get the render up and running with the new manager, then I should be good to put focus back on the custom function. |
That did the trick! Big step forward in getting the render to utilize the custom manager instead of the default manager. Now, I can just focus on the implementation of the getCenteredLocation() function to extract the text per section |
Moving on to the implementation of getCenteredLocation(), I think some of the code will need to be changed to reflect being an extension of DefaultViewManager. For instance, anything that says "this.rendition.manager" should be replaced with "this" , or anynthing with "this.rendition" to "this"
|
Another update: More good progress. Got it to start the application correctly and call the getCenteredLocation() function using the custom manager. Added some extra debug code like counting number of sections, console logging of text, etc. I am getting one bug in the form of my application only ever returning 1 section after running the new function. I am basically rendering the returned sections array size to the main application, and have it call that code every time a location change occurs. I remember you mentioning some bits before about the page width to correct, but I couldn't find that exact comment upon going back through the thread. I ended up using getPaginatedLocation() as a template for this function, so there might be some other things that need to be changed here, just like I had to change some of the "this.xxx" statements. Once I have it returning the correct number of displayed sections (which should be 2 in this example, I can move on to text extraction code. |
If you mean that the length of |
Ok, I might just need to grab the right property of the section then to get the number I'm looking for. to obtain that value, it was just: Was there perhaps a certain property of sections that would contain that number? I'm trying to figure out how I would utilize sections to find the "splitting point" that would allow me to extract text by. For instance, if section contains two separate "blocks", which I highlighted in red in the first post, I would want to access the number of "blocks" being rendered and perform functions against them for text extraction. I was under the impression these "blocks" I am referring to are sections. I was put off by the return "sections" at the bottom of the function, which made me think there was going to be more than 1.
|
A "section" in Epub.js is an XHTML file in the EPUB. So for fixed layout EPUBs, yes, each "page" in a spread would normally consist of one section. But for reflowable EPUBs, no. Although there appear to be two text blocks in a spread, they would belong to the same section, rendered in the same iframe. The blocks are just CSS columns. That's why you have to check the rects of ranges in the section to see which columns they are being rendered to. |
Got it, I think I will need to include a columns array within getCenteredLocation() and add it within the return. |
it looks like this is what I've been looking for:
function definition from mapping.js:
I seem to always return a value of 0 for columns when calling in the function below:
|
Update: Was able to get accurate column counts using the pages property of sections and counting total. I noticed each page has start and end properties, so attempted to gain Cfi values to create the range function and seem to get undefined values:
|
Hello fellow epub js programmers! My current goal is to be able to extract the text that is being displayed from the render. Using the default render method, there is typically 2 different "blocks" being rendered --- one on the left and one on the right. This changes depending on the device/scaling. Is there a way I can separate the extracted text out into 2 separate variables for each rendered block of text? And if it is the case that the device screen/scaling is small enough to only be displaying one rendered "block", then the 2nd variable would not contain any text. Any ideas on how this could be implemented? Thanks!
I have attached an image with 2 highlighted boxes in red, showing my desired functionality. Thanks!
The text was updated successfully, but these errors were encountered: