Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow page breaks in floats, absolute blocks, table-cells #36

Closed
Smylers opened this issue Feb 14, 2013 · 36 comments
Closed

Allow page breaks in floats, absolute blocks, table-cells #36

Smylers opened this issue Feb 14, 2013 · 36 comments
Labels
bug Existing features not working as expected
Milestone

Comments

@Smylers
Copy link
Contributor

Smylers commented Feb 14, 2013

Floated elements that don't fit on the current page simply fall off the bottom, rather than being placed on the next page.

Here's a handy long list of floated elements to demonstrate the problem: http://www.stripey.com/demo/weasyprint/float_off_bottom.html

Look at it in Firefox and do ‘Print Preview’. You should see that there's a page break, with the list being continued on page 2. Similarly if you print from Chromium.

But WeasyPrint generates this file, where the elements simply run off the bottom of the first page: http://www.stripey.com/demo/weasyprint/float_off_bottom.pdf

@SimonSapin
Copy link
Member

Yes, this is a known limitation: no page breaks are supported inside floats, absolute positioning, or table cells. Unfortunately right now I don’t have a better answer than “avoid using floats that way”.

I’d be happy to help anyone who wants to fix this, but this is a non-trivial change in the layout code. Otherwise this is something to be fixed eventually, but I don’t know when I’ll get to it.

@Smylers
Copy link
Contributor Author

Smylers commented Feb 14, 2013

Thanks. From your description I'm not sure whether this is the known limitation or not.

In this case I'm not trying to have page breaks inside a floated element, but between floated elements. Each li is floated separately. My apologies for not making that clearer in the initial report.

@liZe
Copy link
Member

liZe commented Oct 28, 2016

As reported in #375, we have the same problem with consecutive absolute/relative blocks.

@hughsw
Copy link
Contributor

hughsw commented Apr 27, 2017

I have just started using WeasyPrint, and I'm already a big fan. However, I have also quickly run into the float/break issue -- my users want Bootstrap and floated columns, and don't like what happens in the PDF document!

Can @SimonSapin or anyone else comment on the refactoring that would be necessary to fix this wartish problem? I haven't perused your codebase yet, but I know Python very well; so, I'm looking for high-level overview of the current layout model/algorithm and why it gets tripped up trying to put breaks in floats, and what would have to be changed.

Thanks,
-Hugh

@SimonSapin
Copy link
Member

301 @liZe

@liZe
Copy link
Member

liZe commented Apr 27, 2017

Let's go! I'll skip some details and lie a little bit to avoid useless complexity.

Web pages have mainly been created to be displayed on rectangles whose width is fixed and whose height is automatically calculated according to the content. That's what a "normal" browser do. But the problem is a bit different when you want to print these web pages: the height is fixed too and you'll need to cut the content between different pages.

CSS defines how the layout must be done, how blocks and texts are displayed. "Normal" blocks are put one below the other and "normal" texts are broken between multiple lines put one below the other. The way the "normal" content is displayed is called the normal flow.

CSS gives the possibility to remove blocks from the normal flow of the page and make them behave in a different way. These blocks sometimes create their own flow, creating nested or parallel flows in the page. That's where it's becoming a bit hard.

When CSS 2 has been written, floats and absolute/relative blocks (and somehow tables) were (almost) the only blocks creating parallel flows, and no-one really defined how these parallel flows had to be broken between pages. That's why WeasyPrint's layout has only one flow that can be correcly broken, and the blocks that are outside this flow are seen as atomic blocks going below the bottom of the page if needed.

But now, many CSS specifications have added many ways to create strange flows, such as columns, regions, flexbox and grid. It was time to define how parallel and nested flows had to be broken between pages. It's now done in the fragmentation module. It's not clearly defined but it's much better than what we had in CSS 2.

Bad news: it was not written when we started WeasyPrint.

Really bad news: it's really different from what we have in WeasyPrint.

It's probably not that difficult to implement the parts of the fragmentation module that are needed to fix this issue (well, for really simple cases). But it will need to slightly change many functions and modules in a single atomic commit that will be huge. We can imagine that the work needed is something like #291: long, tiring and painful. But not impossible.

@hughsw
Copy link
Contributor

hughsw commented Apr 28, 2017

OK. Where should I be looking in the code to learn about the following (beyond the peephole insight of #291):

  • WeasyPrint's layout model and algorithm
  • Where to make fixes for this issue
  • What would have to change architecturally to address the fragmentation module spec

Thanks!

@liZe
Copy link
Member

liZe commented Apr 28, 2017

WeasyPrint's layout model and algorithm

You'll find all the code you need in the layout folder. The layout.pages module has got a make_all_pages function, calling the make_page function, calling the block_level_layout function, etc.

Where to make fixes for this issue
What would have to change architecturally to address the fragmentation module spec

Nested flows (as defined by the fragmentation CSS module) are pretty well supported for block-level and inline-level boxes, using a variable called resume_at that keeps a kind of pointer to where the rendering is (the "current" position). resume_at contains nested tuples representing the nested boxes, you'll find how it works for example in the block_container_layout function (in layout.blocks).

We need to add the support of parallel flows. Instead of one pointer pointing to one position in the flow, we need multiple pointers pointing to the "current" positions in the parallel flows. I imagine that resume_at can be changed into resume_at_list, containing one or more resume_at pointers.

To fix this issue, we basically need:

  • to change resume_at into resume_at_list almost everywhere, as rendering a box in the flow can return parallel positions where the parallel flows have reached the end of the page (one flow for itself and for each child creating parallel flows such as floats, table cells, etc., the list is in the fragmentation module),
  • to make floats, table cells, etc. take care of the bottom of the page and return their resume_at_list, instead of assuming that they have no limit for their vertical position.

That's all 😄! I think that everything's not correcly defined in the spec, we'll have to make some stupid choices for stupid cases (how do you render floats whose top border is taller than the page?), but the "normal" use cases should be quite well described and easy (and long, and painful) to implement.

If you need anything, I'll be really happy to help!

@hughsw
Copy link
Contributor

hughsw commented Apr 28, 2017

Thanks. That's just the kind of overview I was looking for.

One last thing: Testing driven development: (okay, two last things)

  • What's the quickest way to run tests during development work?
  • Do you have instances of HTML/CSS tests for parallel flows, that, when passing, will indicate that the work is finished? I of course have the instance that got me here, but do you know of a reference test set?

@hughsw
Copy link
Contributor

hughsw commented Apr 28, 2017

FYI, my habit is to do minor refactoring while I'm working to understand existing logic. So you can expect some PRs along those lines.

Also, I'm completely new to CSS implementation work ! ;-) However, my career has largely been developing scientific software, so I'm at home with specs and deep algorithms.

The code appears to have a good number of pointers to key CSS specs. However, if there are some spec documents that are so basic that you wouldn't mention them in code comments, they might actually be useful for me! So, I would appreciate pointers to key algorithmic starting points for CSS.

Thanks.

@liZe
Copy link
Member

liZe commented Apr 28, 2017

What's the quickest way to run tests during development work?

./setup.py test (launch tests and check coding style).

Do you have instances of HTML/CSS tests for parallel flows.

<style>
  @page {
    font-family: monospace;
    height: 2.5em;
    line-height: 1em;
    margin: 0;
    width: 10em;
  }
  body {
    margin: 0;
  }
  div {
    background: red;
    float: left; 
    width: 50%; 
  }
</style>

<body>
  <div>
    float float float float float
  </div>
  flow flow flow flow flow
</body>

You need to get something like:

Page 1
+-------------------------+
| float float | flow flow |
| float float | flow flow |
+-------------------------+
Page 2
+-------------------------+
| float       | flow      |
|-------------+           |
+-------------------------+

However, my career has largely been developing scientific software, so I'm at home with specs and deep algorithms.

You'll need these skills!

So, I would appreciate pointers to key algorithmic starting points for CSS.

There's a very useful chapter in the documentation. In the CSS spec, the best starting point is probably the presentation of the normal flow and the implementation of 9.4.1 and 9.4.2 in layout.blocks and layout.inlines.

Good luck!

@hughsw
Copy link
Contributor

hughsw commented May 3, 2017

OK, you threw me in the deep end of CSS spec, and I'm floundering, but progressing, through prose like this:

Except for table boxes, which are described in a later chapter, and replaced elements, a block-level box is also a block container box. A block container box either contains only block-level boxes or establishes an inline formatting context and thus contains only inline-level boxes. Not all block container boxes are block-level boxes: non-replaced inline blocks and non-replaced table cells are block containers but not block-level boxes. Block-level boxes that are also block containers are called block boxes.

I don't yet have a solid mental-model of what it takes to do all the layout given multiple flows and page breaks, but I'm working on it, and the WeasyPrint code-base is very approachable and the focus on resume_at helps. To keep getting my hands dirty I intend to add a terse detail string to each assert, so at least I'll know e.g. box sizes when I'm breaking things...

@polonat
Copy link

polonat commented Oct 5, 2017

We solved the table split problem by placing <div style="clear:both;"><div> before table.

@wd
Copy link

wd commented May 29, 2018

This problem is really annoying, I found a way to fix this.

The key point is split one <tr> into more <tr>, eg:

<tr>
  <td>col1</td>
  <td>long lines1
long lines2
long lines3
long lines4
  </td>
 </tr>

will be changed to

<tr>
  <td class="top_border"></td>
  <td class="top_border">long lines1</td>
 </tr>
<tr>
  <td class="no_border">col1</td>
  <td class="no_border">long lines2</td>
 </tr>
<tr>
  <td class="no_border"></td>
  <td class="no_border">long lines3</td>
 </tr>
<tr>
  <td class="no_border"></td>
  <td class="no_border">long lines4</td>
 </tr>

css

        table tr .no_border {
            border-left: 1px solid #000000;
            border-right: 1px solid #000000;
            border-top: 0;
            border-bottom: 0;
        }

        table tr .top_border {
            border-left: 1px solid #000000;
            border-right: 1px solid #000000;
            border-top: 1px solid #000000;
            border-bottom: 0;
        }

This just some sample code, just try to explain the main ideas, you need to change it to fit your situations. Wish this could help someone out.

@Hideman85
Copy link

Any news on this thread, what about the support of tables?

I'm currently using wkHtmlToPdf and also have the issue with tables, the current behavior is cut every where (that is fine for me) but it also allow to cut in the middle of a line of text that makes the lib not usable for me.

Do we have a patch for this lib for my desired behavior?

@liZe
Copy link
Member

liZe commented Sep 29, 2020

Do we have a patch for this lib for my desired behavior?

No, there’s currently no patch. As said earlier, there’s no easy fix, and closing this issue requires a lot of work.

This was referenced Nov 23, 2020
liZe added a commit that referenced this issue Aug 22, 2021
Many operations, including page breaks, require a pointer to a specific
position of the box tree. For example, we used to have this structure to point
to the beginning of the first child of the second child:

(1, (2, None))

We now use:

{1: {2: None}}

This change is the first step to handle parallel flows (see #36). It doesn’t
change anything to the layout for now, but it allows us to store multiple
pointers in the same structure.

The next step is to handle multiple pointers in skip_stack during boxes layout.
It means that most of the *_layout() function need an extra for-loop to manage
multiple skip stacks.

We’ll then need to split new types of boxes: table cells, floats, absolutes…
liZe added a commit that referenced this issue Aug 23, 2021
In CSS Display Module Level 3, the "display" property gets a long
representation allowing:

- a clear separation between inner and outer display type,
- new supported types (contents, run-in, flow-root…),
- inline list items.

This commit allows the (retrocompatible) new long syntax for "display". It also
supports the "flow-root" value. It doesn’t support values related to ruby, and
it doesn’t support the new "contents" and "run-in" display types.

This work gives the possibility to simplify the code in the block_*_layout
functions, and to improve the overall layout.

Related to #36.
@liZe liZe added this to the 54.0 milestone Aug 31, 2021
@liZe
Copy link
Member

liZe commented Sep 20, 2021

We won’t break inline-block boxes, because according to the spec:

Since line boxes contain no possible break points, inline-block and inline-table boxes (and other inline-level display types that establish an independent formatting context) may also be considered monolithic: that is, in the cases where a single line box is too large to fit within its fragmentainer even by itself and the UA chooses to split the line box, it may fragment such boxes or it may treat them as monolithic.

@liZe liZe changed the title Allow page breaks in floats, absolute blocks, inline-blocks, table-cells Allow page breaks in floats, absolute blocks, table-cells Sep 20, 2021
@grewn0uille grewn0uille pinned this issue Oct 20, 2021
@liZe liZe closed this as completed in f5d6d54 Dec 12, 2021
@liZe
Copy link
Member

liZe commented Dec 12, 2021

We now handle parallel flows for floats, absolutes, relatives, and table-cells.

This bug is now closed. It required 9 years of hard work 🚀.

We’ll release a beta soon, tests and feedback are welcome!

@RafaelLinux
Copy link

Please, warn us here to test when available.

Thank you

@grewn0uille
Copy link
Member

Hello,

A beta has been released.

Don’t hesistate to try it 😉

@pzdkn
Copy link

pzdkn commented Nov 25, 2022

So where can I find this new feature? Is this integrated in the newest release?

@liZe
Copy link
Member

liZe commented Nov 25, 2022

So where can I find this new feature? Is this integrated in the newest release?

Hi!

As you can see in the metadata of these issues, it’s available since version 54.

@pzdkn
Copy link

pzdkn commented Nov 25, 2022

Thanks liZe, sorry for this stupid question:
I am using version 57.1, but still can't break a <tr> of long text over to the next page. Is there any parameter I need to pass for this to work?

@liZe
Copy link
Member

liZe commented Nov 26, 2022

I am using version 57.1, but still can't break a <tr> of long text over to the next page. Is there any parameter I need to pass for this to work?

It should work out of the box.

If your table row is not split, then there may be another CSS rule avoiding breaks somewhere (td { break-inside: avoid } for example). Or the content of the table cell may be using a layout that WeasyPrint is not able to split yet, like a flex box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Existing features not working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.