Fix: individual /Resources directory are now properly produced for each document page #1133

Lucas-C · 2024-03-06T11:22:37Z

This change causes a slight increase of PDF documents, but makes their structure more valid regarding the PDF spec.

Checklist:

The GitHub pipeline is OK (green), meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.
A unit test is covering the code added / modified by this PR
This PR is ready to be merged
In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder
A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

Lucas-C · 2024-03-06T11:31:25Z

Could you please review this @andersonhc or @gmischler?

Despite the 100+ reference PDF files modified, this PR onlys affects 2 source Python files and is relatively short.

Lucas-C · 2024-03-07T08:23:59Z

There are some alternative approaches to the one currently used in this PR:

instead of 3 extra dicts (.fonts_used_per_page_number / .images_used_per_page_number / .graphics_style_names_per_page_number), we could store those informations as properties in PDFPage instances (in FPDF.pages)
instead of storing this information at all, we could generalize the regex-based approach used in FPDF.drawing_context() and perform those regex-matches in OutputProducer based on the final .contents of pages

…ch document page

andersonhc · 2024-03-07T20:45:40Z

I will try to make some time for a complete review by tomorrow.
For now I am running some tests and it's interesting your PR makes it clear we are letting unused fonts end up in the output document.

An example:

from fpdf import FPDF

pdf = FPDF()
pdf.add_font("NotoSans", "B", "NotoSans-Bold.ttf")
pdf.add_font("NotoSans", "BI", "NotoSans-BoldItalic.ttf")
pdf.add_font("NotoSans", "I", "NotoSans-Italic.ttf")
pdf.add_font("NotoSans", "", "NotoSans-Regular.ttf")

pdf.set_font("NotoSans", "", 12)
pdf.single_resources_object = False

pdf.add_page()
pdf.multi_cell(w=pdf.epw, text="**Text in bold**", markdown=True)

pdf.add_page()
pdf.multi_cell(w=pdf.epw, text="__Text in italic__", markdown=True)

pdf.output("test1.pdf")

The page 1 will have "F1" and "F4" in the resources dictionary, and page 2 will have "F3" and "F4".
"F2" is added on the final document and not referenced at all. "F4" is added to the resource list because of set_font() although not used.

I have documents with many fallback fonts and there is a considerable amount of unused font data added to the documents.

Might be something to tackle in a future PR.

andersonhc · 2024-03-10T11:49:16Z

CHANGELOG.md

@@ -25,6 +25,7 @@ This can also be enabled programmatically with `warnings.simplefilter('default',
 * [`FPDF.write_html()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.write_html) now accepts a `tag_indents` parameter to control, for example, the indent of `<blockquote>` elements
 * allow to define custom `cell_fill_mode` logic for tables: [_Set cells background_ - documentation section](https://py-pdf.github.io/fpdf2/Tables.html#set-cells-background). Also added 2 new values: `TableCellFillMode.EVEN_ROWS` & `TableCellFillMode.EVEN_COLUMNS`: [documentation](https://py-pdf.github.io/fpdf2/fpdf/enums.html#fpdf.enums.TableCellFillMode)
 ### Fixed
+* individual `/Resources` directory are now properly produced for **each** document page. This causes a slight increase of PDF documents, but makes their structure more valid regarding the PDF spec. You can still use the old behaviour by setting `FPDF().single_resources_object = True`.


Suggestion to make it clearer the increase is the size of the documents:

individual /Resources directories are now properly created for each document page. This change ensures better compliance with the PDF specification but results in a slight increase in the size of PDF documents. You can still use the old behavior by setting FPDF().single_resources_object = True.

andersonhc

I don't have anything else to add. The changes are simple and clear.
I didn't have time to visually inspect all the changed reference files but I did a few and everything looks OK.

andersonhc · 2024-03-10T11:57:14Z

fpdf/output.py

+                    page_img_objs_per_index,
+                    page_gfxstate_objs_per_name,
+                )
+


Suggestions for future improvement:

If multiple pages use the exact same resources, they can make reference to the same resource dictionary object.

Lucas-C requested a review from gmischler as a code owner March 6, 2024 11:22

Lucas-C mentioned this pull request Mar 6, 2024

Fix test_cat_subset_ensure_reduced_size() py-pdf/pdfly#45

Open

Lucas-C force-pushed the individual-page-resources branch from f655bbb to d11887a Compare March 6, 2024 11:30

Lucas-C force-pushed the individual-page-resources branch 14 times, most recently from b9fdf5e to af159f9 Compare March 7, 2024 07:51

Fix: individual /Resources directory are now properly produced for ea…

c123b0e

…ch document page

Lucas-C force-pushed the individual-page-resources branch from af159f9 to c123b0e Compare March 7, 2024 08:26

andersonhc self-requested a review March 10, 2024 11:30

andersonhc reviewed Mar 10, 2024

View reviewed changes

andersonhc approved these changes Mar 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: individual /Resources directory are now properly produced for each document page #1133

Fix: individual /Resources directory are now properly produced for each document page #1133

Lucas-C commented Mar 6, 2024 •

edited

Lucas-C commented Mar 6, 2024 •

edited

Lucas-C commented Mar 7, 2024 •

edited

andersonhc commented Mar 7, 2024

andersonhc Mar 10, 2024

andersonhc left a comment

andersonhc Mar 10, 2024

Fix: individual /Resources directory are now properly produced for each document page #1133

Are you sure you want to change the base?

Fix: individual /Resources directory are now properly produced for each document page #1133

Conversation

Lucas-C commented Mar 6, 2024 • edited

Lucas-C commented Mar 6, 2024 • edited

Lucas-C commented Mar 7, 2024 • edited

andersonhc commented Mar 7, 2024

andersonhc Mar 10, 2024

Choose a reason for hiding this comment

andersonhc left a comment

Choose a reason for hiding this comment

andersonhc Mar 10, 2024

Choose a reason for hiding this comment

Lucas-C commented Mar 6, 2024 •

edited

Lucas-C commented Mar 6, 2024 •

edited

Lucas-C commented Mar 7, 2024 •

edited