Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: individual /Resources directory are now properly produced for each document page #1133

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Lucas-C
Copy link
Member

@Lucas-C Lucas-C commented Mar 6, 2024

This change causes a slight increase of PDF documents, but makes their structure more valid regarding the PDF spec.

Checklist:

  • The GitHub pipeline is OK (green), meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.

  • A unit test is covering the code added / modified by this PR

  • This PR is ready to be merged

  • In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder

  • A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

@Lucas-C
Copy link
Member Author

Lucas-C commented Mar 6, 2024

Could you please review this @andersonhc or @gmischler?

Despite the 100+ reference PDF files modified, this PR onlys affects 2 source Python files and is relatively short.

@Lucas-C Lucas-C force-pushed the individual-page-resources branch 14 times, most recently from b9fdf5e to af159f9 Compare March 7, 2024 07:51
@Lucas-C
Copy link
Member Author

Lucas-C commented Mar 7, 2024

There are some alternative approaches to the one currently used in this PR:

  • instead of 3 extra dicts (.fonts_used_per_page_number / .images_used_per_page_number / .graphics_style_names_per_page_number), we could store those informations as properties in PDFPage instances (in FPDF.pages)
  • instead of storing this information at all, we could generalize the regex-based approach used in FPDF.drawing_context() and perform those regex-matches in OutputProducer based on the final .contents of pages

@andersonhc
Copy link
Collaborator

I will try to make some time for a complete review by tomorrow.
For now I am running some tests and it's interesting your PR makes it clear we are letting unused fonts end up in the output document.

An example:

from fpdf import FPDF

pdf = FPDF()
pdf.add_font("NotoSans", "B", "NotoSans-Bold.ttf")
pdf.add_font("NotoSans", "BI", "NotoSans-BoldItalic.ttf")
pdf.add_font("NotoSans", "I", "NotoSans-Italic.ttf")
pdf.add_font("NotoSans", "", "NotoSans-Regular.ttf")

pdf.set_font("NotoSans", "", 12)
pdf.single_resources_object = False

pdf.add_page()
pdf.multi_cell(w=pdf.epw, text="**Text in bold**", markdown=True)

pdf.add_page()
pdf.multi_cell(w=pdf.epw, text="__Text in italic__", markdown=True)

pdf.output("test1.pdf")

The page 1 will have "F1" and "F4" in the resources dictionary, and page 2 will have "F3" and "F4".
"F2" is added on the final document and not referenced at all. "F4" is added to the resource list because of set_font() although not used.

I have documents with many fallback fonts and there is a considerable amount of unused font data added to the documents.

Might be something to tackle in a future PR.

@andersonhc andersonhc self-requested a review March 10, 2024 11:30
@@ -25,6 +25,7 @@ This can also be enabled programmatically with `warnings.simplefilter('default',
* [`FPDF.write_html()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.write_html) now accepts a `tag_indents` parameter to control, for example, the indent of `<blockquote>` elements
* allow to define custom `cell_fill_mode` logic for tables: [_Set cells background_ - documentation section](https://py-pdf.github.io/fpdf2/Tables.html#set-cells-background). Also added 2 new values: `TableCellFillMode.EVEN_ROWS` & `TableCellFillMode.EVEN_COLUMNS`: [documentation](https://py-pdf.github.io/fpdf2/fpdf/enums.html#fpdf.enums.TableCellFillMode)
### Fixed
* individual `/Resources` directory are now properly produced for **each** document page. This causes a slight increase of PDF documents, but makes their structure more valid regarding the PDF spec. You can still use the old behaviour by setting `FPDF().single_resources_object = True`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion to make it clearer the increase is the size of the documents:

  • individual /Resources directories are now properly created for each document page. This change ensures better compliance with the PDF specification but results in a slight increase in the size of PDF documents. You can still use the old behavior by setting FPDF().single_resources_object = True.

Copy link
Collaborator

@andersonhc andersonhc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have anything else to add. The changes are simple and clear.
I didn't have time to visually inspect all the changed reference files but I did a few and everything looks OK.

page_img_objs_per_index,
page_gfxstate_objs_per_name,
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions for future improvement:

If multiple pages use the exact same resources, they can make reference to the same resource dictionary object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants