Skip to content

Commit

Permalink
DOC: Slightly improve wording and three examples (#2634)
Browse files Browse the repository at this point in the history
* DOC: Slightly improve wording and three examples

* DOC: Slightly improve wording and three examples

* DOC: Slightly improve wording and three examples

* DOC: Slightly improve wording and three examples

Also change get_pages_using_field to get_pages_showing_field.

* DOC: Slightly improve wording and three examples

Revert wording back to "id est".
  • Loading branch information
j-t-1 committed May 10, 2024
1 parent a435eaa commit 32f826b
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 9 deletions.
2 changes: 1 addition & 1 deletion docs/user/extract-text.md
Expand Up @@ -85,7 +85,7 @@ parts = []

def visitor_body(text, cm, tm, font_dict, font_size):
y = cm[5]
if y > 50 and y < 720:
if 50 < y < 720:
parts.append(text)


Expand Down
9 changes: 4 additions & 5 deletions docs/user/forms.md
Expand Up @@ -32,7 +32,6 @@ writer.update_page_form_field_values(
auto_regenerate=False,
)

# write "output" to pypdf-output.pdf
with open("filled-out.pdf", "wb") as output_stream:
writer.write(output_stream)
```
Expand Down Expand Up @@ -94,14 +93,14 @@ for page in reader.pages:

However, while similar, there are some very important differences between the two above blocks of code. Most importantly, the first block will return a list of Field objects, whereas the second will return more generic dictionary-like objects. The objects lists will *mostly* reference the same object in the underlying PDF, meaning you'll find that `obj_taken_fom_first_list.indirect_reference == obj_taken_from _second_list.indirect_reference`. Field objects are generally more ergonomic, as the exposed data can be accessed via clearly named properties. However, the more generic dictionary-like objects will contain data that the Field object does not expose, such as the Rect (the widget's position on the page). Therefore the correct approach depends on your use case.

However, it's also important to note that the two lists do not *always* refer to the same underlying PDF object. For example, if the form contains radio buttons, you will find that `reader.get_fields()` will get the parent object (the group of radio buttons) whereas `page.annotations` will return all the child objects (the individual radio buttons).
However, it is also important to note that the two lists do not *always* refer to the same underlying PDF object. For example, if the form contains radio buttons, you will find that `reader.get_fields()` will get the parent object (the group of radio buttons) whereas `page.annotations` will return all the child objects (the individual radio buttons).

__Caution: Remember that fields are not stored in pages: If you use `add_page()` the field structure is not copied. It is recommended to use `.append()` with the proper parameters instead.__
__Caution: Remember that fields are not stored in pages; if you use `add_page()` the field structure is not copied. It is recommended to use `.append()` with the proper parameters instead.__

In case of missing _field_ objects in `/Fields`, `writer.reattach_fields()` will parse page(s) annotations and will reattach them. This fix can not guess intermediate fields and will not report fields using the same _name_.
In case of missing _field_ objects in `/Fields`, `writer.reattach_fields()` will parse page(s) annotations and will reattach them. This fix cannot guess intermediate fields and will not report fields using the same _name_.

## Identify pages where fields are used

On order to ease locating page fields you can use `page.get_pages_using_field`. This methods accepts a field object, id est a *PdfObject* that represents a field (as are extracted from `_root_object["/AcroForm"]["/Fields"]`. The method returns a list of pages, because a field can have multiple widgets as mentioned previously (e.g. radio buttons or text displayed on multiple pages).
In order to ease locating page fields you can use `get_pages_showing_field` of PdfReader or PdfWriter. This method accepts a field object, a *PdfObject* that represents a field (as extracted from `_root_object["/AcroForm"]["/Fields"]`). The method returns a list of pages, because a field can have multiple widgets as mentioned previously (e.g. radio buttons or text displayed on multiple pages).

The page numbers can then be retrieved as usual by using `page.page_number`.
2 changes: 1 addition & 1 deletion docs/user/merging-pdfs.md
Expand Up @@ -67,9 +67,9 @@ If you want to insert pages in the middle of the destination, use `merge` (which
You can insert the same page multiple times, if necessary even using a list-based syntax:

```python
# Insert pages 2 and 3, with page 1 before, between, and after
writer.append(reader, [0, 1, 0, 2, 0])
```
will insert the pages 1 and 2 with page 0 before, in the middle and after.

## add_page / insert_page

Expand Down
4 changes: 2 additions & 2 deletions docs/user/viewer-preferences.md
Expand Up @@ -4,7 +4,7 @@ It is possible to set viewer preferences of a PDF file.
These properties are described in Section 12.2 of the [PDF 1.7 specification](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf).

Note that the `/ViewerPreferences` dictionary does not exist by default.
If it's not already present, it must be created by calling the `create_viewer_preferences` method
If it is not already present, it must be created by calling the `create_viewer_preferences` method
of the `PdfWriter` object.

If viewer preferences exist in a PDF file being read with `PdfReader`,
Expand Down Expand Up @@ -79,5 +79,5 @@ with open("output.pdf", "wb") as output_stream:
```

The names beginning with a slash character are part of the PDF file format. They are
included here to aid to anyone searching pypdf documentation
included here to ease searching the pypdf documentation
for these names from the PDF specification.

0 comments on commit 32f826b

Please sign in to comment.