Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent usage of warnings #2354

Closed
stefan6419846 opened this issue Dec 20, 2023 · 5 comments · Fixed by #2377
Closed

Inconsistent usage of warnings #2354

stefan6419846 opened this issue Dec 20, 2023 · 5 comments · Fixed by #2377
Labels
is-maintenance Anything that is just internal: Simplifying code, syntax changes, updating docs, speed improvements
Milestone

Comments

@stefan6419846
Copy link
Collaborator

warnings.warn is currently not being used in a consistent manner:

  • There is pypdf._utils.deprecate and pypdf._utils.deprecation, but some places use plain warnings.warn with the DeprecationWarning category.
    • pypdf/pypdf/_writer.py

      Lines 274 to 278 in 908797f

      warnings.warn(
      "The parameter 'ido' is depreciated and will be removed in "
      "pypdf 4.0.0.",
      DeprecationWarning,
      )
    • pypdf/pypdf/_writer.py

      Lines 1286 to 1290 in 908797f

      warnings.warn(
      "Please use 'user_password' instead of 'user_pwd'. "
      "The 'user_pwd' argument is deprecated and "
      "will be removed in pypdf 4.0.0."
      )
    • pypdf/pypdf/_writer.py

      Lines 1305 to 1311 in 908797f

      warnings.warn(
      message=(
      f"{old_term} is deprecated as an argument and will be "
      f"removed in pypdf 4.0.0. Use {new_term} instead"
      ),
      category=DeprecationWarning,
      )
    • pypdf/pypdf/_writer.py

      Lines 1745 to 1751 in 908797f

      warnings.warn(
      message=(
      f"{old_term} is deprecated as an argument and will be "
      f"removed in pypdf 4.0.0. Use {new_term} instead"
      ),
      category=DeprecationWarning,
      )
    • pypdf/pypdf/_writer.py

      Lines 2038 to 2044 in 908797f

      warnings.warn(
      message=(
      f"{old_term} is deprecated as an argument and will be "
      f"removed in pypdf 4.0.0. Use {new_term} instead"
      ),
      category=DeprecationWarning,
      )
    • pypdf/pypdf/_writer.py

      Lines 2083 to 2089 in 908797f

      warnings.warn(
      message=(
      f"{old_term} is deprecated as an argument and will be "
      f"removed in pypdf 4.0.0. Use {new_term} instead"
      ),
      category=DeprecationWarning,
      )
    • pypdf/pypdf/_writer.py

      Lines 2321 to 2325 in 908797f

      warnings.warn(
      "The 'ignore_byte_string_object' argument of remove_images is "
      "deprecated and will be removed in pypdf 4.0.0.",
      category=DeprecationWarning,
      )
    • pypdf/pypdf/_writer.py

      Lines 2363 to 2367 in 908797f

      warnings.warn(
      "The 'ignore_byte_string_object' argument of remove_images is "
      "deprecated and will be removed in pypdf 4.0.0.",
      category=DeprecationWarning,
      )
    • pypdf/pypdf/_writer.py

      Lines 2405 to 2409 in 908797f

      warnings.warn(
      "The 'pagenum' argument of add_uri is deprecated and will be "
      "removed in pypdf 4.0.0. Use 'page_number' instead.",
      category=DeprecationWarning,
      )
    • pypdf/pypdf/_page.py

      Lines 361 to 367 in 908797f

      warnings.warn(
      (
      "indirect_ref is deprecated and will be removed in "
      "pypdf 4.0.0. Use indirect_reference instead of indirect_ref."
      ),
      DeprecationWarning,
      )
    • pypdf/pypdf/_page.py

      Lines 375 to 381 in 908797f

      warnings.warn(
      (
      "indirect_ref is deprecated and will be removed in pypdf 4.0.0"
      "Use indirect_reference instead of indirect_ref."
      ),
      DeprecationWarning,
      )
    • pypdf/pypdf/_page.py

      Lines 2281 to 2284 in 908797f

      warnings.warn(
      "parameters Tj_Sep, TJ_sep depreciated, and will be removed in pypdf 4.0.0.",
      DeprecationWarning,
      )
    • pypdf/pypdf/_merger.py

      Lines 163 to 169 in 908797f

      warnings.warn(
      (
      f"{old_term} is deprecated as an argument and will be "
      f"removed in pypdf=4.0.0. Use {new_term} instead"
      ),
      DeprecationWarning,
      )
    • pypdf/pypdf/_merger.py

      Lines 702 to 708 in 908797f

      warnings.warn(
      (
      f"{old_term} is deprecated as an argument and will be "
      f"removed in pypdf==4.0.0. Use {new_term} instead"
      ),
      DeprecationWarning,
      )
    • pypdf/pypdf/_merger.py

      Lines 809 to 815 in 908797f

      warnings.warn(
      (
      f"{old_term} is deprecated as an argument and will be "
      f"removed in pypdf==4.0.0. Use {new_term} instead"
      ),
      DeprecationWarning,
      )
  • There are some plain warnings.warn with a different category. I am not sure whether they are eligble for converting them to logger_warning instead as there is not much which can be done on the user side anyway (besides submitting a PR to implement it ;)):

I would like to see this harmonized and I am open to submit a corresponding PR for this.

@MartinThoma
Copy link
Member

My suggestion would be to wait two more weeks (until next year). Then we will remove quite a bit of deprecated code.

This will make it easier to find inconsistencies - which we should fix before the release!
I'll mark this issue so that we tackle this in the pypdf==4.0.0 release :-)

@MartinThoma MartinThoma added this to the pypdf==4.0.0 milestone Dec 20, 2023
@MartinThoma MartinThoma added the is-maintenance Anything that is just internal: Simplifying code, syntax changes, updating docs, speed improvements label Dec 20, 2023
@MartinThoma
Copy link
Member

MartinThoma commented Dec 25, 2023

I've just merged #2367

The remaining warnings are:

Mainly the deprecate / deprecate_with_replacement, but also Advanced encoding {enc} not implemented yet

@MartinThoma
Copy link
Member

The Advanced encoding {enc} not implemented yet is not avoidable for the user, so warnings.warn is wrong (see docs).

@stefan6419846 In #2356 you mention that warning. Do you think an Exception would have been better? I guess the effect was that a part of the text was not extracted, right?

@stefan6419846
Copy link
Collaborator Author

I personally do not have a strong opinion about whether this should be an exception or not. For my use case (basically identifying empty PDF files), warnings.warn or logger_warning are indeed more suitable than an error, as I usually have Latin texts only and the other cases tend to be invalid (spam, wrong OCR) anyway, but yes, the extracted text probably is incomplete.

The use-case or criticality of the error most likely differs for other locales and use-cases, although I would argue that warnings.warn or logger_warning are sufficient to at least extract the extractable text, while indicating that something might be wrong/incomplete indeed.

@MartinThoma
Copy link
Member

Maybe that would even be a good case for logging.Error:

  • Avoiding an exception, as we might have extracted a part of the important stuff
  • Indicating (maybe with a different message) that there might be something missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-maintenance Anything that is just internal: Simplifying code, syntax changes, updating docs, speed improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants