New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: Simplify file identifiers generation #2003
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #2003 +/- ##
==========================================
- Coverage 94.54% 94.52% -0.02%
==========================================
Files 43 43
Lines 7549 7549
Branches 1490 1491 +1
==========================================
- Hits 7137 7136 -1
Misses 253 253
- Partials 159 160 +1 ☔ View full report in Codecov by Sentry. |
return ByteStringObject(_rolling_checksum(stream).encode("utf8")) | ||
def _compute_document_identifier(self) -> ByteStringObject: | ||
md5 = hashlib.md5() | ||
md5.update(str(time.time()).encode("utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes document-generation non-deterministic, right?
What impact do the file identifiers have? Who/what makes use of them? |
the PDF standard says:
the identifiers are also be used for encryption. @MartinThoma so i think it's ok to make it simple. |
Having a deterministic way to generate PDFs is valuable to several developers. Does the current deterministic identifier generation cause any issues? |
first of all, it cost too much for big pdf files. |
9092a14
to
5fd1e91
Compare
See #2003 Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
See #2003 Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
#2003 Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
#2003 Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
@@ -1246,7 +1244,7 @@ def generate_file_identifiers(self) -> None: | |||
id2 = self._compute_document_identifier() | |||
else: | |||
id1 = self._compute_document_identifier() | |||
id2 = id1 | |||
id2 = ByteStringObject(id1.original_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id1 is a ByteStringObject already. So .original_bytes
just returns id1. Then wrapping it in ByteStringObject
doesn't do anything, right?
return ByteStringObject(_rolling_checksum(stream).encode("utf8")) | ||
md5 = hashlib.md5() | ||
md5.update(str(time.time()).encode("utf-8")) | ||
md5.update(str(self.fileobj).encode("utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is self.fileobj equivalent to self._write_pdf_structure(stream)?
No description provided.