Skip to content

Commit

Permalink
DOC: How to read PDFs from S3 (#1509)
Browse files Browse the repository at this point in the history
  • Loading branch information
MartinThoma committed Dec 20, 2022
1 parent e5e26ad commit 3fb9b69
Showing 1 changed file with 21 additions and 0 deletions.
21 changes: 21 additions & 0 deletions docs/user/streaming-data.md
Expand Up @@ -53,3 +53,24 @@ with BytesIO() as bytes_stream:
Body=bytes_stream, RequestRoute=request_route, RequestToken=request_token
)
```

## Reading PDFs directly from cloud services

One option is to first download the file and then pass the local file path to `PdfReader`.
Another option is to get a byte stream.

For AWS S3 it works like this:

```python
from io import BytesIO

import boto3
from PyPDF2 import PdfReader


s3 = boto3.client("s3")
obj = s3.get_object(Body=csv_buffer.getvalue(), Bucket="my-bucket", Key="my/doc.pdf")
reader = PdfReader(BytesIO(obj["Body"].read()))
```

It works similarly for Google Cloud Storage ([example](https://stackoverflow.com/a/68403628/562769))

0 comments on commit 3fb9b69

Please sign in to comment.