From 6e7ec0cbd5577ab7effe647db652c2e60dce7b68 Mon Sep 17 00:00:00 2001
From: Martin Thoma <info@martin-thoma.de>
Date: Tue, 20 Dec 2022 23:39:59 +0100
Subject: [PATCH] DOC: How to read PDFs from S3

---
 docs/user/streaming-data.md | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/docs/user/streaming-data.md b/docs/user/streaming-data.md
index 3cfa5c315..78f960039 100644
--- a/docs/user/streaming-data.md
+++ b/docs/user/streaming-data.md
@@ -53,3 +53,24 @@ with BytesIO() as bytes_stream:
         Body=bytes_stream, RequestRoute=request_route, RequestToken=request_token
     )
 ```
+
+## Reading PDFs directly from cloud services
+
+One option is to first download the file and then pass the local file path to `PdfReader`.
+Another option is to get a byte stream.
+
+For AWS S3 it works like this:
+
+```python
+from io import BytesIO
+
+import boto3
+from PyPDF2 import PdfReader
+
+
+s3 = boto3.client("s3")
+obj = s3.get_object(Body=csv_buffer.getvalue(), Bucket="my-bucket", Key="my/doc.pdf")
+reader = PdfReader(BytesIO(obj["Body"].read()))
+```
+
+It works similarly for Google Cloud Storage ([example](https://stackoverflow.com/a/68403628/562769))