Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

over 5min simple_file #26

Open
kazuhitogo opened this issue Mar 2, 2021 · 3 comments
Open

over 5min simple_file #26

kazuhitogo opened this issue Mar 2, 2021 · 3 comments
Labels
question Further information is requested

Comments

@kazuhitogo
Copy link

I can't figure out how to maintain a session for more than 5 minutes using http2. doc describes how to specify expire-time for websocket, but not for http2. Is there any way to connect a session for more than 5 minutes with this SDK?

@nateprewitt
Copy link
Contributor

Hi @kazuhitogo,

Could you clarify what you mean by "maintaining a session"? If the signatures are becoming invalid after 5 minutes I believe you're hitting the issue noted in the quickstart section:

 # NOTE: For pre-recorded files longer than 5 minutes, the sent audio
 # chunks should be rate limited to match the realtime bitrate of the
 # audio stream to avoid signing issues.

The Transcribe Streaming API is meant for realtime audio and processes at that rate. If you're streaming prerecorded audio that's longer than 5 minutes, it will need to be rate limited to being sent closer to real time. Otherwise, the payloads are signed in the client and then can't be processed by the service until after they expire.

@nateprewitt nateprewitt added the question Further information is requested label Apr 16, 2021
@bjnord
Copy link

bjnord commented Apr 28, 2021

There's some code in one of the unit tests that shows how to rate-limit. I got it going as shown below (it pulls the whole WAV into memory which is only good for relatively short files). The only trouble is, the transcript results seem to get "stuck" every 15 seconds or so; they stop coming, and then resume after a bit, but they are missing a string of words in the meantime.

--- simple_file.py.orig 2021-04-28 09:45:32.000000000 -0500
+++ simple_file.py      2021-04-28 10:03:11.000000000 -0500
@@ -35,14 +35,18 @@
     )
 
     async def write_chunks():
-        # An example file can be found at tests/integration/assets/test.wav
-        # NOTE: For pre-recorded files longer than 5 minutes, the sent audio
-        # chunks should be rate limited to match the realtime bitrate of the
-        # audio stream to avoid signing issues.
-        async with aiofile.AIOFile('tests/integration/assets/test.wav', 'rb') as afp:
-            reader = aiofile.Reader(afp, chunk_size=1024 * 16)
-            async for chunk in reader:
-                await stream.input_stream.send_audio_event(audio_chunk=chunk)
+        with open('tests/integration/assets/5min-test.wav', 'rb') as f:
+            raw_bytes = f.read()
+        # This simulates reading bytes from some asynchronous source
+        # This could be coming from an async file, microphone, etc
+        async def byte_generator():
+            # 4000 bytes = 1/4 second of 8kHz, mono, 16-bit PCM
+            chunk_size = 4000
+            for i in range(0, len(raw_bytes), chunk_size):
+                yield raw_bytes[i : i + chunk_size]
+                await asyncio.sleep(0.25)
+        async for chunk in byte_generator():
+            await stream.input_stream.send_audio_event(audio_chunk=chunk)
         await stream.input_stream.end_stream()
 
     # Instantiate our handler and start processing events

@kazuhitogo
Copy link
Author

@nateprewitt

Thanks for the response. I am sorry for the delay in confirming.
Yes, I had checked that note and was wondering how to handle that. As far as the documentation is concerned, for WebSocket, the
I wanted to know how to set the signature expiration date in the case of WebSocket with the parameter X-Amaz-Expires, but in the case of this Python SDK.
https://docs.aws.amazon.com/transcribe/latest/dg/websocket.html

Am I correct in assuming that this is not possible?

@bjnord

Thank you.
Is this code meant to keep the signature updated by reducing the chunk size and interrupting sleep every time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants