Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[storage-blob] processing downloadBlockBlobResponse.readableStreamBody stucks/hangs #16987

Closed
2 of 6 tasks
AyushG25 opened this issue Aug 19, 2021 · 9 comments
Closed
2 of 6 tasks
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)

Comments

@AyushG25
Copy link

AyushG25 commented Aug 19, 2021

  • Package Name: @azure/storage-blob
  • Package Version: 12.6.0
  • Operating system: Linux
  • nodejs
    • version: 14.0.0
  • browser
    • name/version:
  • typescript
    • version: 4.2.4
  • Is the bug related to documentation in

Describe the bug
Trying to process a CSV file stored in azure blob storage of size 38MB (1.5-2 million rows) using download method and downloadBlockBlobResponse.readableStreamBody. My code pipes the readableStreamBody to a csvParser and then using for await loop I try to process the stream row by row. In processing, I do some validations and DB operations. The processing of the file stops in between. Checked azure insights found error wrt GetBlobProperties, attaching screenshot for the same. Not sure why GetBlobProperties was called so many times.
Am I putting unnecessary load on BLOB storage with my below code

public async downloadFileAsReadableStream(fullFileName: string): Promise<NodeJS.ReadableStream | undefined> {
            const blobServiceClient = BlobServiceClient.fromConnectionString(connectionString);
            this.containerClient = blobServiceClient.getContainerClient(containerName);
            logger.info(`Start reading file: ${fullFileName} from container: ${this.containerName}`);
            const downloadBlockBlobResponse = await this.containerClient
                .getBlockBlobClient(fullFileName)
                .download();
            return downloadBlockBlobResponse.readableStreamBody;
    }

const processStream = async (readableStream: NodeJS.ReadableStream) => {
    try {
        console.time("File processing completed in");
        const parser = readableStream.pipe(csvParser());
        for await (const row of parser) {
                const dbRecord = await dbOperation(row);
        }
        await onStreamEnd();
    } catch (err) {
        logger.error(err);
    }
}

To Reproduce

Expected behavior
The processing of BLOB as readable stream should reach file end

Screenshots
image

Additional context
Add any other context about the problem here.

@ghost ghost added needs-triage This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Aug 19, 2021
@ramya-rao-a ramya-rao-a added Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files) labels Aug 19, 2021
@ghost ghost removed the needs-triage This is a new issue that needs to be triaged to the appropriate team. label Aug 19, 2021
@EmmaZhu
Copy link
Contributor

EmmaZhu commented Aug 22, 2021

The download doesn't send "GetBlobProperties" requests. Maybe storage service can find more details to help to identify how sent these requests?

Thanks
Emma

@AyushG25
Copy link
Author

@EmmaZhu ok will check GetBlobProperties issue. Can you see why the code gets stuck while processing the blob as a stream using the code shared above ?

@EmmaZhu
Copy link
Contributor

EmmaZhu commented Aug 23, 2021

Hi @AyushG25

I cannot see why it stuck with the code. Are you able to get a stack to see where it stucks?

Thanks
Emma

@francescorivola
Copy link

francescorivola commented Sep 6, 2021

Hi @AyushG25 and @EmmaZhu

We have also experienced the same exact issue in production right after migrate to Node.js 14.17.3 from Node.js 12.13.0.

@azure/storage-blob: 12.3.0

I have been investigated a bit and I believe the issue comes from a bug with the internal lib used to perform the download: node-fetch and changes in Node.js 14 streams behavior.

For some reason the close and error event is not called and the application just exit. If your application have got more timers the stream will stuck/hang as you described.

A similar issue has been fixed in node-fetch v3 node-fetch/node-fetch#1219. Still not ported in v2.

If I test the same download process with just node-fetch I have got the same result as if I test with azure-storage blob download:

Test with node-fetch 2.6.1, no event close emitted at all (process just exit)

STREAM 26462: pause Download progress: 41.62% - chunks: 249 TIMER 26462: no 2500 list was found in insert, creating a new one STREAM 26462: emitReadable_ false 0 false STREAM 26462: flow false TIMER 26462: process timer lists 623771 TIMER 26462: timeout callback 2500 TIMER 26462: 2500 list empty STREAM 26462: resume STREAM 26462: resume true STREAM 26462: flow true STREAM 26462: read undefined STREAM 26462: need readable true STREAM 26462: length less than watermark true STREAM 26462: reading or ended false Process beforeExit event with code: 0 About to exit with code: 0

Test with node-fetch 3.0.0-beta10, close and error events are emitted properly

STREAM 56461: reading or ended false NET 56461: emit close HTTP 56461: CLIENT socket onClose HTTP 56461: removeSocket xxxxxx.blob.core.windows.net:443::::::::::::::::::::: writable: false HTTP 56461: HTTP socket close STREAM 56461: read undefined STREAM 56461: need readable true STREAM 56461: length less than watermark true STREAM 56461: reading or ended false STREAM 56461: onerror Error [ERR_STREAM_PREMATURE_CLOSE]: Premature close at IncomingMessage.onclose (internal/streams/end-of-stream.js:117:38) at IncomingMessage.emit (events.js:375:28) at TLSSocket.socketCloseListener (_http_client.js:432:11) at TLSSocket.emit (events.js:387:35) at net.js:675:12 at TCP.done (_tls_wrap.js:563:7) { code: 'ERR_STREAM_PREMATURE_CLOSE' } STREAM 56461: unpipe STREAM 56461: call pause flowing=true STREAM 56461: pause STREAM 56461: onunpipe STREAM 56461: cleanup Error [ERR_STREAM_PREMATURE_CLOSE]: Premature close at IncomingMessage.onclose (internal/streams/end-of-stream.js:117:38) at IncomingMessage.emit (events.js:375:28) at TLSSocket.socketCloseListener (_http_client.js:432:11) at TLSSocket.emit (events.js:387:35) at net.js:675:12 at TCP.done (_tls_wrap.js:563:7) { code: 'ERR_STREAM_PREMATURE_CLOSE' } close TIMER 56461: process timer lists 713980 TIMER 56461: timeout callback 2500 TIMER 56461: 2500 list empty STREAM 56461: resume STREAM 56461: resume false STREAM 56461: read 0 STREAM 56461: read: emitReadable 16384 false STREAM 56461: emitReadable true false STREAM 56461: emitReadable true STREAM 56461: flow true STREAM 56461: read undefined STREAM 56461: need readable false STREAM 56461: length less than watermark true STREAM 56461: reading or ended false STREAM 56461: call pause flowing=true STREAM 56461: pause Download progress: 47.65% - chunks: 286 TIMER 56461: no 2500 list was found in insert, creating a new one STREAM 56461: emitReadable_ true 0 false STREAM 56461: flow false TIMER 56461: process timer lists 716481 TIMER 56461: timeout callback 2500 TIMER 56461: 2500 list empty STREAM 56461: resume STREAM 56461: resume false STREAM 56461: read 0 STREAM 56461: need readable true STREAM 56461: length less than watermark true STREAM 56461: reading or ended false STREAM 56461: flow true STREAM 56461: read undefined STREAM 56461: need readable true STREAM 56461: length less than watermark true STREAM 56461: reading or ended false STREAM 56461: read 0 STREAM 56461: need readable true STREAM 56461: length less than watermark true STREAM 56461: reading or ended false Process beforeExit event with code: 0 About to exit with code: 0

The issue cannot be reproduced in Node.js 12 with the same version of the lib. Sometimes the download fails but at least get notified and the stream does not stuck.

Hope this info can help.

@ramya-rao-a ramya-rao-a added the needs-team-attention This issue needs attention from Azure service team or SDK team label Sep 22, 2021
@guanzo
Copy link

guanzo commented Sep 28, 2021

I'm seeing the exact same issue as described by AyushG25. My app just starts hanging after a while because the stream never completes or errors. It's just waiting indefinitely.

Possibly related to #9929 because I'm downloading 10 blobs concurrently.

@EmmaZhu
Copy link
Contributor

EmmaZhu commented Jan 29, 2022

I tried to repro the issue with downloading a 14MB blob with following code:

    result.readableStreamBody!.pipe(new Stream.Writable({
      write(chunk, encoding, callback) {
        chunk;
        encoding;
        console.log('got data evt')
        callback();
      }
    }));

Tried several times, but didn't encounter a hang.

@AyushG25 , did you try with upgrade node-fetch? Can this mitigate your issue?

Thanks
Emma

@EmmaZhu EmmaZhu added the needs-author-feedback More information is needed from author to address the issue. label Jan 29, 2022
@ghost ghost removed the needs-team-attention This issue needs attention from Azure service team or SDK team label Jan 29, 2022
@AyushG25
Copy link
Author

We changed our approach instead of streaming data directly from blob storage, now we are downloading the file and then processing it. We could not figure out the RCA.

@ghost ghost added needs-team-attention This issue needs attention from Azure service team or SDK team and removed needs-author-feedback More information is needed from author to address the issue. labels Jan 30, 2022
@EmmaZhu
Copy link
Contributor

EmmaZhu commented Feb 7, 2022

I'll try to repro the issue again. Did the issue happen consistently or randomly?

@EmmaZhu
Copy link
Contributor

EmmaZhu commented Feb 10, 2022

I tried to repro the hang with Node 14.17.3 and typescript 4.2.4, but still cannot get a hang.

@AyushG25 , I'll close this issue for now. If you'd need further assistant, please feel free to reopen it.

@EmmaZhu EmmaZhu closed this as completed Feb 10, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Apr 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

5 participants