Skip to content

Commit

Permalink
fix: address data corruption when cache is slow
Browse files Browse the repository at this point in the history
Something weird is going on here.  This SHOULD be fine as a simple
pipe(), but for some reason, backpressure from the cache stream can
cause the pipeline to drop the first chunk of data, resulting in invalid
JSON.  Until that is fixed, just write into the cache without any
backpressure.

The only hazard is that, if the fs is truly very slow, and the rest of
the consumption pipeline is very fast, then we'll back up into memory
and use more than we ought to, rather than pushing back on the incoming
stream.  However, this isn't likely to ever be a problem due to how npm
does HTTP.  Either it's fetching a JSON response, or a tarball (which is
also either unpacking to disk, or streaming directly to a tarball file
on disk).  So, if the disk is slow, and it's a tarball request, we're
likely to get backpressure from the main pipeline anyway.  It can only
become a problem if the JSON response is large enough to span multiple
chunks, and also the fs is loaded enough to start slowing down.  In the
JSON response case, we're going to load the whole thing in memory
anyway, so nothing is made particularly *worse* by this lack of
backpressure.

It is possible that the root cause of this bug exists either in cacache,
minipass-pipeline, or minipass itself.  But since we don't do a
multi-pipe tee stream anywhere else in npm's stack, this is the only
spot where it can make itself known.

Re: npm/npm-registry-fetch#23
  • Loading branch information
isaacs committed May 4, 2020
1 parent 276f796 commit 7f896be
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 4 deletions.
34 changes: 33 additions & 1 deletion cache.js
Expand Up @@ -169,7 +169,39 @@ module.exports = class Cache {
ckey,
cacheOpts
)
tee.pipe(cacheStream)

// See: https://github.com/npm/npm-registry-fetch/issues/23#issuecomment-623558888
//
// XXX why does this fix the glitch??
//
// Something weird is going on here. This SHOULD be fine as a simple
// pipe(), but for some reason, backpressure from the cache stream
// can cause the pipeline to drop the first chunk of data, resulting
// in invalid JSON. Until that is fixed, just write into the cache
// without any backpressure.
//
// The only hazard is that, if the fs is truly very slow, and the rest
// of the consumption pipeline is very fast, then we'll back up into
// memory and use more than we ought to, rather than pushing back on
// the incoming stream. However, this isn't likely to ever be a problem
// due to how npm does HTTP. Either it's fetching a JSON response,
// or a tarball (which is also either unpacking to disk, or streaming
// directly to a tarball file on disk). So, if the disk is slow, and
// it's a tarball request, we're likely to get backpressure from the
// main pipeline anyway. It can only become a problem if the JSON
// response is large enough to span multiple chunks, and also the fs
// is loaded enough to start slowing down. In the JSON response case,
// we're going to load the whole thing in memory anyway, so nothing is
// made particularly *worse* by this lack of backpressure.
//
// It is possible that the root cause of this bug exists either in
// cacache, minipass-pipeline, or minipass itself. But since we don't
// do a multi-pipe tee stream anywhere else in npm's stack, this is
// the only spot where it can make itself known.
tee.on('data', d => cacheStream.write(d))
tee.on('end', () => cacheStream.end())
// tee.pipe(cacheStream)

cacheStream.promise().then(cacheWriteResolve, cacheWriteReject)
newBody.unshift(tee)
}
Expand Down
8 changes: 5 additions & 3 deletions test/cache.js
Expand Up @@ -167,20 +167,22 @@ test('put method', (t) => {
memoize: false
}, 'should have correct cache options')
const cacheStream = new Minipass()
cacheStream.concat().then((data) => {
// cache stream actually takes a sec to get started
setTimeout(() => cacheStream.concat().then((data) => {
t.equal(
data.toString(),
CONTENT.toString(),
'cached body is from response'
)
})
}))
return cacheStream
}
const Cache = mockRequire({ cacache: new MockCacache() })
const cache = new Cache(dir, {})
const req = new Request(`${HOST}/put-test`)
const body = new Minipass()
body.end(CONTENT)
body.write(CONTENT.slice(0, 10))
body.end(CONTENT.slice(10))
const resOpts = { url: req.url, status: 200, headers: {} }
const initialResponse = new Response(body, resOpts)
return cache.put(req, initialResponse)
Expand Down

0 comments on commit 7f896be

Please sign in to comment.