Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validator assumes unixfs encoded blocks #97

Closed
5 tasks
Tracked by #68
olizilla opened this issue Feb 9, 2023 · 7 comments
Closed
5 tasks
Tracked by #68

validator assumes unixfs encoded blocks #97

olizilla opened this issue Feb 9, 2023 · 7 comments

Comments

@olizilla
Copy link
Contributor

olizilla commented Feb 9, 2023

All we can reliably validate from a CAR is if each blocks bytes matches each CID multihash for the set of hash functions we support and that the CAR contains a block for every link we encounter where the blocks are encoded with a codec we support.

TODO

  • Add new "validation" Bucket. Update the pickup worker to write CARs to the validation bucket.
  • Update the validator logic to use a CarBlockIterator and pass each block to linkdex.
  • If linkdex says the CAR structure is Complete then consider the CAR valid.
    • Move the CAR from the validation bucket (as an s3 command) to the requested destination Bucket (double check that s3 guarantees the integrity on move).
    • Set the pin request status to pinned.
@olizilla
Copy link
Contributor Author

olizilla commented Feb 9, 2023

if the problem here is that we can't detect an error if kubo truncates a CAR response, then we should dig some more there and see if we can find a way to detect that.

@olizilla
Copy link
Contributor Author

olizilla commented Feb 9, 2023

here where we use the old /dag/export http api... this could be replaced with a call to the newer /ipfs/<cid>?format=car api

const url = new URL(`/api/v0/dag/export?arg=${cid}`, ipfsApiUrl)

that would be worth trying to see if it allows us to detect an error.

see: https://docs.ipfs.tech/reference/http/gateway/#trusted-vs-trustless

note that this is the public gateway api, and is served from the gateway port, not the api port, so some changes would be needed to the config.

@olizilla
Copy link
Contributor Author

olizilla commented Feb 9, 2023

We could pin the DAG to the local Kubo node, wait until we definitely have the entire DAG locally and then do the car export from the blockstore in a way that errors usefully if anything goes wrong. We'd at least be able to know the DAG size before we start the export.

@olizilla
Copy link
Contributor Author

When a CAR is written to the S3 bucket, it is added to the E-IPFS indexer queue.

We need to avoid the need for additional validation (upload fails if error) ...or, if we have to do additional validation then we must write cars we haven't validated to a different bucket and then move them once validated, being careful to verify integrity checking when moving.

@olizilla
Copy link
Contributor Author

some options

  1. Have a through stream / async iterator that decodes the blocks of the car as they are sent from kubo before they are written to s3, and track the links. A complete dag export from kubo would be root first, and complete, so if there are any dangling links then we know it was truncated

1.b We can pin the dag to the kubo node and find the complete dag size, and count the block size in the through stream. If the total of block sizes sent does not match the total dag size then it has been truncated

  1. Have a separate validator process. Write the CAR to a temporary bucket. Validate it in a similar way to how we do in the cloudflare based api when a user uploads a CAR. hash the blocks with the set of hashes we support and decoded them to check for dangling links

2.b Have a seperate kubo node as a validator. Import the CAR into kubo to check it is valid.

@olizilla
Copy link
Contributor Author

olizilla commented Feb 10, 2023

Notes from lidel on detecting a truncated CAR export

Two options:

  • (does not require changing anything) always check the hash of the last block. If it does not match expected CID, means it was truncated (there is a chance the connection got interrupted perfectly at the block boundary, but in practice this will rarely happen)
  • (future or if you control server) start sending a well-known tombstone at the end (could be zero-length block in carv1, or we could add new compact index type for carv2)
    There is ongoing discussion around standardising, so middleware no longer need to hash the last block and saves some CPU. Create IPIP with Gateway spec for partial CAR exports ipfs/specs#348 (comment)

codeflyer added a commit that referenced this issue Feb 14, 2023
This pr refers to: #97

Add a new bucket for the validation.
Pickup store the file in a vavlidation bucket. The CAR is validated
then, if is valid, moved to the final Bucket.
@olizilla
Copy link
Contributor Author

@codeflyer please simplify the validation logic. It should only

  • Create a CarBlockIterator and iterate over the blocks, passing each to linkdex.
  • Catch any errors during iteration, log them, and mark the pin as failed.
  • after all blocks have been indexed, verify that the dagStructure is Complete and if so, move the CAR to the target bucket, and mark the pin as pinned (as you have it now)

All the other checks can be deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants