Add a debugging form for car files. #341

willscott · 2022-11-07T12:57:02Z

This change adds two new sub-commands to the car CLI

car debug file.car

creates a patch-file-compatible representation of the content of the car file. Blocks will be represented in dag-json pretty-printed form.

car compile file.patch

will do the inverse process of building a car file from a debug patch file. CIDs will be re-compiled based on the contents of blocks, with links in parent blocks updated to point to the compiled values.

an example debug patch of the car used in the testscript test fixture would be:

car compile --v2 small.car
root bafybeidx5vxxny6ca3mgs5d6wy5ubwcibpirctmktpkvfk4io34i2ww2hy
for raw: bytes are 68656c6c6f20776f726c640a
--- bafkreifjjcie6lypi6ny7amxnfftagclbuxndqonfipmb64f2km2devei4
+++ raw bafkreifjjcie6lypi6ny7amxnfftagclbuxndqonfipmb64f2km2devei4
@@ -0,1 +0,1 @@
hello world

--- bafybeidx5vxxny6ca3mgs5d6wy5ubwcibpirctmktpkvfk4io34i2ww2hy
+++ json (no-end-cr) bafybeidx5vxxny6ca3mgs5d6wy5ubwcibpirctmktpkvfk4io34i2ww2hy
@@ -0,15 +0,15 @@
{
  "Data": {
    "/": {
      "bytes": "CAE"
    }
  },
  "Links": [
    {
      "Hash": {
        "/": "bafkreifjjcie6lypi6ny7amxnfftagclbuxndqonfipmb64f2km2devei4"
      },
      "Name": "foo.txt",
      "Tsize": 12
    }
  ]
}

This change adds two new sub-commands to the car CLI car debug file.car creates a patch-file-compatible representation of the content of the car file. Blocks will be represented in dag-json pretty-printed form. car compile file.patch will do the inverse process of building a car file from a debug patch file. CIDs will be re-compiled based on the contents of blocks, with links in parent blocks updated to point to the compiled values.

b5 · 2022-11-07T13:33:23Z

Fully support having this!

rvagg · 2022-11-14T03:29:10Z

windows failures, probably crlf related?

also this is weird on macos https://github.com/ipld/go-car/actions/runs/3410523283/jobs/5673582480, maybe just a flake since it's passing on the other 3 macos runners but it seems cmd related

willscott · 2022-11-14T12:10:52Z

There's some sort of flakiness with testscript in general.

@rvagg are you okay with design / the proposed debug patch format modulo the test?

rvagg · 2022-11-15T03:52:32Z

cmd/car/compile.go

+
+	outStream.WriteString("car compile ")
+	if rd.Version == 2 {
+		outStream.WriteString("--v2 ")


It looks like this is the only hint that it once was a v2 and there's not space to say anything else about it? I guess mostly we care about stability of v1 forms and the v2 is just a convenience wrapper, but if we went ahead with additional features, such as the messaging capability in #322, then where would we put these things? Would it be hard to extend this format to include those things, and could we do it in a non-breaking way perhaps?

this is mirroring git patch` where the first line mirrors the command used to generate the patch.

Here we use v2 to indicate if the original car was a v2 or not, and can use that as a default to re-build the same car format if it is not specified explicitly on the command line when re-compiling

cmd/car/compile.go

rvagg · 2022-11-15T04:13:34Z

cmd/car/compile.go

+		if err != nil {
+			return err
+		}
+		if strings.HasPrefix(string(rootLine), "root ") {


ok, so this is is not exhaustive, if it doesn't match root then it continues to loop and drop rootLine. So is this where we get potential backward compatibility of additional v2 features that we can insert with ---, and also the --foobar arguments in the header?

(continuing my thought below about v2 features, where I started my comments on debug first before moving up here to compile)

yes, other meta info would go as lines in this section.

cmd/car/compile.go

rvagg · 2022-11-15T04:41:32Z

cmd/car/compile.go

+
+	//fmt.Printf("structuring as tree...\n")
+	// structure as a tree
+	childMap := make(map[cid.Cid][]cid.Cid)


whoa, so I think this whole next section (x2) exists because we're not confident that the resulting CIDs will match the input CIDs, so we're doing a search, reconstruct, replace operation on them all, is that right?

why do we not have confidence they're going to reconstruct byte-perfect, shouldn't that be reasonable? There might be a CIDv0 CIDv1 difference but we could easily do that check -- if expected CID is v0 then downcast actual CID and compare.

why don't we just error if the resulting CID of the reconstructed block doesn't match the expected?

Thought about this after writing and I realise that it's probably because we anticipate some input to come in badly encoded forms so round-tripping is going to result in mismatched CIDs. So I guess that's why this is here.

This does seem like a lot of effort to go to though; and it's also a little error-prone just replacing the CIDs as strings. That won't necessarily always find the actual links, they could just be included as text and it assumes that they want them to be changed. A find and replace should probably be at least looking for "/": "...."\n, but even that's not necessarily accurate either. A more complete approach would be to walk the instantiated data model form and change the links out, but I guess that's even more code complication! Then there's the CIDv0 vs CIDv1 thing, what if the original wants to be in CIDv0 but we up-convert them to CIDv1?

Lots of effort to do it the right way, but it makes me question whether doing it at all is worth it.

Your call I guess, but maybe add some comments in here about what it's doing so the first person to encounter a bug with this knows why and can choose whether to fix it. It wasn't clear to me what it was doing until I walked through the whole two blocks.

this is for the 'i want to mutate the graph and re-build a valid tree' case.
if i change the json in the patch, then that block will have a new cid when it's hashed.
that hash change needs to propagate back up the DAG.

data model still won't get all edge cases - what if it's a different codec for the same MH? what if the link is encoded in a block that's raw or that we can't parse?

this is the first pass that works reasonably well for the json/cbor cases that i've attempted to manually edit.

i am unconvinced it's worth the time to do much more complex work and still not do something perfect vs where it is currently, which supports a pretty valuable use case already.

oh right, tinker with the content and regenerate the graph; fair enough, I can see that being useful - it does seem like something that needs clear caveats in comments though!

rvagg

This is probably OK, I'd like to know what itch was being scratched here? it's certainly a nice utility to peek inside a CAR and perhaps even a nice way to peek inside IPLD blocks in general without having to code something up (you could even ipfs export .. | car debug - to get pretty-printed output that you can't get from ipfs dag get --output-codec=dagjson, but that's quite a hack!).

Aside from comments inline, the main concern I have is the ability to corrupt the format if you include a raw block that is diff-like. You could even end up doing that with this tool - make a .patch from a CAR, bundle that up as unixfs and include it in a CAR and make a .patch from that and 💥 it will break because --- . But I think you could easily address that by changing isPrintable() to also check for ^--- and false out if there's one in there.

willscott · 2022-11-15T10:33:01Z

This is probably OK, I'd like to know what itch was being scratched here? it's certainly a nice utility to peek inside a CAR and perhaps even a nice way to peek inside IPLD blocks in general without having to code something up (you could even ipfs export .. | car debug - to get pretty-printed output that you can't get from ipfs dag get --output-codec=dagjson, but that's quite a hack!).

for example:
https://twitter.com/pfrazee/status/1589747431071428609
https://gitlab.com/bnewbold/adenosine/-/blob/main/notes/ipld_car_explore.md

* add check for bytes not containing end-of-patch sequence

willscott · 2022-11-16T12:34:49Z

made the raw blocks a bit more cautious per the edge case you pointed out

rvagg · 2022-11-17T03:40:17Z

cmd/car/compile.go

+		}, outStream); err != nil {
+			return err
+		}
+		for c, blk := range outBlocks {


the failing tests are because of this, and the same block below for v1 -- the blocks are ending up shuffled thanks to iterating over the go map[] .. we're going to need to keep a slice of CIDs to iterate over and then rewrite those when you do the reconstruction.

willscott · 2022-11-17T09:44:23Z

good catch, @rvagg - made the output match the order of initial blocks.

cmd/car/compile.go

rvagg

nice green ticks

I still think it needs notes in the DAG rewrite bit, so I've added my suggestions for what that might look like.

Co-authored-by: Rod Vagg <rod@vagg.org>

willscott added 2 commits November 14, 2022 12:19

clean newline behavior a bit

2bc73b2

continue to fiddle with line endings

0380aab

rvagg reviewed Nov 15, 2022

View reviewed changes

cmd/car/compile.go Outdated Show resolved Hide resolved

rvagg reviewed Nov 15, 2022

View reviewed changes

cmd/car/compile.go Outdated Show resolved Hide resolved

rvagg reviewed Nov 15, 2022

View reviewed changes

cmd/car/compile.go Outdated Show resolved Hide resolved

rvagg reviewed Nov 15, 2022

View reviewed changes

cmd/car/compile.go Outdated Show resolved Hide resolved

rvagg reviewed Nov 15, 2022

View reviewed changes

cmd/car/compile.go Outdated Show resolved Hide resolved

rvagg reviewed Nov 15, 2022

View reviewed changes

cmd/car/compile.go Outdated Show resolved Hide resolved

rvagg reviewed Nov 15, 2022

View reviewed changes

rvagg requested changes Nov 15, 2022

View reviewed changes

code review updates

3b133b3

* add check for bytes not containing end-of-patch sequence

mod tidy

fb69479

rvagg requested changes Nov 17, 2022

View reviewed changes

willscott added 2 commits November 17, 2022 10:36

stable map iteration

fdb9581

tidy

a2c9f85

willscott requested a review from rvagg November 17, 2022 09:44

rvagg reviewed Nov 18, 2022

View reviewed changes

cmd/car/compile.go Outdated Show resolved Hide resolved

rvagg reviewed Nov 18, 2022

View reviewed changes

cmd/car/compile.go Outdated Show resolved Hide resolved

rvagg reviewed Nov 18, 2022

View reviewed changes

rvagg approved these changes Nov 18, 2022

View reviewed changes

Apply suggestions from code review

af90d85

Co-authored-by: Rod Vagg <rod@vagg.org>

willscott merged commit dab0fd5 into master Nov 18, 2022

willscott deleted the feat/debug-compile branch November 18, 2022 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a debugging form for car files. #341

Add a debugging form for car files. #341

willscott commented Nov 7, 2022

b5 commented Nov 7, 2022

rvagg commented Nov 14, 2022

willscott commented Nov 14, 2022

rvagg Nov 15, 2022

willscott Nov 15, 2022

rvagg Nov 15, 2022

rvagg Nov 15, 2022

willscott Nov 16, 2022

rvagg Nov 15, 2022

rvagg Nov 15, 2022

willscott Nov 16, 2022

rvagg Nov 17, 2022

rvagg left a comment

willscott commented Nov 15, 2022

willscott commented Nov 16, 2022

rvagg Nov 17, 2022

willscott commented Nov 17, 2022

rvagg left a comment

Add a debugging form for car files. #341

Add a debugging form for car files. #341

Conversation

willscott commented Nov 7, 2022

b5 commented Nov 7, 2022

rvagg commented Nov 14, 2022

willscott commented Nov 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvagg left a comment

Choose a reason for hiding this comment

willscott commented Nov 15, 2022

willscott commented Nov 16, 2022

Choose a reason for hiding this comment

willscott commented Nov 17, 2022

rvagg left a comment

Choose a reason for hiding this comment