Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could "car ls" command to output file relative path and filename as well? #335

Open
kernelogic opened this issue Sep 23, 2022 · 11 comments
Open
Labels
help wanted Seeking public contribution on this issue P3 Low: Not priority right now

Comments

@kernelogic
Copy link

kernelogic commented Sep 23, 2022

Hello, right now car ls command only lists CIDs, is it possible to allow it to print the relative path and filename as well?

For example
car ls baga6ea4seaqab7qam2an2mkzssn7vioorrcxhaxxszz6k4t6mscwnhvfjj4hoaq.car

bafybeidfuaqmwqx3osrfo2krqyneomro3kdfqsn6743gmgmcwusbq6zafe  ./folder1/file1.txt
bafybeibvzgbc5pa3xhs2eh46y6c6as3qeddz7drlz34ceu5id6rridofuu  ./folder2/file2.txt
bafybeifjcv55w5mb55sc5dpi4r2bw4ggidy5gxoc4geomnk7xi2ctmszci  ./folder2/folder3/file3.txt
bafybeibml7y4skggzitsb4srmt6hohkvs6asq3ccesuyd3kwbk77ksgmai  ./folder2/folder3/folder4/file4.txt

@willscott
Copy link
Member

I believe this is partially implemented as car ls --unixfs baga6ea4seaqab7qam2an2mkzssn7vioorrcxhaxxszz6k4t6mscwnhvfjj4hoaq.car

@kernelogic
Copy link
Author

@willscott oh ya it works! I guess I need to run it twice, one for CID and one for the filenames.

car ls --unixfs /mnt/pool243/cars/worldbank/baga6ea4seaqbm326hi63viv5pmmc3qn4ekt7veu7qdgcdpkcxtu7isao2wllkdi.car 
globalnightlight
globalnightlight/201204
globalnightlight/201204/SVDNB_npp_d20120418_t0206526_e0212330_b02450_c20120418081234074938_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0206526_e0212330_b02450_c20120418081234074938_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0206526_e0212330_b02450_c20120418081234074938_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0212343_e0218146_b02450_c20120418081816258190_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0212343_e0218146_b02450_c20120418081816258190_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0212343_e0218146_b02450_c20120418081816258190_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0218159_e0223563_b02450_c20120418082357478905_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0218159_e0223563_b02450_c20120418082357478905_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0218159_e0223563_b02450_c20120418082357478905_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0315103_e0320507_b02451_c20120418092051137435_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0315103_e0320507_b02451_c20120418092051137435_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0315103_e0320507_b02451_c20120418092051137435_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0320519_e0326323_b02451_c20120418092632275819_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0320519_e0326323_b02451_c20120418092632275819_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0320519_e0326323_b02451_c20120418092632275819_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0326335_e0332139_b02451_c20120418093214412769_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0326335_e0332139_b02451_c20120418093214412769_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0326335_e0332139_b02451_c20120418093214412769_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0332152_e0337555_b02451_c20120418093755509353_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0332152_e0337555_b02451_c20120418093755509353_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0332152_e0337555_b02451_c20120418093755509353_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0337568_e0343354_b02451_c20120418094336640323_noaa_ops.rade9.co.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0337568_e0343354_b02451_c20120418094336640323_noaa_ops.rade9.co.json
globalnightlight/201204/SVDNB_npp_d20120418_t0337568_e0343354_b02451_c20120418094336640323_noaa_ops.rade9.co.tif
globalnightlight/201204/SVDNB_npp_d20120418_t0343366_e0349170_b02451_c20120418094918746170_noaa_ops.rade9.co.co.tif

@kernelogic
Copy link
Author

However the two have very different number of lines, CID is much much more than the files

car ls --unixfs /mnt/pool243/cars/worldbank/baga6ea4seaqbm326hi63viv5pmmc3qn4ekt7veu7qdgcdpkcxtu7isao2wllkdi.car | wc -l
155
car ls /mnt/pool243/cars/worldbank/baga6ea4seaqbm326hi63viv5pmmc3qn4ekt7veu7qdgcdpkcxtu7isao2wllkdi.car | wc -l
10393

@kernelogic kernelogic reopened this Sep 24, 2022
@willscott
Copy link
Member

is this wrong?
a file, especially if it's a big file, will often be made up of many chunks of multiple cids

@kernelogic
Copy link
Author

Sorry, no it's not wrong. Just thinking how to get the CID for each file. Not sure how to map them.

@willscott
Copy link
Member

you can also look at the verbose flag to print out the types of the different CIDs, and I believe that will print out the items in a unixfs directory as well

@BigLep BigLep added P3 Low: Not priority right now help wanted Seeking public contribution on this issue labels Oct 4, 2022
@SethDocherty
Copy link

SethDocherty commented Dec 15, 2023

@willscott, I'm trying to get the same details as the OP, gathering the filename/directory name alongside the CID. While the --verbose flag does provide that context, it doesn't all the links. Below is an example of running the car list command:

dag-pb: bafybeifurw2z3xmkrflcqy2xfpr34ys7dneodveyeniov7voiss5azkdhu
	10 links. 2 bytes
		GEDI04_B_MW019MW223_02_002_02_R01000M_MI.tif[14 MB] bafybeiakure7paspbvxqfj4b64svy3vtxqjc72kww6doykphaq63w23ctu
		GEDI04_B_MW019MW223_02_002_02_R01000M_MU.tif[503 MB] bafybeiekmgjua5qvlm3zrnkqj54mnd2y764hk42wdthrk4gfwgjgxvqepi
		GEDI04_B_MW019MW223_02_002_02_R01000M_NC.tif[88 MB] bafybeiduzczefsght5ep52tjkrgh6ylpyp2h5cveyh5pgbnxfubyrdvlxy
		(10 total)
	Unixfs Directory

Unfortunately, the other 7 links are not printed out. I noticed here that the max lines printed is set to 3.

Here's another example but in this case, all of the raw links printed above the dag-pb key which they're linked to.

raw: bafkreigex65obvg2cuxpd4dp5vvxdfdvon3vj4ctwubn3al7dweyc76bqy
raw: bafkreigdhkqsdcyxbj4wbnfu7bcijpc4ahrqvexycxrwrsnhfcjvuiysuq
raw: bafkreiarcvjdnhlvxx2iwdadubwmfbvqjd34h2gpyrpcftqbxnhvm6bpvu
raw: bafkreiedi4c6rebigtkz7pco4vbas3ikmezrec4cojyyh5f3733mtx5zxu
raw: bafkreigeol6iuim7a6emge22otwmhnshyvm6skjmcjhachle5mqtu47uum
raw: bafkreihn5hdzqcx4c7j5f3a34izptpkkfesvxctixxkcszgm6e3gm4q7y4
raw: bafkreihlkkvxchsdjyis2uvh6lv3tgvbhj6fomh5y2x4ro7dkl6zr2xr5m
raw: bafkreifeif6sv3yfdf4yxypmq5z67vnrvrfld22hro6shzan42e73fawri
raw: bafkreihwe4ismovddja343akjs4uk5ilmhdjktnbhc3oryspub6ffz7g5a
raw: bafkreia6boirg2imm2pdypyb45kznzw47dxhav25c2kbf5m5uwkavwllee
raw: bafkreidmyxmzlr3pljpuekbrsmz2i7orwwgjztwea63mukgn5cc3tinpwa
raw: bafkreia4ge2wl2mfez7h3cyhkcp7qmqdwtnspdbyjikugvugscly4hf2je
raw: bafkreifqm7bwo6fdl47fj6dnys4refmr7sd5354ac3wq7atxktrqgpivym
raw: bafkreif6oifgjtaiyvzbb2lpok6lejz2lu2i4tytcvnz2ttfrjszwa4xt4
raw: bafkreiaslirngvag4oa7afcako625rbzjae2yayrg33w6r2qme6wgwh6hy
dag-pb: bafybeigz7sxvjzp6463til5bysvortvyiwkxzopaslrbbxzebaaytajh4a
	15 links. 67 bytes
		[1.0 MB] bafkreigex65obvg2cuxpd4dp5vvxdfdvon3vj4ctwubn3al7dweyc76bqy
		[1.0 MB] bafkreigdhkqsdcyxbj4wbnfu7bcijpc4ahrqvexycxrwrsnhfcjvuiysuq
		[1.0 MB] bafkreiarcvjdnhlvxx2iwdadubwmfbvqjd34h2gpyrpcftqbxnhvm6bpvu
		(15 total)
	Unixfs File

Ideally, it would be much more readable if the output looked like this + conformed to the format when listing file/directory names:

dag-pb: bafybeigz7sxvjzp6463til5bysvortvyiwkxzopaslrbbxzebaaytajh4a
    15 links. 67 bytes
        raw [1.0 MB] bafkreigex65obvg2cuxpd4dp5vvxdfdvon3vj4ctwubn3al7dweyc76bqy
        raw [1.0 MB] bafkreigdhkqsdcyxbj4wbnfu7bcijpc4ahrqvexycxrwrsnhfcjvuiysuq
        raw [1.0 MB] bafkreiarcvjdnhlvxx2iwdadubwmfbvqjd34h2gpyrpcftqbxnhvm6bpvu
        raw [1.0 MB] bafkreiedi4c6rebigtkz7pco4vbas3ikmezrec4cojyyh5f3733mtx5zxu
        raw [1.0 MB] bafkreigeol6iuim7a6emge22otwmhnshyvm6skjmcjhachle5mqtu47uum
        raw [1.0 MB] bafkreihn5hdzqcx4c7j5f3a34izptpkkfesvxctixxkcszgm6e3gm4q7y4
        raw [1.0 MB] bafkreihlkkvxchsdjyis2uvh6lv3tgvbhj6fomh5y2x4ro7dkl6zr2xr5m
        raw [1.0 MB] bafkreifeif6sv3yfdf4yxypmq5z67vnrvrfld22hro6shzan42e73fawri
        raw [1.0 MB] bafkreihwe4ismovddja343akjs4uk5ilmhdjktnbhc3oryspub6ffz7g5a
        raw [1.0 MB] bafkreia6boirg2imm2pdypyb45kznzw47dxhav25c2kbf5m5uwkavwllee
        raw [1.0 MB] bafkreidmyxmzlr3pljpuekbrsmz2i7orwwgjztwea63mukgn5cc3tinpwa
        raw [1.0 MB] bafkreia4ge2wl2mfez7h3cyhkcp7qmqdwtnspdbyjikugvugscly4hf2je
        raw [1.0 MB] bafkreifqm7bwo6fdl47fj6dnys4refmr7sd5354ac3wq7atxktrqgpivym
        raw [1.0 MB] bafkreif6oifgjtaiyvzbb2lpok6lejz2lu2i4tytcvnz2ttfrjszwa4xt4
        raw [1.0 MB] bafkreiaslirngvag4oa7afcako625rbzjae2yayrg33w6r2qme6wgwh6hy
    Unixfs File

@willscott
Copy link
Member

@SethDocherty - is what you want a machine parse-able listing of all of the unixfs 'named' items in a car?
do you want the raw block cids within the unixfs files as well?

@SethDocherty
Copy link

@SethDocherty - is what you want a machine parse-able listing of all of the unixfs 'named' items in a car? do you want the raw block cids within the unixfs files as well?

Apologies for the delay in responding. Hope you had the chance to enjoy the holidays!

In the hopes of answering your question, let me provide some auxiliary details on what I'm trying to do with go-car.

I'm building out a workflow using Singularity v2 to pack up content into CARs as well as performing partial extractions. Singularity lacks the capability to retrieve details about the content found inside CARs. This leads me to why I'm using go-car. I can get the CIDs for all the unixfs 'named' items e.g. directories and files with the car list command, cherry pick the CIDs of content I want to extract and then pass into the Extract Car command. So based on my workflow, I'm really only interested in the CIDs of the unixfs 'named' items. But later down the road I could see a case where I'd be interested in the raw block CIDs within the unixfs file.

Side note, I could use go-car to extract content but Singularity (which is the tool I'm focusing on for this workflow) creates a 'manifest' CAR file containing all the CIDs of all the unixfs 'named' items. My understanding of 'go-car' is that the extract command will only extract content if all the CID references are in a single CAR file.

@willscott
Copy link
Member

@SethDocherty see if #514 does what you need

@SethDocherty
Copy link

@willscott, Thanks! Just tested it on the CAR generated through Singularity and it renders two columns with the CIDs and unixfx path for each item.

While I do prefer the additional details that come with the verbose flag such as the filesize and counts, this gets me what I need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Seeking public contribution on this issue P3 Low: Not priority right now
Projects
None yet
Development

No branches or pull requests

4 participants