Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support gzip on tarball.ImageFromPath #1858

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zhangguanzhang
Copy link

@zhangguanzhang zhangguanzhang commented Jan 3, 2024

support gzip on

docker save alpine | gzip -> alpine.tar.gz
$ docker save alpine -o test.tar
$ docker save alpine | gzip -> test.tar.gz
$ ll -h
total 22M
-rw------- 1 root root  15M Apr 16 09:30 test.tar
-rw-r--r-- 1 root root 6.3M Apr 16 09:31 test.tar.gz

@zhangguanzhang
Copy link
Author

zhangguanzhang commented Jan 4, 2024

@imjasonh @jonjohnsonjr PTAL

@zhangguanzhang
Copy link
Author

@imjasonh PATL

@zhangguanzhang
Copy link
Author

@jonjohnsonjr PTAL

magicHeader := make([]byte, len(ggzip.MagicHeader))
n, err := f.Read(magicHeader)
if n == 0 && err == io.EOF {
return nil, errors.New("invalid tar header")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the error should be reported as "invalid tar or tar.zip header" to match the context, as both formats are supported

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the error should be reported as "invalid tar or tar.zip header" to match the context, as both formats are supported

done

Signed-off-by: zhangguanzhang <zhangguanzhang@qq.com>
@peter-wangxu
Copy link

peter-wangxu commented Apr 13, 2024

do you have a chance to make a benchmark with following code:

image, err := tarball.Image(func() (io.ReadCloser, error) {
file, _ := os.Open(imagePath)
// file.Seek(0, 0)
gzReader, err1 := gzip.NewReader(file)
return gzReader, err1
}, nil)

The code above also supports the tar.gz format without changing the tarball code, but of very slow performance.

I am wondering if they have any performance difference.

@zhangguanzhang
Copy link
Author

do you have a chance to make a benchmark with following code:

image, err := tarball.Image(func() (io.ReadCloser, error) {
file, _ := os.Open(imagePath)
// file.Seek(0, 0)
gzReader, err1 := gzip.NewReader(file)
return gzReader, err1
}, nil)

The code above also supports the tar.gz format without changing the tarball code, but of very slow performance.

I am wondering if they have any performance difference.

I have been busy with work recently. You can compile two different binaries on your own computer and benchmark them to compare them.

@peter-wangxu
Copy link

I did a test, but no luck for simple upload gzip image

PS D:\code\go-containerregistry> .\crane-gzip.exe  push  C:\Users\Peter\image.tar.gz 127.0.0.1:10000/hello/crane-test:0.2
2024/04/13 18:42:34 existing blob: sha256:fe5ca62666f04366c8e7f605aa82997d71320183e99962fa76b3209fdfbb8b58
2024/04/13 18:42:34 existing blob: sha256:07a64a71e01156f8f99039bc246149925c6d1480d3957de78510bbec6ec68f7a
2024/04/13 18:42:34 existing blob: sha256:b02a7525f878e61fc1ef8a7405a2cc17f866e8de222c1c98fd6681aff6e509db
2024/04/13 18:42:34 existing blob: sha256:4aa0ea1413d37a58615488592a0b827ea4b2e48fa5a77cf707d0e35f025e613f
2024/04/13 18:42:34 existing blob: sha256:1e3d9b7d145208fa8fa3ee1c9612d0adaac7255f1bbc9ddea7e461e0b317805c
2024/04/13 18:42:34 existing blob: sha256:e8c73c638ae9ec5ad70c49df7e484040d889cca6b4a9af056579c3d058ea93f0
2024/04/13 18:42:34 existing blob: sha256:5627a970d25e752d971a501ec7e35d0d6fdcd4a3ce9e958715a686853024794a
2024/04/13 18:42:34 existing blob: sha256:7c881f9ab25e0d86562a123b5fb56aebf8aa0ddd7d48ef602faf8d1e7cf43d8c
2024/04/13 18:42:34 existing blob: sha256:804bcaae78ee5b94f93193de7a0c3c3f4c55e6f84a76f299c6c0628e35933d9d
2024/04/13 18:42:34 existing blob: sha256:a0eed15eed4498c145ef2f1883fcd300d7adbb759df73c901abd5383dda668e7
2024/04/13 18:42:35 existing blob: sha256:d481ac5b71a6be7b6f1b0715b57a0ee724143b98aa5da9b54a22f9ba2d6cfc98
2024/04/13 18:42:35 existing blob: sha256:fcb6f6d2c9986d9cd6a2ea3cc2936e5fc613e09f1af9042329011e43057f3265
2024/04/13 18:42:36 existing blob: sha256:86b2214f3e425fcd38954cad4285959f086bd6e9f11b07307e3eb82860224441
Error: PUT http://127.0.0.1:10000/v2/hello/crane-test/manifests/0.2: multiple errors returned: MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:54ad2ec71039b74f7e82f020a92a8c2ca45f16a51930d539b56973a18b8ffe8d; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:6fbdf253bbc2490dcfede5bdb58ca0db63ee8aff565f6ea9f918f3bce9e2d5aa; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:7bea6b893187b14fc0a759fe5f8972d1292a9c0554c87cbf485f0947c26b8a05; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:ff5700ec54186528cbae40f54c24b1a34fb7c01527beaa1232868c16e2353f52; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:d52f02c6501c9c4410568f0bf6ff30d30d8290f57794c308fe36ea78393afac2; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:e624a5370eca2b8266e74d179326e2a8767d361db14d13edd9fb57e408731784; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:1a73b54f556b477f0a8b939d13c504a3b4f4db71f7a09c63afbc10acb3de5849; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:d2d7ec0f6756eb51cf1602c6f8ac4dd811d3d052661142e0110357bf0b581457; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:4cb10dd2545bd173858450b80853b850e49608260f1a0789e0d0b39edf12f500; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:5185a177ceb5de87c52c72d5704ad35976a413b25e14a96149f917e8ed29aedc; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:f9e13b6fcbbe89bc58ca4479888d3a427b4fa468ea750fac2c9a47db7d923d13; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:8247c33dd8dcfc2cedd8a00591c6809676cafb8e0da44b6dbc9aed8525b560e8

I am sure the file was a valid gzip docker image:

$ file image.tar.gz
image.tar.gz: gzip compressed data, was "image.tar", last modified: Sat Apr 13 10:23:29 2024, from Unix, original size modulo 2^32 148959232

Peter@ASUS-TUF-GAMING MINGW64 ~
$ docker load -i image.tar.gz
Loaded image: registry.k8s.io/etcd:3.5.10-0

@zhangguanzhang
Copy link
Author

I did a test, but no luck for simple upload gzip image

PS D:\code\go-containerregistry> .\crane-gzip.exe  push  C:\Users\Peter\image.tar.gz 127.0.0.1:10000/hello/crane-test:0.2
2024/04/13 18:42:34 existing blob: sha256:fe5ca62666f04366c8e7f605aa82997d71320183e99962fa76b3209fdfbb8b58
2024/04/13 18:42:34 existing blob: sha256:07a64a71e01156f8f99039bc246149925c6d1480d3957de78510bbec6ec68f7a
2024/04/13 18:42:34 existing blob: sha256:b02a7525f878e61fc1ef8a7405a2cc17f866e8de222c1c98fd6681aff6e509db
2024/04/13 18:42:34 existing blob: sha256:4aa0ea1413d37a58615488592a0b827ea4b2e48fa5a77cf707d0e35f025e613f
2024/04/13 18:42:34 existing blob: sha256:1e3d9b7d145208fa8fa3ee1c9612d0adaac7255f1bbc9ddea7e461e0b317805c
2024/04/13 18:42:34 existing blob: sha256:e8c73c638ae9ec5ad70c49df7e484040d889cca6b4a9af056579c3d058ea93f0
2024/04/13 18:42:34 existing blob: sha256:5627a970d25e752d971a501ec7e35d0d6fdcd4a3ce9e958715a686853024794a
2024/04/13 18:42:34 existing blob: sha256:7c881f9ab25e0d86562a123b5fb56aebf8aa0ddd7d48ef602faf8d1e7cf43d8c
2024/04/13 18:42:34 existing blob: sha256:804bcaae78ee5b94f93193de7a0c3c3f4c55e6f84a76f299c6c0628e35933d9d
2024/04/13 18:42:34 existing blob: sha256:a0eed15eed4498c145ef2f1883fcd300d7adbb759df73c901abd5383dda668e7
2024/04/13 18:42:35 existing blob: sha256:d481ac5b71a6be7b6f1b0715b57a0ee724143b98aa5da9b54a22f9ba2d6cfc98
2024/04/13 18:42:35 existing blob: sha256:fcb6f6d2c9986d9cd6a2ea3cc2936e5fc613e09f1af9042329011e43057f3265
2024/04/13 18:42:36 existing blob: sha256:86b2214f3e425fcd38954cad4285959f086bd6e9f11b07307e3eb82860224441
Error: PUT http://127.0.0.1:10000/v2/hello/crane-test/manifests/0.2: multiple errors returned: MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:54ad2ec71039b74f7e82f020a92a8c2ca45f16a51930d539b56973a18b8ffe8d; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:6fbdf253bbc2490dcfede5bdb58ca0db63ee8aff565f6ea9f918f3bce9e2d5aa; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:7bea6b893187b14fc0a759fe5f8972d1292a9c0554c87cbf485f0947c26b8a05; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:ff5700ec54186528cbae40f54c24b1a34fb7c01527beaa1232868c16e2353f52; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:d52f02c6501c9c4410568f0bf6ff30d30d8290f57794c308fe36ea78393afac2; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:e624a5370eca2b8266e74d179326e2a8767d361db14d13edd9fb57e408731784; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:1a73b54f556b477f0a8b939d13c504a3b4f4db71f7a09c63afbc10acb3de5849; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:d2d7ec0f6756eb51cf1602c6f8ac4dd811d3d052661142e0110357bf0b581457; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:4cb10dd2545bd173858450b80853b850e49608260f1a0789e0d0b39edf12f500; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:5185a177ceb5de87c52c72d5704ad35976a413b25e14a96149f917e8ed29aedc; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:f9e13b6fcbbe89bc58ca4479888d3a427b4fa468ea750fac2c9a47db7d923d13; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:8247c33dd8dcfc2cedd8a00591c6809676cafb8e0da44b6dbc9aed8525b560e8

I am sure the file was a valid gzip docker image:

$ file image.tar.gz
image.tar.gz: gzip compressed data, was "image.tar", last modified: Sat Apr 13 10:23:29 2024, from Unix, original size modulo 2^32 148959232

Peter@ASUS-TUF-GAMING MINGW64 ~
$ docker load -i image.tar.gz
Loaded image: registry.k8s.io/etcd:3.5.10-0

用linux下跑我单元测试里那个呢

@peter-wangxu
Copy link

peter-wangxu commented Apr 14, 2024

I did a test, but no luck for simple upload gzip image

PS D:\code\go-containerregistry> .\crane-gzip.exe  push  C:\Users\Peter\image.tar.gz 127.0.0.1:10000/hello/crane-test:0.2
2024/04/13 18:42:34 existing blob: sha256:fe5ca62666f04366c8e7f605aa82997d71320183e99962fa76b3209fdfbb8b58
2024/04/13 18:42:34 existing blob: sha256:07a64a71e01156f8f99039bc246149925c6d1480d3957de78510bbec6ec68f7a
2024/04/13 18:42:34 existing blob: sha256:b02a7525f878e61fc1ef8a7405a2cc17f866e8de222c1c98fd6681aff6e509db
2024/04/13 18:42:34 existing blob: sha256:4aa0ea1413d37a58615488592a0b827ea4b2e48fa5a77cf707d0e35f025e613f
2024/04/13 18:42:34 existing blob: sha256:1e3d9b7d145208fa8fa3ee1c9612d0adaac7255f1bbc9ddea7e461e0b317805c
2024/04/13 18:42:34 existing blob: sha256:e8c73c638ae9ec5ad70c49df7e484040d889cca6b4a9af056579c3d058ea93f0
2024/04/13 18:42:34 existing blob: sha256:5627a970d25e752d971a501ec7e35d0d6fdcd4a3ce9e958715a686853024794a
2024/04/13 18:42:34 existing blob: sha256:7c881f9ab25e0d86562a123b5fb56aebf8aa0ddd7d48ef602faf8d1e7cf43d8c
2024/04/13 18:42:34 existing blob: sha256:804bcaae78ee5b94f93193de7a0c3c3f4c55e6f84a76f299c6c0628e35933d9d
2024/04/13 18:42:34 existing blob: sha256:a0eed15eed4498c145ef2f1883fcd300d7adbb759df73c901abd5383dda668e7
2024/04/13 18:42:35 existing blob: sha256:d481ac5b71a6be7b6f1b0715b57a0ee724143b98aa5da9b54a22f9ba2d6cfc98
2024/04/13 18:42:35 existing blob: sha256:fcb6f6d2c9986d9cd6a2ea3cc2936e5fc613e09f1af9042329011e43057f3265
2024/04/13 18:42:36 existing blob: sha256:86b2214f3e425fcd38954cad4285959f086bd6e9f11b07307e3eb82860224441
Error: PUT http://127.0.0.1:10000/v2/hello/crane-test/manifests/0.2: multiple errors returned: MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:54ad2ec71039b74f7e82f020a92a8c2ca45f16a51930d539b56973a18b8ffe8d; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:6fbdf253bbc2490dcfede5bdb58ca0db63ee8aff565f6ea9f918f3bce9e2d5aa; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:7bea6b893187b14fc0a759fe5f8972d1292a9c0554c87cbf485f0947c26b8a05; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:ff5700ec54186528cbae40f54c24b1a34fb7c01527beaa1232868c16e2353f52; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:d52f02c6501c9c4410568f0bf6ff30d30d8290f57794c308fe36ea78393afac2; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:e624a5370eca2b8266e74d179326e2a8767d361db14d13edd9fb57e408731784; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:1a73b54f556b477f0a8b939d13c504a3b4f4db71f7a09c63afbc10acb3de5849; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:d2d7ec0f6756eb51cf1602c6f8ac4dd811d3d052661142e0110357bf0b581457; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:4cb10dd2545bd173858450b80853b850e49608260f1a0789e0d0b39edf12f500; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:5185a177ceb5de87c52c72d5704ad35976a413b25e14a96149f917e8ed29aedc; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:f9e13b6fcbbe89bc58ca4479888d3a427b4fa468ea750fac2c9a47db7d923d13; MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:8247c33dd8dcfc2cedd8a00591c6809676cafb8e0da44b6dbc9aed8525b560e8

I am sure the file was a valid gzip docker image:

$ file image.tar.gz
image.tar.gz: gzip compressed data, was "image.tar", last modified: Sat Apr 13 10:23:29 2024, from Unix, original size modulo 2^32 148959232

Peter@ASUS-TUF-GAMING MINGW64 ~
$ docker load -i image.tar.gz
Loaded image: registry.k8s.io/etcd:3.5.10-0

用linux下跑我单元测试里那个呢

I did a rebase on the main or your branch, all seem work fine now:)

@jonjohnsonjr
Copy link
Collaborator

Why do you want this change? It doesn't make a lot of sense to me.

The output of docker save is a tarball that contains mostly gzipped tarballs already. There are a few JSON files, but there's really no point in gzipping it.

$ docker save alpine > alpine.tar
$ docker save alpine | gzip > alpine.tar.gz

$ ls -l alpine*
-rw-r--r--@ 1 jonjohnson  staff  3359744 Apr 15 10:43 alpine.tar
-rw-r--r--@ 1 jonjohnson  staff  3331750 Apr 15 10:44 alpine.tar.gz

Since this code currently re-opens the stream with every file access, this is going to be really really slow. It would be better for callers to decompress it themselves first.

I have been experimenting with some code that would make this a little bit less slow, but I'd rather avoid complex stuff like that if we can.

Can you describe why you need this?

@zhangguanzhang
Copy link
Author

zhangguanzhang commented Apr 16, 2024

Why do you want this change? It doesn't make a lot of sense to me.

The output of docker save is a tarball that contains mostly gzipped tarballs already. There are a few JSON files, but there's really no point in gzipping it.

$ docker save alpine > alpine.tar
$ docker save alpine | gzip > alpine.tar.gz

$ ls -l alpine*
-rw-r--r--@ 1 jonjohnson  staff  3359744 Apr 15 10:43 alpine.tar
-rw-r--r--@ 1 jonjohnson  staff  3331750 Apr 15 10:44 alpine.tar.gz

Since this code currently re-opens the stream with every file access, this is going to be really really slow. It would be better for callers to decompress it themselves first.

I have been experimenting with some code that would make this a little bit less slow, but I'd rather avoid complex stuff like that if we can.

Can you describe why you need this?

Thanks for reply.
This is a typical trade-off scenario between file space and CPU time. Many times, it is expected that the final file size will be as small as possible. Just like in the connected issue . and there is a demo:

$ docker save alpine -o test.tar
$ docker save alpine | gzip -> test.tar.gz
$ ll -h
total 22M
-rw------- 1 root root  15M Apr 16 09:30 test.tar
-rw-r--r-- 1 root root 6.3M Apr 16 09:31 test.tar.gz
$ file test.tar*
test.tar:    POSIX tar archive
test.tar.gz: gzip compressed data, from Unix, original size modulo 2^32 15634944

The size difference will be larger after larger image compression. Maybe there is a performance problem in my implementation, but this is indeed a very common scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants