Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

gix fails to decode a tree object that git has no trouble with #1096

Open
bradlarsen opened this issue Nov 8, 2023 · 13 comments
Open

gix fails to decode a tree object that git has no trouble with #1096

bradlarsen opened this issue Nov 8, 2023 · 13 comments
Assignees
Labels
acknowledged an issue is accepted as shortcoming to be fixed feedback requested

Comments

@bradlarsen
Copy link

Current behavior 馃槸

There are certain Git repositories that contain tree objects that gix fails to decode, but which git itself has no trouble with.

For example, from a test program included in this issue:

Finding tree 7dde39c37eea1cc6f81a04ba50e2ef55333c6966 from repo at /disk1/clones/jsbin/jsbin
Err(Decode(Error { inner: VerboseError { errors: [("040000 css\0\xA7E/\xD7菭\x91S\x96p\xF2\xE8S甩S\x98@-\xDF100644 favicon.ico\0p\x06\x9B\xADB\x8A\x93\xD4V\xCD\x15\x93\x9D\xB2\xF1\xD66v4\x90100644 help.html\0\xBDp\xC6M$\xC7\u{1e}D\xC8@OZ>\\\xA5TU\xAB\xAE\xE5040000 help\0\xE7eo7rg g\n\x18_:\x12\xE3<\xD2!\xFF\xF9\x82040000 images\0d\x97\xB2\x82C\xAF%n\t)\xDB\x05\xD0
q\xE2\xC3\x0c\xBD\xE1C100644 index.php\0\xFF\r\x08\xC4\x0b\xCF \xFB\xEE:]\x7fQ訅b\xEC1\xD6\u{1c}40000 js\0^\"\xE0P\x94\xA0bi4\xCB/s\x19\xE9-\xA2\xD3C\xE7&040000 lib\0\xAB;\xBF\xBC\x88)Y\xE8p\n=\r\xF3\xAF\xCFH\x13\xB6\x0e\x8C100644 phpMake\0<\xB0\xFF\x96Uvq\xB4\xBB\xC6,\xC1!\x17\xC78\x99<j\xE5100644 sprocketize.php\0\x82L\xD6\u{1f}\xD8\x13\xC8\x0bq\x16\x0f\xF9\x86
`5no\x9B\x10b", Nom(Eof))] } }))

Additionally, it appears that the verbose-object-parsing-errors feature of gix-object is broken when using gix newer than 0.51.0 and gix-object newer than 0.33. Rather than getting a detailed error message when an object fails to parse, the error message is None, or rendered as an empty string. This seems to coincide with switching away from nom for object parsing.

Expected behavior 馃

gix should correctly parse the tree objects.

Git behavior

git seems to have no trouble!

Steps to reproduce 馃暪

For example, see https://github.com/jsbin/jsbin, which has a few problematic trees:

  • 5e22e05094a0626934cb2f7319e92da2d343e726
  • 7dde39c37eea1cc6f81a04ba50e2ef55333c6966
  • ad20e1886a324465093f656a0910b9fe00429977
  • e0d917771d3c3ffb7782158d9f33b2b7e9c6c524

Locally, I cloned the repo to /disk1/clones/jsbin/jsbin using git clone --mirror.

We can inspect the second problematic tree (7dde39c37eea1cc6f81a04ba50e2ef55333c6966) with Git. It is the root tree of this commit: jsbin/jsbin@884c72e

$ (cd /disk1/clones/jsbin/jsbin && git cat-file -p 7dde39c37eea1cc6f81a04ba50e2ef55333c6966)
100644 blob 996ff49cd001cdbae32514ac195edb2eebacdcd0    .gitignore
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    .gitmodules
100644 blob e5feb64e6839795c0d47ba0476b34cbb44d406dd    .htaccess
100644 blob e76d29c74827f2766941518789b74f19afb6b6d5    MIT-LICENSE.TXT
100644 blob 47c2818071f4e8c03994dea6090148ef8aca228c    README.markdown
100644 blob f5a2e73a21171b24343e4dfc44da3a4a328fbf0e    about.html
100644 blob 4ff6443cf419b2a255015ec2237f135330cf2fdd    app.php
100644 blob 21ea174a8fc08bdceb0dca1b4bb54ee159371995    config.php
040000 tree a7452fd7c7a091539670f2e853cba65398402ddf    css
100644 blob 70069bad428a93d456cd15939db2f1d636763490    favicon.ico
100644 blob bd70c64d24c71e44c8404f5a3e5ca55455abaee5    help.html
040000 tree e7656f37726720670a185f3a12e33cd221fff982    help
040000 tree 6497b28243af256e0929db05d071e2c30cbde143    images
100644 blob ff0d08c40bcf20fbee3a5d7f51d38962ec31d61c    index.php
040000 tree 5e22e05094a0626934cb2f7319e92da2d343e726    js
040000 tree ab3bbfbc882959e8700a3d0df3afcf4813b60e8c    lib
100644 blob 3cb0ff96557671b4bbc62cc12117c738993c6ae5    phpMake
100644 blob 824cd61fd813c80b71160ff98660356e6f9b1062    sprocketize.php

Git says its payload is 659 bytes long:

$ (cd /disk1/clones/jsbin/jsbin && git cat-file -s 7dde39c37eea1cc6f81a04ba50e2ef55333c6966)
659

I wrote a small gix test program that demonstrates the problem.

Cargo.toml:

[package]
name = "gix-tree-bug"
version = "0.1.0"
edition = "2021"

[dependencies]

# Note that the `verbose-object-parsing-errors` feature is broken in gix 0.52 + gix-object 0.35 and later: error messages come out as `None`, rendered as empty strings.
# The feature is not broken in gix 0.51 + gix-object 0.33.
gix = { version = "0.51" }
gix-object = { version = "0.33", features = ["verbose-object-parsing-errors"] }

src/main.rs:

// This example program shows a failure to parse a tree object using gix.
//
// The repo at `https://github.com/jsbin/jsbin` has several trees that fail to parse:
//
//   - 5e22e05094a0626934cb2f7319e92da2d343e726
//   - 7dde39c37eea1cc6f81a04ba50e2ef55333c6966
//   - ad20e1886a324465093f656a0910b9fe00429977
//   - e0d917771d3c3ffb7782158d9f33b2b7e9c6c524
//
// Note that Git itself does not have issues working with these tree objects.

use gix::prelude::*;

fn main() {
    let args: Vec<String> = std::env::args().collect();

    if args.len() != 3 {
        eprintln!("Usage: {} REPO_PATH TREE_OID", args[0]);
        return ();
    }

    let repo_path = &args[1];
    let tree_oid = &args[2];
    let tree_oid = gix::ObjectId::from_hex(tree_oid.as_bytes()).expect("Should be able to parse tree oid");

    println!("Finding tree {tree_oid} from repo at {repo_path}");

    let opts = gix::open::Options::isolated().open_path_as_is(true);
    let repo = gix::open_opts(repo_path, opts).expect("Should be able to open repository");

    // First, try getting the tree via the odb APIs
    {
        let odb = &repo.objects;
        let mut scratch: Vec<u8> = Vec::new();
        let res_odb = odb.find_tree(tree_oid, &mut scratch);
        println!("{res_odb:?}");

    }

    // Next, try getting the tree via the repo APIs
    {
        let res_repo = repo.find_object(tree_oid);
        println!("{res_repo:?}");

        let obj = res_repo.unwrap().try_into_tree().unwrap();
        println!("{obj:?}");

        println!("{}", obj.data.len());
        println!("{:?}", obj.data);

        let decoded = obj.decode();
        println!("{decoded:?}");
    }
}

Running this for the second bad tree object noted above, we get this:

$ cargo run -- /disk1/clones/jsbin/jsbin 7dde39c37eea1cc6f81a04ba50e2ef55333c6966
    Finished dev [unoptimized + debuginfo] target(s) in 0.08s
     Running `target/debug/gix-tree-bug /disk1/clones/jsbin/jsbin 7dde39c37eea1cc6f81a04ba50e2ef55333c6966`
Finding tree 7dde39c37eea1cc6f81a04ba50e2ef55333c6966 from repo at /disk1/clones/jsbin/jsbin
Err(Decode(Error { inner: VerboseError { errors: [("040000 css\0\xA7E/\xD7菭\x91S\x96p\xF2\xE8S甩S\x98@-\xDF100644 favicon.ico\0p\x06\x9B\xADB\x8A\x93\xD4V\xCD\x15\x93\x9D\xB2\xF1\xD66v4\x90100644 help.html\0\xBDp\xC6M$\xC7\u{1e}D\xC8@OZ>\\\xA5TU\xAB\xAE\xE5040000 help\0\xE7eo7rg g\n\x18_:\x12\xE3<\xD2!\xFF\xF9\x82040000 images\0d\x97\xB2\x82C\xAF%n\t)\xDB\x05\xD0q\xE2\xC3\x0c\xBD\xE1C100644 index.php\0\xFF\r\x08\xC4\x0b\xCF \xFB\xEE:]\x7fQ訅b\xEC1\xD6\u{1c}40000 js\0^\"\xE0P\x94\xA0bi4\xCB/s\x19\xE9-\xA2\xD3C\xE7&040000 lib\0\xAB;\xBF\xBC\x88)Y\xE8p\n=\r\xF3\xAF\xCFH\x13\xB6\x0e\x8C100644 phpMake\0<\xB0\xFF\x96Uvq\xB4\xBB\xC6,\xC1!\x17\xC78\x99<j\xE5100644 sprocketize.php\0\x82L\xD6\u{1f}\xD8\x13\xC8\x0bq\x16\x0f\xF9\x86`5no\x9B\x10b", Nom(Eof))] } }))
Ok(Tree(7dde39c37eea1cc6f81a04ba50e2ef55333c6966))
Tree(7dde39c37eea1cc6f81a04ba50e2ef55333c6966)
659
[49, 48, 48, 54, 52, 52, 32, 46, 103, 105, 116, 105, 103, 110, 111, 114, 101, 0, 153, 111, 244, 156, 208, 1, 205, 186, 227, 37, 20, 172, 25, 94, 219, 46, 235, 172, 220, 208, 49, 48, 48, 54, 52, 52, 32, 46, 103, 105, 116, 109, 111, 100, 117, 108, 101, 115, 0, 230, 157, 226, 155, 178, 209, 214, 67, 75, 139, 41, 174, 119, 90, 216, 194, 228, 140, 83, 145, 49, 48, 48, 54, 52, 52, 32, 46, 104, 116, 97, 99, 99, 101, 115, 115, 0, 229, 254, 182, 78, 104, 57, 121, 92, 13, 71, 186, 4, 118, 179, 76, 187, 68, 212, 6, 221, 49, 48, 48, 54, 52, 52, 32, 77, 73, 84, 45, 76, 73, 67, 69, 78, 83, 69, 46, 84, 88, 84, 0, 231, 109, 41, 199, 72, 39, 242, 118, 105, 65, 81, 135, 137, 183, 79, 25, 175, 182, 182, 213, 49, 48, 48, 54, 52, 52, 32, 82, 69, 65, 68, 77, 69, 46, 109, 97, 114, 107, 100, 111, 119, 110, 0, 71, 194, 129, 128, 113, 244, 232, 192, 57, 148, 222, 166, 9, 1, 72, 239, 138, 202, 34, 140, 49, 48, 48, 54, 52, 52, 32, 97, 98, 111, 117, 116, 46, 104, 116, 109, 108, 0, 245, 162, 231, 58, 33, 23, 27, 36, 52, 62, 77, 252, 68, 218, 58, 74, 50, 143, 191, 14, 49, 48, 48, 54, 52, 52, 32, 97, 112, 112, 46, 112, 104, 112, 0, 79, 246, 68, 60, 244, 25, 178, 162, 85, 1, 94, 194, 35, 127, 19, 83, 48, 207, 47, 221, 49, 48, 48, 54, 52, 52, 32, 99, 111, 110, 102, 105, 103, 46, 112, 104, 112, 0, 33, 234, 23, 74, 143, 192, 139, 220, 235, 13, 202, 27, 75, 181, 78, 225, 89, 55, 25, 149, 48, 52, 48, 48, 48, 48, 32, 99, 115, 115, 0, 167, 69, 47, 215, 199, 160, 145, 83, 150, 112, 242, 232, 83, 203, 166, 83, 152, 64, 45, 223, 49, 48, 48, 54, 52, 52, 32, 102, 97, 118, 105, 99, 111, 110, 46, 105, 99, 111, 0, 112, 6, 155, 173, 66, 138, 147, 212, 86, 205, 21, 147, 157, 178, 241, 214, 54, 118, 52, 144, 49, 48, 48, 54, 52, 52, 32, 104, 101, 108, 112, 46, 104, 116, 109, 108, 0, 189, 112, 198, 77, 36, 199, 30, 68, 200, 64, 79, 90, 62, 92, 165, 84, 85, 171, 174, 229, 48, 52, 48, 48, 48, 48, 32, 104, 101, 108, 112, 0, 231, 101, 111, 55, 114, 103, 32, 103, 10, 24, 95, 58, 18, 227, 60, 210, 33, 255, 249, 130, 48, 52, 48, 48, 48, 48, 32, 105, 109, 97, 103, 101, 115, 0, 100, 151, 178, 130, 67, 175, 37, 110, 9, 41, 219, 5, 208, 113, 226, 195, 12, 189, 225, 67, 49, 48, 48, 54, 52, 52, 32, 105, 110, 100, 101, 120, 46, 112, 104, 112, 0, 255, 13, 8, 196, 11, 207, 32, 251, 238, 58, 93, 127, 81, 211, 137, 98, 236, 49, 214, 28, 52, 48, 48, 48, 48, 32, 106, 115, 0, 94, 34, 224, 80, 148, 160, 98, 105, 52, 203, 47, 115, 25, 233, 45, 162, 211, 67, 231, 38, 48, 52, 48, 48, 48, 48, 32, 108, 105, 98, 0, 171, 59, 191, 188, 136, 41, 89, 232, 112, 10, 61, 13, 243, 175, 207, 72, 19, 182, 14, 140, 49, 48, 48, 54, 52, 52, 32, 112, 104, 112, 77, 97, 107, 101, 0, 60, 176, 255, 150, 85, 118, 113, 180, 187, 198, 44, 193, 33, 23, 199, 56, 153, 60, 106, 229, 49, 48, 48, 54, 52, 52, 32, 115, 112, 114, 111, 99, 107, 101, 116, 105, 122, 101, 46, 112, 104, 112, 0, 130, 76, 214, 31, 216, 19, 200, 11, 113, 22, 15, 249, 134, 96, 53, 110, 111, 155, 16, 98]
Err(Error { inner: VerboseError { errors: [("040000 css\0\xA7E/\xD7菭\x91S\x96p\xF2\xE8S甩S\x98@-\xDF100644 favicon.ico\0p\x06\x9B\xADB\x8A\x93\xD4V\xCD\x15\x93\x9D\xB2\xF1\xD66v4\x90100644 help.html\0\xBDp\xC6M$\xC7\u{1e}D\xC8@OZ>\\\xA5TU\xAB\xAE\xE5040000 help\0\xE7eo7rg g\n\x18_:\x12\xE3<\xD2!\xFF\xF9\x82040000 images\0d\x97\xB2\x82C\xAF%n\t)\xDB\x05\xD0q\xE2\xC3\x0c\xBD\xE1C100644 index.php\0\xFF\r\x08\xC4\x0b\xCF \xFB\xEE:]\x7fQ訅b\xEC1\xD6\u{1c}40000 js\0^\"\xE0P\x94\xA0bi4\xCB/s\x19\xE9-\xA2\xD3C\xE7&040000 lib\0\xAB;\xBF\xBC\x88)Y\xE8p\n=\r\xF3\xAF\xCFH\x13\xB6\x0e\x8C100644 phpMake\0<\xB0\xFF\x96Uvq\xB4\xBB\xC6,\xC1!\x17\xC78\x99<j\xE5100644 sprocketize.php\0\x82L\xD6\u{1f}\xD8\x13\xC8\x0bq\x16\x0f\xF9\x86`5no\x9B\x10b", Nom(Eof))] } })
@bradlarsen
Copy link
Author

Note, this feels similar to the problem I reported in #950.

@bradlarsen
Copy link
Author

I have seen this sort of tree parsing error arise with gix several hundred times when examining the content of ~7500 Git repositories using Nosey Parker. They probably are all failing to decode in the same way.

@Byron
Copy link
Owner

Byron commented Nov 9, 2023

Thanks a lot for this fantastic report! I see how much time you must have spent not only to compile it, but also to debug the issue at hand down to this point.

I also see how there is two issues here, one regarding verbose-object-parsing-errors not working in gix-object, and the other about trees not being able to be parsed.

For the first of the two, a test was always missing, presumably, so it was missed during the winnow conversion, and I am already curious what surprise could lead to trees not being parsed correctly. After all, they truly have the simplest format.

I will definitely keep you posted about the progress on this matter.

@Byron Byron added the acknowledged an issue is accepted as shortcoming to be fixed label Nov 9, 2023
@Byron Byron self-assigned this Nov 9, 2023
Byron added a commit that referenced this issue Nov 9, 2023
Fix the slow and the fast-path tree-parsers to be able to cope
with a greater variety of trees.
@Byron
Copy link
Owner

Byron commented Nov 9, 2023

Could you also elaborate where find_tree is called, or where in NoseyParker it runs into this issue?

I am asking because doing so allocates a vector, and maybe the Iter version of this will suffice as well to avoid the allocation - it could be 2.5x faster then.

Byron added a commit that referenced this issue Nov 9, 2023
Fix the slow and the fast-path tree-parsers to be able to cope
with a greater variety of trees.
Byron added a commit that referenced this issue Nov 9, 2023
Fix the slow and the fast-path tree-parsers to be able to cope
with a greater variety of trees.
@Byron
Copy link
Owner

Byron commented Nov 9, 2023

The fix is now available in the linked PR, and it should land in main soon as well.

It would be great if you could validate it's working for all repositories now, and if not I am happy for more samples that won't parse.

Thank you.

@bradlarsen
Copy link
Author

Could you also elaborate where find_tree is called, or where in NoseyParker it runs into this issue?

I am asking because doing so allocates a vector, and maybe the Iter version of this will suffice as well to avoid the allocation - it could be 2.5x faster then.

The specific call is in here. The code iterates over the object IDs in the ODB, reads each header, and switches on object type. In the tree-handling case, odb.find_tree is called using a scratch buffer that is allocated outside the loop.

What's this Iter version you mention?

Some background:

Nosey Parker detects secrets in Git repos by scanning each blob.

Since its v0.14.0 release in August, it now additionally investigates commit and tree objects within Git repositories in order to calculate detailed metadata for each blob it scans. This metadata for each blob includes the commit that first introduced it and the pathname it appeared with. It uses a novel graph algorithm (?) to efficiently determine this metadata for all blobs, many orders of magnitude faster than you can with regular git commands.

The big idea in the algorithm is to build an in-memory representation of the inverted Git commit graph (allowing quick lookup of child commits), then traverse that in topological order, determining for each commit the set of blobs that were first introduced there, with respect to all other commits in the repository.

The implementation uses an adaptation of Kahn鈥檚 algorithm for topological traversal, combined with a priority queue ordered by commit node out-degree. The priority queue approach heuristically minimizes memory use by keeping the work list of nodes that still need to be visited small. Additionally, the implementation uses data representations that provide good CPU cache behavior and constant factors, such as compressed bitmaps to keep track of which blobs have been seen at each commit.

@bradlarsen
Copy link
Author

@Byron thank you for the very fast response and patch!

What was the root of the problem? Was that tree object badly formed somehow?

I've tested Nosey Parker with the main branch of gix, and that reduces the number of warnings about trees that failed to parse, but I still see a few.

From https://github.com/xilense/Home-AssistantConfig:

04b414cb43033d8a97ef92100f4af19bb419f88f
06d9a45dcfcf0d938240f43fefc9990a25efee80
0839676ca0bc4c5bbe6d8ce772cece8e316b9e8f
0a5965092a8cf976fc984a04b608daa7a1fad05d
0a5fe8ec7728b76fc69c408d2de657bc01f8ec9f
0bcdafad3d373b39f3cfa3e1278ae2d05e353ca8
0ceecead3b46d9f8c2d753cb4bee4957955f3abf
160b4512a93583b8bcc3bb1064d2b1eb51759c7a
1847f1b0b4a6424cb15296f5505acd2720d8eeba
1e3e1343e99fd8c4448357d5fb1a82427e374790
1f2ce61f8db31c6284572509fb8ea11932aa4779
27b6a959ab9c3ca50fe9474bb6280ab339a7ea6f
280d1cd88aab93e3d80098340e2b9dc653a9087a
34018453297e23d63b0ea6ee5faa8a2ff117cea6
35cb2329ffbd1a02f18423026ef1c2f6aeb9607e
3ebb555922bffdcabb8b5e3695ab4dd379d1786f
4dc8ad60994fdaf720ad11b6e6fc9edb9b8f2e6f
4f71b80898f86e87f909aa3910560430c1f87734
4fc02d36b9ada1a540ae7c7a391fd0d7b6f1aff1
523751ac99f5c0569665a7023c9a10ee920126b3
56cd766a03d309033d3e8f7461413ca94f3cfa9d
604d88d7dcbd09830b76ea28408d950cdd139eec
6319b401421a883e741aa0a560dd19f1f71201ab
6aaa446307c99419b58ed044f7a2e10291bcd3f2
6c398b39f707fac3abdedcc8035238cf9d87215f
6e34f8682997ec0374698b9c3e5b6bac45d1c9b0
6fa897cc55b09d2d6bb7db8a97b1359d1aee4339
7416819d785bd3a49a9ad0ad7b3739c4e3a28210
775a527b4797a47b417a90ac4e34d90082f6a296
8d8d38da677ac2f5ab89d9a95a1431d69a40fa93
90a1ba6add24929060e98b218f925f045bf3b349
9add112007d208f22dffaeef93a9e613f0706063
a63b18121c403b16d4a1e1657f564ef2566004b7
a890291c9b89ef432b3819271c0dab028f12128c
aad6c296c49388f83214315017118d54fe461e45
ad10501c1759f0f2be86dc3dae4d8f260b32af35
b080b4150d566f32995446e93b186cb1fc1a6791
b18418a478e08aa1dafbcd8bf377adb539575c8f
b25522729c3edf7a672f4112457b3736e221671f
bfbf77661b6ed14cbdd931e078565fb1b5881aa1
c65a59a3174e365170bd8ac6978d07db0adf7412
cd85c0cd9b01a99ad95368d5e0ebaca500fe1363
cf6b28bb633b430ee7271eaaff503df4c9e221d9
da1eebe232c6712fe6c05c5e5c1785951699121e
da80d3e09a73967abb833f9992361dcfe9177081
da87ef555b3493e7e6c423816ed5384d751cdf68
de2f41917e7909561c44d1570fb31da1b9dcdb2b
e068f5ec97fc2d5b0fce7148bab83129ac3324e2
e41c01d52df382aea7ad1991965bb6391932f6db
e8fa1098f8df5575999eb33a4433d7cfc4572fe1
ee35bba3ae50d1f5ac89138d45969a1d070a4264
f196ff373d8c07be2529d81c038f7115ebd58c2f
f8b8fe751b6d038edc381571f00fc65eb902a0b0
f938fa761e46b85a8a3cd847b05ee6400e43eb7c
feb85c0e834ef430434d0331ee0fd9448bd6c4c9

From https://github.com/akram/openclassifieds2:

4fc35139d3bb25c2108e45442ba8064dbefbe22a
656aafda80e497a3dd833d4b9d7854b03829aa97
bb7e32fd8be7e0ed0c9e5440789c050622c715fa
f62f6c1510e608a99a34cfb4323039522fe38df4

@Byron
Copy link
Owner

Byron commented Nov 10, 2023

The specific call is in here. The code iterates over the object IDs in the ODB, reads each header, and switches on object type. In the tree-handling case, odb.find_tree is called using a scratch buffer that is allocated outside the loop.

What's this Iter version you mention?

In this line it shows that ultimately the code wants to iterate entries, so there is no need to allocate a Vec of these.

Instead, TreeRefIter can be used to parse each entry on the fly, which also has the advantage that you'd get at least some of the entries until the first parse failure.

Thanks so much for sharing the algorithmic approach that's implemented in NoseyParker, I am loving it! Even though it's way above my head, it gives me hope that some of these algorithms can also be used to speed up gitoxide in some places where they are applicable.

What was the root of the problem? Was that tree object badly formed somehow?

I don't know actually. There were two parser implementations, one with winnow that is based on the nom version before it, and a fast-path which is just a manual implementation. The latter worked out of the box, so I just removed the former entirely. Trees are very simple objects and probably best parsed 'by hand'.

Thus I am curious what these other failing trees look like - I will let you know once I know more.

@Byron
Copy link
Owner

Byron commented Nov 10, 2023

I did the fix properly this time and fixed another shortcoming that would have bitten some other time, I am sure. With that debt removed, I'd hope these kinds of failures are nothing you will see ever again.
Let's see what you find out though :D.

@bradlarsen
Copy link
Author

I have updated a local development copy of Nosey Parker to use the latest gix from main, with your additional fix in 8d05cae. I had to make a small change there to get the code compiling again for the small API changes (EntryKind vs EntryMode).

Anyway, I ran over my 7500 input repos again, and this time saw no tree parsing errors. Thank you @Byron! I'll get those changes on the next gix release.

@Byron
Copy link
Owner

Byron commented Nov 15, 2023

I am glad to hear that!

I had to make a small change there to get the code compiling again for the small API changes (EntryKind vs EntryMode).

Yes, those really had to happen to remove the hopefully last 'wart' from one of the earliest crates in the family. These sometimes have terrible shortcuts in them as back then, my own quality standards for what was a pet-project were much lower.

@bradlarsen
Copy link
Author

I've also switched to using odb.find_tree_ref_iter instead of odb.find_tree. Every performance-conscious detail like this matters!

@Byron
Copy link
Owner

Byron commented Nov 15, 2023

Absolutely! If you ever run into a profile that indicates something in gix can be improved, please don't hesitate to let me know or contribute the improvement right away - wasting resources is something gix definitely doesn't fancy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed feedback requested
Projects
None yet
Development

No branches or pull requests

2 participants