-
Summary 💡I have working code using However I would like to add tracking through symlinks. Git just models symlinks as blobs that contain a path with a bit to indicate it's intended to be a symlink. This means if commit 1 creates symlink A to file B, and then commit 2 modifies B, in the diff I only see a modification of B, not a modification of A. But I want to consider it a modification of A for purposes of determining if the user regex matches the commit. Further complicating things branches point to the tip, and the natural way to iterate is backwards through history, but to incrementally process symlink state from scratch we would need to start at the beginning of the repo state. So my plan instead is:
The vast majority of commits do not modify symlinks, so the number of distinct hash table states I have should be small enough to fit in memory. At the end, every commit would have one of these tables associated with it, so that when I compute diffs I can lookup if any of the files changed have symlinks that match the regex pointing at them. Questions:
Motivation 🔦The code base I'm running this on switched from using files that are edited in place to a crazy rats nest of symlinks in the middle of its history :( |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
(converted issue to Q&A as I don't think it's actionable as an issue)
Gaining a 10x speedup per core seems to be an indication that the general strategy isn't the worst! That said, an object cache might be useful to avoid having to decode the same object multiple times if you are not using one already.
As the iterator inherently flattens the commit graph, there doesn't seem to be an easy way to get the control you need without at least dropping down to Maybe in doing so you would discover some sort of pattern that could be provided by the base implementation as well, allowing the an actual enhancement to the |
Beta Was this translation helpful? Give feedback.
(converted issue to Q&A as I don't think it's actionable as an issue)
Gaining a 10x speedup per core seems to be an indication that the general strategy isn't the worst! That said, an object cache might be useful to avoid having to decode the same object multiple times if you are not using one already.