Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reorganize into smaller packages #18

Open
josharian opened this issue Dec 29, 2019 · 3 comments
Open

reorganize into smaller packages #18

josharian opened this issue Dec 29, 2019 · 3 comments

Comments

@josharian
Copy link
Contributor

josharian commented Dec 29, 2019

I have been thinking about making a push to get this to v1.0. The main problem is the API. The current API is unwieldy: it is a blend of high level helpers and low level data structures, and doing anything useful with it requires a half dozen lines.

I propose to break this up into multiple packages.

The top level package (github.com/pkg/diff) will contain at most a few functions that do the most commonly requested operations, such as read two files and emit a unified diff. It may also contain a bunch of options that can be passed to those functions. The implementation will be mostly glue.

The subpackages will contain lower level data structures, algorithms, and helpers. These can be used by people who want fine-grained control over everything or who want to do something unusual (like me).

Here's a very rough, incomplete sketch of packages and exported funcs/types/vars, to give a flavor:

diff

Files
Readers

diff/edit

Script
Range
Op

diff/myers

Diff
Pair

diff/write

Options
Color
Names
WriterToA etc

diff/unified

Write

diff/sidebyside (new, WIP)

Write

diff/header (new, WIP, names are just placeholder mnemonics for me)

Regexp
LSP
GNU

Opinions on this general direction, @mvdan?

josharian added a commit that referenced this issue Dec 31, 2019
Part of #18.

There are some unfortunate API changes as part of this,
such as the new EditScriptWithContextSize,
since edit.Script is now part of a different package.

Those will get ironed out as the refactoring continues.
josharian added a commit that referenced this issue Dec 31, 2019
josharian added a commit that referenced this issue Dec 31, 2019
josharian added a commit that referenced this issue Dec 31, 2019
Part of #18.

I am very torn about having a single write package with a routine
per format:

write.Pair
write.Option
write.Names
write.Unified
write.SideBySide

vs having a package per format:

write.Option
write.Names
unified.Pair
unified.Write
sidebyside.Pair
sidebyside.Write

One motivation for separate packages is that the ideal interface
for different formats is different. For unified, WriteTo is high
performance, and we don't need more. For side by side, we need to be able
to inspect and even modify the line before writing, so it makes more sense
to have an interface specified in terms of []byte.
However, we can use a bytes.Buffer internally to bridge from WriteTo to []byte,
and most write options are shared between the formats.

So for the moment, try out a single write package.
Maybe Daniel (or someone else) will have a compelling reason
to go one way or another.

The adaptor API is getting ugly. It'll get fixed (mostly eliminated) soon.
josharian added a commit that referenced this issue Dec 31, 2019
package diff now has a simple, single-call API.
for more control, people can use the subpackages.

Part of #18.
@josharian
Copy link
Contributor Author

I've taken a stab at this in the reorg branch. It doesn't really make sense to make a pull request, since the diff is so large as to be inscrutable. Anyone who is interested/willing, please:

  • pull down the branch, and poke around a bit with godoc or go doc or the like
  • you can see the readme rendered at https://github.com/pkg/diff/tree/reorg
  • check whether your use of the module would be significantly nicer or worse as a result of this reorganization
  • read the commit message of a78b239, in which I agonize over an API design decision

Many thanks!

@mvdan
Copy link
Contributor

mvdan commented Jan 7, 2020

I definitely agree with the idea here. I think the vast majority of users won't care about diff algorithms or writing the glue code themselves. And I agree that mixing high-level and low-level APIs in a single godoc will be messy, at least as long as we don't have a good way to define sections of documentation.

As for a single write package versus many:

One motivation for separate packages is that the ideal interface for different formats is different.

I'm not sure I follow. I'd only split a single package into many if it was very large in API or source code, or if the qualified names got significantly better, or if the APIs didn't belong together at all.

It doesn't seem like we clearly fall under any of those. I don't think it's clear that we should use a single package either, but when in doubt, I think one package is fine. If v1 comes and goes and we end up with a much larger write package, perhaps it's clearer then that we could use multiple packages.

@mvdan
Copy link
Contributor

mvdan commented Jan 7, 2020

I also think that separate packages for diff algorithms makes sense, even though separate packages for "writing" algorithms doesn't necessarily. A diff algorithm can get really complex, to the point that we might need diagrams or step-by-step examples to explain what the code is doing. I don't think we'd ever get to that point with an algorithm to render a computed diff for humans.

josharian added a commit that referenced this issue Sep 14, 2020
i believe this may explain the significant performance gap
between this module and git's implementation.

good thing i'm doing all this digging while the api still has some flexibility.
eek. :)

updates #18, insofar as it means that i want to do some more implementing
before declaring this part of the api stable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants