Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The scripts folder in plugin-packages is prone to causing trouble #94

Open
ckreibich opened this issue Mar 2, 2021 · 5 comments
Open

Comments

@ckreibich
Copy link
Member

The scripts folder in Zeek packages that have plugins has subtle properties that can trip up package developers in unintuitive ways. We've touched this in the past in various contexts but not really discussed solutions, hence this ticket. It could also live in https://github.com/zeek/zeek-aux/ given that's where init-plugin lives atm — I wasn't sure.

Context: Zeek packages with plugins include two sets of Zeek scripts:

  • one set that resides with the plugin, to provide initialization that the plugin code requires (e.g. a script-level type that the native plugin code requires). Zeek's plugin manager explicitly loads these when hooking up a plugin. Zeek's cmake code for managing plugins assumes such scripts reside in the scripts folder.

  • The "usual" package-level scripts that any Zeek package provides. zkg installs these via its package management, and loads them via the packages.zeek file it maintains for capturing package load state. By convention, many packages also use scripts as the location for such scripts, via script_dir = scripts in zkg.meta.

If $PREFIX is the Zeek installation folder, the former set ends up in $PREFIX/lib/zeek/plugins/packages/<package>/scripts (which Zeek also adds to its script search path), and the latter in $PREFIX/share/zeek/site/packages/<package>.

In the package layout established via init-plugin, a nested scripts hierarchy addresses this, as follows: there's a nested folder structure scripts/<namespace>/<package>, where plugin-related scripts should go into scripts/, and the package-level ones into the package subdirectory. The whole tree gets installed into the plugin location, and since the template specifies script_dir = scripts/<namespace>/<package>, only the innermost folder gets installed as part of zkg's own script management.

The script_dir line is crucial. If you're not careful about specifying the right folder, two problems can arise:

Double-loading of Zeek scripts

If you say script_dir = scripts in zkg.meta, the set of scripts in that folder gets installed twice, once with the plugin, and once as managed by zkg. When you don't rely on zkg's package mangement, you don't notice this. In the following, I haven't touched local.zeek, so I'm not using zkg's packages.zeek. An example of a package that has this problem is amzn/zeek-plugin-bacnet:

$ zkg list
/home/christian/devel/zeek/packages/amzn/zeek-plugin-bacnet (installed: master) - Plugin that enables parsing of the BACnet standard building controls protocol
$ zeek -NN
...
Zeek::BACnet - BACnet protocol analyzer (dynamic, no version information)
    [Analyzer] BACnet (ANALYZER_BACNET, enabled)
    [Event] bacnet
$ zeek -r some.pcap
$

But when you now add zkg's package-loading, things break:

$ zeek -r some.pcap packages
warning in /home/christian/inst/opt/zeek/lib/zeek/plugins/packages/zeek-plugin-bacnet/scripts/./consts.zeek, line 8 and /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 9: redefinition requires "redef" (BACnet::bvlc_functions)
warning in /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 23: Duplicate identifier documentation: BACnet::bvlc_functions
warning in /home/christian/inst/opt/zeek/lib/zeek/plugins/packages/zeek-plugin-bacnet/scripts/./consts.zeek, line 26 and /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 27: redefinition requires "redef" (BACnet::ipv6_bvlc_functions)
warning in /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 41: Duplicate identifier documentation: BACnet::ipv6_bvlc_functions
warning in /home/christian/inst/opt/zeek/lib/zeek/plugins/packages/zeek-plugin-bacnet/scripts/./consts.zeek, line 44 and /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 45: redefinition requires "redef" (BACnet::apdu_types)
warning in /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 53: Duplicate identifier documentation: BACnet::apdu_types
warning in /home/christian/inst/opt/zeek/lib/zeek/plugins/packages/zeek-plugin-bacnet/scripts/./consts.zeek, line 56 and /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 57: redefinition requires "redef" (BACnet::confirmed_services)
...
error in /home/christian/inst/opt/zeek/lib/zeek/plugins/packages/zeek-plugin-bacnet/scripts/./main.zeek, line 36 and /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./main.zeek, line 37: already defined (BACnet::ports)
error in /home/christian/inst/opt/zeek/lib/zeek/plugins/packages/zeek-plugin-bacnet/scripts/./main.zeek, line 46: already defined (BACnet::bytes_to_count)
fatal error in error: Type::AsFuncType (error/func) (error)
Aborted (core dumped)

This happens because the files live in separate paths, and Zeek has no way of knowing that the loaded files are actually identical.

Missing logs

If you still use script_dir = scripts in zkg.meta despite the nested-folder structure mentioned above, then in many situations things seem fine, since the package-level scripts bootstrap via __load__.zeek, which for many plugins does nothing (it might not exist at all for the plugin). But odds are the plugin also defines additional functionality (often logging) via the __load__.zeek in the nested package directory, which now never gets loaded. mitrecnd/bro-http2 is an example that has this problem. Unless you explicitly load the nested folder, no http2.log appears.

Weird other problems

Since the entire script tree gets copied recursively into the plugin tree and Zeek adds that folder to the search path, you can end up accidentally sourcing scripts from the plugin space even though you're thinking of package-level scripts only. The init-plugin template has this concept of namespacing (you need to provide a namespace and a package name to instantiate), but at the script layer that namespacing doesn't really apply as you'd expect. zkg installs the scripts under the git-repo-derived name, not <namespace>/<package>. For example, with Community ID, the scripts ends up in zeek-community-id, not Corelight/CommunityID. But, confusingly, you can actually load them via @load Corelight/CommunityID, because that structure happens to exist in the plugin tree. All this can be hard to track down if you're not deeply familiar with this stuff.

I went through our standard package source and currently a bit under half of the Zeek packages with plugins in the standard package source have these problems. (For the Amazon ones I've just submitted PRs to get these fixed, but it's not a sustainable approach...) It affects our own too -- zeek/spicy-tftp has this problem, for example, so it's clearly something that's easily overlooked.

So the question is what we should do about this. Several approaches come to mind:

  • Alter the script layout in the package template so the collision is less likely. For example, you can move all of the plugin functionality into its own directory, so you end up with this kind of tree that came up over in Templating support for zkg #93:
$ tree
.
├── plugin
│   ├── scripts
│   │   ├── __load__.zeek
│   │   ├── __preload__.zeek
│   │   └── types.zeek
│   └── src
│       ├── config.h.in
│       ├── foobar.bif
│       ├── Plugin.cc
│       └── Plugin.h
├── scripts
│   ├── __load__.zeek
│   └── main.zeek
  • Adapt the cmake logic that manages plugin content, so it no longer automatically uses scripts, or perhaps warns if the scripts folders seem to overlap/collide.

  • Change plugin-level script loading more broadly. I'm not sure plugins really need to source __load__.zeek or add the folder to the generic search path? The plugin manager simply first loads scripts/__preload__.zeek, then a bif subdirectory, then scripts/__load__.zeek. Perhaps we could simplify?

Let me stop here. Do you guys have thoughts on how best to improve this? (Sorry for the length here, argh...)

@jsiwek
Copy link
Contributor

jsiwek commented Mar 3, 2021

I went through our standard package source and currently a bit under half of the Zeek packages with plugins in the standard package source have these problems. (For the Amazon ones I've just submitted PRs to get these fixed, but it's not a sustainable approach...) It affects our own too -- zeek/spicy-tftp has this problem, for example, so it's clearly something that's easily overlooked.

That's interesting to find that many have potential issues, thanks for looking. Any guesses/commonalities about how they came to be that way?

I recall the initial init-plugin for some time did not output any zkg.meta (bro-pkg.meta), that may have been around 2.6, and the first plugin-based packages I created even suffered this pitfall and I learned the hard way. An interesting thing to know could be how many of the existing packages with this issue were created before init-plugin offered a guiding zkg.meta ? If it's a high number, it at least helps reduce worry that it's a rampant mistake people make and more of a historical relic that might phase-out/minimize over time even if we did nothing further.

  • Alter the script layout in the package template so the collision is less likely. For example, you can move all of the plugin functionality into its own directory, so you end up with this kind of tree that came up over in Templating support for zkg #93:

This seems like a "simplest thing that works" for dealing with original confusion that likely led to picking wrong script_dir paths: just put the right scripts/ dir that people need to use at the top-level, making it the obvious choise. Maybe you'll still end up installing (and taring) things into a similar layout as before though? i.e. since people can use these plugins standalone and technically a load like @load Corelight/CommunityID works that way, then that should still work after any changes we do?

Also means there could still be the double-loading problem when installing via zkg and trying to use both @load Corelight/CommunityID and @load packages/@load zeek-community-id since there's two separate copies of those scripts. For that, I think zkg may be able to detect when script_dir is nested within plugin_dir and, if it is, then instead of creating a copy when installing script_dir, it makes a symlink that points into the location where plugin_dir gets installed.

Doing both those seems like relatively straightforward approach that should address the problem.

@ckreibich
Copy link
Member Author

Any guesses/commonalities about how they came to be that way?

It's hard to generalize, but from digging through git commit histories it usually looks like the packages didn't start out from the init-plugin template. Some started out as "pure" plugins and had the zkg packaging added later.

This seems like a "simplest thing that works" for dealing with original confusion

Cool.

Maybe you'll still end up installing (and taring) things into a similar layout as before though?

I couldn't quite follow you here — do you mean that despite the adjusted directory structure, people might still (accidentally) put a tree in place that could cause loads like @load Corelight/CommunityID to work even though they shouldn't?

I think zkg may be able to detect when script_dir is nested [...]

That's nice. It'd restore Zeek's existing ability to detect that a script has already been sourced, which is great.

Two related thoughts here, while we're on it:

I do wonder whether the plugin-level script need to be as open-ended and flexible as we currently make them. For example, do there really need to be scripts in there that the plugin manager's entry points don't cover? Do any of those scripts/folders need to become part of the script search path? I've not seen examples yet of packages that actually require this. Not critical — I'm just wondering whether we can simplify this a bit.

Do you think the existing namespaced load-paths (like Corelight/CommunityID) are worth supporting? I couldn't tell from the code whether they were more or less a side-effect of the nested folder structure, or a design goal. Personally, I really like that people can load packages via their package name (e.g. @load zeek-community-id), i.e. via zkg's folder management as opposed to the plugin's.

@jsiwek
Copy link
Contributor

jsiwek commented Mar 9, 2021

Maybe you'll still end up installing (and taring) things into a similar layout as before though?

I couldn't quite follow you here — do you mean that despite the adjusted directory structure, people might still (accidentally) put a tree in place that could cause loads like @load Corelight/CommunityID to work even though they shouldn't?

It was mostly unfinished thinking about expectations around @load Corelight/CommunityID like you mentioned:

Do you think the existing namespaced load-paths (like Corelight/CommunityID) are worth supporting? I couldn't tell from the code whether they were more or less a side-effect of the nested folder structure, or a design goal. Personally, I really like that people can load packages via their package name (e.g. @load zeek-community-id), i.e. via zkg's folder management as opposed to the plugin's.

Here's some questions/reasoning I go through:

  • Is there backwards compatibility concern (any stuff that suddenly breaks) if it's no longer supported?

Don't think so since whatever change is made doesn't impact how existing plugins work and easy enough to explain how new plugins created from latest init-plugin start to work.

  • Is @load Corelight/CommunityID an explicit requirement/design ?

I don't remember anything specific, maybe Robin has something. That scheme could help reduce chance of naming conflict/collision, but ultimately seems arbitrary and not something that has to be standardized -- hypothetically, if a user had ability to better customize installation-path of the scripts, it's maybe more useful if they were free to choose for themselves how to refer to a plugin's scripts in @loads rather than restrict it.

  • To where does a default, standalone plugin installation (without zkg) end up installing the scripts ?

I didn't understand this yet: the standalone plugin installation will have to install the pure-scripts somewhere and zkg will still rely on that for installation of plugin-specific parts, but also additionally still do its own separate install of the pure-script parts even though the standalone install logic does that, too?

So when using zkg, if there's two distinct copies of the pure-scripts that get installed, is the "double-loading" problem still a concern or what's done to minimize it?

I do wonder whether the plugin-level script need to be as open-ended and flexible as we currently make them. For example, do there really need to be scripts in there that the plugin manager's entry points don't cover? Do any of those scripts/folders need to become part of the script search path? I've not seen examples yet of packages that actually require this. Not critical — I'm just wondering whether we can simplify this a bit.

I'm not sure, but if changes there are not directly related/needed to address the particular problems in the original discussion, it may be easier to leave that for separate treatment.

@rsmmr
Copy link
Member

rsmmr commented Mar 9, 2021

No strong opinions here (other than that it'll be good to address this!), just a couple of notes:

  • I'd much prefer a solution that doesn't change Zeek's semantics of plugin loading; not because the current scheme is the best way to do it (probably not), but because I'd like to avoid ending up with different semantics for different Zeek versions. That line of thinking also suggests something similar for zkg: if we can address thist hrough better templating/guidance/warnings, rather than changing how zkg operates, that would be preferable.

  • Re/ the namespaced loading (@load Corelight/CommunityID) I believe the original motivation was indeed to avoid conflicts. I'd say retaining some kind of name spacing is valuable here. For example, what if I wanted to fork the community ID package? If I put that into rsmmr/zeek-community-id, but keep loading it through zeek-community-id, that would be confusing. Maybe a way in between would be to keep the namespace but combine it with the local part of the package name: @load corelight/zeek-community-id. Could also be something to discuss in the broader context of overhauling Zeek 'sscript/package namespacing.

@ckreibich
Copy link
Member Author

Great feedback folks, thank you. Sorry for taking a bit to follow up here.

To where does a default, standalone plugin installation (without zkg) end up installing the scripts ?

Yeah, my thinking here was as you say Jon — nothing needs to change for pure plugins because they don't have a problem. I think this is purely about finding something acceptable that avoids the dual-loads for zkg packages. This then also pretty naturally leads to not messing with plugin semantics, since they don't need to change. Agreed.

So it sounds to me like the path forward can be as you proposed, @jsiwek: combine plugin-tree relocation in the package source with nesting detection to enable symlinks.

I think this leaves your point about installing multiple versions, @rsmmr. This gets a bit hairy for me. A few thoughts:

  • Thinking of Python, there's no equivalent: a package just has a name (like zkg) — you need to figure things out yourself if you want multiple installations (via PYTHONPATH, virtualenvs, etc)
  • zkg currently has a notion of namespacing via Package.name_with_source_directory(), which gives you e.g. corelight/zeek-community-id when dealing with a package source, because the sources impose that structure. This is nice because it's automatic — you don't have to configure it in zkg.meta or some such. But there's no immediate equivalent when installing from local folders or an explicit git URL. For the latter relying on the directory name would normally work (that'd lead to rsmmr/zeek-community-id for you), and one could use something like local/zeek-community-id for the former. Maybe... not sure about this.
  • This would bring up a new issue of conflicting versions: something would need to ensure that you can't enable both the fork and the other version, since they would likely conflict in Zeek again.
  • As an aside, installing such forks currently doesn't work, zkg complains about the package name already existing. (I'm not saying that's a bug, I think it's just not something supported atm.)

Looking at the above, I'm inclined to leave this as-is and require explicitly separate configurations, Python-style, for those who need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants