Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible buggy handling of class / subclass modifier values #791

Open
JAelwynTyler opened this issue Aug 28, 2023 · 9 comments
Open

Possible buggy handling of class / subclass modifier values #791

JAelwynTyler opened this issue Aug 28, 2023 · 9 comments

Comments

@JAelwynTyler
Copy link

The current version (4.8.162) appears to have two (probably-related) issues:

  1. The ACC_SUPER flag in a classfile is treated as a normal modifier value and stored on the ClassInfo object but not handled in any other way (that I can see, at least). This might or might not be correct depending on the intent, but is certainly confusing (and potentially means the JavaDoc on ClassInfo.getModifiers is misleading, as it implies that values on java.lang.reflect.Modifier correspond).
  2. Subclass modifiers appear to be "contaminated" by the modifier flags for their containing superclass in at least some cases. Specifically, "protected" subclasses of public classes have the 0x0020 (ACC_SUPER / synchronized), 0x0004 (protected), and 0x0001 (public) flags set, but private subclasses appear to have only 0x0020 and 0x0002 (private) set.

Code to replicate:
`
package mypkg;

import io.github.classgraph.*;

public class DummyClass {
public static class DummyPublicSubclass { }
protected static class DummyProtectedSubclass { }
private static class DummyPrivateSubclass { }

public static void main(String[] args) {
		ClassGraph cg = new ClassGraph()
				.acceptPackages("mypkg")
				.enableAllInfo()
				.verbose()
				;
		ScanResult scan = cg.scan();
		ClassInfoList allClassInfos = scan.getAllClasses();
		System.out.printf("Class count: %d%n", allClassInfos.size());
		
		for (ClassInfo classInfo : allClassInfos) {
			System.out.printf("Class: %s, Modifiers=0x%04X%n", classInfo, classInfo.getModifiers());
		}
	}

}
`

Verbose log attached: classgraph-innermods-mre-log.txt

Expected results:

  • DummyClass should be marked public and depending on intended behavior should either not show 0x0020 in its modifiers or the mods-to-string conversion should render it as ACC_SUPER (and the JavaDoc for getModifiers should be updated to indicate that it reflects access_flags rather than strictly 'modifiers' for non-inner classes).
  • DummyPublicSubclass should have a modifier of 0x0009 ( static | public ).
  • DummyProtectedSubclass should have a modifier of 0x000C ( static | protected ).
  • DummyPrivateSubclass should have a modifier of 0x000A ( static | private ).

I'll see what I can do to track down the specific point causing this to happen, and offer a pull request if I can manage to pin it down and the changes to fix it aren't excessively invasive, but I'm not sure what the intent is with regard to ACC_SUPER handling.

@JAelwynTyler
Copy link
Author

JAelwynTyler commented Aug 28, 2023

Interestingly, the "bad" behavior appears to be invoked twice, in close proximity: Classfile.java lines 486 (initial creation of the ClassInfo, which appears to pass the modifiers of the file) and line 489 (which sets them directly on the resulting ClassInfo object). The correct values are then provided later through ClassInfo.addClassContainment, invoked at line 508.

Possibly noteworthy: it seems like the Classfile object should have only the main class name available to it where the call is claiming to come from, but my debugger was showing a subclass name (but a different one each time, so it clearly isn't stable). Going to try to dig into this further and see if it is some sort of threading weirdness…

Looks like it is simply using the values directly from the .class file for the subclass, rather than honoring the override. This might potentially even work fine if "setModifiers" did what "setX" is usually expected to do in a Java API (to wit, unconditionally set the value, as opposed to what it currently does, which is OR the provided value with the current value).

@lukehutch
Copy link
Member

Hi, thanks for the report, and you're right to be looking in Classfile.java. Line 1251 is where the class modifiers are read.

A PR would be greatly appreciated, since I'm not totally following what the issue is!

@JAelwynTyler
Copy link
Author

JAelwynTyler commented Aug 29, 2023

Hi, thanks for the report, and you're right to be looking in Classfile.java. Line 1251 is where the class modifiers are read.

A PR would be greatly appreciated, since I'm not totally following what the issue is!

The big one is that it is using the access_flags value (0x0021) from DummyClass$DummyProtectedSubclass.class when creating the ClassInfo object. The logic around line 508 which handles class containment entries later calls setModifiers with 0x000C, which is actually the correct value — but this is ORed with the initial 0x0021 to get a net result of 0x002D, rather than overwriting the initial (bad) value.

So the obvious possible fixes would be to:

  1. Make "setModifiers" do what it claims on the tin (set them, not add to them), which could have all sorts of unexpected consequences elsewhere if that isn't the intended behavior,
  2. Ignore the initial access_flags values when creating a ClassInfo that represents an inner class, or
  3. Zero them just before the logic that is intended to deal with inner class permissions.

Unfortunately, this particular bit of code is complex enough (by necessity, from what I can see) that it isn't entirely clear to me which of the above is actually likely to be the cleanest. Option #1 (presumably in combination with adding an "addModifiers" method) certainly appeals to the purist in me, but that also seems like a pretty massive and probably gratuitous API break.

Basically, you can blame Sun for not doing a particular good job of making the classfile format definition particularly consistent when dealing with inner classes… if compiled into their own classfile, they get an access_flags value (because they have to), but that value is irrelevant because an inner class's access controls are read from an attribute and have a completely different set of valid flag values.

@JAelwynTyler
Copy link
Author

Another point of interest: the initial access_flags value is loaded in readBasicClassInfo and then used to make decisions about whether to pay attention to the class based on visibility. Based on the behavior of private subclass flags, I think the intended idea of protected subclasses being marked with the public flag at the file level is that they have the potential to be visible to something outside the same class (specifically, if the containing class is exposed and not final, something in another package can extend the containing class and thus see it), and so they can't simply be treated as "ignored outside the package where they appear" (which is basically what the public/non-public flag at the base of a classfile indicates).

But that also means that the logic about whether they are considered visible without "ignoreClassVisibility" does not actually act quite according to what the JavaDoc says — although one can argue that the net result might actually be more useful to someone trying to search for "public" classes in terms of "what do I expose as an API?", which is actually the task that I'm working on at the end of the day.

My current thinking is that the best answer right now, to avoid breaking a long-established (if poorly named) API method, is to:

  • Leave the behavior of "setModifiers" alone (though I strongly suggest reviewing it for 5.x and considering splitting it into set/add/remove),
  • Add a "removeModifiers" call to ClassInfo that can unset a combination of modifiers that may currently be set (basically AND-ing the current value with the inverse of the provided bitmask), and
  • Invoking removeModifiers to "clear out" the stray modifiers on inner classes just before setting their modifiers to the correct values.

That seems like it should fix the biggest problem with subclass modifiers without much risk of blowing up lots of other stuff or introducing "nasty hacks" that would cause problems for future maintenance. Expanding the removeModifiers call to other member types would be simple to do if there ever turns out to be any call for it, and would be consistent with any likely future change to the API (unless that change is so profound that it makes the whole question moot).

@lukehutch
Copy link
Member

I'm not sure what you're getting at here... setModifiers is package-private, it is not exposed publicly. It's just used internally to make sure that the right modifier bits are set. If there is an error in the logic that sets these bits, then please submit a PR to fix that. But a user of the library can't modify these bits.

@JAelwynTyler
Copy link
Author

I'm not sure what you're getting at here... setModifiers is package-private, it is not exposed publicly. It's just used internally to make sure that the right modifier bits are set. If there is an error in the logic that sets these bits, then please submit a PR to fix that. But a user of the library can't modify these bits.

Whoops, didn't look closely enough to notice that. I'll do a bit of digging and check how widespread the impacts of such a change might be, but won't have time until (at least) tomorrow. Thank you for being responsive, however.

@lukehutch
Copy link
Member

The general picture is that ClassInfo and the other ...Info classes are designed to be read-only. There would be no point in making them mutable.

It is very possible that something is wrong with how I'm parsing or setting the modifier bits... I had to make my best guess with a lot of things when I wrote the classfile parser. Especially for JVM languages other than Java.

The reason for OR-ing the modifier bits together is that in Kotlin, and I think Scala too, you can have two classes generated from one class, but they technically represent the same class. So in this case, the modifiers are OR'd together, and all the other ClassInfo fields are merged.

@JAelwynTyler
Copy link
Author

The general picture is that ClassInfo and the other ...Info classes are designed to be read-only. There would be no point in making them mutable.

It is very possible that something is wrong with how I'm parsing or setting the modifier bits... I had to make my best guess with a lot of things when I wrote the classfile parser. Especially for JVM languages other than Java.

The reason for OR-ing the modifier bits together is that in Kotlin, and I think Scala too, you can have two classes generated from one class, but they technically represent the same class. So in this case, the modifiers are OR'd together, and all the other ClassInfo fields are merged.

Ah. That explains the OR-ing, then, and definitely suggests that the right answer is to treat inner classes as a special case. Specifically, the permissions for an inner class are not defined by the access_flags in the classfile, but by the inner_class_access_flags within the inner class array in the InnerClasses attribute and are not additive. From what I can tell, this is calculated handled by the ClassContainment instance and propagated into the ClassInfo object at at ClassInfo.java:537), and the values being set at that point are correct — they're just "contaminated" by being OR-ed with the access_flags values.

One big hint that tipped me off here is that table 4.7.6-A (the possible flags for an inner class) does not include any flag with a value of 0x0020, and the spec requires that all bits not otherwise specified are reserved and should be zeroed / ignored. From what I can tell, the cleanest way to map all of this seems to be roughly:

  • Continue to store the access_flags value in Classfile (this makes sense, as Classfile "maps" the concept of an actual class file, which is the scope an ACC_SUPER bit is relevant at).
  • Don't propagate the ACC_SUPER bit to any inner class. Arguably not propagating it to any class is reasonable, as it isn't really a "class" concept, except by historical accident of abusing an otherwise-unused bit in a creative way early in the specification's history.
  • Continue to use the ACC_PUBLIC bit from access_flags to determine whether something should be considered "visible" (as currently happens around Classfile.java:1272). For a top-level class this accurately maps the only two possible states ("public" or "not public", i.e. package-protected); based on the generated class files for the example code, the ACC_PUBLIC bit is set for public or protected inner classes on a non-final class but not on package-protected or private inner classes, or on inner classes that are protected but on a final class. That means it is being interpreted by the compiler / loader as "is this inner class potentially visible outside the package?" and used as a 'quick filter', but relying on later access checks to take care of making sure that protected inner classes are only accessed by things that can actually see them.
  • Don't propagate the ACC_PUBLIC bit in access_flags to inner classes. This is the key change.

The logic above is also backed up by another subtle hint in the specification: while table 4.7.6-A (inner class permissions) defines ACC_PUBLIC as "Marked or implicitly public in the source", table 4.1-B (class permissions) defines ACC_PUBLIC as "Declared public; may be accessed from outside its package." For top level classes, those two things are synonymous, but for inner classes they are not… this appears to be a "bug" in the spec where it was just never updated to account for the possibility.

@lukehutch
Copy link
Member

Are you able to submit a PR that implements this, please? The time I can invest into ClassGraph is very limited at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants