Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNG-5862] Support XML entities and XInclude #1205

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

gnodet
Copy link
Contributor

@gnodet gnodet commented Jul 18, 2023

This PR is built on top of #1245 which can be integrated separately. This feature could be opt-in or even extracted as a separate extension (hence the PR to give the needed support).

Note that the tests do ensure that any POM installed/deployed is a standalone one, i.e. the xinclude / entities are inlined during the consumer pom transformation step.

Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR only supports external entities, not XInclude?

There might be some security implications here. There certainly will be people who claim there are.

This likely needs doc changes as well.

@gnodet
Copy link
Contributor Author

gnodet commented Jul 18, 2023

I think this PR only supports external entities, not XInclude?

Both, but the code for XInclude support is in third-party library for now: https://github.com/gnodet/stax-xinclude
Also, the IT uses xinclude at https://github.com/apache/maven-integration-testing/pull/279/files#diff-f29a4cbe7d6f8b4fb5f7534e23bfe9622d437605cf1df65821d94389d2efe5e6R29

There might be some security implications here. There certainly will be people who claim there are.

Those new options are conditioned by reading a local file with strict mode (which means not when loading a POM from a dependency, i.e. only for projects being built). If needed, we could use a custom resolver and restrict to loading entities / xinclude from the file system, or even the project tree, but I'd like to avoid any restriction if they are not needed. As the files are loaded from locally built POM files, the user should be in control of those files.

This likely needs doc changes as well.

Yes, I'll try to find where...

@gnodet gnodet force-pushed the dtd-xinclude branch 3 times, most recently from 2950c9f to 6d99605 Compare July 19, 2023 09:49
@gnodet gnodet requested a review from cstamas July 20, 2023 06:09
@gnodet
Copy link
Contributor Author

gnodet commented Jul 20, 2023

There might be some security implications here. There certainly will be people who claim there are.

Those new options are conditioned by reading a local file with strict mode (which means not when loading a POM from a dependency, i.e. only for projects being built). If needed, we could use a custom resolver and restrict to loading entities / xinclude from the file system, or even the project tree, but I'd like to avoid any restriction if they are not needed. As the files are loaded from locally built POM files, the user should be in control of those files.

I've modified the xinclude support so that it reuses the XMLResolver. This allows using a single implementation to load external stuff. I've thus modified it to reject any non relative URI, which means the code can only access files under the user's control, so that should be fine from a security perspective.

@norrisjeremy
Copy link

Is this wise?
XML parsers are littered with a long history of security vulnerabilities around loading crap remotely.
Introducing that into a build system seems scary to me.

@gnodet
Copy link
Contributor Author

gnodet commented Aug 22, 2023

Is this wise?
XML parsers are littered with a long history of security vulnerabilities around loading crap remotely.
Introducing that into a build system seems scary to me.

I tried to mitigate the risks by only loading from rejecting any absolute URI. Also, no file with such entities / xinclude import should end up in maven central, those are translated when installed / uploaded. So this should only happen at build time, for the pom.xml that are parts of the build. The security problems are then kinda in the hand of the developper I would think.

That's said, I'd like to find a consensus, and if it's seen as too risky, we could go with only mixins. That's the whole point of the discussion I started on dev.

@norrisjeremy
Copy link

I tried to mitigate the risks by only loading from rejecting any absolute URI. Also, no file with such entities / xinclude import should end up in maven central, those are translated when installed / uploaded. So this should only happen at build time, for the pom.xml that are parts of the build. The security problems are then kinda in the hand of the developper I would think.

That's said, I'd like to find a consensus, and if it's seen as too risky, we could go with only mixins. That's the whole point of the discussion I started on dev.

Would you consider adding a CLI option that a user can specify to Maven to explicitly tell it not to support remote entities / xinclude stuff (and possibly even default the option to having remote entities / xinclude off)?
That could help allay concerns, in that users would have to explicitly opt-in (or have an ability to explicitly opt-out) to enabling this new potentially risky behavior.

@gnodet gnodet force-pushed the dtd-xinclude branch 5 times, most recently from 2b731fe to e33a326 Compare August 29, 2023 07:12
elharo
elharo previously requested changes Aug 29, 2023
Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the library you wrote uses the org.apache.maven package, it needs to move into this repo (package protected?) or into another package or something before maven core can depend on it.

@gnodet
Copy link
Contributor Author

gnodet commented Sep 5, 2023

I tried to mitigate the risks by only loading from rejecting any absolute URI. Also, no file with such entities / xinclude import should end up in maven central, those are translated when installed / uploaded. So this should only happen at build time, for the pom.xml that are parts of the build. The security problems are then kinda in the hand of the developper I would think.
That's said, I'd like to find a consensus, and if it's seen as too risky, we could go with only mixins. That's the whole point of the discussion I started on dev.

Would you consider adding a CLI option that a user can specify to Maven to explicitly tell it not to support remote entities / xinclude stuff (and possibly even default the option to having remote entities / xinclude off)? That could help allay concerns, in that users would have to explicitly opt-in (or have an ability to explicitly opt-out) to enabling this new potentially risky behavior.

I've added a way to opt-out using -Dmaven.experimental.xinclude=false ...

@gnodet gnodet requested a review from elharo September 5, 2023 12:17
@norrisjeremy
Copy link

Would you consider adding a CLI option that a user can specify to Maven to explicitly tell it not to support remote entities / xinclude stuff (and possibly even default the option to having remote entities / xinclude off)? That could help allay concerns, in that users would have to explicitly opt-in (or have an ability to explicitly opt-out) to enabling this new potentially risky behavior.

I've added a way to opt-out using -Dmaven.experimental.xinclude=false ...

Thank you!

@gnodet
Copy link
Contributor Author

gnodet commented Sep 5, 2023

@elharo can I go ahead with this PR ?

Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced this is a good idea, but if it's going in, there are still some ciode issues to be addressed.

@gnodet
Copy link
Contributor Author

gnodet commented Sep 6, 2023

I'm still not convinced this is a good idea, but if it's going in, there are still some ciode issues to be addressed.

Well, I tried to kick a discussion on dev@ but not much interest there... The only concerns were the same that have been raised here around security, and while I agree it could have been a problem, I think those concerns have been addressed in the PR.
So I assume lazy consensus...

The idea of xinclude/xml entities is slightly redundant with pom mixins, so I would have expected that this would have been raised... This discussion was started mid-august, definitely not the best time to have people involved.

@norrisjeremy
Copy link

FYI, just from a simple user's perspective, it seems peculiar that "experimental" (maven.experimental.xxx) features are being enabled by default.

Typically I would expect a feature/option that a software developer considers "experimental" to be something that a user has to opt-in to enable, not opt-out to disable.

I'm not sure if that means these new features should be disabled by default, or if the options should be renamed to better indicate they aren't considered "experimental"?

@gnodet gnodet force-pushed the dtd-xinclude branch 5 times, most recently from a27c56a to fa0a99b Compare September 13, 2023 11:55
@gnodet gnodet added this to the 4.0.0-alpha-8 milestone Sep 14, 2023
@gnodet gnodet modified the milestones: 4.0.0-alpha-8, 4.0.0 Sep 26, 2023
@pdeneve
Copy link

pdeneve commented Jan 9, 2024

@gnodet Thanks for you work. Any plans on resolving the requested change?

@elharo elharo dismissed their stale review January 9, 2024 12:49

re-reviewing current version

@gnodet
Copy link
Contributor Author

gnodet commented Jan 9, 2024

One possibility would be to create an extension to support this feature. This should be possible as the entities/xinclude are only processed at build time and the consumer pom is flattened.

Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits.

More generally, rereading this I'm struck at how much work this is. You've provided a nearly complete XInclude implementation. The maintenance burden on this code is likely to be high. It feels like this belongs in the parser, or perhaps a separate library on top of the parser, and all Maven should do when reading a pom is flip the flag to turn on XInclude support.
Maven's XML parsing is already quite non-standard and avoids the normal parsers like Xerces or the JDK's because of decisions made 20 years ago when the easier path was not so obvious. This is unfortunately baked into a lot of the code and even the public APIs, so I'm not sure how much of that can be repaired now.

Nonetheless, I do wonder if this code belongs here instead of in the parser.

@@ -246,8 +246,7 @@ void testReadInvalidPom() throws Exception {
assertThat(pex.getResults().get(0).getProblems().size(), greaterThan(0));
assertThat(
pex.getResults(),
contains(projectBuildingResultWithProblemMessage(
"Received non-all-whitespace CHARACTERS or CDATA event in nextTag()")));
contains(projectBuildingResultWithProblemMessage("expected START_TAG or END_TAG, not CHARACTERS")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't go further than asserting the message is non-null. Messages are informative, not part of the stable API

}

XMLStreamReader parser;
// We only support xml entities and xinclude when reading a file in strict mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: external general entities and XInclude

@@ -45,6 +45,12 @@ public interface ModelReader {
*/
String INPUT_SOURCE = "org.apache.maven.model.io.inputSource";

/**
* Name of the property used to store a boolean {@code true} if XInclude supports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supports --> support

/**
* Evaluates the XPointer on the root Element and returns the resulting Element or null.
*
* @return an Element from the resultant evaluation of the root Element or null if evaluation fails
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete resultant

int endChar;

// Find an NCName if it exists?
startChar = schemeData.indexOf("/");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declare here

package org.apache.maven.stax.xinclude;

/**
* This class represents Exceptions that can happen during parsing an XPointer Expression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during --> while

* This class represents the data type NCName use for XML non-colonized names.
*/
@SuppressWarnings("checkstyle:MagicNumber")
class NCName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot to reinvent and maintain for one function. Maybe we need this to avoid extra dependencies, but I'm sure this already exists in Xerces, XML Commons, or any of several other projects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reused the woodstox methods.

@@ -62,6 +64,18 @@ public static boolean buildConsumer(@Nullable Session session) {
return buildConsumer(session != null ? session.getUserProperties() : null);
}

public static boolean xinclude(@Nullable Properties userProperties) {
return doGet(userProperties, XINCLUDE, true);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still advocate for this being off by default (opt-in) instead of on by default (opt-out) since XML parsers have a long history of security vulnerabilities surrounding this stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@gnodet
Copy link
Contributor Author

gnodet commented Jan 11, 2024

Mostly nits.

They should be all fixed now.

More generally, rereading this I'm struck at how much work this is. You've provided a nearly complete XInclude implementation. The maintenance burden on this code is likely to be high. It feels like this belongs in the parser, or perhaps a separate library on top of the parser, and all Maven should do when reading a pom is flip the flag to turn on XInclude support. Maven's XML parsing is already quite non-standard and avoids the normal parsers like Xerces or the JDK's because of decisions made 20 years ago when the easier path was not so obvious. This is unfortunately baked into a lot of the code and even the public APIs, so I'm not sure how much of that can be repaired now.

Nonetheless, I do wonder if this code belongs here instead of in the parser.

We switched to a standard StaX parser in 4.x. The xml parser is now Woodstox, which is a fully conformant parser. Unfortunately, I don't know any StaX parser that provides support for XInclude, so I ended up writing one, borrowing the XPointer implementation from Apache Woden. I'd rather keep it separated from the parser so that we can eventually switch to Aalto which is slightly faster than Woodstox, but there were a few problems when I tried initially.

As for the location, I originally put it in an external repository (https://github.com/gnodet/stax-xinclude), but you asked to move it back in this repository. I agree, this library is independent of Maven and would be better outside. I can maintain it as a personal project (and rename the package) if you think it's better, I don't really mind.

@elharo
Copy link
Contributor

elharo commented Jan 11, 2024

A personal project won't really work, but is there any way this can go into Woodstox or the Apache XML project or something like that?

@gnodet
Copy link
Contributor Author

gnodet commented Jan 11, 2024

A personal project won't really work, but is there any way this can go into Woodstox or the Apache XML project or something like that?

@cowtowncoder ? @scantor ?

@scantor
Copy link

scantor commented Jan 11, 2024

A personal project won't really work, but is there any way this can go into Woodstox or the Apache XML project or something like that?

@cotowncoder ? @scantor ?

Umm. Maybe I was referenced by mistake, but I'm not sure of my relevance to this? (If this is the context of the Apache Xerces project, while I'm on the PMC, I'm a committer on the C++ side, I stopped using the Java version a long time ago, and use the JDK for that exclusively now.)

As a maintainer of software that uses Maven itself, I certainly have "opinions" but I think you all addressed the main one, which is to turn this off by default.

All I can say is, please don't ever enable this by default. Ever. I say that as somebody that's worked with XML for 25 years and has been fixing security bugs in the use of it for about 20. Maybe that's why you asked me?

@delanym
Copy link

delanym commented Jan 11, 2024

The stigma surrounding xinclude is really quite infuriating. Is there a feature more maligned than this? The disparity between its notoriety and its simplicity is something almost poetic.

The fact is pom files already compose in at least 3 other ways - so if there's some "security principle" at play its already broken. It could be argued the whole purpose of pom files is to compose.
Whether processing xinclude introduces risks has nothing to do with XML per se (a standard designed to be as open and dynamic as possible so no wonder there's so many horror stories) and everything to do with this implementation. You'd have to be half-mad not to use this one anyway.

I'm also not sure that xincludes will really work in practice, but I've yet to see why not.

@scantor
Copy link

scantor commented Jan 11, 2024

The stigma surrounding xinclude is really quite infuriating. Is there a feature more maligned than this?

Yes, entities, deservedly so. That was actually more my concern. I have no background on what this issue is about or why I was asked about it, but apologies for not clarifying which of the two I was expressing the opinion about. I know very little about XInclude as it's very poorly supported in general, it came too late, so I've generally ignored it.

My opinion is that it's not sensible for Maven to even consider including a one-off implementation of any XML standard addendum. But that's a decision for its maintainers.

@cowtowncoder
Copy link

I remember thinking about XInclude at some point wrt Woodstox, but it seemed rather complicated to support for a streaming parser. So likely it'd instead need to be some processor on top of generic Stax (or SAX interface), in which case Woodstox as well as any other compliant implementation would work.

I don't think I personally have time to work on such thing, although I can see how it'd be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants