Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Support forcing a specific language from the command-line #3417

Merged
merged 7 commits into from Jul 31, 2021

Conversation

aidan-harding
Copy link
Contributor

@aidan-harding aidan-harding commented Jul 23, 2021

Describe the PR

Store the language version provided by a new command-line argument -force-language and use that as the default language if determining by file extension doesn't work (e.g. no language module is responsible for the file extension currently analyzed)

The -language parameter from the CLI is currently being ignored. It it meant to be used together with -version (see #3426). The language used to analyse a file is based on the file extension. Each language modules knows about its own extensions it is responsible for. If no language module is found, then PMD falls back to a default language, which is currently "java".

This makes it hard to analyse XML files which use the "wrong" file extension (not ".xml"). With -force-language one can change the default language of PMD so that it doesn't fall back to "java" but to the specified language.

Note, that this doesn't affect the way, how PMD searches for files to be analyzed: As the extensions are not changed, specifying a directory with -d together with -force-language won't automatically search for new file extensions. You need to call PMD for each file individually or use -filelist.

The fix here records the language as specified on the command line (if any), and then uses that as the default language for analysis. If no language is supplied, the original behaviour is maintained - falling back to java.

Note: I was only able to add a fairly simple unit test for the first part of this. Running the whole end-to-end CLI is tricky. However, I built with Maven and the original tests work. It may be good to add more tests to net.sourceforge.pmd.it.AllRulesIT once we have other standard rulesets which leverage this functionality.

NB this is my first PR on PMD, so apologies if anything is missing/wrong!

This is highlighted in #1540 and #2133

Related issues

Ready?

  • [ somewhat ] Added unit tests for fixed bug/feature
  • [ yes ] Passing all unit tests
  • [ yes ] Complete build ./mvnw clean verify passes (checked automatically by github actions)
  • [ yes ] Added (in-code) documentation (if needed)

@pmd-test
Copy link

pmd-test commented Jul 23, 2021

1 Message
📖 This changeset changes 0 violations,
introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report
This changeset changes 0 violations,
introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report
This changeset changes 0 violations,
introduces 1724 new violations, 0 new errors and 0 new configuration errors,
removes 2147 violations, 0 errors and 0 configuration errors.
Full report
This changeset changes 0 violations,
introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report

Generated by 🚫 Danger

Copy link
Member

@rsoesemann rsoesemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat small fix that enables a ton of new possibilities. Love it. Thank you!

Would approve it but I need the main-tainers @adangel and @oowekyala to answer wether they are ok with the modified standard behaviour (see comment) and if more tests are needed.

@adangel
Copy link
Member

adangel commented Jul 24, 2021

Thanks for the PR!

Just to give a bit of background info, why and for what reasons this cli parameter was introduced in the first place: If a language supports more than one version (Java and Scala being the only languages with multiple versions) you could tell PMD, that - when it has discovered the language Java by extension - it should use a specific version instead of the default (latest) version (some rules are version dependent and would suggest constructs that are only possible in later versions of the language). It's also needed if you want to use java preview features.

I have to think a bit more about this as the other idea that was introduced with PMD 5 was, that it should be possible to have one ruleset with multiple languages and PMD should automagically do "the right thing". I guess, providing the language in that way limits the PMD run to exactly this language.

I'm not sure how important this feature is really (one PMD execution, single ruleset, multiple languages) and how often this is actually used.

Different languages also need different extra parameters to work correctly (besides the specific language version): for Java, it's auxclasspath, for Visualforce it's via environment parameters PMD_VF_APEXDIRECTORIES

@adangel adangel changed the title Respect the language version from the command-line [core] Respect the language version from the command-line Jul 24, 2021
@rsoesemann
Copy link
Member

rsoesemann commented Jul 24, 2021

@adangel thanks for providing this background. What if we instead introduce an extensions parameter per ruleset like this

<?xml version="1.0" encoding="UTF-8"?>
<ruleset xmlns="http://pmd.sourceforge.net/ruleset/2.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" name="Default ruleset used by the CodeClimate Engine for Salesforce.com Apex" xsi:schemaLocation="http://pmd.sourceforge.net/ruleset/2.0.0 https://pmd.sourceforge.io/ruleset_2_0_0.xsd">
   <description>Default ruleset</description>

    <rule name="CustomFieldNamingConvention" extensions=".object, .field-meta.xml" language="xml" message="Incorrectly named field" class="net.sourceforge.pmd.lang.rule.XPathRule">
        <properties>
            <property name="version" value="2.0"/>
            <property name="xpath">
                <value><![CDATA[(//CustomField | //fields)]]></value>
            </property>
        </properties>
    </rule>
</ruleset>

@aidan-harding I guess this would also perfectly cover your flow use case, right?

Copy link
Member

@oowekyala oowekyala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

To me this looks like a nice workaround until a fix for #461 is introduced. But I don't think we should aim for keeping this in in the long-term... If we consider all input files to be of the given language, then we can't safely ignore things like .gitignore or resource files that may be in the file tree. They will probably fail parsing, which is a minor inconvenience I guess, since the user can then correct it by excluding the files anyway. I guess, few users will be concerned.

I don't know either how much users actually rely on pmd's language discovery mechanism, but I personally think it's a nice feature. Its only problem is extensibility, like in many places in pmd... We can work on that as we go.

What if we instead introduce an extensions parameter per ruleset like this

Side note, I made a detailed proposal for this in #461 (comment), though its scope is broader. The idea is to allow each rule to match the set of files it wants instead of configuring necessarily per language/ruleset. Language and ruleset would then just provide defaults.

@rsoesemann rsoesemann self-requested a review July 26, 2021 07:26
@rsoesemann
Copy link
Member

@oowekyala thanks for mentioning #461. That looks like a solution to all issues @aidan-harding and I try to cover.

What is the status of this issue? Can it be worked on? based on your proposal? Any blockers?

Copy link
Member

@rsoesemann rsoesemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidentally clicked "Request for Re-Review"

@aidan-harding
Copy link
Contributor Author

Thanks for the feedback and context, everyone.

I agree that the ideal behaviour for a user would be able to point PMD at a set of source files in mixed languages and have it discover the right thing to do. So, a full solution to #461 would be amazing.

Hopefully, my changes will help in the interim, though. I've updated the PR as per @oowekyala's comments

@rsoesemann rsoesemann added this to the 6.37.0 milestone Jul 27, 2021
@aidan-harding aidan-harding force-pushed the language-detection branch 2 times, most recently from 1de6c61 to 1c2298d Compare July 27, 2021 15:37
@aidan-harding
Copy link
Contributor Author

@rsoesemann could you please run the validation workflow for this? I think I'm at the final version of the PR, so it would be good to have the workflow tests complete before @oowekyala looks at the code

Using a local build, I tried this in the scenario you cared about and I was able to check an object file from an MDAPI Salesforce project (Account.object) using an XML XPath rule.

@oowekyala
Copy link
Member

oowekyala commented Jul 29, 2021

@aidan-harding So @adangel and I chatted on Tuesday and agreed that a CLI argument to force use of a language is useful and we should merge it. However we'd like it to be more explicit, for instance, -force-language xml. Adding a new option also doesn't affect the existing -language.

Currently using -force-language behaves differently whether you mention a directory or a single file to -d. If you use a directory, file discovery is still using extensions, so -force-language is ignored. OTOH when you write -d <file> we can honor force-language, and ignore the extension altogether. I think it's an ok tradeoff to avoid complexifying -d, and you still can use a filelist instead of -d. I've been able to check that everything behaves like this locally, but this all needs to be documented.


So @aidan-harding I think this PR needs a slight rework before we merge it. Could you add a new option -force-language and relevant stuff on PMDConfiguration, so as not to use getInputLanguageVersion?

We maintainers can take care of adding some integration tests and updating the documentation.

  • Introduce -force-language and revert changes to -language behavior

For us:

  • Update documentation: https://pmd.github.io/latest/pmd_userdocs_cli_reference.html with the above explanation of interaction of -d and force-language
    • Bonus: add example for filelist option: find "$(realpath src)" -name "*.flow" > filelist
  • Extended unit tests: 2 files of different languages to test reset behavior. If the reset behavior is broken then we’ll probably have a parse error

@adangel adangel changed the title [core] Respect the language version from the command-line [core] Support forcing a specific language from the command-line Jul 30, 2021
@aidan-harding
Copy link
Contributor Author

That sounds like a decent way forward. But I wonder if we can make -force-language attempt to parse any file in the given directory when used with -d directory. We treat everything in the directory as potentially being the forced language. If anything fails for parsing reasons, we just silently skip it. Do you think that's a bad idea?

I do understand that, in general, you want PMD to be as smart as possible. So that I can run PMD on a directory containing a mix of files from different languages and some that aren't supported languages at all, and then it would use the right languages and skip other files. That's definitely the ideal.

Separating out -force-language to its own flag seems like an opportunity to define the behaviour of that flag on its own terms, so it would be nice to make it work with directories.

Store the language version provided by a -force-language command-line argument and use that as the default language before falling back to the filename
@aidan-harding
Copy link
Contributor Author

@oowekyala I've updated my changes to use a -force-language flag. In a separate commit (in case you don't like the idea), I've implemented my suggestion that -force-language should try to interpret everything in a directory as the given language and just fail softly on anything that it can't parse.

Copy link
Member

@adangel adangel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the PR!

I'm ok with your solution for scanning all files if a directory is given.

I'll let it run now. If the regression tester is back at "This changeset changes 0 violations,
introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.", then we know, that this change at least doesn't introduce any problems.

@adangel adangel merged commit f31cc46 into pmd:master Jul 31, 2021
@aidan-harding
Copy link
Contributor Author

Thanks, @adangel, I started to respond to your specific comments, then it turned out that I needed to refresh my browser to see that you'd already fixed up and merged the PR. And I see it's already in a release 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[xml] Allow to check Salesforce XML Metadata using XPath rules
5 participants