Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Remember language when using FileCollector.addFile #3971

Merged
merged 5 commits into from Aug 30, 2022

Conversation

adangel
Copy link
Member

@adangel adangel commented May 13, 2022

Describe the PR

  • remembers the language for each file, which is added via FileCollector.addFile(Path, Language)
  • uses a new internal LanguageAwareDataSource interface to have a DataSource with a language
  • I guess, this can be removed in PMD 7, as there we work directly on TextFile and not DataSource. Only the test might be useful.

Related issues

Ready?

  • Added unit tests for fixed bug/feature
  • Passing all unit tests
  • Complete build ./mvnw clean verify passes (checked automatically by github actions)
  • Added (in-code) documentation (if needed)

@adangel adangel added this to the 6.46.0 milestone May 13, 2022
@pmd-test
Copy link

pmd-test commented May 13, 2022

1 Message
📖 No regression tested rules have been changed.
Compared to master:
This changeset changes 0 violations,
introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report
Compared to master:
This changeset changes 0 violations,
introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report
Compared to master:
This changeset changes 0 violations,
introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report

Generated by 🚫 Danger

@oowekyala
Copy link
Member

Looking back at the original use case:

Note that I've found PMDConfiguration#setForceLanguageVersion, but if I explicitly set that to HTML then other relevant rules for that file type aren't executed, specifically for hybrid files like Visualforce which should be evaluated as both Visualforce and HTML.

The problem here is we want rules from several languages to apply to the same file. Whether you set the language to HTML via --force-language or via addFile(Path,Language), only HTML rules will be applied. So here we're duplicating the --force-language option with new code paths but without actually helping with the use case...

I think a better fix would be to fix TextFile#toDataSourceCompat so that the returned DataSource implementation actually also contains the LanguageVersion of the TextFile. Then we would typecheck and downcast to use this language version here:
https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/SourceCodeProcessor.java#L225

This makes the change more local to compatibility APIs, so it's easier to tear down in PMD 7. It also fixes addFile(TextFile), which is also broken.

This still doesn't help with the original use case. I think solving this requires a concept of "language dialects". VF would be a dialect of HTML, which means all HTML rules also apply to VF files. The change would apply here:

return rule.getLanguage().equals(languageVersion.getLanguage())

Instead of checking for equality we would check languageVersion.getLanguage().isKindOf(rule.getLanguage()), where VF.isKindOf(VF) and VF.isKindOf(HTML).

@adangel
Copy link
Member Author

adangel commented May 20, 2022

Ok, there are more use cases - I didn't notice the "the same file should be analyzed as two distinct languages". To be honest - this sounds very weird. I can understand, that one file can contain two languages (embedded), but the whole thing is only one (primary?) language at all times. Do I miss something?

This PR is only really trying to fix #3970 and nothing else.

I think a better fix would be to fix TextFile#toDataSourceCompat so that the returned DataSource implementation actually also contains the LanguageVersion of the TextFile. Then we would typecheck and downcast to use this language version here:
https://github.com/pmd/pmd/blob/master/pmd-core/src/main/java/net/sourceforge/pmd/SourceCodeProcessor.java#L225

Maybe - I tried to be minimally invasive and just make the existing API work without adding more APIs (e.g. changing DataSource).

I might have a look at this, but can't promise when - probably not for 6.46.0, time is not on my side.

addFile(TextFile), which is also broken.

True, this suffers from the same problem, that the language is ignored when converted into DataSource.

Just to reiterate, why forced-language is something different than I implemented here:
When you set --force-language=html (or PMDConfiguration::setForceLanguageVersion), then the language determination by extension is completely bypassed. This has the effect - if you have rules for java and rules for html in the same ruleset - that we try to parse java files as html. So, you can't analyze multiple languages in once pass when using force-language. And if you don't use it and add a file with a language, the language is ignored.

Anyway: What we IMHO should try to concentrate here on is these two API calls, which don't work as promised:

  • FileCollector::addFile(Path, Language)
  • FileCollector::addFile(TextFile)

@adangel adangel modified the milestones: 6.46.0, 6.47.0 May 20, 2022
@adangel adangel marked this pull request as draft May 20, 2022 06:44
@oowekyala
Copy link
Member

To be honest - this sounds very weird. I can understand, that one file can contain two languages (embedded), but the whole thing is only one (primary?) language at all times. Do I miss something?

The language might be a kind of another language (a dialect). Eg a pom.xml file might well have the pom language, and only that language. But obviously it's also an XML file, so rules for both pom and xml should apply. This is intuitive because the pom "language" is in fact XML. It is also easy to implement if the parsers are the same (which is the case for all our XML modules). With this we could implement rules that don't depend on a specific tag set (eg replacing <a></a> with <a/>), and hence apply to all XML languages. Without, these kinds of rules cannot be written without duplicating them.

This could also be useful for Visualforce as it's a dialect of HTML, but the parsers are different, so this is not an easy fix for #3952 (as I initially suggested here).

Maybe - I tried to be minimally invasive and just make the existing API work without adding more APIs (e.g. changing DataSource).
I might have a look at this, but can't promise when - probably not for 6.46.0, time is not on my side.

Fair enough. Let me know if I can help, but I think postponing it isn't a problem

Just to reiterate, why forced-language is something different than I implemented here:
When you set --force-language=html (or PMDConfiguration::setForceLanguageVersion), then the language determination by extension is completely bypassed. This has the effect - if you have rules for java and rules for html in the same ruleset - that we try to parse java files as html. So, you can't analyze multiple languages in once pass when using force-language. And if you don't use it and add a file with a language, the language is ignored.

Thanks for the explanation, it makes sense now

@adangel adangel modified the milestones: 6.47.0, 6.48.0 Jun 21, 2022
@adangel adangel changed the title [core] Remember forced language per file [core] Remember language when using FileCollector.addFile Jul 29, 2022
@adangel
Copy link
Member Author

adangel commented Jul 29, 2022

I've reworked this now to use DataSource to transport the languageVersion of the files added via FileCollector.addFile. The new interfaces classes are all internal API, so it should be transparent to any public API.

Note: on PMD 6, there is currently no public way to create a TextFile - other than implementing the interface. Since LanguageAwareDataSource is also internal API, it's probably not possible to use FileCollector.addFile(TextFile) at all correctly in PMD 6...

@adangel adangel marked this pull request as ready for review July 29, 2022 10:30
Co-authored-by: Clément Fournier <clement.fournier76@gmail.com>
@adangel adangel modified the milestones: 6.48.0, 6.49.0 Jul 30, 2022
@oowekyala oowekyala self-requested a review August 4, 2022 16:45
Copy link
Member

@oowekyala oowekyala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@adangel adangel merged commit bdde4b7 into pmd:master Aug 30, 2022
@adangel adangel deleted the filecollector-language branch October 7, 2022 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[core] FileCollector.addFile ignores language parameter
3 participants