Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Naming/InclusiveLanguage cop #9842

Merged
merged 3 commits into from Jun 23, 2021

Conversation

tjwp
Copy link
Contributor

@tjwp tjwp commented May 31, 2021

This pull request adds a new cop (Naming/InclusiveLanguage) that recommends the use of inclusive language instead of problematic terms.

Following this comment, the cop is generic and can target different terms based on configuration. Suggested replacements can be configured in addition to exceptions for allowed use of a term.

The implementation is inspired by some of the ideas found in https://github.com/flexport/inclusive-code/tree/main/ruby.

Auto-correct is not implemented because too much context is required to choose an appropriate replacement suggestion and many renames would be unsafe.

@tjwp tjwp force-pushed the tjwp-naming-inclusive-language branch 2 times, most recently from a3d3a6b to 1aeba9f Compare May 31, 2021 23:15
@bbatsov
Copy link
Collaborator

bbatsov commented Jun 1, 2021

Thanks for tackling this! Two points from me:

  1. Shouldn't this target only identifiers? I think that extending the check to filenames is a bit excessive.
  2. I think this cop can be used to suggest alternatives all sorts of poor identifier names, not just those related to inclusive language (e.g. confusing shorthands, commonly misspelled words and so on).

@tjwp
Copy link
Contributor Author

tjwp commented Jun 1, 2021

@bbatsov Thanks for taking a look!

  1. I was intentionally targeting entire codebases, comments included. My intent was to identify language that may be unwelcoming wherever it appears in a project. I'm okay with dropping the file path part. Terms in the file or directory name would like also be reflected in file contents.
  2. I'm hesitant to combine the existing implementation with other examples of poor identifier names because the reason that offenses are being flagged is very different. For example a commonly misspelled word may be for execution correctness, whereas the reason for this cop is inclusiveness. The configuration would likely need to be broken out by use case, or supply a reason for the offense with each term. Also there is the implementation difference of targeting just identifiers vs all content.

What do you think about a GoodIdentifier cop in addition to this cop? It could target identifiers specifically for confusing shorthands and common misspellings but some of the implementation and configuration approach could possibly be shared with the InclusiveLanguage cop?

@bbatsov
Copy link
Collaborator

bbatsov commented Jun 1, 2021

I get your point, but I'm concerned that some words like master have plenty of valid uses (e.g. Master of None, Master Boot Record, etc) and I don't want to generate lots of false positives for the users. I agree that whitelist and blacklist are universally bad, but for master I think that the context matters a lot. Just my two cents. @rubocop/rubocop-core What are your thoughts on this?

@koic
Copy link
Member

koic commented Jun 1, 2021

In the past, I worked on #7469. What I remember as a task at that time was searching for the target term in the text files with grep. So I didn't parse AST with a static analysis tool.
I'm wondering whether a static analysis tool (e.g. RuboCop) should have the feature, as it also needs to be detected in the documents that users see. Users will see more of documentation than code.
IMHO, for this solution, text search tool is better and faster than static analysis tool. The right tool for the right job :-)

@tjwp
Copy link
Contributor Author

tjwp commented Jun 1, 2021

@bbatsov To handle master without a lot of false positives, I think there a couple of different options:

  • Don't include that term in the default configuration. Users can create a custom configuration if they want to check for it.
  • Or ship with a default list of AllowedRegex values that includes examples like Master of None, Master Boot Record (what others?).
    Personally, I'm more invested in having the generic capability than shipping a specific configuration.

@koic I see your point that this cop does not rely on parsing the code. The benefit that I see to including it in RuboCop is everything else that Rubcop already provides. I.e. the tooling is already setup in build pipelines, possibly across hundreds or thousands of repos for large companies. Building something on top of grep or another static analysis tool would also require defining all of the configuration conventions, writing formatters for reporting offenses, inventing mechanisms for disabling offenses and managing TODO lists. Though this doesn't use the AST it benefits from everything else RuboCop provides.

@tjwp
Copy link
Contributor Author

tjwp commented Jun 4, 2021

I've removed the file path check, as a separate commit for now.

@tas50
Copy link
Contributor

tas50 commented Jun 8, 2021

This will be super handy in our project. Thanks for creating it @tjwp

@bbatsov
Copy link
Collaborator

bbatsov commented Jun 10, 2021

Overall I'm happy with the current code of the cop, but I want to make it more configurable - e.g. to have CheckIdentifiers, CheckStrings, CheckComments flags that make its operation a bit more granular. If we go this route we can also bring back the filename check as well.

Sorry about the slow feedback here - I had a couple of busy weeks on my end.

@tjwp
Copy link
Contributor Author

tjwp commented Jun 10, 2021

@bbatsov No problem! Thanks for the review. I'll start on making the requested changes.

tjwp added 2 commits June 15, 2021 20:35
Recommends the use of inclusive language instead of problematic terms.
@tjwp tjwp force-pushed the tjwp-naming-inclusive-language branch from 64b3b44 to 50990e4 Compare June 16, 2021 00:36
@tjwp tjwp force-pushed the tjwp-naming-inclusive-language branch from 50990e4 to c73b83a Compare June 19, 2021 11:41
@Drenmi
Copy link
Collaborator

Drenmi commented Jun 19, 2021

Just my two cents. @rubocop/rubocop-core What are your thoughts on this?

The implementation is certainly a bit unorthodox, as Koichi already pointed out. I don't mind it conceptually, but matching regular expressions on every token does sound like it could significantly affect the execution time. Some benchmark around this would be nice.

I certainly think that this cop should either be 1) disabled by default, or 2) be shipped with a minimal configuration. As the author pointed out, the general capability is more important than a specific configuration. RuboCop is used by many people, and like with the other cops, we leave it to them to define their own policies.

@tjwp
Copy link
Contributor Author

tjwp commented Jun 19, 2021

@Drenmi Thanks for taking a look!

The implementation is certainly a bit unorthodox, as Koichi already pointed out. I don't mind it conceptually, but matching regular expressions on every token does sound like it could significantly affect the execution time. Some benchmark around this would be nice.

A small update on this. The implementation has been updated based on feedback to look at specific types of tokens, and all of those are now individually configurable.

I certainly think that this cop should either be 1) disabled by default, or 2) be shipped with a minimal configuration. As the author pointed out, the general capability is more important than a specific configuration. RuboCop is used by many people, and like with the other cops, we leave it to them to define their own policies.

👍🏻

@tjwp
Copy link
Contributor Author

tjwp commented Jun 21, 2021

I ran a very simple benchmark around this using the rubocop code base. The following 3 scenarios were tested:

  • InclusiveLanguage cop disabled
  • InclusiveLanguage cop enabled with default config
  • InclusiveLanguage cop enable with repo config

Each scenario was run 3 times with the cache disabled:

Scenario Avg Max Min % increase avg
Disabled 4:13.87 4:14:56 4:12.57
Enabled w/ default 4:14.86 4:16.70 4:12.79 0.4%
Enabled w/ repo 4:17.77 4:18.37 4:15.59 1.5%

My interpretation of those results is that the impact on the execution time is not large. With the current default configuration from the change, the increase was small, with the fastest execution falling within the range of the baseline executions. With the config included for the rubocop repo, the effect of the cop was larger, but still a small percentage increase for the overall execution time.

@bbatsov bbatsov merged commit 609cd7f into rubocop:master Jun 23, 2021
@bbatsov
Copy link
Collaborator

bbatsov commented Jun 23, 2021

Great work! Thanks for working on this!

koic added a commit to rubocop/rubocop-minitest that referenced this pull request Jun 23, 2021
koic added a commit to rubocop/rubocop-performance that referenced this pull request Jun 23, 2021
koic added a commit to rubocop/rubocop-rails that referenced this pull request Jun 23, 2021
@tjwp
Copy link
Contributor Author

tjwp commented Jun 23, 2021

Great work! Thanks for working on this!

Thank you! It's been several years since my last contribution to rubocop, but a pleasure to be back here and able to contribute!

schmijos added a commit to renuo/applications-setup-guide that referenced this pull request Jun 29, 2021
I don't agree with forcing human language into cultural limits and therefore propose to disable the cop introduced by rubocop/rubocop#9842.

My code normally offends people in a lot of different non-cultural ways already 😉 .
It's fine if Rubocop complains about those. But please talk to me in person if my code offends you by cultural issues.
coorasse pushed a commit to renuo/applications-setup-guide that referenced this pull request Jun 29, 2021
I don't agree with forcing human language into cultural limits and therefore propose to disable the cop introduced by rubocop/rubocop#9842.

My code normally offends people in a lot of different non-cultural ways already 😉 .
It's fine if Rubocop complains about those. But please talk to me in person if my code offends you by cultural issues.
renawatson68 added a commit to renawatson68/performance-develop-rubyonrails that referenced this pull request Sep 23, 2022
richardstewart0213 added a commit to richardstewart0213/performance-build-Performance-optimization-analysis- that referenced this pull request Nov 4, 2022
MarttiCheng added a commit to MarttiCheng/Rubocop-Performance that referenced this pull request Sep 28, 2023
SerhiiMisiura added a commit to SerhiiMisiura/Rubocop-Performance that referenced this pull request Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants