Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39881][PYTHON] Fix erroneous check for black and reenable black validation. #37305

Closed
wants to merge 5 commits into from

Conversation

grundprinzip
Copy link
Contributor

@grundprinzip grundprinzip commented Jul 26, 2022

What changes were proposed in this pull request?

The previously committed check for running black did not actually work and caused code to be committed that does not follow the linter rules. This patch fixes the way we check if black is locally installed and update the dev/reformat-python script. In addition, we run the script to fix existing style issues. Similar to the original PR #32779 this patch only applies the black checks on the pandas code.

The black version is updated in this PR because on an empty virtualenv the selected version of click ends up in a conflict due to a underspecified version of click. See psf/black#2964.

Why are the changes needed?

We have linter rules, we should actually address them.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual testing.

@github-actions github-actions bot added the INFRA label Jul 26, 2022
@grundprinzip grundprinzip changed the title [WIP] [SPARK-39881] Fix erroneous check for black and renable black validation. [WIP] [SPARK-39881] Fix erroneous check for black and reenable black validation. Jul 26, 2022
@@ -968,7 +968,7 @@ class _CountVectorizerParams(JavaParams, HasInputCol, HasOutputCol):

def __init__(self, *args: Any):
super(_CountVectorizerParams, self).__init__(*args)
self._setDefault(minTF=1.0, minDF=1.0, maxDF=2 ** 63 - 1, vocabSize=1 << 18, binary=False)
self._setDefault(minTF=1.0, minDF=1.0, maxDF=2**63 - 1, vocabSize=1 << 18, binary=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think this is false positive from Black presumably (per https://peps.python.org/pep-0008/#other-recommendations). I believe we use one space around these operators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this fall into this rule?

If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies). Use your own judgment; however, never use more than one space, and always have the same amount of whitespace on both sides of a binary operator:

2**63 - 1

The - has a lower priority compared to the power operator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this style change because we are upgrading black?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, yes. I just tested. In the old version it would not touch this change, the new version does. Playing with it actually shows that the behaviour is specifc to the power operator. I'll check if I find some more details in the black releases.

Copy link
Member

@ueshin ueshin Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the doc that mentions about this:

Almost all operators will be surrounded by single spaces, the only exceptions are unary
operators (+, -, and ~), and power operators when both operands are simple. For
powers, an operand is considered simple if it's only a NAME, numeric CONSTANT, or
attribute access (chained attribute access is allowed), with or without a preceding
unary operator.

And the commit.

The style change is from 22.1.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ueshin Please see - psf/black#538

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like ** is not covered by PEP8, and the main reason of balck change is consider about readable, so I personaly think black choice is right.

@grundprinzip grundprinzip changed the title [WIP] [SPARK-39881] Fix erroneous check for black and reenable black validation. [SPARK-39881] Fix erroneous check for black and reenable black validation. Jul 27, 2022
@grundprinzip
Copy link
Contributor Author

@itholic @ueshin it would be great if you could have a look at the PR since you authored / reviewed the original PR that introduced the bug - #32779

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise looks fine to me.

dev/lint-python Outdated
$BLACK_BUILD 2> /dev/null
if [ $? -ne 0 ]; then
$PYTHON_EXECUTABLE -c 'import black' &> /dev/null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why this should be changed ??

The existing code "$PYTHON_EXECUTABLE -m black" isn't enough to check for Black installation ??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is very simple:

> python -m black
 Usage: python -m black [OPTIONS] SRC ...

 One of 'SRC' or 'code' is required.
> echo $?
1
> python -m black
/System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python: No module named black
> echo $?
1
> python -c 'import black'
> echo $?
0
> python -c 'import this_does_not_exist
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'this_does_not_exist'
> echo $?
1

dev/reformat-python Show resolved Hide resolved
Comment on lines 233 to 234
self.assertRaises(TypeError, lambda: datetime.date(1994, 1, 1) ** b_psser)
self.assertRaises(TypeError, lambda: datetime.datetime(1994, 1, 1) ** b_psser)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why Black doesn't reformat this, too...

Shouldn't it be reformatted like:

-         self.assertRaises(TypeError, lambda: datetime.date(1994, 1, 1) ** b_psser)
-         self.assertRaises(TypeError, lambda: datetime.datetime(1994, 1, 1) ** b_psser)
+         self.assertRaises(TypeError, lambda: datetime.date(1994, 1, 1)**b_psser)
+         self.assertRaises(TypeError, lambda: datetime.datetime(1994, 1, 1)**b_psser)

???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no black expert, but from the discussion I had with @HyukjinKwon above, it looks like it's a operator precedence behavior thing.

If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies). Use your own judgment; however, never use more than one space, and always have the same amount of whitespace on both sides of a binary operator:

in

self.assertRaises(TypeError, lambda: datetime.date(1994, 1, 1) ** b_psser)

The binary operator ** is not paired with a lower priority other operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand about the priority but just wonder why Black did:

-         self.assertRaises(TypeError, lambda: True ** b_psser)
+         self.assertRaises(TypeError, lambda: True**b_psser)

but didn't

-         self.assertRaises(TypeError, lambda: datetime.date(1994, 1, 1) ** b_psser)
-         self.assertRaises(TypeError, lambda: datetime.datetime(1994, 1, 1) ** b_psser)
+         self.assertRaises(TypeError, lambda: datetime.date(1994, 1, 1)**b_psser)
+         self.assertRaises(TypeError, lambda: datetime.datetime(1994, 1, 1)**b_psser)

It seems like just inconsistence behavior from Black, so maybe we can just keep is as is for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is explained here #37305 (comment) - in short the operators are not simple ( True vs datetime.datetime(1994, 1, 1))

@itholic
Copy link
Contributor

itholic commented Jul 27, 2022

Anyway, +1 for upgrading the version to the latest.

@itholic
Copy link
Contributor

itholic commented Jul 27, 2022

Can we also add [PS] to the PR title ?

@grundprinzip grundprinzip changed the title [SPARK-39881] Fix erroneous check for black and reenable black validation. [PS] [SPARK-39881] Fix erroneous check for black and reenable black validation. Jul 27, 2022
@grundprinzip grundprinzip changed the title [PS] [SPARK-39881] Fix erroneous check for black and reenable black validation. [SPARK-39881][PS] Fix erroneous check for black and reenable black validation. Jul 27, 2022
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itholic

Can we also add [PS] to the PR title ?

This should not be categorized as [PS] but PySpark in general.

@grundprinzip Could you use [PYTHON] instead?

dev/lint-python Outdated Show resolved Hide resolved
@grundprinzip grundprinzip changed the title [SPARK-39881][PS] Fix erroneous check for black and reenable black validation. [SPARK-39881][PYTHON] Fix erroneous check for black and reenable black validation. Jul 27, 2022
Copy link
Contributor Author

@grundprinzip grundprinzip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting unnecessary change.

@grundprinzip
Copy link
Contributor Author

Updated the PR description to indicate why the black version was updated.

Copy link
Member

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some nits but it's also OK for me, so just go ahead if you think it's ok!

dev/lint-python Outdated Show resolved Hide resolved
dev/pyproject.toml Show resolved Hide resolved
@Yikun
Copy link
Member

Yikun commented Jul 29, 2022

and CI failed due to Run / Scala 2.13 build with SBT git clone networking issue, I think we can pass it by re-triggering.

@grundprinzip
Copy link
Contributor Author

I updated the remaining comments. Can we merge the PR?

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants