Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of testing Providers that were prepared on November 15, 2022 #27674

Closed
potiuk opened this issue Nov 15, 2022 · 43 comments
Closed

Status of testing Providers that were prepared on November 15, 2022 #27674

potiuk opened this issue Nov 15, 2022 · 43 comments
Labels
kind:meta High-level information important to the community testing status Status of testing releases

Comments

@potiuk
Copy link
Member

potiuk commented Nov 15, 2022

Body

I have a kind request for all the contributors to the latest provider packages release.
Could you please help us to test the RC versions of the providers?

Let us know in the comment, whether the issue is addressed.

Those are providers that require testing as there were some substantial changes introduced:

Provider alibaba: 2.2.0rc1

Provider amazon: 6.1.0rc1

Provider apache.beam: 4.1.0rc1

Provider apache.drill: 2.3.0rc1

Provider apache.druid: 3.3.0rc1

Provider apache.hive: 4.1.0rc1

Provider apache.livy: 3.2.0rc1

Provider apache.pig: 4.0.0rc1

Provider apache.pinot: 4.0.0rc1

Provider apache.spark: 4.0.0rc1

Provider arangodb: 2.1.0rc1

Provider asana: 3.0.0rc1

Provider cncf.kubernetes: 5.0.0rc3

Provider common.sql: 1.3.0rc1

Provider databricks: 3.4.0rc1

Provider docker: 3.3.0rc1

Provider exasol: 4.1.0rc1

Provider google: 8.5.0rc1

Provider grpc: 3.1.0rc1

Provider hashicorp: 3.2.0rc1

Provider jdbc: 3.3.0rc1

Provider microsoft.azure: 5.0.0rc1

Provider microsoft.mssql: 3.3.0rc1

Provider microsoft.winrm: 3.1.0rc1

Provider mongo: 3.1.0rc1

Provider mysql: 3.3.0rc1

Provider opsgenie: 5.0.0rc1

Provider oracle: 3.5.0rc1

Provider postgres: 5.3.0rc1

Provider salesforce: 5.2.0rc1

Provider sendgrid: 3.1.0rc1

Provider sftp: 4.2.0rc1

Provider slack: 7.0.0rc1

Provider snowflake: 4.0.0rc1

Provider sqlite: 3.3.0rc1

Provider ssh: 3.3.0rc1

Provider tableau: 4.0.0rc1

Provider trino: 4.2.0rc1

Provider vertica: 3.3.0rc1

Provider yandex: 3.2.0rc1

Provider zendesk: 4.1.0rc1

The guidelines on how to test providers can be found in

Verify providers by contributors

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.
@potiuk potiuk added the kind:meta High-level information important to the community label Nov 15, 2022
@potiuk
Copy link
Member Author

potiuk commented Nov 15, 2022

Note that cncf.kubernetes is rc3 - because I accidentally published cncf.kubernetes 5.0.0rc1 and rc2 some time in the past and it is impossible to delete an already released package in PyPI so I had to use rc3 for the cncf.kubernetes provider.

@r-richmond
Copy link
Contributor

r-richmond commented Nov 15, 2022

Attempted to test google: 8.5.0rc1 by installing it into a dev instance and was hit with the following error. Which I suppose is expected considering apache-airflow-providers-common-sql 1.3.0 has rc1 above but I don't have an easy way to install 8.5.0rc1 in my dev environment as a result.

One workaround would be to update google 8.5.0rcX to be dependent on apache-airflow-providers-common-sql>=1.3.0rcX & remove the rc portion after finished

Because apache-airflow-providers-google (8.5.0rc1) depends on apache-airflow-providers-common-sql (>=1.3.0) which doesn't match any versions, apache-airflow-providers-google is forbidden.
So, because airflow-dev depends on apache-airflow-providers-google (==8.5.0rc1), version solving failed.

@potiuk
Copy link
Member Author

potiuk commented Nov 15, 2022

Actually, the RC versions should also support dependencies to other rc packages. but I will check it later (on the run now). You can workaround and install both packages with --no-deps switch fi you have problems.

@potiuk
Copy link
Member Author

potiuk commented Nov 16, 2022

@r-richmond I cannot reproduce it. Where did you install the rc1 package from ?

I just double checked and our RC packages (in PyPI) have mechanism implemented to cope with the issue you mentioned and they properly depend on .* version in case of such a cross-dependency:

Requires-Dist: apache-airflow-providers-common-sql (>=1.3.0.*)

The .* is added automatically when we release and RC package if we depend on another apache-airflow package - specificaly to handle such case.

When I try to install apache-airflow-providers-google==8.5.0rc1, it properly installs and pulls common-sql==1.3.0rc1

...
Installing collected packages: chardet, asgiref, apache-airflow-providers-common-sql, aiofiles, gcloud-aio-auth, gcloud-aio-storage, gcloud-aio-bigquery, google-cloud-dataform, google-cloud-compute, apache-airflow-providers-google
  Attempting uninstall: apache-airflow-providers-common-sql
    Found existing installation: apache-airflow-providers-common-sql 1.1.0
    Uninstalling apache-airflow-providers-common-sql-1.1.0:
      Successfully uninstalled apache-airflow-providers-common-sql-1.1.0
  Attempting uninstall: apache-airflow-providers-google
    Found existing installation: apache-airflow-providers-google 8.2.0
    Uninstalling apache-airflow-providers-google-8.2.0:
      Successfully uninstalled apache-airflow-providers-google-8.2.0
Successfully installed aiofiles-0.8.0 apache-airflow-providers-common-sql-1.3.0rc1
apache-airflow-providers-google-8.5.0rc1 asgiref-3.5.2 chardet-4.0.0 
gcloud-aio-auth-4.0.1 gcloud-aio-bigquery-6.1.2 gcloud-aio-storage-7.0.1 
google-cloud-compute-0.7.0 google-cloud-dataform-0.2.0

Note: apache-airflow-providers-common-sql-1.3.0rc1 apache-airflow-providers-google-8.5.0rc1 above

So I wonder where your errors came from?

@potiuk
Copy link
Member Author

potiuk commented Nov 16, 2022

OK. I think I know. From the error message, seems you are using poetry @r-richmond. Poetry is not supported to install airflow, and it has apparently a bug that it cannot properly handle the vaild >= .* requirement. You MUST use pip to install airflow I am afraid.

@bdsoha
Copy link
Contributor

bdsoha commented Nov 16, 2022

@potiuk The following work great:

Provider ssh: 3.3.0rc1

@rajaths010494
Copy link
Contributor

@potiuk The following work great:
Provider microsoft.azure: 5.0.0rc1

@pankajkoti
Copy link
Member

We are getting the below error when trying to install the Kubernetes RC via setup.cfg:

ERROR: Could not find a version that satisfies the requirement apache-airflow-providers-cncf-kubernetes==5.0.0rc3; extra == "all" (from astronomer-providers[all]) (from versions: 2.0.2, 3.0.0, 4.0.0, 4.1.0, 4.3.0, 4.4.0)
ERROR: No matching distribution found for apache-airflow-providers-cncf-kubernetes==5.0.0rc3; extra == "all"

Any hint that could help us solve this problem? The same via pip install works. But somehow it is failing with setup.cfg only for this RC, all other RCs work fine via setup.cfg including apache-airflow-providers-amazon==6.1.0rc1, apache-airflow-providers-apache-hive==4.1.0rc1, apache-airflow-providers-apache-livy==3.2.0rc1, apache-airflow-providers-common-sql==1.3.0rc1, apache-airflow-providers-databricks==3.4.0rc1, apache-airflow-providers-google==8.5.0rc1, apache-airflow-providers-microsoft-azure==5.0.0rc1, apache-airflow-providers-snowflake==4.0.0rc1, apache-airflow-providers-trino==4.2.0rc1

@Taragolis
Copy link
Contributor

#26784 - Tested on actual both AWS SSM Parameter Store and AWS Secrets Manager and circular error gone (I hope forever)

#26687 - Tested non-doc parts

  • Test connection show expected error in the UI (non-testable connection)
  • Show warning if user provide non-existed emr_conn_id

#26946 - Assume role with credentials fixed and work as initial expected

#27134 - Most of changes related to documentation but anyway test that obtain connection in case if it stored in AWS SSM as JSON

#26953 - Works as expected

@potiuk
Copy link
Member Author

potiuk commented Nov 16, 2022

We are getting the below error when trying to install the Kubernetes RC via setup.cfg:

My guess you might have the other RCs already installed locally and explicitly that's why they are found by pip - but if you do it through extra and they are not installed manually, pip does not find pre-release candidates automatically - you need to tell it to search also among pre-release candidates by --pre switch of pip install.

  --pre                       Include pre-release and development versions. By default, pip only finds stable versions.

@pankajkoti
Copy link
Member

pankajkoti commented Nov 16, 2022

We are getting the below error when trying to install the Kubernetes RC via setup.cfg:

My guess you might have the other RCs already installed locally and explicitly that's why they are found by pip - but if you do it through extra and they are not installed manually, pip does not find pre-release candidates automatically - you need to tell it to search also among pre-release candidates by --pre switch of pip install.

  --pre                       Include pre-release and development versions. By default, pip only finds stable versions.

Thanks Jarek. I am sorry for the bother here, looks like it is an internal Astronomer issue we have because of a constraint we have on the Kubernetes provider releases. @kaxil helped us figure this out, so thanks to Kaxil too :)

@alexandermalyga
Copy link
Contributor

@potiuk
Provider trino: 4.2.0rc1 #27168

This particular issue is fixed in 4.2.0rc1.
But, when I tried with 4.1.0 to check that the issue was still in the old version, it wasn't anymore. I guess this is because the dependency was specified as trino>=0.301.0 in 4.1.0 and it dowloaded 0.319.0 when I installed (which fixes the underlying issue).
This seems a little weird to me because the fix for the issue was to bump the dependency to trino>=0.318.0 in the first place, so is there any point in PRs that just bump some dependency version, if these versions are getting downloaded into old releases anyway?

@potiuk
Copy link
Member Author

potiuk commented Nov 16, 2022

This seems a little weird to me because the fix for the issue was to bump the dependency to trino>=0.318.0 in the first place, so is there any point in PRs that just bump some dependency version, if these versions are getting downloaded into old releases anyway?

Of course - it's the installation of new provider bumped trino upgrade. If you already had an earlier version of trino library installed the only way to upgrade it would be to manually upgrade the trino library. If we did not have >= 0.318 it would not have happened.

@alexandermalyga
Copy link
Contributor

Make GSheetsHook return an empty list when there are no values (#27261) works as expected.

@alexandermalyga
Copy link
Contributor

PostgresHook: Added ON CONFLICT DO NOTHING statement when all target fields are primary keys #26661 works as expected.

@dstandish
Copy link
Contributor

dstandish commented Nov 16, 2022

@potiuk the yandex provider changelog has a note about a 4.0 release since i thought that we would be bumping (as was the practice at the time). but actual release is 3.2.0 so the release notes need to be fixed.
see here https://github.com/apache/airflow/pull/27040/files#diff-78feedeed4b63a7326f7ade03a383d397cceb5f48c2eda0f90ddcce25494033aR27

confirmed it's in the release package. i am going through all of them now

i'll compile fixes here #27726

all fixes added to that PR:

  • yandex release notes update: release is 3.2 not 4.0
  • asana can be minor
  • k8s prefix removal is feature not bug

@potiuk
Copy link
Member Author

potiuk commented Nov 16, 2022

yandex release notes update: release is 3.2 not 4.0

Ok. Thanks. Missed it. I will remove yandex and make a follow-up release.

asana can be minor

I will leave as it is. It was originally marked as breaking, and maybe even if it is not, it's still OK from SemVer point of view (we are ok to bump major version any time we want, regardless if there are breaking changes). This is at most small annoyance.

k8s prefix removal is feature not bug

Will leave it as is, This can be corrected in the docs (in the next version), it will stay in the PIP/Package README though. But again, it's minor inconvenience.

@dstandish
Copy link
Contributor

ok i'll keep compiling things in that PR and yeah, somethings can be deferred for sure

@dstandish
Copy link
Contributor

ok i completed review and all suggestions are in that PR

@dstandish
Copy link
Contributor

ok @potiuk, in addition to yandex, jdbc references a non-existent release in the changelog

but the other suggestions i think can be docs-only

@punx120
Copy link
Contributor

punx120 commented Nov 17, 2022

#27184 Tested - looks good

@pankajastro
Copy link
Member

Tested #26652, #27330, #27533, #27236, #27543 work as expected

@bdsoha
Copy link
Contributor

bdsoha commented Nov 17, 2022

Tested #27404, #27321

@potiuk
Copy link
Member Author

potiuk commented Nov 18, 2022

Thanks everyone. Closing this one and releasing most providers (see announcement at devlist for details)l

@potiuk potiuk closed this as completed Nov 18, 2022
@alexott
Copy link
Contributor

alexott commented Nov 18, 2022

Unfortunately I'll have time to test Databricks provider only on weekend

@sshivprasad
Copy link
Contributor

Tested #26951 - Works as expected

@emincanoguz11
Copy link
Contributor

Hi @potiuk. Sorry for late response, I am very busy nowadays.

I tested #27174 - Its works.

@potiuk
Copy link
Member Author

potiuk commented Nov 18, 2022

Unfortunately I'll have time to test Databricks provider only on weekend

We can always fix in upcoming release if we find problems @alexott :D

@alexott
Copy link
Contributor

alexott commented Nov 19, 2022

@potiuk unfortunately DatabricksSqlOperator is broken - it fails with error:

AttributeError: 'schema' is configured as a template field but DatabricksSqlOperator does not have this attribute.

most probably it's caused by #25717.

#27446 works as designed

@alexott
Copy link
Contributor

alexott commented Nov 19, 2022

Can we revert this release of databricks provider until this issue is investigated & fixed? It's really bad that refactoring was done without testing of affected operators.

@eladkal
Copy link
Contributor

eladkal commented Nov 19, 2022

It's really bad that refactoring was done without testing of affected operators.

We always welcome the community to test. This is why we have this tracker before release.
Noting that if provider is not well covered with unit tests then we have no way of knowing if new functionality works. Other providers have coverage for testing templated fields.

I dont think there is need to yank the release because one operator stopped working. Affected users can use older version of provider while a fix is in progress till next release.
Fixing the template field issue should be simple enough cc @kazanzhy

@alexott
Copy link
Contributor

alexott commented Nov 19, 2022

The problem here was that voting was open only during the work week, not over the weekend as usual. And I have time only over weekend. I agree about having test coverage, but it work in progress.

P.S. It looks like that #23971 broke saving results to file

@eladkal
Copy link
Contributor

eladkal commented Nov 19, 2022

We are always looking for ways to improve :)
If you specific idea in mind about what we can do better (procedure wise) I suggest to start a thread in the mailing list

@potiuk
Copy link
Member Author

potiuk commented Nov 19, 2022

Hello @alexott - I have always cvalued the work of yours. And please don't treat that personally. This comment has no personal attack on you in any way - it's more towards the organization of yours, but your comment come to me as a bit of shock. And I presume you are speaking in the name of Databricks as an organisation here. If not, I am even a bit more shocked.

Are you seriously saying that the work of volunteers that prepare and release the ASF software mostly in their free (or sponsored by some responsible players) should be impacted to delay/no time that commercial company - Databricks - interested in their (Databricks) intergration could not afford any of their employees to spend any time on it during their day job?

Do I interpret your words well?

Are you seriously saying that Databricks has an expectation that their integration will Airflow works and get tested, but they are not able to allocate time of their employees working on it to verify it during their day job, and instead they expect that Airflow PMC process should be changed to impact (and delay) general release process of 70+ providers because of that?

If that's the case, I am literally shocked - even stunned. It shows great misunderstanding of the role and contributions that the company can make to the project that they deeply care about. (again - I have nothing to you personally, It's a comment to the organization of yours).

The problem here was that voting was open only during the work week, not over the weekend as usual. And I have time only over weekend. I agree about having test coverage, but it work in progress.

There is no "usual" thing here. I checked and in the past sometimes voting on Sat, sometimes on Mon, but sometimes on Thu. The 72 hours that the ASF requires is there for a reason. It should give not only a lot of time for reaction, but also should account for geographical locations

https://www.apache.org/foundation/voting.html

Voting periods should generally run for at least 72 hours to provide an opportunity for all concerned persons to participate, regardless of their geographic location.

Yes. There is "at least", but there is nothig about "make sure it goes through weekend". Surely companies like Databricks have on-call-rotation with "hours" SLAs for their critical problems, and giving 72 Hrs notice is far more relaxed and give quite a lot of time to be able to react for the company/team that takes care of it.

Do I understand the situation correctly?

Just another comment there - If your organisation wants to really have a control over the release process and testing - it's perfectly possible. We could actually let databricks release their own provider (not as a community one). Databricks is free to do that - we can move the "apache-airflow-providers-databricks" and let databricks release their own "databricks-airflow-provider" (same as Great Expectations did). Then you will have fulll control over release, testing and changes. You (Databricks) can do it very easily.

Being part of the community and ASF-ruled project is that you take both benefits and limitations that are coming from that. We do all the release process, we have a number of people (like @kazanzhy ) who contribute and improve the stuff there. This all comes as a benefit for Databricks, as they don't have to necessary put their own effort there into improving and maintaining the libraries. And it comes with the cost - the cost is that release process of ASF has to be followed. The cost is that if you should make sure the code is sufficiently covered by unit tests. Finally the cost is that when we put it up for release Databricks have 72 hrs to see if they are ok with the release - with all the prior notifications, warnings and even listing detailed list of changes so that Databricks can focus just on this.

And even more - with the AIP-47, you have an apportunity (as Databricks) to create, maintain, develop and RUN system tests for your provider. We have a very well defined way how to build and add those tests, yet we EXPECT Databricks ( similarly as Amazon and Google who pave the way) to eventually run and report the status of their end-2-end integration of the provider. That would have caught the problem way earlier. And you are absolutely free to invest your time and effort to make those examples/system_tests fully automatically runnable, have an account to run them and actually run and report result of those on whatever frequency you think is appropriate. This is something Databricks could do today. Just needs to invest. And this is the way how the "testability" of the Databricks provider can be improved. Not by shifting voting to weekends.

I'd really love that Databricks does the investment there to improve the test quality tehre. If they care about the quality of their integration - AIP-47 is precisely about that. And the best way of doing it.

@potiuk
Copy link
Member Author

potiuk commented Nov 19, 2022

And yes - it is the weekemd, where I am on the woods and I just yanked the release (it's reversible as well):

Screenshot 2022-11-20 at 00 54 08

I am planning the follow-up release right after I am back (Monday afternoon) so if we will find and investigate/fix the issue before that - I am happy to include 3.4.1 databricks provider in this wave.

@hankehly
Copy link
Contributor

The following work as expected (better late than never)

@potiuk
Copy link
Member Author

potiuk commented Nov 20, 2022

I dont think there is need to yank the release because one operator stopped working. Affected users can use older version of provider while a fix is in progress till next release.

@eladkal - I think yanking is fine for that -> this is really a "soft" way of making releases not "searchable" when doing upgrades. People still can choose to install them manually, no problem. And sinece we have not included the new providers in "2.4.3" - it has no impact for the default bugfix installations/image/constraints versions.

I personally believe we should use yanking more frequently (and have no objections for it if there is a good reason) - for releases we do not want tolet users install "a new" - because we know they are somewhat seriously flawed.

@eladkal
Copy link
Contributor

eladkal commented Nov 20, 2022

@eladkal - I think yanking is fine for that -> this is really a "soft" way of making releases not "searchable" when doing upgrades. People still can choose to install them manually, no problem. And sinece we have not included the new providers in "2.4.3" - it has no impact for the default bugfix installations/image/constraints versions.

I personally believe we should use yanking more frequently (and have no objections for it if there is a good reason) - for releases we do not want tolet users install "a new" - because we know they are somewhat seriously flawed.

I'm not sure I agree.
Yanking is harsh. It means under no circumstances do not use this version. This is not the case here. 1 functionality out of many others has regression. If someone use this package without this specific operator why should he be effected? More over, if that individual waited for another feature you are now preventing him from using this new feature.
To me this sounds like this path means always yank everything because almost any new release fix regressions from previous ones.

@eladkal
Copy link
Contributor

eladkal commented Nov 20, 2022

(but this is not something I feel strongly about. Just my own view)

@alexott
Copy link
Contributor

alexott commented Nov 21, 2022

@potiuk sorry for adding more work on your shoulders, and I appreciate your work & other maintainers. We still don't have on-call for Airflow operators. Work on improving test coverage, and setting up continuous testing is prioritized.

@potiuk
Copy link
Member Author

potiuk commented Nov 22, 2022

@alexott - Also to explain more (and apologise, if it's been a little - or even little more than a little - harsh).

But I think this is really just question of balance between the risk we will break something and the time/effort of somone's (who?) to do enough testiing to be confident enouhg. There is never 100%. We cannot extend the testing period for very long time because it will hamper our "operations". And we do not know who is / is not available for testing. We cannot really have a point to say "we wait for response".

The most we can do is to make all the reasonable effort to flag and notify everyone involved and give them SOME time to react. Is 72 hours enough? Should it be 96 maybe? Should we always have weekend included?

To be honest I was quite surprised to see the statement of yours - usually we made voting last 1 more day if the weekend was included to account that people will NOT have time to do testing then because usually they spend weekends with their families etc. I often (my personal choice) do some CI stuff during wekends because then it's faster and safer to iterate on it) and quite often I shift my working/non-working days freely because I have very flexible situation (no day job, no kids, wife that has also super flexible schedule), but I would not think this is "needed" by others :)

Do you really think having a weekend as part of the voting time for providers is a good idea and way to improve the quality of testing in general? I think we COULD make sure we always start voting on Wed and finish on Tuesday, I just have feeling that would be rather artifficial setting for one specific case only. I often did adjust the relase schedule to various events (for example I would never start voting this Wednesday - knowing that ThanksGiving is coming),.

Maybe this is also question to others in this thread. We have quite representative group in this thread - do you think that would be a good idea to add some regularity in the day-of-week process?

@potiuk
Copy link
Member Author

potiuk commented Nov 22, 2022

@eladkal I looked at the change again, and I think I want to keep my approch - yanking - especially if we can fix it quickly - @alexott @kazanzhy ? Can we have a fix for the problem today/tomorrow? I would then release a fixed version.

Why I think we should yank it ?

SQL interface is an extremely important part of any modern data interface. Even if this is just SQLExecute, people will rely on it heavily and by upgrading to the new Databricks provider, a lot of the workflows might be broken. Databricks is popular and important part of our ecosystem. With this bug, we will not only impact Databricks "view" but also Airflow's one and we might have to deal with many issues raised by our users. Yanking is the way to prevent those issues from happening in most cases. But it also does not prevent those who want to use new features released in the new Databricks Provider. They STILL can install the provider if they specify ==3.4.0 . No problem with that. This is precisely what yanking provides - it means that you won't accidentally install it when you upgrade by running pip install --upgrade or when installing Airlfow from the scratch without constraints. But you still can install this version if you need the features and you do not care about SQL error. You will see the message why installing it (including the reason for yanking that currently explains that there is a bug in SQL operator). I think this is the best approach for now.

Especially if we will be able to release a fix in few days, this will be only a minor inconvenience that users won't be able to use the new features of Databricks provider "automatically". Compare that with potential "radius blast" of broken SQL operator for Databricks. It's like few orders of magnitude more serious problem that our users might have to deal with if we do not yank it now.

So @kazanzhy @alexott - is it possible to fix this problem quickly ? If so - I am happy to wait with the RC2 candidate release I am going to do for those providers that we skipped in last wave until this one is tested and merged.

@Voldurk
Copy link
Contributor

Voldurk commented Nov 22, 2022

@potiuk Please excuse the delay, #27033 tested OK on Composer+Airflow 2.3.4, but had to resolve dependency issues.

@potiuk potiuk added the testing status Status of testing releases label Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:meta High-level information important to the community testing status Status of testing releases
Projects
None yet
Development

No branches or pull requests