Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop older Python support #487

Closed
albertz opened this issue Apr 16, 2021 · 32 comments · Fixed by #1255
Closed

Drop older Python support #487

albertz opened this issue Apr 16, 2021 · 32 comments · Fixed by #1255

Comments

@albertz
Copy link
Member

albertz commented Apr 16, 2021

This is just to start some discussion, specifically what we want to drop. Surely <=Python2, but probably also <Python3.x for some x (e.g. 7). We could also say it might make more sense maybe in 6 months or so. This issue implies then to clean up the code accordingly.

The purpose is to clean up some code where Python 2 vs 3 needs extra separate logic, and also where we could make use of newer syntax. I want to collect some of these arguments (and their needed Python version) here.
Sometimes some things can also be fixed by from __future__ import ..., which is ok (but also not for all features).

Then there might be practical issues on some systems we use, in some conditions, which just doesn't provide newer Python versions, or would make it very inconvenient. E.g. for some time, RASR was linked to Python 2 by default, and also did not support Python 3. I think that is fixed now? Then Ubuntu 16 has some older Python, but maybe we can just agree to use anyway a newer Python version?


Now some arguments, or relevant features:

  • Type annotations. Like def func(a: str) -> Data or whatever. I think since Python 3.5.
    However, there is the issue with forward declaration. E.g.:

    class Data:
       def bla(self) -> Data:
         ...
    

    This requires >=Python 3.10 I think.
    Or it requires from __future__ import annotations since Python 3.7 (doc).
    The earlier workaround was to use a string as annotation instead, i.e. def bla(...) -> "Data" (referred to as forward declaration). But I think this support was always somewhat limited (e.g. not sure if PyCharm will correctly pick this up), and will also likely go away at some point, or being deprecated, so I would argue to avoid this (because otherwise we will have a similar discussion soon again). But this means we need at least Python 3.7.
    (There are ongoing discussions around this. E.g. see pydantic #2678 (HN), PEP 649 (python-dev discussion).)
    I would also argue, in any case, type inference must work in PyCharm. If that doesn't work, we should not use it.
    In any case, I think these type annotations are actually one of the major reasons for us to drop older support.

  • dict.items() or dict.keys() returns a non-deterministic order in earlier Python versions.
    Because you usually want to have this deterministic, you would use sorted(dict.items()).
    Since Python 3.6, this is now deterministic (using the insertion order) (see here).
    So, from Python 3.6 on, you can drop the sorted, or explicit usage of OrderedDict.
    We should still be a bit careful here though. Some issues:

    • When the user now runs such code with Python 3.5, there will not be any error. But it might produce strange very hard to debug issues. We always should avoid hard-to-debug issues at all cost. So If we adopt this, we should also have some assert py_version >= (3,6) somewhere.
    • I can't imagine such a situation now, but maybe there could be cases where we must match the old behavior using sorted, i.e. sorted by alphabet, or maybe to stay compatible with some old data?
    • I can't imagine such a situation now, but maybe there could be other cases where this would be bad? E.g. you remove some key, then add some key. Then list(dict.keys()) would be different (whereas sorted would always be the same). Not sure if this could cause any issues.
  • f-strings, since Python 3.6.

  • Ubuntu 16.04 (so far still the standard on many of our systems) comes with Python 3.5.
    Ubuntu 20.04 comes with Python 3.8.
    Ubuntu 22.04 comes with Python 3.10.

  • typing.Protocol requires Python >=3.8.

  • Slash (/) in function argument list to mark positional-only arguments (doc) requires Python >=3.8.

@curufinwe
Copy link
Collaborator

We use RETURNN also on some Centos 7 / Amazon EC2 Linux machines with python 3.6. It would be nice if support for python 3.6 remained let's say until the end of the year.

Regarding the sorted issue: this is not performance critical, so I think it's best to just leave the sorted call there if a deterministic order is required.

@albertz
Copy link
Member Author

albertz commented Apr 16, 2021

Yea, this whole issue is mostly not about performance at all, but really about cleaning up code. And for that, using sorted might be more annoying. But sure, it's just a small thing.

Python 3.6 might be problematic for the annotation arguments above (specifically forward declarations).

But so maybe we just wait with this code cleaning up effort and dropping earlier Python support until end of year then. So you expect that the Amazon EC2 machines will come with newer Python then?

@curufinwe
Copy link
Collaborator

Actually the one EC2-Linux machine I checked had python3.7 but that one was ARM based. I'm more concerned about the whole Centos situation. Redhat stopped support for Centos stable releases and now we'll have to decide what distribution we want for our prod environment. For now we stick to Centos 7. Maybe we can start by dropping support for python 2.7? But maybe you prefer to do all changes in one go?

@JackTemaki
Copy link
Collaborator

Does the current style of using rST docstring annotations and type hinting as in https://www.python.org/dev/peps/pep-0484 any restrictions on where a type hinting is not possible? I think it will be quite the effort to change all docstrings and functions, so what is the main reason to do this? Or do you want to do a gradual change, so that only new code uses the new annotation style?

@albertz
Copy link
Member Author

albertz commented Apr 16, 2021

Maybe we can start by dropping support for python 2.7? But maybe you prefer to do all changes in one go?

I think it's easier to do in one go. For only dropping Python 2, there is actually also not too much to be done, this is really minor then. That is why I also hesitate to do just that, because it's really only minor, and having Python 2 support might outweigh the 5 lines of code we safe or so.

I think it will be quite the effort to change all docstrings and functions, so what is the main reason to do this?

Sure, this will be some effort. But I think it's required at some point in time to keep the code style modern. The Python world is slowly moving towards having more and more static typing.

Or do you want to do a gradual change, so that only new code uses the new annotation style?

Sure, we have to be pragmatic about this. We would have a rule that all new code must use annotations.

And we would slowly change the old code, but these changes would not be a single commit.

I think with some automation, we can also get quite far, and then need to check where it is incorrect or broken, which can also be automated (e.g. with help of PyCharm and/or MyPy).

@albertz

This comment has been minimized.

@michelwi
Copy link
Collaborator

Bevor dropping the support for python2 we should make sure that in the instances where returnn has to talk to rasr (sequence training, Baum-Welch training, feature-extraction) this still works.

I.e. finally make rasr compatible with python3 or handle possible incompatibilities of the used pickle protocols.

@albertz
Copy link
Member Author

albertz commented Apr 16, 2021

I.e. finally make rasr compatible with python3

I thought it already is compatible with Python 3? What is the state here?

handle possible incompatibilities of the used pickle protocols.

I think there were just problems when we mixed Python 2 and Python 3, although we even resolved these problems with some workarounds.

But when we drop Python 2, we can clean this up. But then RASR must support Python 3 as well.

Or maybe we could also say that the modules which are imported by RASR (e.g. SprintInterface) must still be importable and compatible with Python 2?

@curufinwe
Copy link
Collaborator

There is a version of RASR that uses python3 instead of python2 and I can port it to the public version if there is demand for that.

@albertz
Copy link
Member Author

albertz commented Apr 16, 2021

There is a version of RASR that uses python3 instead of python2 and I can port it to the public version if there is demand for that.

I think this is a must such that we can safely drop Python 2.

@albertz

This comment has been minimized.

@albertz

This comment has been minimized.

@albertz
Copy link
Member Author

albertz commented Oct 27, 2021

From the points raised in the original post, and also a similar discussion in rwth-i6/returnn_common#43, it looks like Python 3.7 is a good minimum version candidate. This is e.g. required for proper nice type annotations.

Note that the current Python stable version is 3.10.

Just dropping Python 2 and support earlier Python 3 versions like 3.5 would not really make much sense, and also not really allow us for much cleanup.

@braddockcg
Copy link

bumping this issue. I find Python Typing essential for a large Python code base. 3.7 would be great.

@albertz
Copy link
Member Author

albertz commented Oct 20, 2022

About RASR interface, specifically full-sum training, seq training or other exotic things, what is the state, is this now all well tested with Python 3? @Marvin84 @michelwi @SimBe195?

About Python 3, which version do we need? @curufinwe?

@JackTemaki
Copy link
Collaborator

I have seen setups ranging from Python 3.5 to Python 3.8, we had recent fixes (recent as in "within the last year") for >3.7 as there were deprecated functions before. Switching to 3.7 will definitely exclude many existing setups that are still in 3.5/3.6, but I think we should not wait too long. Directly going to 3.7 will be a problem for many though I think, but lets wait for @curufinwe reply.

@albertz
Copy link
Member Author

albertz commented Oct 21, 2022

However, as it was argued before, just dropping Python 2 but keeping Python 3.5 or 3.6 compatibility does not really gain us much. We still cannot use type annotations in a reasonable way. I think this would create just double the effort without much benefit. I think if we drop this, we should directly switch to at least Python 3.7 (or later, maybe better 3.8), and if this is not ready yet, then we still should wait.

@albertz
Copy link
Member Author

albertz commented Oct 21, 2022

The question is, those existing setups with Python 3.5/3.6, is there any good reason they use such old Python? Or could those setups just be switched to some newer Python?

@michelwi
Copy link
Collaborator

is there any good reason they use such old Python?

in my estimation the main reason is that the OS we use ships with this old version of python people would just not bother installing a custom version. I.e. no good reason.

we should directly switch to at least Python 3.7 (or later, maybe better 3.8), and if this is not ready yet, then we still should wait.

I think currently new versions of OS are being tested that would ship with python3.8. Once these get deployed we would anyway check compatibility and put in effort to fix problems. So maybe start the process now and then we can also drop support for python<=3.7

About RASR interface, specifically full-sum training, seq training or other exotic things, what is the state, is this now all well tested with Python 3? @Marvin84 @michelwi @SimBe195?

I just received confirmation that both sequence training (@christophmluscher) and fullsum recipe (@Marvin84) have been tested with python3.8. Maybe also check with @ZhouW321 if his setups are compatible?

@albertz
Copy link
Member Author

albertz commented Oct 21, 2022

maybe start the process now

I'm not really sure what you mean by that. Dropping Python will not be done in a branch or PR. This will be a continuous process, directly in master. It will not be in one go. It will probably take a lot of time. It will be somewhat similar as the transition in i6_core. And we should only start at that point when it really can be dropped.

@michelwi
Copy link
Collaborator

maybe start the process now

I'm not really sure what you mean by that.

I mean to start the transition in i6_core now and not only when the new OS is deployed.

@Marvin84
Copy link

I would like just to share one use case with python 3.8 and full-sum training, our HiWi Daniel had some additional scientific packages installed, and even though returnn run script was not using his local python this interfered with the training, leading to very high WERs. This was solved once he created a virtual environment where those packages were installed, i.e., by making sure they are not accidentally used. I was very confused about this but we did not spend further time on debugging.

@albertz
Copy link
Member Author

albertz commented Oct 21, 2022

I mean to start the transition in i6_core now and not only when the new OS is deployed.

So you suggest the same here for RETURNN? Not wait until the new OS is deployed, but drop it right now?

@albertz
Copy link
Member Author

albertz commented Oct 21, 2022

I would like just to share one use case with python 3.8 and full-sum training, our HiWi Daniel had some additional scientific packages installed, and even though returnn run script was not using his local python this interfered with the training, leading to very high WERs. This was solved once he created a virtual environment where those packages were installed, i.e., by making sure they are not accidentally used. I was very confused about this but we did not spend further time on debugging.

I don't really understand this comment. In what way is this specific to Python 3.8? From your description, I don't think this is in any way specific to the Python version, so not really relevant here. Further, I also don't really understand how some unrelated unused additionally installed packages can in any way influence RETURNN. But this is also not relevant here in this issue but rather should be discussed in a separate issue, if you want to investigate this further.

@michelwi
Copy link
Collaborator

I mean to start the transition in i6_core now and not only when the new OS is deployed.

So you suggest the same here for RETURNN? Not wait until the new OS is deployed, but drop it right now?

no, as I said in the statement that you literally just quoted I suggest this for i6_core (well ok, also for other recipes).
I think verifying that all setups work with python3.8 is a prerequisite for

And we should only start at that point when it really can be dropped.

A statement to which I agree.

@albertz
Copy link
Member Author

albertz commented Oct 21, 2022

no, as I said in the statement that you literally just quoted I suggest this for i6_core (well ok, also for other recipes).

Ah ok, I was confused, because this issue here is about RETURNN. And also, I thought for i6_core we already do that? At least I remember that we started to use type annotations (which means mostly that Python >=3.7 is required). I also see f-strings, which are Python >=3.6. So it means, i6_core does anyway not work with the default Ubuntu 16 Python.

@Marvin84
Copy link

In what way is this specific to Python 3.8

Since no further investigation was done, I cannot express any further thoughts on this. However, your high confidence in underlying the irrelevance of python3.8 is not comforting.

Further, I also don't really understand how some unrelated unused additionally installed packages can in any way influence RETURNN

I also wondered about this. Again, I only described the experience 'as-is'. I am not sure where and how you deducted the 'unsued' case. Our intuitive assumption here was that Returnn did rely on local packages as an unexpected behavior.

well tested with Python 3? @Marvin84 @michelwi @SimBe195?

My intention was to answer your questions here, and the scenario I described, IMO, enters into the category of 'testing python version', since he used the same Returnn code with only a different python version and got from 7% WER to 90%. Maybe not directly because of the python version? I am not able to answer this question.

@albertz
Copy link
Member Author

albertz commented Oct 21, 2022

@Marvin84 Yes I'm very confident that this is not about the Python version, but rather about some mess-up in the system or environment. Esp as you point out that it works after you worked with a virtual environment.

@Marvin84
Copy link

some mess-up in the system or environment

In which context do you think we should follow-up with more investigation on this? I think this is an important issue. Implementation meeting or is it still rather a Returnn issue?

@albertz
Copy link
Member Author

albertz commented Oct 21, 2022

some mess-up in the system or environment

In which context do you think we should follow-up with more investigation on this? I think this is an important issue. Implementation meeting or is it still rather a Returnn issue?

First step, open a separate issue on this, and let's continue the discussion there. In that issue, then also add some more details if possible.

@albertz
Copy link
Member Author

albertz commented Oct 21, 2022

After some discussion with @curufinwe, we agreed that we can drop Python <=3.6 support already right now, i.e. require Python >=3.7.

But as I mentioned, this will be a continual effort. Although for all new code, we should directly write it in such a way. Specifically:

  • Use type annotations. In the docstring, the args should still all be documented, but the type is not needed there. In the beginning of a file, we need from __future__ import annotations.
  • Use f-strings instead of other string formatting.
  • We don't necessarily need sorted for dict items or keys anymore (if the only reason was determinism).

I might see if I can do some of the type annotations automatically, by converting our doc strings. I already do similar logic on RETURNN common side where I automatically parse RETURNN code and generate wrappers.

@albertz
Copy link
Member Author

albertz commented Oct 25, 2022

I think before we start doing this, we should give some notification to everyone and leave them a bit of time (in terms of days, not too long) to test their setups with a new Python version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants