Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeze Python 2.7 and modernize codebase #945

Closed
mkleehammer opened this issue Aug 22, 2021 · 11 comments
Closed

Freeze Python 2.7 and modernize codebase #945

mkleehammer opened this issue Aug 22, 2021 · 11 comments

Comments

@mkleehammer
Copy link
Owner

mkleehammer commented Aug 22, 2021

I'd like to look into splitting the codebase into Python 2 and Python 3 and modernizing the 3 version. There used to be a huge number of Python 2 downloads every day, but Python 3 is now ~95% of the pypi download traffic.

https://pypistats.org/packages/pyodbc

Does anyone know the best way to do this? My first inclination is to make a new v5 that is only the modern version and maintain the 4.x branch for security fixes. Any other suggestions?

Some things I'm thinking about for a v5:

  • Update C API and consider using the stable ABI. If I understand correctly, it would allow new versions of Python to use the same binaries.
  • Would it be helpful to have a Python layer that loads the C implementation as _pyodbc? Does this give the project more flexibility to introduce items in whichever layer is easiest?
  • More thoroghly integrate the parameter binding of per-value and fast_executemany so there is only 1 function for encoding each.
  • Provide a connection factory to replace the hidden CnxnInfo. It would be eaiser to configure pre-connect items on the factory and then ask for the connect. (We might need an optional password on the connect for those that don't want it stored globally.)
  • Switch to pytest
@hugovk
Copy link
Contributor

hugovk commented Sep 14, 2021

Many projects make/tag a last release for Python 2.7, then bump the major version and move on.

2.7 has been EOL for 20 months now so people using it aren't getting security releases, and most of the top-downloaded PyPI packages have already dropped it (pip, setuptools, certifi, numpy, cryptography, pytest...).

Make sure you have python_requires>=3.6 in the new version, then pip won't install the new version for folk on Python 2.7.

@keitherskine
Copy link
Collaborator

It's been a while since @mkleehammer raised this issue but I think everything that @mkleehammer mentioned in his original post are great ideas. Dropping Python 2.7 support makes perfect sense and will make the codebase much easier to maintain. As I understand it, we more-or-less have to start using the stable API for Unicode strings soon because Python 3.12 won't work with pyodbc in its current form (there are already a bunch of deprecation warnings during builds). I also like the idea of putting the C code into a _pyodbc module and creating a wrapper Python pyodbc module that calls it. This would make the pyodbc API easier to understand, and make code changes easier too.

I've been looking at the options for making these changes. Although we could potentially update the code to use the stable ABI first, i.e. before dropping 2.7, it seems to me that Python 2.7 just makes everything more complex. Dropping 2.7 will require a lot more work than just deleting some old code, but I still think it's worth doing that first. Happy to hear thoughts on that though.

So here are my thoughts on dropping Python 2.7 from pyodbc version 5.x:

  1. put an update in the README.md file stating this is the intention, preferably with a date
  2. create a new 4.x release from master, there have been a bunch of changes since the last release in Aug 2021 so we should put out a new release before making any serious code changes
  3. once the new release is out there and considered stable, create a new branch from master called something like "py27" which will be used long-term for any Python 2.7 security/bug fixes
  4. create a feature branch from master for the Python 3 work and start removing 2.7 code from it
  5. when that work is complete (!), release it as pyodbc version 5.0.0

After Python 2.7 has been dropped we can then move on to use the stable ABI and do the other changes. All thoughts welcome.

@mkleehammer @gordthompson

v-makouz pushed a commit to v-makouz/pyodbc that referenced this issue Sep 9, 2022
* Add support for Python 3.10, drop EOL 3.5 (mkleehammer#952)

* Remove duplicate entry in pyi stub (mkleehammer#979)

* Replace deprecated SafeConfigParser with ConfigParser (mkleehammer#953)

* Designate connection string as optional (mkleehammer#987)

* Fix spelling typos (mkleehammer#985)

Co-authored-by: Gord Thompson <gord@gordthompson.com>

* Fix for DSN Names with non-ASCII chars (mkleehammer#951)

* Fix for DSN Names with non-ASCII chars

Fixes: mkleehammer#948

Co-authored-by: bamboo <bamboo@galaxy.ad>
Co-authored-by: Gord Thompson <gordon.d.thompson@gmail.com>

* Added InterfaceError to pyodbc.pyi. (mkleehammer#1013)

Co-authored-by: Benjamin Holder <bholder@rpagency.com>

* Upgrade deprecated unicode encoding calls (mkleehammer#792)

* Do not include .pyc artifacts in source tarball mkleehammer#742

* Build wheels with cibuildwheels on GitHub Actions

Fixes mkleehammer#175
Ref mkleehammer#688
Closes mkleehammer#668
Closes mkleehammer#685
Fixes mkleehammer#441 and pretty much most issues that mention
` sql.h: No such file or directory`

This also need to setup some PyPI keys for automated uploads.

* Install unixodbc-dev for Linux wheels

* Enable GitHub Actions for pull requests

* Use Debian based `manylinux_2_24` image

* `apt-get` update before installing in wheel build

* Use PEP 440 version name required for wheels

* Skip building 32-bit wheels

* 4.0.dev0 for default version, because test_version() wants 3 parts here

Checked this won't shadow released minor version (credit goes to @hugovk)

    >>> from packaging.version import Version
    >>> Version("4.0.dev0") > Version("4.0.24")
    False

* Had to use Debian image for PyPy too

* Disable PyPy wheels

https://cibuildwheel.readthedocs.io/en/stable/options/#build-selection

PyPy is missing some C functions that `pyodbc` needs.

* Update README.md

* Avoid error when testing with DSN= connection

Fixes: mkleehammer#1000

* Disable setencoding/setdecoding in tests3/pgtests.py

Fixes: mkleehammer#1004

* Adjust test_columns() in tests3/pgtests.py for newer driver versions

Fixes: mkleehammer#1003

* Move driver version check out of function

* Add comment to _get_column_size()

* Fix memory leak with decimal parameters

Fixes: mkleehammer#1026

* Create codeql-analysis.yml

* Bugfix/sql param data memory leak (mkleehammer#703)

* Updated .gitignore

* * Created a test file for the specific scenario

* * Updated doc of test file for the specific SQLParamData scenario

* * Fixed the test file for the specific SQLParamData scenario by Py_XDECREF the PyObject with 1 reference.

* * Improved the test to close the cursor and set it to None, then forcing the gc

* * Changed the fix of the memory leak and updated the test.

* * Removed redundant empty line

* * Converted tabs to spaces

* * Moved variable out of conn's scope

* Update gitignore, remove duplicated

* Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (mkleehammer#998)

Current versions of Python write a deprecation warning message to
stderr, which breaks CGI scripts running under web servers which
fold stderr into stdout. Likely breaks other software. This change
replaces the deprecated calls with PyUnicode_New(size, maxchar).
The accompanying code to populate the new objects has also been
rewritten to use the new PyUnicode APIs.

* Making pyodbc compatible with PostgreSQL infinity dates, returning MINYEAR and MAXYEAR to python, instead of values out of python's limits

* Removing autoformat from code

* Removing autoformat from code

* Add odbc_config support on mac and m1 homebrew dir

* Note EOL of 2.7 support in README (mkleehammer#945)

* Fix version of CI generated wheels

The CI system is checking out exact tags like "git checkout 4.0.33", which results in a
detached HEAD.  The version calculation was adding the commit hash.

* Fix for mkleehammer#1082 libraries in Linux wheels (mkleehammer#1084)

* use argparse instead of optparse (mkleehammer#1089)

Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Alex Nelson <alexander.nelson@nist.gov>
Co-authored-by: Kian Meng, Ang <kianmeng.ang@gmail.com>
Co-authored-by: Gord Thompson <gord@gordthompson.com>
Co-authored-by: bamboo <bamboo@galaxy.ad>
Co-authored-by: Gord Thompson <gordon.d.thompson@gmail.com>
Co-authored-by: bdholder <benjamin.holder@rocketmail.com>
Co-authored-by: Benjamin Holder <bholder@rpagency.com>
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
Co-authored-by: Michael Fladischer <FladischerMichael@fladi.at>
Co-authored-by: Anatoli Babenia <anatoli@rainforce.org>
Co-authored-by: Francisco Morales <51379487+jose598@users.noreply.github.com>
Co-authored-by: Gord Thompson <gordthompson@users.noreply.github.com>
Co-authored-by: Michael Kleehammer <michael@kleehammer.com>
Co-authored-by: Gilad Leifman <leifmangilad@gmail.com>
Co-authored-by: Bob Kline <bkline@rksystems.com>
Co-authored-by: Leandro Scott <lsrzj@yahoo.com>
Co-authored-by: Jordan Mendelson <jordan@zenzen.org>
Co-authored-by: Keith Erskine <toastie604@gmail.com>
@mkleehammer
Copy link
Owner Author

My original plans for v5 were more ambitous than the time I've been able to put in, so I think Keith's suggested path is best. I'll save major architecture changes for a later version.

  • Drop Python 2 support
  • Continue Python 2 (and maybe 3) v4 security fixes?
  • Convert tests to pytest
  • Switch to stable ABI

The switch to the ABI would ensure pyodbc libraries are ready for new Python versions before those versions are even officially released. There is a potential performance hit, so I'd want to do that work on a separate "abi" branch and create at least one good performance test I could use with PostgreSQL and MyODBC locally.

I'm going to delete the current v5 branch and create a new one.

Thoughts?

@keitherskine
Copy link
Collaborator

keitherskine commented Nov 17, 2022

Here's a few of my thoughts about things we could do to improve pyodbc going forward. Much of this does not involve changing the core C++ code, but there's still plenty of things we can do to modernize the project.

  • Drop Python 2.7

    • Seems kinda fundamental for us to move forward. Not doing so makes everything more complicated.
    • The "master" branch will continue to carry the evolving latest-and-greatest Python 3 code.
    • Optionally, create a new branch "py27" to manage the legacy Python 2.7 code.
    • Strip out Python 2.7 code from the entire codebase - C++ extensions, tests, CICD, etc. (probably best to do this in pieces over time).
  • Build and deploy

    • Figure out a versioning strategy for both deployment and development work. In version 5, I'm hoping we will aim for incremental minor releases (5.0.0, 5.1.0. 5.2.0, etc.) rather than incremental patch releases (5.0.1, 5.0.2, 5.0.3) as in version 4. We may also want to consider "release candidate" labels between deployments (e.g. 5.3rc). Finally, I'm not sure I totally understand the package versions generated by setup.py based on commits, e.g. "4.0.35b14". Are they truly needed?
    • Drop the use of distutils in setup.py, which will be removed in 3.12 (use setuptools instead).
    • PEP517 - update pyproject.toml and setup.py as necessary. We can probably move much of the static info into pyproject.toml from setup.py.
  • Unit Tests

    • Organize the tests into folders for each database, to make test discovery simpler.
    • Expand the use of tox for running unit tests. tox creates virtual environments (in a top-level ".tox" folder) which keeps the testing isolated and running them easy.
    • Use pytest to run the unit tests. This will help discovery further by including all test files simply by their filenames, instead of lumping all our tests into one script (for each database) as we do currently.
    • Reconsider the mechanism of adding .pyd files to PYTHONPATH? It would be good to just use tox if possible. Besides, with PEP517, generating .pyd files might not be so easy in the future. Calling setup.py directly is discouraged now and much of setup.py might be moved into pyproject.toml anyway.
    • Tidy up the unit tests. They have evolved incrementally over many years and perhaps there's case for giving them a spring clean.
  • Codebase

    • Upgrade the C++ code to not use the legacy Unicode APIs (PyUnicode_AsUnicode, etc.) which have been removed from Python 3.12.
    • Create a Python-based top-level module that imports the C extensions? This would make the pyodbc API less opaque.

I'm no C++ developer, so I'll leave the codebase to others, but I'm happy to tackle many of the non-C++ ideas here.

@keitherskine
Copy link
Collaborator

Not sure where this comment should go, so I'll just add it here. I stumbled across the PyPi downloads database recently. Kinda interesting to see what systems are downloading which pyodbc files from PyPi. Here's some sample SQL you might want to take a look at:

SELECT file.project, file.version, file.filename, details.distro.name AS distro_name, details.system.name AS system_name, details.system.release AS system_release, count(*) as cnt
FROM `bigquery-public-data.pypi.file_downloads`
WHERE DATE(timestamp) = "2022-11-18"
  AND file.project = 'pyodbc'
  AND file.version = '4.0.35'
GROUP BY file.project, file.version, file.filename, details.distro.name, details.system.name, details.system.release
ORDER BY count(*) desc, file.project DESC, file.version DESC, file.filename, details.distro.name, details.system.name, details.system.release;

Looks like Azure is our biggest customer by far, on Ubuntu. Also, Linux downloads dominate, followed by Windows then Macs, not surprisingly, It's interesting to see which of the many wheels we publish are actually used.

@mkleehammer
Copy link
Owner Author

That is interesting. I'm surprised at how many source downloads there are for the same OS version that has wheels.

I was looking at something similar here: https://pypistats.org/packages/pyodbc

In particular I was surprised that there are almost 6K Python 2.7 downloads per day stil.

@mkleehammer
Copy link
Owner Author

I discarded my old v5 work and restarted a couple of days ago to slim it down. I've pushed it on branch py3. I think it is probably best to not use a MR until it gets to a baseline. In particular, I want to get rid of the Unicode APIs that are going to be removed in 3.12.

@keitherskine
Copy link
Collaborator

keitherskine commented Nov 25, 2022

Just a thought, but rather than dropping Python2.7 largely all in one go, we could break up this transition into incremental parcels of work, each step of which would maintain a working pyodbc codebase (whether deployed or not).

The first step could be to make a minimalistic PR that just shows pyodbc no longer supports Python 2.7 (or 3.6). That PR would just:

  • in setup.py, change python_requires to ">=3.7" and VERSION to "5.0.0"
  • remove Python 2.7 and Python 3.6 from .github\workflows\ubuntu_build.yml
  • remove Python 2.7 and Python 3.6 from appveyor.yml

For example, PR #1137 . That's all that would be needed to signify a break from Python 2, although if we wanted to do more to setup.py as described below, that would be fine too.

After that, there are various reasonably self-contained parcels of work, which could be done in roughly this order:

  1. Modernize setup.py by removing references to distutils (use setuptools instead) and remove Python2 references. Also, potentially: remove the git-based versioning(?), remove the cmdclass references(?). Lastly, we could explore moving the static info like "name" and "description" into pyproject.toml, but that's not at all necessary right now. Something for the future perhaps.

  2. Update the CI pipelines to use pytest when calling the unit test scripts. This work has been proposed in PRs 1129 use pytest in run_tests() #1129 and 1130 database-specific unit test folders, updated test scripts #1130. It includes moving database-specific tests into their own subdirectories within /tests3, e.g. /tests3/mysql/.

  3. Drop Python2.7 from the AppVeyor pipeline code. This probably involves just deleting appveyor\compile.cmd for the most part, although there are other Python2.7 references to be removed as well (included in database-specific unit test folders, updated test scripts #1130 ).

  4. Remove Python 2.7 from the C++ codebase. This involves stripping out all the PY_MAJOR_VERSION macros and discarding other Python2-specific constructions. A lot of work.

  5. Updating the C++ codebase to not use c-extension functions that are going to be removed in Python 3.12, as described in issue917 Deprecated PyUnicode functions that must be updated #917.

  6. Updating the C++ codebase to use only the stable ABI.

  7. Delete the /tests2 directory and rename the /tests3 directory to the more standard "/tests". There are various references to "tests3" in the repo that will need to be updated accordingly too.

  8. Update the /tests3 (or /tests) unit test scripts to use pytest-style tests rather than unittest-style tests. This might be as simple as just removing the unittest.TestCase class wrappers but I suspect this might be more complicated than that. Also, the test cases themselves could probably do with a thorough review.

I believe each of the above steps can be done separately, rather than taking a "big-bang" approach. After this Python3 transition, there is still more work to be done of course, like perhaps having a Python pyodbc front-end and a "_pyodbc" C++ backend, etc. But this transition to Python3-only would be a major step foward.

@mkleehammer mkleehammer added this to the 5.0 Python 3 Only milestone Apr 11, 2023
@mkleehammer
Copy link
Owner Author

I just built the py3 branch with a debug version of Python 3.12 and the PostgreSQL tests pass.

I did comment out the fast executemany code temporarily until I can port it also. I need to study it first and I'd really like to see if the two code paths could share the core binding code.

I also wonder if the feature should be renamed to "row wise binding" or "array binding" or something less generic than fast-executemany, mainly to give users an idea of what ODBC feature it is. Otherwise, people probably wonder why we don't just use "fast" and how they can determine if their DB supports it.

@ndmlny-qs
Copy link
Contributor

@mkleehammer & @keitherskine I'd like to start tackling some of the items discussed above as I have time to devote to helping the community that uses pyodbc. The first thing I would like to help with is transitioning tests to use pytest. I can also help with documentation, example usage, and possibly even adding a github pages about the project that builds API documentation.

Let me know what y'all think would be best for me to start with. You can also tag me on issues you want done.

@mkleehammer
Copy link
Owner Author

I've merged 5.0 into the master branch and released 5.0.0 alpha 2. Please see the discussion here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: No status
Development

No branches or pull requests

4 participants