Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/sql param data memory leak #703

Merged
merged 12 commits into from May 13, 2022

Conversation

Mizaro
Copy link
Contributor

@Mizaro Mizaro commented Feb 8, 2020

Fixing "Memory leak after getting [42000][ODBC Driver 17 for SQL Server] (1105) (SQLParamData) Error #702"

@mkleehammer
Copy link
Owner

I've posted a question to the issue.

@Mizaro
Copy link
Contributor Author

Mizaro commented Feb 10, 2020

I have answered and changed the code, can you please take another look?

@Mizaro
Copy link
Contributor Author

Mizaro commented Mar 14, 2020

@mkleehammer

@Mizaro
Copy link
Contributor Author

Mizaro commented Apr 10, 2020

Waiting for your response.

@Mizaro
Copy link
Contributor Author

Mizaro commented Jun 20, 2020

@mkleehammer

Copy link
Contributor Author

@Mizaro Mizaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed my changes. Seems OK to me.

@Mizaro Mizaro force-pushed the bugfix/SQLParamData_MemoryLeak branch from aa9c8b8 to 1290726 Compare January 22, 2021 15:58
@gordthompson
Copy link
Collaborator

@mkleehammer – I've just had another look at this and using test code

from time import time
import psutil
import pyodbc

print(f"pyodbc {pyodbc.version}")
cnxn = pyodbc.connect("DSN=mssqlLocal", autocommit=True)
process = psutil.Process()


def print_status(msg, t0=None):
    s = f"{msg}: "
    mb = process.memory_info().vms / 1048576
    s += f"vms {mb:0.1f} MiB"
    if t0:
        s += f", {(time() - t0):0.1f} sec."
    print(s)


print_status("startup")
num_chars = 10_000_000
data = (1, "x" * num_chars, "2012-02-31")
print_status("data loaded")

table_name = "pd_test"
col_names = ["id", "txt_col", "dtm_col"]
ins_sql = f"INSERT INTO {table_name} ({','.join(col_names)}) VALUES ({','.join('?' * len(col_names))})"
for iteration in range(5):
    t0 = time()
    crsr = cnxn.cursor()
    crsr.execute(f"DROP TABLE IF EXISTS {table_name}")
    crsr.execute(
        f"CREATE TABLE {table_name} (id int, txt_col varchar(max), dtm_col datetime2)"
    )
    try:
        crsr.execute(ins_sql, data)
    except pyodbc.DataError:
        # (Feb 31 is an invalid date)
        pass
    crsr.close()
    print_status(f"iteration {iteration}", t0)

with python -m pip install git+https://github.com/mkleehammer/pyodbc I get

pyodbc 4.0.31b56
startup: vms 6.4 MiB
data loaded: vms 15.9 MiB
iteration 0: vms 35.1 MiB, 0.9 sec.
iteration 1: vms 54.2 MiB, 1.0 sec.
iteration 2: vms 73.3 MiB, 1.0 sec.
iteration 3: vms 92.4 MiB, 1.0 sec.
iteration 4: vms 111.5 MiB, 1.0 sec.

and with python -m pip install git+https://github.com/Mizaro/pyodbc@bugfix/SQLParamData_MemoryLeak I get

pyodbc 4.0.29b72+bugfix/SQLParamData_MemoryLeak
startup: vms 6.4 MiB
data loaded: vms 15.9 MiB
iteration 0: vms 16.0 MiB, 1.0 sec.
iteration 1: vms 16.0 MiB, 1.0 sec.
iteration 2: vms 16.0 MiB, 1.0 sec.
iteration 3: vms 16.0 MiB, 1.0 sec.
iteration 4: vms 16.0 MiB, 1.0 sec.

Can we merge this one now?

@Mizaro
Copy link
Contributor Author

Mizaro commented Jun 25, 2021

@gordthompson Sorry for disappearing for a while.

Is there something I can do to promote this PR?

@mkleehammer
Copy link
Owner

Was it decided that this is no longer needed because #832 fixes the leak? I'm not sure swapping the two free calls would make a difference as they are not coupled.

@gordthompson
Copy link
Collaborator

gordthompson commented Mar 27, 2022

@mkleehammer - #832 was merged by 716572a. When I run my test code above against the current master branch the leak is still there:

pyodbc 4.0.33b8
startup: vms 37.4 MiB
data loaded: vms 47.0 MiB
iteration 0: vms 66.0 MiB, 28.9 sec.
iteration 1: vms 85.1 MiB, 18.6 sec.
iteration 2: vms 104.2 MiB, 16.7 sec.
iteration 3: vms 123.3 MiB, 18.0 sec.
iteration 4: vms 142.3 MiB, 13.1 sec.

so this fixes a different leak. (This relates to .execute() while #832 was a fast_executemany issue.)

@mkleehammer
Copy link
Owner

I ran the test above with slight modifications for Linux & Postgres, from commit 4a7c583, and I'm not seeing any leaks. I'll have to test on my Windows laptop.

I still don't see how swapping those two lines could affect anything. I'm concerned we don't actually know the cause.

Note that it could be a bug in the driver, so we do need to track those versions too.

@gordthompson
Copy link
Collaborator

@mkleehammer - FWIW, my test earlier today was also on Linux.

Just tested again with PostgreSQL Unicode driver. Xubuntu 20.04, Python 3.8.10, unixODBC 2.3.7.

from time import perf_counter
import psutil
import pyodbc

print(f"{pyodbc.version=}")
connection_string = (
    "Driver=PostgreSQL Unicode;"
    "Server=192.168.0.199;"
    "Database=test;"
    "UID=scott;PWD=tiger;"
)
cnxn = pyodbc.connect(connection_string, autocommit=True)
print(f"{cnxn.getinfo(pyodbc.SQL_DRIVER_NAME)=}")
print(f"{cnxn.getinfo(pyodbc.SQL_DRIVER_VER)=}")
process = psutil.Process()


def print_status(msg, t0=None):
    s = f"{msg}: "
    mb = process.memory_info().vms / 1048576
    s += f"vms {mb:0.1f} MiB"
    if t0:
        s += f", {(perf_counter() - t0):0.1f} sec."
    print(s)


print_status("startup")
num_chars = 10_000_000
data = (1, "x" * num_chars, "2012-02-31")
print_status("data loaded")

table_name = "pd_test"
col_names = ["id", "txt_col", "dtm_col"]
ins_sql = f"INSERT INTO {table_name} ({','.join(col_names)}) VALUES ({','.join('?' * len(col_names))})"
for iteration in range(5):
    t0 = perf_counter()
    crsr = cnxn.cursor()
    crsr.execute(f"DROP TABLE IF EXISTS {table_name}")
    crsr.execute(
        f"CREATE TABLE {table_name} (id int, txt_col text, dtm_col timestamp)"
    )
    try:
        crsr.execute(ins_sql, data)
    except pyodbc.DataError:
        # (Feb 31 is an invalid date)
        pass
    crsr.close()
    print_status(f"iteration {iteration}")

With current master (e67ad89):

pyodbc.version='4.0.33b26'
cnxn.getinfo(pyodbc.SQL_DRIVER_NAME)='psqlodbcw.so'
cnxn.getinfo(pyodbc.SQL_DRIVER_VER)='12.01.0000'
startup: vms 43.9 MiB
data loaded: vms 53.4 MiB
iteration 0: vms 88.5 MiB
iteration 1: vms 133.1 MiB
iteration 2: vms 152.1 MiB
iteration 3: vms 171.2 MiB
iteration 4: vms 190.3 MiB

With Mizaro's patch (850a8dd):

pyodbc.version='4.0.31b63+bugfix/SQLParamData_MemoryLeak'
cnxn.getinfo(pyodbc.SQL_DRIVER_NAME)='psqlodbcw.so'
cnxn.getinfo(pyodbc.SQL_DRIVER_VER)='12.01.0000'
startup: vms 43.9 MiB
data loaded: vms 53.4 MiB
iteration 0: vms 69.4 MiB
iteration 1: vms 69.4 MiB
iteration 2: vms 69.4 MiB
iteration 3: vms 69.4 MiB
iteration 4: vms 69.4 MiB

@Mizaro
Copy link
Contributor Author

Mizaro commented Mar 28, 2022

Was it decided that this is no longer needed because #832 fixes the leak? I'm not sure swapping the two free calls would make a difference as they are not coupled.

@mkleehammer This is a different issue, that maybe only looks alike.

@gordthompson Thanks for retesting it! I appreciate it.

I ran the test above with slight modifications for Linux & Postgres, from commit 4a7c583, and I'm not seeing any leaks. I'll have to test on my Windows laptop.

I still don't see how swapping those two lines could affect anything. I'm concerned we don't actually know the cause.

Note that it could be a bug in the driver, so we do need to track those versions too.

About how swapping those two lines can help.

FreeParameterInfo sets cur->paramcount to zero, while FreeParameterData calls FreeInfos with cur->paramcount.
As far as I remember, when swapping the lines, cur->paramInfos are being freed without a memory leak.

We can also see my explanation here #702 (comment),

Variable cur->paramInfos that is passed to ParamInfo* a is not null and holds an object in memory.

@Mizaro
Copy link
Contributor Author

Mizaro commented Apr 8, 2022

@mkleehammer, @gordthompson Is there a plan to merge this?

@gordthompson gordthompson self-requested a review April 10, 2022 18:16
Copy link
Collaborator

@gordthompson gordthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@gordthompson gordthompson merged commit a4b0b75 into mkleehammer:master May 13, 2022
v-makouz pushed a commit to v-makouz/pyodbc that referenced this pull request Sep 9, 2022
* Add support for Python 3.10, drop EOL 3.5 (mkleehammer#952)

* Remove duplicate entry in pyi stub (mkleehammer#979)

* Replace deprecated SafeConfigParser with ConfigParser (mkleehammer#953)

* Designate connection string as optional (mkleehammer#987)

* Fix spelling typos (mkleehammer#985)

Co-authored-by: Gord Thompson <gord@gordthompson.com>

* Fix for DSN Names with non-ASCII chars (mkleehammer#951)

* Fix for DSN Names with non-ASCII chars

Fixes: mkleehammer#948

Co-authored-by: bamboo <bamboo@galaxy.ad>
Co-authored-by: Gord Thompson <gordon.d.thompson@gmail.com>

* Added InterfaceError to pyodbc.pyi. (mkleehammer#1013)

Co-authored-by: Benjamin Holder <bholder@rpagency.com>

* Upgrade deprecated unicode encoding calls (mkleehammer#792)

* Do not include .pyc artifacts in source tarball mkleehammer#742

* Build wheels with cibuildwheels on GitHub Actions

Fixes mkleehammer#175
Ref mkleehammer#688
Closes mkleehammer#668
Closes mkleehammer#685
Fixes mkleehammer#441 and pretty much most issues that mention
` sql.h: No such file or directory`

This also need to setup some PyPI keys for automated uploads.

* Install unixodbc-dev for Linux wheels

* Enable GitHub Actions for pull requests

* Use Debian based `manylinux_2_24` image

* `apt-get` update before installing in wheel build

* Use PEP 440 version name required for wheels

* Skip building 32-bit wheels

* 4.0.dev0 for default version, because test_version() wants 3 parts here

Checked this won't shadow released minor version (credit goes to @hugovk)

    >>> from packaging.version import Version
    >>> Version("4.0.dev0") > Version("4.0.24")
    False

* Had to use Debian image for PyPy too

* Disable PyPy wheels

https://cibuildwheel.readthedocs.io/en/stable/options/#build-selection

PyPy is missing some C functions that `pyodbc` needs.

* Update README.md

* Avoid error when testing with DSN= connection

Fixes: mkleehammer#1000

* Disable setencoding/setdecoding in tests3/pgtests.py

Fixes: mkleehammer#1004

* Adjust test_columns() in tests3/pgtests.py for newer driver versions

Fixes: mkleehammer#1003

* Move driver version check out of function

* Add comment to _get_column_size()

* Fix memory leak with decimal parameters

Fixes: mkleehammer#1026

* Create codeql-analysis.yml

* Bugfix/sql param data memory leak (mkleehammer#703)

* Updated .gitignore

* * Created a test file for the specific scenario

* * Updated doc of test file for the specific SQLParamData scenario

* * Fixed the test file for the specific SQLParamData scenario by Py_XDECREF the PyObject with 1 reference.

* * Improved the test to close the cursor and set it to None, then forcing the gc

* * Changed the fix of the memory leak and updated the test.

* * Removed redundant empty line

* * Converted tabs to spaces

* * Moved variable out of conn's scope

* Update gitignore, remove duplicated

* Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (mkleehammer#998)

Current versions of Python write a deprecation warning message to
stderr, which breaks CGI scripts running under web servers which
fold stderr into stdout. Likely breaks other software. This change
replaces the deprecated calls with PyUnicode_New(size, maxchar).
The accompanying code to populate the new objects has also been
rewritten to use the new PyUnicode APIs.

* Making pyodbc compatible with PostgreSQL infinity dates, returning MINYEAR and MAXYEAR to python, instead of values out of python's limits

* Removing autoformat from code

* Removing autoformat from code

* Add odbc_config support on mac and m1 homebrew dir

* Note EOL of 2.7 support in README (mkleehammer#945)

* Fix version of CI generated wheels

The CI system is checking out exact tags like "git checkout 4.0.33", which results in a
detached HEAD.  The version calculation was adding the commit hash.

* Fix for mkleehammer#1082 libraries in Linux wheels (mkleehammer#1084)

* use argparse instead of optparse (mkleehammer#1089)

Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Alex Nelson <alexander.nelson@nist.gov>
Co-authored-by: Kian Meng, Ang <kianmeng.ang@gmail.com>
Co-authored-by: Gord Thompson <gord@gordthompson.com>
Co-authored-by: bamboo <bamboo@galaxy.ad>
Co-authored-by: Gord Thompson <gordon.d.thompson@gmail.com>
Co-authored-by: bdholder <benjamin.holder@rocketmail.com>
Co-authored-by: Benjamin Holder <bholder@rpagency.com>
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
Co-authored-by: Michael Fladischer <FladischerMichael@fladi.at>
Co-authored-by: Anatoli Babenia <anatoli@rainforce.org>
Co-authored-by: Francisco Morales <51379487+jose598@users.noreply.github.com>
Co-authored-by: Gord Thompson <gordthompson@users.noreply.github.com>
Co-authored-by: Michael Kleehammer <michael@kleehammer.com>
Co-authored-by: Gilad Leifman <leifmangilad@gmail.com>
Co-authored-by: Bob Kline <bkline@rksystems.com>
Co-authored-by: Leandro Scott <lsrzj@yahoo.com>
Co-authored-by: Jordan Mendelson <jordan@zenzen.org>
Co-authored-by: Keith Erskine <toastie604@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak after getting [42000][ODBC Driver 17 for SQL Server] (1105) (SQLParamData) Error
3 participants