Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot handle None in custom data dumper #377

Open
sakishrist opened this issue Sep 16, 2022 · 7 comments · May be fixed by #642
Open

Cannot handle None in custom data dumper #377

sakishrist opened this issue Sep 16, 2022 · 7 comments · May be fixed by #642
Milestone

Comments

@sakishrist
Copy link

sakishrist commented Sep 16, 2022

Python: 3.10.6
Psycopg: 3.0.16

I was handling dates with a custom Loader and Dumper with psycopg3:

class ExceptionDateDumper(DateDumper):
    def dump(self, obj):

        year = obj.year
        month = obj.month
        day = obj.day

        if year is None and month is None and day is None:
            return super().dump(None)

        if year < 0:
            bc = " BC"
            year*=-1
        else:
            bc = ""

        day = 1 if day is None else day
        month = 1 if month is None else month

        return bytes(f'{year:04}-{month:02}-{day:02}{bc}', encoding='utf-8')

I know the code is kinda sloppy, but the problem is that I cannot give None to super().dump()

Also if I do return b'\N' instead which should be the Postgres Null string, it gets converted to \\N when given to the DB (the rest of the null values that I do not try to mess with are still just \N so are handled fine by Postgres). So I assume it tries to escape my value that I pass while I though writing a Dumper like this would not try to do any further processing.

I also tried with \0 for good measure, but of course that didn't work.

In the end I ended up doing this:

#SQLDate is my custom type that I registered with `register_dumper`
date = None if year is None and month is None and day is None else SQLDate(year, month, day)

# Three dots just means more arguments
copy.write_row(   (..., date, ...)   )

I don't like this approach though as I handle None outside the DateDumper.

If I have missed something, hopefully with your help we can update the documentation to make this more clear how it should be handled.

Thank you so much for making this project! :)


EDIT:

I also have a __nonzero__ method in the custom type, but it does not change the handling.

class SQLDate:
    def __init__(self, year, month, day):
        self.year = year
        self.month = month
        self.day = day
    
    def __str__(self):
        return f'SQLDate({self.year:04}-{self.month:02}-{self.day:02})'

    def __nonzero__(self):
        if self.year is None and self.month is None and self.day is None:
            return False
        else:
            return True
@dvarrazzo
Copy link
Member

Hello,

By design, dumpers only dump bytes, and None is the only thing that could be supposedly passed as NULL.

Said that... I think that there is no problem actually dumping NULL from a non-None object. I don't know about the COPY path but in the query path it should mostly work.

The types annotations are definitely wrong throughout the codebase, but we can make some experiments and enlarge what dumpers can return. The way I envision it is that a dumper would just return None.

If you return None in your dumper, does it work? If ton, what death does it die of?

@sakishrist
Copy link
Author

I don't know about the COPY path but in the query path it should mostly work.

I am not sure what those paths are actually 😓

As for your suggestion, if I got it right, it seems it doesn't like the return None at the moment. Is this how you meant it?

class ExceptionDateDumper(DateDumper): # From Python to DB
    def dump(self, obj):

        year = obj.year
        month = obj.month
        day = obj.day

        if year is None and month is None and day is None:
            return None

        if year < 0:
            bc = " BC"
            year*=-1
        else:
            bc = ""

        day = 1 if day is None else day
        month = 1 if month is None else month

        return bytes(f'{year:04}-{month:02}-{day:02}{bc}', encoding='utf-8')
Traceback (most recent call last):
  File ".venv/lib/python3.10/site-packages/psycopg/cursor.py", line 891, in copy
    yield copy
  File "artists/ensembles.py", line 59, in migrate
    copy.write_row(
  File ".venv/lib/python3.10/site-packages/psycopg/copy.py", line 307, in write_row
    data = self.formatter.write_row(row)
  File ".venv/lib/python3.10/site-packages/psycopg/copy.py", line 696, in write_row
    format_row_text(row, self.transformer, self._write_buffer)
  File ".venv/lib/python3.10/site-packages/psycopg/copy.py", line 805, in _format_row_text
    out += _dump_re.sub(_dump_sub, b)
TypeError: expected string or bytes-like object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 49, in <module>
    run()
...
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File ".venv/lib/python3.10/site-packages/psycopg/cursor.py", line 893, in copy
    raise ex.with_traceback(None)
psycopg.errors.QueryCanceled: COPY from stdin failed: error from Python: TypeError - expected string or bytes-like object
CONTEXT:  COPY ensembles, line 3

I had to skip some parts from the trace, so if it's not useful like that, I will try to isolate the case in a completely standalone project where it will be easier to clone and test from you too.

Also tried to override the quote method from DateDumper so that it returns the obj it receives without doing anything to it, but it seems it's not there that the additional \ comes from as it never got called when I overrode it.

And tried inheriting directly Dumper in case DateDumper does indeed something fancy, but same result.

@sakishrist
Copy link
Author

Maybe this is good enough for demonstration and testing?

https://github.com/sakishrist/psycopg-dumper-return-none

@dvarrazzo
Copy link
Member

dvarrazzo commented Sep 18, 2022

As for your suggestion, if I got it right, it seems it doesn't like the return None at the moment. Is this how you meant it?

Yes, this is the way I think a dumper should be written, but yes, I understand it doesn't work :) Managing None as a return value for dump() is something to be tested.

Indications where things may break can be obtained by changing the dump() return value:

def dump(self, obj: Any) -> Buffer:

to be Optional[buffer] and see where mypy complains. Then tests for queries and copy operations should be added.

@dvarrazzo
Copy link
Member

Maybe this is good enough for demonstration and testing?

Even something simpler, but to be added to the test suite. For instance testing with a string dumper subclass dumping the empty string to NULL, or a date dumper subclass dumping all 2020 dates to NULL.

@dvarrazzo dvarrazzo added this to the 3.2 milestone Sep 28, 2022
@canhuynh1998
Copy link

Hi @dvarrazzo, can I try working on this feature?

@dvarrazzo dvarrazzo linked a pull request Sep 17, 2023 that will close this issue
@dvarrazzo
Copy link
Member

Hi @canhuynh1998

I started working on this feature in the none-returning-dumper branch. This was done a few months ago so the branch is likely stale. I've now opened MR #642.

If you want to complete this work, please rebase this branch on master and add the missing parts (for instance, this branch touches all the dumpers, but some dumpers were added afterwards, e.g. numpy dumpers). Other problems I see immediately in the CI run triggered by the MR (black failing, C extension not compiling) are probably solved by changes currently in master so they might be fixed by rebasing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants