Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float gets parsed as string #8

Closed
RuABraun opened this issue Oct 18, 2021 · 8 comments · Fixed by #17
Closed

Float gets parsed as string #8

RuABraun opened this issue Oct 18, 2021 · 8 comments · Fixed by #17

Comments

@RuABraun
Copy link

RuABraun commented Oct 18, 2021

Hi, example

import os
from hyperpyyaml import load_hyperpyyaml

def test(val):
    if os.path.exists('bla'):
        os.remove('bla')
    fh = open('bla', 'a+')
    fh.write('bla: 0.0001\n')
    fh.seek(0)
    dct = {'bla': val}
    params = load_hyperpyyaml(fh, overrides=dct)
    print(type(params['bla']))

def main():
    test(1e-4)
    test(1e-6)

main()

prints

<class 'float'>
<class 'str'>

can't figure out why this is happening myself..?

@RuABraun RuABraun changed the title Float gets parsed as integer Float gets parsed as string Oct 18, 2021
@Gastron
Copy link
Contributor

Gastron commented Nov 1, 2021

I think this must come from the underlying YAML parsers.

Is it just the E notation floats, though? I think the immediate solution is to just switch to the decimal notation unfortunately.

@RuABraun
Copy link
Author

RuABraun commented Nov 5, 2021

yeah it's just the E notation. Oh well, okay.

@RuABraun RuABraun closed this as completed Nov 5, 2021
@RuABraun
Copy link
Author

RuABraun commented Nov 6, 2021

Oh wait @Gastron it also happens with 0.00001 and 0.000001, but not 0.0001

@RuABraun RuABraun reopened this Nov 6, 2021
@Gastron
Copy link
Contributor

Gastron commented Nov 8, 2021

Whoa. I was also able to reproduce with 0.00001, and that it works fine with 0.0001.

I did some digging and the cause is a little convoluted. This happens because the YAML gets loaded twice: once with ruamel.yaml, then the overloads get added, and then the whole thing gets dumped back as YAML, and loaded again with "bells and whistles" with PyYAML. The floats, when they are dumped out as YAML, probably get their basic Python string representation, which turns to scientific (E) notation at 0.00001.

And it is apparently a known issue that PyYAML does not parse the scientific notation right, see this issue: yaml/pyyaml#173

There's probably something we could do to the floats to dump them in the right format, or we could maybe patch the PyYAML float matching, see this nice StackOverflow answer

@larsrgit
Copy link

larsrgit commented Nov 16, 2021

This should be possible to fix in hyperpyyaml by adding a custom representer for floats. The default representer already has the needed lines of code

if getattr(self.serializer, 'use_version', None) == (1, 1):
    if '.' not in value and 'e' in value:
        # Note that in some cases `repr(data)` represents a float number
        # without the decimal parts.  For instance:
        #   >>> repr(1e17)
        #   '1e17'
        # Unfortunately, this is not a valid float representation according
        # to the definition of the `!!float` tag in YAML 1.1.  We fix
        # this by adding '.0' before the 'e' symbol.
        value = value.replace('e', '.0e', 1)

but this part is not executed since getattr(self.serializer, 'use_version', None) == (1, 1) does not return True.

It can therefore also be fixed by setting the version of the ruamel_yaml object which dumps after resolving references and changing overrides in resolve_references() by adding ruamel_yaml.version = (1, 1) after the ruamel_yaml object is initialised. I tested this on my code, and it works, but I'm not sure if changing the version has other negative effects.

@Gastron
Copy link
Contributor

Gastron commented Nov 16, 2021

Thanks @larsrgit, that's an interesting way to fix it! I wonder if this leaves out some cases where overrides get injected; they're not parsed by ruamel, and I am not sure if they get represented by this code or not. I'd imagine that patching the PyYAM-side parsing might be more robust.

@larsrgit
Copy link

larsrgit commented Nov 16, 2021

In my understanding of the hyperpyyaml code this should handle all cases where the e notation is introduced by the intermediate dumping.
So it would handle all cases where the user does not specify his variables (in the yaml or in overrides) in e notation without the dot. This way the user would at least have the full control.

But you are right, patching the pyyaml parsing during loading would be better in the way that it would also work if the user does not know that pyyaml requires the e-notation to be with a dot.

@pplantinga
Copy link
Collaborator

This stack overflow answer should provide a template for solving this if anyone is interested https://stackoverflow.com/a/30462009

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants