Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster fact creation #89

Merged
merged 21 commits into from
Mar 1, 2022
Merged

Faster fact creation #89

merged 21 commits into from
Mar 1, 2022

Conversation

daveraja
Copy link
Collaborator

Defines the Predicate.init() by custom generating a string version for each predicate type and then exec() it to get the bytecode. Some other performance related tweaks.

Set version to 1.4.0 because of small API tweaks.

Removed old usage of raw in the initialisation. Now raw should only be used for internal API purposes.
Cleaning up of a few other old things (like the field.unifies() method which served no useful purpose).

@daveraja daveraja mentioned this pull request Feb 26, 2022
@daveraja
Copy link
Collaborator Author

Added the class factory function Predicate._unify() to also be dynamically generated which speeds up unification.

Copy link
Collaborator

@florianfischer91 florianfischer91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite nice implementation and good performance improvements :-) (of course it adds a bit more complexity because of building the template)

Sry for the many comments but most of them are just some questions or small suggested changes^^
I have also two general questions:

  • Do you think it would be helpfull to create a separate attribute default_factory for BaseField so it will make handling callables a bit easier and is more similar to dataclass.py?
  • What about moving the boilerplate code for creating the methods into a separate file? (this is just a personal preference but core.py gets always longer and longer...) Or is it even not possible because of circular imports?

clorm/orm/core.py Outdated Show resolved Hide resolved
clorm/orm/core.py Outdated Show resolved Hide resolved
AnySymbol = Union[clingo.Symbol,noclingo.Symbol]

def _get_symbols():
return symbols
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you created a function here and not calling symbols directly? (Ah perhaps, because symbols can change when you change the symbol mode?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

But it is ugly and I'm going to change this. The way I've integrated the noclingo stuff is not clean so I'm going to try to make it a bit more self-contained within the noclingo module. The idea is that noclingo.Function(), noclingo.String(), etc will internally decide what type of symbol to create based on the mode so all the indirection is contained within these functions and we don't need hackiness like _get_symbols(). Which will also make it easier if/when noclingo can be removed.

Copy link
Collaborator Author

@daveraja daveraja Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed most of the hackiness in core.py of dealing with noclingo (such as the _get_symbols() function). However, I think there is still one lingering issue.

Now, the overhead introduced by the flexibility of noclingo is confined to noclingo.py and it is simply to add an indirection when calling the various symbol creating functions. This overhead is pretty small and you can only see the performance difference when you look at averages over multiple runs of the test code. But it is still annoying because there are many/most occasions when you don't need this functionality so why have the overhead?.

I've added some if statements in the module that could enable/disable noclingo when the module is loaded. I think the default should be to have it disabled. But I'm not sure what's the most pythonic way to trigger this. You would need to set some flag before the module is loaded to tell it to enable noclingo. Or potentially do it using a environment variable, in a similar way to PYTHONHASHSEED.

Any thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using something like a context-manager if you want to use the noclingo-mode?
I'm not sure if it is possible but you could switch to the noclingo-mode when entering the ctxt-manager and switch back to clingo-mode when leaving the ctxt-manager?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea, and might be a good way to switch to noclingo-mode. But it's not what I had in mind here, where I want to completely disable it, so there is no way to switch modes. We could get a small performance improvement because we can remove the indirection of the various symbol creation functions.

Anyway, for the moment I won't disable it and will just merge the branch. Can revisit this later.

clorm/orm/core.py Outdated Show resolved Hide resolved
clorm/orm/core.py Outdated Show resolved Hide resolved
clorm/orm/core.py Outdated Show resolved Hide resolved
clorm/orm/core.py Outdated Show resolved Hide resolved
clorm/orm/core.py Outdated Show resolved Hide resolved
clorm/orm/core.py Outdated Show resolved Hide resolved
clorm/orm/core.py Show resolved Hide resolved
@daveraja
Copy link
Collaborator Author

Quite nice implementation and good performance improvements :-) (of course it adds a bit more complexity because of building the template)

Sry for the many comments but most of them are just some questions or small suggested changes^^ I have also two general questions:

* Do you think it would be helpfull to create a separate attribute `default_factory` for BaseField so it will make handling callables a bit easier and is more similar to dataclass.py?

* What about moving the boilerplate code for creating the methods into a separate file? (this is just a personal preference but core.py gets always longer and longer...) Or is it even not possible because of circular imports?

I'm not familiar with dataclass default_factory and callables. I'll look into it.

I do agree that core.py is getting too big. I've moved some of the template stuff into a separate file templating.py, but kept the main construction function in core as it relies on things that are declared in core.py and would mean passing a lot parameters into the function.

I'm very tempted to look at Jinja2 to see if it could be used to make a really clean template and simpify that generator function. But I really don't want to add the dependency. One thing I don't understand is that Python does have a standard Template class (https://docs.python.org/3/library/string.html#string.Template) but it seems so limited I don't see the point of it.

Longer term I think a lot of the define_XXX helper functions for defining new fields can be moved out of core into something like extra_fields.py

Predicate._unify(X) is available for internal use and users should use the API
unify() functions.
For each Predicate sub-class a custom init function is generated as a string
and then turned into Python bytecode by calling exec(). This creates a much
leaner and faster initialisation, since as much work as possible is done when
defining the init function and not have to be built into the logic of the init
function itself.
For clorm tuple we need to coerce any normal tuple (or some other clorm tuple)
to the correct type. If the value is not a clorm tuple then it must be of the
correct type.
Fixes based on @florianfischer91 comments. Especially good to remove the raw
paramater from Predicate __init__.
This works for any field that has a non-callable default
I think this is the simplest message but maybe a more descriptive message could
be better.
NOCLINGO adds some small overhead when creating symbols. We can remove this
overhead and disable NOCLINGO, by adding some conditional statements when
loading the module. But I'm not sure the best way to use this; should it be
monkey patching or some global variable?
I think this was not possible for clingo 5.4 because it didn't properly return
NotImplemented.
Also added some parts to enable noclingo. But it is still not working as
expected so noclingo is still enabled by default.
This improves performance of creating facts
@daveraja daveraja marked this pull request as ready for review March 1, 2022 00:52
@daveraja daveraja merged commit 45e0dd5 into master Mar 1, 2022
@daveraja daveraja deleted the faster_fact_creation branch March 2, 2022 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants