Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base expression node types on dataclasses #125

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Base expression node types on dataclasses #125

wants to merge 1 commit into from

Conversation

inducer
Copy link
Owner

@inducer inducer commented Jan 13, 2023

No description provided.

@inducer
Copy link
Owner Author

inducer commented Jan 13, 2023

Surprisingly, this seems to somewhat work. (Current CI notwithstanding) @kaushikcfd @alexfikl @isuruf What do you think?

@inducer
Copy link
Owner Author

inducer commented Jan 13, 2023

For context, this happened because I was working on inducer/pytato#393 and had a need to have the persistent hash key builder just "understand" expression values. Loopy has some absurd hack job to make this possible, but I wasn't itching to replicate it in pytato. This seemed nicer.

pymbolic/primitives.py Show resolved Hide resolved
pymbolic/primitives.py Outdated Show resolved Hide resolved
pymbolic/primitives.py Show resolved Hide resolved
setup.py Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@inducer inducer changed the title Base expressions on attrs classes Base expression node types on attrs classes Jan 13, 2023
@inducer inducer changed the title Base expression node types on attrs classes Base expression node types on attrs Jan 13, 2023
@alexfikl
Copy link
Collaborator

Generally seems like a good idea to me!

One thing that doesn't quite spark joy is having to remember which classes are dataclass and which are attrs in more places :( But it's hard to argue with the hash caching. I found the hash to show up in quite a few profiles, so making that fast would have priority for me too.

@inducer
Copy link
Owner Author

inducer commented Jan 13, 2023

I'm guessing I may revert this to dataclasses, for exactly this reason. Not sure.

@inducer inducer changed the title Base expression node types on attrs Base expression node types on dataclasses Jun 2, 2023
@inducer
Copy link
Owner Author

inducer commented Jun 2, 2023

I've changed this to dataclasses, and the tests in pymbolic itself seem to be passing. We'll see how the downstreams do.

@inducer
Copy link
Owner Author

inducer commented Jun 2, 2023

Most everything seems to be working, except pytential has a puzzling failure.

Copy link
Collaborator

@alexfikl alexfikl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked a bit through the code to see if anything jumps out that could cause the pytential errors (in Beltrami stuff again 😢) and left some nitpicks!

The only thing (?) that the Beltrami operators do a bit differently is that they have nested IntG expressions, but I couldn't find any reason this would change how those are handled..

EDIT: Found a simpler case to run (uses includes from test_beltrami.py)

def test_nested_potential(actx_factory):
    logging.basicConfig(level=logging.INFO)
    actx = actx_factory()

    case = eid.CircleTestCase(
            target_order=5,
            qbx_order=5,
            source_ovsmp=4,
            resolutions=[32, 64, 96, 128],
            # FIXME: FMM should not be slower!
            fmm_order=False, fmm_backend=None,
            radius=1.0
            )

    from pytential import GeometryCollection
    qbx = case.get_layer_potential(actx, 32, case.target_order)
    places = GeometryCollection(qbx, auto_where=case.name)

    from sumpy.kernel import LaplaceKernel
    kernel = LaplaceKernel(2)

    sym_sigma = sym.var("sigma")
    sym_op = sym.D(kernel,
                   sym.D(kernel, sym_sigma, qbx_forced_limit="avg"),
                   qbx_forced_limit="avg")

    density_discr = places.get_discretization(case.name)
    sigma = actx.thaw(density_discr.nodes()[0])
    r = bind(places, sym_op)(actx, sigma=sigma)

Tried S(Spp + Dp) and S(kappa * S) and S(W(S)) (the other terms in the Laplace-Beltrami operator) and the D(D) one was the only one that failed.

pymbolic/primitives.py Outdated Show resolved Hide resolved
pymbolic/primitives.py Show resolved Hide resolved
pymbolic/primitives.py Outdated Show resolved Hide resolved
pymbolic/primitives.py Outdated Show resolved Hide resolved
pymbolic/primitives.py Show resolved Hide resolved
@inducer
Copy link
Owner Author

inducer commented Jun 4, 2023

Thanks for finding the smaller reproducer! That helped quite a bit. The issue was the missing rewriting from None to cse_scope.EVALUATION in CommonSubexpression. It also helped expose inducer/pytential#209, which, as usual, the one issue you thought you had, is actually two. 🙂

@inducer inducer enabled auto-merge (rebase) June 4, 2023 01:13
@inducer
Copy link
Owner Author

inducer commented Jun 4, 2023

Super weird. I wonder why the pytential run is getting canceled. Out of memory for some reason? If so, why would this PR make a difference?

@inducer
Copy link
Owner Author

inducer commented Jun 4, 2023

@inducer
Copy link
Owner Author

inducer commented Jun 5, 2023

Tried to investigate more. Here's the time profile of the pytential tests, first with this patch:

============================================================================================================ slowest 10 durations =============================================================================================================
1357.39s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case3]
1355.53s call     test/test_beltrami.py::test_beltrami_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-operator7-solution7]
914.25s call     test/test_stokes.py::test_exterior_stokes[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-3]
746.09s call     test/test_beltrami.py::test_beltrami_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-operator6-solution6]
724.49s call     test/test_scalar_int_eq.py::test_integral_equation[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case9]
685.49s call     test/test_layer_pot_identity.py::test_identity_convergence_slow[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
640.87s call     test/test_maxwell.py::test_pec_mfie_extinction[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
615.13s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
499.75s call     test/test_layer_pot_identity.py::test_identity_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case1]
264.22s call     test/test_layer_pot_eigenvalues.py::test_ellipse_eigenvalues[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-1-5-3-False]
=========================================================================================================== short test summary info ===========================================================================================================
SKIPPED [1] test_linalg_proxy.py:200: 3d partitioning requires a tree
========================================================================================= 237 passed, 1 skipped, 16198 warnings in 4015.96s (1:06:55) =========================================================================================

and then without:

==================================================================================================================================================================================================================================== slowest 10 durations =====================================================================================================================================================================================================================================
2185.13s call     test/test_stokes.py::test_exterior_stokes[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-3]
2146.56s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case3]
1560.74s call     test/test_beltrami.py::test_beltrami_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-operator7-solution7]
912.97s call     test/test_beltrami.py::test_beltrami_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-operator6-solution6]
786.50s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
631.47s call     test/test_maxwell.py::test_pec_mfie_extinction[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
421.05s call     test/test_scalar_int_eq.py::test_integral_equation[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case9]
265.36s call     test/test_beltrami.py::test_beltrami_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-operator5-solution5]
231.81s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case1]
188.92s call     test/test_scalar_int_eq.py::test_integral_equation[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case10]
=================================================================================================================================================================================================================================== short test summary info ===================================================================================================================================================================================================================================
SKIPPED [1] test_linalg_proxy.py:200: 3d partitioning requires a tree
================================================================================================================================================================================================================= 237 passed, 1 skipped, 1916 warnings in 3817.94s (1:03:37) =

(run on tripel, back to back with inducer/pytential@488b958 and inducer/sumpy@8960662)

The differences are... interesting. I'm not sure I understand them.

@alexfikl
Copy link
Collaborator

alexfikl commented Jun 8, 2023

Tried to investigate more.

Ran this on koelsch as well and didn't see as much of a toss up between the timings of the various tests. The total runtime was pretty similar like in your run though. I also did a (in pytential)

mprof run python -m pytest -v -s -k 'not slowtest' --durations=25

to get the memory and it seems pretty similar as well (blue is on main and black is on attrs branches).

memory

Sooo.. really not sure why the github CI is crashing..

EDIT: Not sure how accurate this is, but Github reports having 7GB on their machines (and above we're using 12GB)
https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

@inducer
Copy link
Owner Author

inducer commented Jul 19, 2023

In a way I think it's not a surprise that the runners get canceled. They have about 7G of memory available, plus 4G swap, so the 12G peak allocation is definitely flirting with disaster, and I think which one gets cancelled is probably just up to coincidence. Perhaps the more useful question here might be why the curve is monotonically increasing, when in reality all tests should be independent, and so there shouldn't be an increase.

I propose we use inducer/sumpy#178 to have this discussion, because that looks eerily related.

@inducer
Copy link
Owner Author

inducer commented Jul 19, 2023

OK, so inducer/sumpy#178 seems to be a different issue.

@inducer
Copy link
Owner Author

inducer commented Jul 19, 2023

Made inducer/pytential#213 to discuss the unbounded memory growth.

@alexfikl
Copy link
Collaborator

Just wanted to write this down before I forget it again! After a run on koelsch the pytential test

test/test_scalar_int_eq.py::test_integral_equation[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz' on 'Portable Computing Lan
guage'>>-case9]

uses

  • about 3.5GB on pymbolic@main
  • about 5GB on pymbolic@attrs

so that may be a good candidate to look at to see what's ballooning the memory.

@alexfikl
Copy link
Collaborator

Also slightly related: tried hacking in __slots__ into all the Expressions from pymbolic, since that should reduce memory a bit, but the CI still didn't pass so I didn't bother looking into it further..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants