Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suspected bug in Projection Plots for SPH datasets #4788

Open
nastasha-w opened this issue Jan 25, 2024 · 2 comments
Open

suspected bug in Projection Plots for SPH datasets #4788

nastasha-w opened this issue Jan 25, 2024 · 2 comments

Comments

@nastasha-w
Copy link

Bug report

Bug summary

To my understanding, the pixel values in projection plots should represent the integral of some field over the line of sight through a pixel center. That means that pixels in different grids, but with the same center, should produce the same integrated values. However, this is not the case for the FixedResolutionBuffer projections of SPH datasets. I suspect this is due to the handling of small SPH particles (relative to the grid spacing) in the backend code: their contribution to the line-of-sight integrals seem to be systematically overestimated.

I've proposed a fix, but I'd like someone else to confirm whether this problem is real and seems correctly diagnosed before I open a pull request.

Bug explanation

Here, I will explain the issue as I understand it from the backend code. I will demonstrate the problem in the next section.

The backend code at issue here is pixelize_sph_kernel_projection in pixelization_routines.pyx, starting on line 1121. This code loops over the input particles, determines the extent of the region each particle might contribute to, and adds the (kernel-integrated path length) * (selected particle field) to each pixel's total. In #4783, I discuss some background on the reasoning for this method; the pixel values should be the same as the $\sum_{j} dl_j \cdot A_j$, along a Ray, where the sum is over SPH particles $j$, $dl_j$ is the path length from e.g., #4783, and $A$ is the particle field to be projected.
For the column density or surface density $N$ along one of those pencil beams, we then get

$$N \approx \sum_{j=1}^{N'} A_j \cdot (m_j / (\rho_j \cdot h_j^2)) \cdot \left[ \int dr \, U \left( \sqrt{(b_i/h_i)^2 + r^2}\right) \right],$$

and

$$dl_j = (m_j / (\rho_j \cdot h_j^2)) \cdot \left[ \int dr \, U \left( \sqrt{(b_j/h_j)^2 + r^2} \right) \right].$$

For a particle $j$, $h_j$ is its smoothing length, $m_j$ is its mass, $\rho_j$ is its density, and $b_j$ is its impact parameter: its minimum distance to the line of sight.

The main loop in pixelize_sph_kernel_projection seems to implement thie. However, for small SPH particles, the smoothing lengths are tweaked in a way that seems to be inconsistent with the 'pencil-beam integral' approach to determining surface densities, and systematically overestimates their contributions to the surface density. This is because the calculation of $dl_j$ in the code is split up into two parts: a factor $A_j \cdot m_j / (\rho_j \cdot h_j^2)$ which is straightforwardly calculated from the dataset fields, and the integral $\int U \left( \sqrt{(b_j/h_j)^2 + r^2} dr \right)$, which only depends on $b_j/h_j$, and is therefore calculated for a small number of values, then interpolated for the rest with the SPHKernelInterpolationTable class.

This adjustment which I think is causing trouble is made in lines 1221-1223:

# we set the smoothing length squared with lower limit of the pixel
h_j2 = fmax(hsml[j]*hsml[j], dx*dy)
ih_j2 = 1.0/h_j2

where h_j2 is the smoothing length used in the kernel function integral, and ih_j2 is its inverse. hsml is the smoothing length recorded in the dataset, and dx and dy are the pixel dimensions.

However, the first factor in each particle's pixel contributions is calculated as

prefactor_j = pmass[j] / pdens[j] / hsml[j]**2 * quantity_to_smooth[j]

using the dataset smoothing length value hsml[j]. On the other hand, the integral is evaluated as

                            q_ij2 = (posx_diff + posy_diff) * ih_j2
                            if q_ij2 >= 1:
                                continue

                            # see equation 32 of the SPLASH paper
                            # now we just use the kernel projection
                            local_buff[xi + yi*xsize] +=  prefactor_j * itab.interpolate(q_ij2)

where posx_diff is (x-coordinate of particle center - x-coordinate of pixel center)^2, and posy_diff is the analogous value for y-coordinates. The integral evaluation (interpolation) happens with itab.interpolate(q_ij2). (This should indeed be a squared input value if I'm not mistaken; see #4783.)

For sufficiently large SPH particles (compared to the pixel grid spacing), this issue does not arise as fmax(hsml[j]*hsml[j], dx*dy) will be equal to the squared dataset smoothing length. Here, the different parts of the particle surface density calculation use consistent smoothing lengths. However, for smaller particles, the integral factor is evaluated at a smaller normalized impact parameter (b / hsml), and will therefore be larger than it should be.

For the inclusion of particles in a pixel calculation (i.e., the assignment non-zero $dl_j$ to particle $j$ for a given pixel center), the situation is complicated by the use of a mix of (h_j2, ih_j2) and hsml[j] in different parts of the calculation. To evaluate whether particle $j$ even lies in the domain of the projection, line 1232

if (px + hsml[j] < x_min) or (px - hsml[j] > x_max): continue

(and the analogous line 1236 for the y position) uses the true smoothing length hsml[j]. The other variables are the lower x-value bound in the grid xmin, the upper bound xmax, and px is the particle position x-coordinate (possibly with the box period added or subtracted to wrap around periodic edges). Starting on line 1238, the true smoothing length is also used to determine the square region of pixels particle $j$ might contribute to:

                    # here we find the pixels which this particle contributes to
                    x0 = <np.int64_t> ((px - hsml[j] - x_min)*idx)
                    x1 = <np.int64_t> ((px + hsml[j] - x_min)*idx)
                    x0 = iclip(x0-1, 0, xsize)
                    x1 = iclip(x1+1, 0, xsize)

                    y0 = <np.int64_t> ((py - hsml[j] - y_min)*idy)
                    y1 = <np.int64_t> ((py + hsml[j] - y_min)*idy)
                    y0 = iclip(y0-1, 0, ysize)
                    y1 = iclip(y1+1, 0, ysize)

The iclip function returns its first value if it's between the bounds given as its second and third arguments, but returns the closest bound if it is not. (I'm assuming second arg < thrid arg here.) idx and idy are the inverse pixel lengths in the x- and y-directions, respectively.

Two nested loops then iterate: for xi in range(x0, x1) and for yi in range(y0, y1). Even for a smoothing length of zero, this will return a range of at least 2 pixels (xi and yi values) in each coordinate direction. These nested loops start on line 1249:

                    # found pixels we deposit on, loop through those pixels
                    for xi in range(x0, x1):
                        # we use the centre of the pixel to calculate contribution
                        x = (xi + 0.5) * dx + x_min

                        posx_diff = px - x
                        posx_diff = posx_diff * posx_diff

                        if posx_diff > h_j2: continue

                        for yi in range(y0, y1):
                            y = (yi + 0.5) * dy + y_min

                            posy_diff = py - y
                            posy_diff = posy_diff * posy_diff
                            if posy_diff > h_j2: continue

                            q_ij2 = (posx_diff + posy_diff) * ih_j2
                            if q_ij2 >= 1:
                                continue

                            # see equation 32 of the SPLASH paper
                            # now we just use the kernel projection
                            local_buff[xi + yi*xsize] +=  prefactor_j * itab.interpolate(q_ij2)
                            mask[xi, yi] = 1

Within these loops, the distance between the position of particle $j$ and the pixel center is consistently compared to ih_j2 or h_j2: the adjusted smoothing lengths. This means that all SPH particles in the grid domain will contribute a non-zero density to some line of sight within the grid, even if they don't actually intersect any of the sightlines in this grid. Small SPH particles that do intersect the grid will have a larger $dl_j$ value (due to the integral part of the calculation) than their true smoothing lengths would give.

This means that the grid projections will systematically overestimate the contribution of small SPH particles (compared to the pixel grid spacing) to the line-of-sight integrals through pixel centers.

Note that this issue will also affect off-axis SPH projections, since the backend for those (off_axis_projection_SPH, starting on line 1915 of pixelization_routines.pyx) calls pixelize_sph_kernel_projection` after rotating the particle coordinates.

Proposed solution

If I am correct in diagnosing this issue, the fix would be simple: don't set a minimum effective smoothing length, and just set

h_j2 = hsml[j]*hsml[j]
ih_j2 = 1.0/h_j2

instead.

This does mean that some small SPH particles will be entirely 'missed' by the projection calculation. I think this ok for a grid of sightliness: this method will only conserve mass in the projection in the limit of a very high grid density, but it does sample the range of surface/column densities values without 'smoothing out' the extreme values.

An alternative approach to projecting SPH data does conserve mass explicitly, at the cost of 'smoothing out' the extremes of the
surface/column density range. Here, instead of projecting densities onto lines of sight through pixel centers, the mass of each SPH particle is divided up into different rectangular prisms (the pixels of the grid, extending along the projection axis). Here, weights can calculated by evaluating the kernel at pixel centers, then normalizing the sum of these weights to 1 explicitly. Here, setting a minimum smoothing length is a straightforward way to make sure each particle contributes to at least one pixel, and its mass is not lost. However, I do not believe this 'loss avoidance' strategy is appropriate for the 'sightline through the center' approach to projections used in YT.

Code for reproduction

Unfortunately, setting up the test case means this isn't quite a minimal amount of code, but it does straightforwardly suggest a test for this issue. Basically, in the first part, I just set up a grid of SPH particles with constant spacing along the x, y, and z coordinate directions, and a smoothing length equal to half the spacing, so the particles just 'touch' each other.

I then set up FixedResolutionBuffer grids of increasing pixel resolution, keeping the centers of a few pixels fixed. The centers coincide precisely with SPH particle centers; I set it up that way for a different test. Using these increasingly dense grids, I then retrieve the gas surface density at 9 pixel centers that are at the same coordinate locations in the different grids.

import numpy as np
from yt.loaders import load_particles

############ this part just sets up a dataset with SPH particles on a grid ###############
def constantmass(i: int, j: int, k: int) -> float:
    return 1.

## TODO: move to testing.py if this ends up in the test suite
def fake_sph_refineable_grid_ds(
        hsml_factor: float = 1.0,
        nperside: int = 3,
        periodic: bool = True, 
        e1hat: np.ndarray[float] = np.array([1, 0, 0]),
        e2hat: np.ndarray[float] = np.array([0, 1, 0]),
        e3hat: np.ndarray[float] = np.array([0, 0, 1]),
        offsets: np.ndarray[float] = 0.5 * np.ones((3,), dtype=np.float64),
        massgenerator = constantmass,
        unitrho: float = 1.,
        bbox: np.ndarray | None = None, 
        ):
    """Returns an in-memory SPH dataset useful for testing

    This dataset should have `nperside`**3 particles with the particles
    arranged uniformly on a 3D grid. 
    The basis vectors `e1hat`, `e2hat`, and `e3hat` define this grid.
    (If these are not normalized to 1 or not orthogonal, the spacing or
    overlap between SPH particles will be affected, but this is 
    allowed.) 
    Along these basis vectors, `offsets` give the zero point.
    All particles will have smoothing regions with a radius of 
    `hsml_factor` * 0.5,
    masses of 1, and densities of 1, and zero velocity.
    `unitrho` defines the density for a particle with mass 1 ('g'),
    and the standard (uniform) grid hsml_factor.
    """

    npart = nperside**3

    pos = np.empty((npart, 3), dtype=np.float64)
    mass = np.empty((npart,), dtype=np.float64)
    for i in range(0, nperside):
        for j in range(0, nperside):
            for k in range(0, nperside):
                _pos = (offsets[0] + i) * e1hat \
                        + (offsets[1] + j) * e2hat \
                        + (offsets[2] + k) * e3hat
                ind = nperside**2 * i + nperside * j + k
                pos[ind, :] = _pos
                mass[ind] = massgenerator(i, j, k)
    rho = unitrho * mass

    data = {
        "particle_position_x": (np.copy(pos[:, 0]), "cm"),
        "particle_position_y": (np.copy(pos[:, 1]), "cm"),
        "particle_position_z": (np.copy(pos[:, 2]), "cm"),
        "particle_mass": (mass, "g"),
        "particle_velocity_x": (np.zeros(npart), "cm/s"),
        "particle_velocity_y": (np.zeros(npart), "cm/s"),
        "particle_velocity_z": (np.zeros(npart), "cm/s"),
        "smoothing_length": (np.ones(npart) * 0.5 * hsml_factor, "cm"),
        "density": (rho, "g/cm**3"),
    }
    
    if bbox is None:
        eps = 1e-3
        margin = (1. + eps) * hsml_factor
        bbox = np.array([[np.min(_pos[:, 0]) - margin,
                          np.max(_pos[:, 0]) + margin],
                         [np.min(_pos[:, 1]) - margin,
                          np.max(_pos[:, 1]) + margin],
                         [np.min(_pos[:, 2]) - margin,
                          np.max(_pos[:, 2]) + margin], 
                         ])

    ds = load_particles(data=data, length_unit=1.0, 
                        bbox=bbox, periodicity=(periodic,) * 3)
    ds.kernel_name = 'cubic'
    return ds

def _get_dataset(reflevel: int = 1):
    '''
    constant density particle grid, 
    with increasing particle sampling
    '''
    lenfact = (1./3.)**(reflevel - 1)
    massfact = lenfact**3
    nperside = 3**reflevel

    e1hat = np.array([1, 0, 0]) * lenfact
    e2hat = np.array([0, 1, 0]) * lenfact
    e3hat = np.array([0, 0, 1]) * lenfact
    hsml_factor = lenfact
    bbox = np.array([[0., 3.]] * 3)
    offsets = np.ones(3, dtype=np.float64) * 0.5 * lenfact
    
    def refmass(i: int, j: int, k: int) -> float:
        return massfact
    unitrho = 1. / massfact # want density 1 for decreasing mass

    ds = fake_sph_refineable_grid_ds(hsml_factor=hsml_factor, 
                                     nperside=nperside,
                                     periodic=True, 
                                     e1hat=e1hat,
                                     e2hat=e2hat, 
                                     e3hat=e3hat,
                                     offsets=offsets,
                                     massgenerator=refmass,
                                     unitrho=unitrho,
                                     bbox=bbox, 
                                    )
    return ds 
################################################################

# actual test running
def test_gridproj3():
    # refine the pixel grid instead of the particle grid
    ds = _get_dataset(reflevel=2)
    proj = ds.proj(("gas", "density"), 2) 
    imgs = {}
    for rl in range(1, 6):
        npix = 1 + 2**(rl + 1)
        margin = 0.5 - 0.5**(rl + 1)
        frb = proj.to_frb(width=(3. - 2. * margin, 'cm'), 
                          resolution=(npix, npix),
                          height=(3. - 2. * margin, 'cm'),
                          center=np.array([1.5, 1.5, 1.5]), 
                          periodic=False)
        out = frb.get_image(('gas', 'density'))
        imgs[rl] = out
    return imgs

def run_main():
    imgs = test_gridproj3()
    print('Using a constant datset with pixel grids of increasing size, '
          'then selecting pixels with the same centers:')
    for rl in imgs:
        img = imgs[rl]
        pixspace = 2**(rl)
        print(f'Grid refinement level {rl}:') 
        print(img[::pixspace, ::pixspace])

if __name__ == '__main__':
    run_main() 

Actual outcome

>>> python sph_pixelization_tests.py
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
yt : [INFO     ] 2024-01-25 01:24:28,451 Parameters: current_time              = 0.0
yt : [INFO     ] 2024-01-25 01:24:28,451 Parameters: domain_dimensions         = [1 1 1]
yt : [INFO     ] 2024-01-25 01:24:28,452 Parameters: domain_left_edge          = [0. 0. 0.]
yt : [INFO     ] 2024-01-25 01:24:28,452 Parameters: domain_right_edge         = [3. 3. 3.]
yt : [INFO     ] 2024-01-25 01:24:28,452 Parameters: cosmological_simulation   = 0
yt : [INFO     ] 2024-01-25 01:24:28,453 Allocating for 729 particles
yt : [INFO     ] 2024-01-25 01:24:28,765 Making a fixed resolution buffer of (('gas', 'density')) 5 by 5
yt : [INFO     ] 2024-01-25 01:24:28,774 Making a fixed resolution buffer of (('gas', 'density')) 9 by 9
yt : [INFO     ] 2024-01-25 01:24:28,775 Making a fixed resolution buffer of (('gas', 'density')) 17 by 17
yt : [INFO     ] 2024-01-25 01:24:28,777 Making a fixed resolution buffer of (('gas', 'density')) 33 by 33
yt : [INFO     ] 2024-01-25 01:24:28,778 Making a fixed resolution buffer of (('gas', 'density')) 65 by 65
Using a constant datset with pixel grids of increasing size, then selecting pixels with the same centers:
Grid refinement level 1:
[[26.17550883 26.17550883 26.17550883]
 [26.17550883 26.17550883 26.17550883]
 [26.17550883 26.17550883 26.17550883]] g/cm**2
Grid refinement level 2:
[[2.05048465 2.05048465 2.05048465]
 [2.05048465 2.05048465 2.05048465]
 [2.05048465 2.05048465 2.05048465]] g/cm**2
Grid refinement level 3:
[[0.00345061 0.00345061 0.00345061]
 [0.00345061 0.00345061 0.00345061]
 [0.00345061 0.00345061 0.00345061]] g/cm**2
Grid refinement level 4:
[[0.00345061 0.00345061 0.00345061]
 [0.00345061 0.00345061 0.00345061]
 [0.00345061 0.00345061 0.00345061]] g/cm**2
Grid refinement level 5:
[[0.00345061 0.00345061 0.00345061]
 [0.00345061 0.00345061 0.00345061]
 [0.00345061 0.00345061 0.00345061]] g/cm**2

Expected outcome

My expectation was that a pixel value represents the integral of a quantity (e.g., gas density) over a line of sight (pencil beam) through the pixel center. This means that the positions of the other pixels should not affect the value attributed to a pixel through the same center, e.g., if that center shows up in different pixel grids like in my test.

However, those values are only constant when the grid spacing is smaller than the SPH particle sizes. At lower pixel grid resolution, the surface densities come out too high.

Version Information

  • Operating System: Mac OS Sonoma 14.2.1 (23C71)
  • Python Version: 3.11.7
  • yt version: 4.4.dev0
  • numpy version: 1.26.3

I installed most dependencies from conda (standard channel), but installed yt from source, using my conda environment's pip.

@nastasha-w
Copy link
Author

I think basically the same issue also comes up in slices through SPH datasets:
pixelization_routines.pyx, line 1465 onwards: pixelize_sph_kernel_slice. The general approach here seems to be to evaluate the SPH field $A$ at the center $\vec{x}$ of each slice pixel:

$$A(\vec{x}) \approx \sum_{j=1}^{N} A_j \cdot (m_j / \rho_j) \cdot W\left(|\vec{x} - \vec{x_j}|, h_j\right),$$

where $A_j$ is the value of that field for particle $j$, $m_j$ is its mass, $\rho_j$ is its density, $\vec{x_j}$ is its position, and $h_j$ is its smoothing length. For small particles with smoothing lengths $\lesssim$ the pixel size, the smoothing lengths are artificially stretched, so that the normalized kernel and prefactor parts of the contribution are calculated inconsistently, and small SPH particles add to the total even when they do not intersect the pixel center.

Finally, I would suggest that we really need the z values for this as well. Now, this method calculates the kernel contributions for all particles as if the particle center lies exactly in the plane of the slice, but really, the kernel should probably be evaluated at the 3D-distance to the pixel center, to be consistent with how SPH interpolation is done in a simulation.

@nastasha-w
Copy link
Author

The 3D version of this grid deposition pixelization_routines.pyx, line 1607 onwards: pixelize_sph_kernel_arbitrary_grid seems to handle the 3D part just find, but again sets minimum smoothing lengths (for the kernel evaluation, but not the pre-factor). This is inconsistent with an evaluation of the SPH field at a voxel center.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant