Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError: Integer overflow in ysize reading ndpi image #31

Closed
rcasero opened this issue Apr 9, 2018 · 20 comments
Closed

MemoryError: Integer overflow in ysize reading ndpi image #31

rcasero opened this issue Apr 9, 2018 · 20 comments
Labels

Comments

@rcasero
Copy link

rcasero commented Apr 9, 2018

Context

Issue type: bug report
Operating system: Ubuntu 17.10 (Artful Aardvark)
Platform: 64-bit x86
OpenSlide Python version (openslide.__version__): 1.1.1
OpenSlide version (openslide.__library_version__): 3.4.1
Slide format (e.g. SVS, NDPI, MRXS): NDPI

Details

When trying to read this NDPI image, size (51200, 38144) ~ 1.82 Gpixels,

http://openslide.cs.cmu.edu/download/openslide-testdata/Hamamatsu/CMU-1.ndpi

with OpenSlide, the code

import openslide
slide = openslide.OpenSlide("CMU-1.ndpi")
foo = slide.read_region(location=(0, 0), level=0, size=slide.dimensions)

gives the error

Traceback (most recent call last):
  File "<input>", line 4, in <module>
  File "/home/rcasero/.conda/envs/elastixity/lib/python3.6/site-packages/openslide/__init__.py", line 223, in read_region
    level, size[0], size[1])
  File "/home/rcasero/.conda/envs/elastixity/lib/python3.6/site-packages/openslide/lowlevel.py", line 260, in read_region
    return _load_image(buf, (w, h))
  File "/home/rcasero/.conda/envs/elastixity/lib/python3.6/site-packages/openslide/lowlevel.py", line 65, in _load_image
    return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBA', 0, 1)
  File "/home/rcasero/.conda/envs/elastixity/lib/python3.6/site-packages/PIL/Image.py", line 2398, in frombuffer
    core.map_buffer(data, size, decoder_name, None, 0, args)
MemoryError: Integer overflow in ysize

Package versions:

python                    3.6.4                hc3d631a_1
openslide-python          1.1.1                     <pip>
Pillow                    5.1.0                     <pip>
@andanis
Copy link

andanis commented Apr 11, 2018

Same issue :/ @rcasero were you able to resolve it?

@thomasaiman
Copy link

I've had the same problem. It appears to be independent of slide level and region location.

Regions with less than 2^29 pixels work fine. Regions with 2^29 or more pixels give the integer overflow error.
im = slidePtr.read_region((150,150),0,((2**15)-1,2**14)) is fine.
im = slidePtr.read_region((150,150),0,((2**15),2**14)) fails.

This runs fine:

import numpy as np
from PIL import Image

a = np.zeros([2**15-1, 2**14, 4], dtype='uint8')
im = Image.fromarray(a)

a = np.zeros([2**15+1000, 2**14, 3], dtype='uint8')
im = Image.fromarray(a)

a = np.zeros([2**31, 1], dtype='uint8')
im = Image.fromarray(a)

But this gives the same error:

a = np.zeros([2**15, 2**14, 4], dtype='uint8')
im = Image.fromarray(a)
Traceback (most recent call last):

  File "<ipython-input-257-91fae34ad103>", line 4, in <module>
    im = Image.fromarray(a)

  File "C:\tools\miniconda3\lib\site-packages\PIL\Image.py", line 2217, in fromarray
    return frombuffer(mode, size, obj, "raw", rawmode, 0, 1)

  File "C:\tools\miniconda3\lib\site-packages\PIL\Image.py", line 2162, in frombuffer
    core.map_buffer(data, size, decoder_name, None, 0, args)

MemoryError: Integer overflow in ysize

So this is still a Pillow issue.

But this is pretty similar to #17, which was supposedly fixed by reading from the slide in smaller chunks. Maybe that patch isn't working?

@willgdjones
Copy link

willgdjones commented Jul 20, 2018

I am encountering this issue as well.

@markemus
Copy link

markemus commented Nov 9, 2018

There are two definitions for lowlevel._load_image(). The solution to #17 was only applied to the fallback function, which is there in case the openslide._convert import fails. If the import succeeds, the old code executes and the overflow occurs.

I was able to test this:

...
wsi.read_region(location=(0,0), level=0, size=(30000,30000))    /# MemoryError: Integer overflow in ysize
from openslide.lowlevel import *
def _load_image...                   /# use the SECOND _load_image definition on line 67
openslide.lowlevel._load_image = _load_image
wsi.read_region(location=(0,0), level=0, size=(30000,30000))   /# returns PIL.Image object

@markemus
Copy link

After further testing I can confirm that this method works, but is pretty slow- it takes ~15 minutes to load a large WSI into memory. This might be because of the aBGR -> RGBa conversion, or it might be because of the pasting; maybe we could multithread it and speed things up. But it does work. When I get a chance I'll submit a PR, would that be alright? @bgilbert

@Borda
Copy link

Borda commented Nov 15, 2018

what about reading it as mosaic and then compose it back?
https://github.com/Borda/BIRL/blob/scripts/convert_tiff2png.py

for i in range(2 ** level_shift):
    img_tiles_d1 = []
    for j in range(2 ** level_shift):
        loc = (i * tile_size[0] * 2 ** level,
               j * tile_size[1] * 2 ** level)
        img = slide_img.read_region(loc, level, size=tile_size)
        img_tiles_d1.append(img)
        tqdm_bar.update()
    img_tiles_d0.append(np.vstack(img_tiles_d1))
image = np.hstack(img_tiles_d0)

@markemus
Copy link

@Borda it does read them as tiles, but it only uses one core at the moment.

@sbelharbi
Copy link

sbelharbi commented Feb 8, 2019

Same issue here. Trying to read a patch of size (w, h) = 23782, 32451). I get the error MemoryError: Integer overflow in ysize.
I use Pillow: region = slide.read_region((upper_left[0], upper_left[1]), 0, (w_rec, h_rec)).convert('RGB')

According to what I've read, it is a Pillow issue.

Any pointers? Thanks!

openslide-python: 1.1.1
openslide: 3.4.1
Pillow: 5.4.1
Python: 3.7.1

EDIT:
I just read the changes in openslide-python 1.1.1:

Version 1.1.1, 2016-06-11

  • Change default Deep Zoom tile size to 254 pixels
  • Fix image reading with Pillow 3.x when installed --without-performance
  • Fix reading >= 2 ** 29 pixels per call --without-performance
  • Fix some "unclosed file" ResourceWarnings on Python 3
  • Improve object reprs
  • Add test suite
  • examples: Drop support for Internet Explorer < 9

I am not sure if I did something wrong during the installation, since they have already fixed this issue!

I installed openslide-python using: pip install --no-deps openslide-python.

See here about a fix.

@bgilbert @markemus is this the fix that was mentioned in link about the issue Fix reading >= 2**29 pixels per call --without-performance, which is located in openslide.lowlevel:

try:
    from . import _convert
    def _load_image(buf, size):
        '''buf must be a mutable buffer.'''
        _convert.argb2rgba(buf)
        return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBA', 0, 1)
except ImportError:
    def _load_image(buf, size):
        '''buf must be a buffer.'''

        # Load entire buffer at once if possible
        MAX_PIXELS_PER_LOAD = (1 << 29) - 1
        # Otherwise, use chunks smaller than the maximum to reduce memory
        # requirements
        PIXELS_PER_LOAD = 1 << 26

        def do_load(buf, size):
            '''buf can be a string, but should be a ctypes buffer to avoid an
            extra copy in the caller.'''
            # First reorder the bytes in a pixel from native-endian aRGB to
            # big-endian RGBa to work around limitations in RGBa loader
            rawmode = (sys.byteorder == 'little') and 'BGRA' or 'ARGB'
            buf = PIL.Image.frombuffer('RGBA', size, buf, 'raw', rawmode, 0, 1)
            # Image.tobytes() is named tostring() in Pillow 1.x and PIL
            buf = (getattr(buf, 'tobytes', None) or buf.tostring)()
            # Now load the image as RGBA, undoing premultiplication
            return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBa', 0, 1)

        # Fast path for small buffers
        w, h = size
        if w * h <= MAX_PIXELS_PER_LOAD:
            return do_load(buf, size)

        # Load in chunks to avoid OverflowError in PIL.Image.frombuffer()
        # https://github.com/python-pillow/Pillow/issues/1475
        if w > PIXELS_PER_LOAD:
            # We could support this, but it seems like overkill
            raise ValueError('Width %d is too large (maximum %d)' %
                    (w, PIXELS_PER_LOAD))
        rows_per_load = PIXELS_PER_LOAD // w
        img = PIL.Image.new('RGBA', (w, h))
        for y in range(0, h, rows_per_load):
            rows = min(h - y, rows_per_load)
            if sys.version[0] == '2':
                chunk = buffer(buf, 4 * y * w, 4 * rows * w)
            else:
                # PIL.Image.frombuffer() won't take a memoryview or
                # bytearray, so we can't avoid copying
                chunk = memoryview(buf)[y * w:(y + rows) * w].tobytes()
            img.paste(do_load(chunk, (w, rows)), (0, y))
        return img

This way seems strange to me, since the function _load_image() is imported the first time openslide/lowlevel is loaded independently of the size of the patch. Unless there is an import error of the default function, the second definition of the function is never imported. I am not really an expert, but if you have any insight on how to use the mentioned fix in openslide-python 1.1.1 to read large patches from WSI, please let me know. I am not sure if I am doing something wrong. Thanks!

My current fix,

that seems to work, following link:

  1. Create a separate Python file openslide_python_fix.py.
  2. Copy-paste the two definitions of _load_image() located in here and here (see below).
  3. Within the my main code that reads the patches, depending of the size of the patch, I choose the right function. Something like this:
...
import openslide
from openslide_python_fix import _load_image_lessthan_2_29, _load_image_morethan_2_29
...

# The function that reads the patches from a WSI.
def func_read_patch():
   ...
    # Check which _load_image() function to use depending on the size of the region.
    if (h_rec * w_rec) >= 2**29:
        openslide.lowlevel._load_image = _load_image_morethan_2_29
    else:
        openslide.lowlevel._load_image = _load_image_lessthan_2_29

    region = slide.read_region((upper_left[0], upper_left[1]), 0, (w_rec, h_rec)).convert('RGB')

Unless I am mistaken, the issue of reading patches >=2**29 is still here. The provided solution seems a little bit off.

Content of openslide_python_fix.py:

from openslide.lowlevel import *
from openslide.lowlevel import _convert


def _load_image_lessthan_2_29(buf, size):
    '''buf must be a mutable buffer.'''
    _convert.argb2rgba(buf)
    return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBA', 0, 1)


def _load_image_morethan_2_29(buf, size):
    '''buf must be a buffer.'''

    # Load entire buffer at once if possible
    MAX_PIXELS_PER_LOAD = (1 << 29) - 1
    # Otherwise, use chunks smaller than the maximum to reduce memory
    # requirements
    PIXELS_PER_LOAD = 1 << 26

    def do_load(buf, size):
        '''buf can be a string, but should be a ctypes buffer to avoid an
        extra copy in the caller.'''
        # First reorder the bytes in a pixel from native-endian aRGB to
        # big-endian RGBa to work around limitations in RGBa loader
        rawmode = (sys.byteorder == 'little') and 'BGRA' or 'ARGB'
        buf = PIL.Image.frombuffer('RGBA', size, buf, 'raw', rawmode, 0, 1)
        # Image.tobytes() is named tostring() in Pillow 1.x and PIL
        buf = (getattr(buf, 'tobytes', None) or buf.tostring)()
        # Now load the image as RGBA, undoing premultiplication
        return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBa', 0, 1)

    # Fast path for small buffers
    w, h = size
    if w * h <= MAX_PIXELS_PER_LOAD:
        return do_load(buf, size)

    # Load in chunks to avoid OverflowError in PIL.Image.frombuffer()
    # https://github.com/python-pillow/Pillow/issues/1475
    if w > PIXELS_PER_LOAD:
        # We could support this, but it seems like overkill
        raise ValueError('Width %d is too large (maximum %d)' %
                         (w, PIXELS_PER_LOAD))
    rows_per_load = PIXELS_PER_LOAD // w
    img = PIL.Image.new('RGBA', (w, h))
    for y in range(0, h, rows_per_load):
        rows = min(h - y, rows_per_load)
        if sys.version[0] == '2':
            chunk = buffer(buf, 4 * y * w, 4 * rows * w)
        else:
            # PIL.Image.frombuffer() won't take a memoryview or
            # bytearray, so we can't avoid copying
            chunk = memoryview(buf)[y * w:(y + rows) * w].tobytes()
        img.paste(do_load(chunk, (w, rows)), (0, y))
    return img

Here is the running time of the function read_region() on different patch sizes using the above fix (hh:mm:ss with 2**29 = 536,870,912):

  1. read_region (h=5800 x w=5478 = 31,772,400) took 0:00:01.027638 . hxw >= 2**29 False
  2. read_region (h=19689 x w26537= 522,486,993) took 0:00:37.363076 . hxw >= 2**29 False
  3. read_region (h=2425 x w=2047 = 4,963,975) took 0:00:00.261494. hxw >= 2**29 False
  4. read_region (h=3862 x w=4022 = 15,532,964) took 0:00:01.041433. hxw >= 2**29 False
  5. read_region (h=32451 x w=23782 = 771,749,682) took 0:00:57.984822. hxw >= 2**29 True.

@lunasdejavu
Copy link

lunasdejavu commented May 16, 2019

@sbelharbi
I modified your code a little:
for openslide_python_fix.py:

from openslide.lowlevel import *
from openslide.lowlevel import _convert



def _load_image_lessthan_2_29(buf, size):
    '''buf must be a mutable buffer.'''
    _convert.argb2rgba(buf)
    return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBA', 0, 1)


def _load_image_morethan_2_29(buf, size):
    '''buf must be a buffer.'''

    # Load entire buffer at once if possible
    MAX_PIXELS_PER_LOAD = (1 << 29) - 1
    # Otherwise, use chunks smaller than the maximum to reduce memory
    # requirements
    PIXELS_PER_LOAD = 1 << 26

    def do_load(buf, size):
        '''buf can be a string, but should be a ctypes buffer to avoid an
        extra copy in the caller.'''
        # First reorder the bytes in a pixel from native-endian aRGB to
        # big-endian RGBa to work around limitations in RGBa loader
        rawmode = (sys.byteorder == 'little') and 'BGRA' or 'ARGB'
        buf = PIL.Image.frombuffer('RGBA', size, buf, 'raw', rawmode, 0, 1)
        # Image.tobytes() is named tostring() in Pillow 1.x and PIL
        buf = (getattr(buf, 'tobytes', None) or buf.tostring)()
        # Now load the image as RGBA, undoing premultiplication
        return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBa', 0, 1)

    # Fast path for small buffers
    w, h = size
    if w * h <= MAX_PIXELS_PER_LOAD:
        return do_load(buf, size)

    # Load in chunks to avoid OverflowError in PIL.Image.frombuffer()
    # https://github.com/python-pillow/Pillow/issues/1475
    if w > PIXELS_PER_LOAD:
        # We could support this, but it seems like overkill
        raise ValueError('Width %d is too large (maximum %d)' %
                         (w, PIXELS_PER_LOAD))
    rows_per_load = PIXELS_PER_LOAD // w
    img = PIL.Image.new('RGBA', (w, h))
    for y in range(0, h, rows_per_load):
        rows = min(h - y, rows_per_load)
        if sys.version[0] == '2':
            chunk = buffer(buf, 4 * y * w, 4 * rows * w)
        else:
            # PIL.Image.frombuffer() won't take a memoryview or
            # bytearray, so we can't avoid copying
            chunk = memoryview(buf)[y * w:(y + rows) * w].tobytes()
        img.paste(do_load(chunk, (w, rows)), (0, y))
    return img

my code:

import openslide
import matplotlib.pyplot as plt
import numpy as np
from openslide_python_fix import _load_image_lessthan_2_29, _load_image_morethan_2_29
def func_read_patch(slide, h, w):
    # Check which _load_image() function to use depending on the size of the region.
    if (h * w) >= 2**29:
        openslide.lowlevel._load_image = _load_image_morethan_2_29
    else:
        openslide.lowlevel._load_image = _load_image_lessthan_2_29


    region = slide.read_region((0,0), 2, (w, h)).convert('RGB')

def main():
    slide = openslide.OpenSlide('D:/breast_cancer_dataset/HER2 contest/testing/05_HER2.ndpi') #读入图片()

    downsamples=slide.level_downsamples 
    [w, h] = slide.level_dimensions[0] 
    size1 = int(w*(downsamples[0]/downsamples[2]))
    size2 = int(h*(downsamples[0]/downsamples[2]))
    func_read_patch(slide, h, w)
if __name__ == '__main__':
    main()

but it still showed

Traceback (most recent call last):
  File "C:/Users/willy_sung/Documents/openslide_test.py", line 33, in <module>
    main()
  File "C:/Users/willy_sung/Documents/openslide_test.py", line 23, in main
    func_read_patch(slide, h, w)
  File "C:/Users/willy_sung/Documents/openslide_test.py", line 13, in func_read_patch
    region = slide.read_region((0,0), 2, (w, h)).convert('RGB')
  File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\lib\site-packages\openslide\__init__.py", line 223, in read_region
    level, size[0], size[1])
  File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\lib\site-packages\openslide\lowlevel.py", line 258, in read_region
    buf = (w * h * c_uint32)()
MemoryError

it seems that it didn't call the function from openslide_python_fix.py
what mistake did I make?

@sbelharbi
Copy link

@lunasdejavu

  1. Can you highlight the difference between your code and mine? If I understand well, you changed func_read_patch(slide, h, w) to accept the slide, h, and w. And, you read at level 2 instead of 0.
  2. If I understand well, [w, h] = slide.level_dimensions[0] returns the width and the height of the level 0. However, you'are calling read_region in region = slide.read_region((0,0), 2, (w, h)).convert('RGB') at level 2. So, this has to be consistent. Grab the right dimensions for the requested level.
  3. Not sure what size1 and size2 are doing. Maybe you forget to use them as the downsized dimensions to pass them to func_read_patch(slide, h, w) instead of h, w. You can use directly slide.level_dimensions[2] to get the dimensions of level 2.
  4. The fix above is supposed to work as long as the region fits in the memory. If it does not fit, the fix can do nothing about it. Usually, we read patches. In my code, w_rec, h_rec are the size of a rectangle within the WS. In your code, you are trying to load an entire WSI at level 0!!!!!!!! into the memory. That's extreme, unless you have a very large memory. If you want to see the entire WS, you can use higher levels (6 for instance). But, if you want high resolution regions, use level 0, but you can only load small regions. The size of the regions depends on your available memory. If you process the regions sequentially, think to free the memory after processing each region.
  5. Check which function is called in func_read_patch by putting a print() within the if else statement. for instance. You can add a return region within the function as well. I think It called the right function. However, you have asked probably too much from your machine (load entire WS in the memory, and it did not like it).

Let me know how it goes. Thanks!

@lunasdejavu
Copy link

@sbelharbi I fixed the problem by modifying lowlevel.py from here
which you have commented before.

@sbelharbi
Copy link

@lunasdejavu which parts I have commented? If I remember correctly, I copy-pasted the content of lowlevel.py. Make sure that you are comparing the same versions. Can you elaborate on what modifications you did? Thanks!

@lunasdejavu
Copy link

just changed the code from line 60 to 110, you have commented in another thread before.

@sbelharbi
Copy link

The merge you linked is not mine.
Not sure what made you think that I have edited lowlevel.py.

@markemus
Copy link

@lunasdejavu It looks like you got a memory error trying to initialize the buffer for the image. Keep in mind that WSI images in memory decompress to be much larger than they are on disk (they're very sparse so they compress well). You can use my PR if you don't mind messing with your openslide install, or @sbelharbi 's code if you prefer not to. Either approach should work. Regardless, I'm glad you were able to resolve the problem.

@Samyssmile
Copy link

Samyssmile commented May 18, 2019

We also have exact same problem, is there any workarounds for this?

@sbelharbi
Copy link

Loading an entire WS (especially with high resolution, i.e., level 0) into memory depends on the size of the memory you have. I don't think that Openslide can do anything about it. Your options are: either loading entire WS with low resolution, or split the WS into large patches and process them sequentially. As @markemus mentioned, when decompressed into memory, WS takes a lot of space (way larger than the size on disc).

@JiancongWang
Copy link

Update the PIL to 6.2.0 dev solve this for me.

@ShanQiong
Copy link

Update the PIL to 6.2.0 dev solve this for me.

how to ? pip3 install Pillow, the newest version is 6.1.0

bgilbert added a commit that referenced this issue Sep 13, 2020
Older versions of Pillow raise "MemoryError: Integer overflow in ysize"
when Image.frombuffer() is called on large buffers.

The test is unconditionally disabled because of its RAM usage, so this is
mostly documentation.

See #31.  Closes #33.
@bgilbert
Copy link
Member

Thanks for the report. This is fixed by python-pillow/Pillow#3964, which is in Pillow ≥ 6.2.0. I'll close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests