`MemoryError: Integer overflow in ysize` reading ndpi image #31

rcasero · 2018-04-09T10:49:27Z

Context

Issue type: bug report
Operating system: Ubuntu 17.10 (Artful Aardvark)
Platform: 64-bit x86
OpenSlide Python version (openslide.__version__): 1.1.1
OpenSlide version (openslide.__library_version__): 3.4.1
Slide format (e.g. SVS, NDPI, MRXS): NDPI

Details

When trying to read this NDPI image, size (51200, 38144) ~ 1.82 Gpixels,

http://openslide.cs.cmu.edu/download/openslide-testdata/Hamamatsu/CMU-1.ndpi

with OpenSlide, the code

import openslide
slide = openslide.OpenSlide("CMU-1.ndpi")
foo = slide.read_region(location=(0, 0), level=0, size=slide.dimensions)

gives the error

Traceback (most recent call last):
  File "<input>", line 4, in <module>
  File "/home/rcasero/.conda/envs/elastixity/lib/python3.6/site-packages/openslide/__init__.py", line 223, in read_region
    level, size[0], size[1])
  File "/home/rcasero/.conda/envs/elastixity/lib/python3.6/site-packages/openslide/lowlevel.py", line 260, in read_region
    return _load_image(buf, (w, h))
  File "/home/rcasero/.conda/envs/elastixity/lib/python3.6/site-packages/openslide/lowlevel.py", line 65, in _load_image
    return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBA', 0, 1)
  File "/home/rcasero/.conda/envs/elastixity/lib/python3.6/site-packages/PIL/Image.py", line 2398, in frombuffer
    core.map_buffer(data, size, decoder_name, None, 0, args)
MemoryError: Integer overflow in ysize

Package versions:

python                    3.6.4                hc3d631a_1
openslide-python          1.1.1                     <pip>
Pillow                    5.1.0                     <pip>

The text was updated successfully, but these errors were encountered:

andanis · 2018-04-11T07:08:47Z

Same issue :/ @rcasero were you able to resolve it?

thomasaiman · 2018-06-11T07:36:14Z

I've had the same problem. It appears to be independent of slide level and region location.

Regions with less than 2^29 pixels work fine. Regions with 2^29 or more pixels give the integer overflow error.
im = slidePtr.read_region((150,150),0,((2**15)-1,2**14)) is fine.
im = slidePtr.read_region((150,150),0,((2**15),2**14)) fails.

This runs fine:

import numpy as np
from PIL import Image

a = np.zeros([2**15-1, 2**14, 4], dtype='uint8')
im = Image.fromarray(a)

a = np.zeros([2**15+1000, 2**14, 3], dtype='uint8')
im = Image.fromarray(a)

a = np.zeros([2**31, 1], dtype='uint8')
im = Image.fromarray(a)

But this gives the same error:

a = np.zeros([2**15, 2**14, 4], dtype='uint8')
im = Image.fromarray(a)

Traceback (most recent call last):

  File "<ipython-input-257-91fae34ad103>", line 4, in <module>
    im = Image.fromarray(a)

  File "C:\tools\miniconda3\lib\site-packages\PIL\Image.py", line 2217, in fromarray
    return frombuffer(mode, size, obj, "raw", rawmode, 0, 1)

  File "C:\tools\miniconda3\lib\site-packages\PIL\Image.py", line 2162, in frombuffer
    core.map_buffer(data, size, decoder_name, None, 0, args)

MemoryError: Integer overflow in ysize

So this is still a Pillow issue.

But this is pretty similar to #17, which was supposedly fixed by reading from the slide in smaller chunks. Maybe that patch isn't working?

willgdjones · 2018-07-20T10:57:57Z

I am encountering this issue as well.

markemus · 2018-11-09T19:55:33Z

There are two definitions for lowlevel._load_image(). The solution to #17 was only applied to the fallback function, which is there in case the openslide._convert import fails. If the import succeeds, the old code executes and the overflow occurs.

I was able to test this:

...
wsi.read_region(location=(0,0), level=0, size=(30000,30000))    /# MemoryError: Integer overflow in ysize
from openslide.lowlevel import *
def _load_image...                   /# use the SECOND _load_image definition on line 67
openslide.lowlevel._load_image = _load_image
wsi.read_region(location=(0,0), level=0, size=(30000,30000))   /# returns PIL.Image object

markemus · 2018-11-14T15:11:09Z

After further testing I can confirm that this method works, but is pretty slow- it takes ~15 minutes to load a large WSI into memory. This might be because of the aBGR -> RGBa conversion, or it might be because of the pasting; maybe we could multithread it and speed things up. But it does work. When I get a chance I'll submit a PR, would that be alright? @bgilbert

Borda · 2018-11-15T08:48:06Z

what about reading it as mosaic and then compose it back?
https://github.com/Borda/BIRL/blob/scripts/convert_tiff2png.py

for i in range(2 ** level_shift):
    img_tiles_d1 = []
    for j in range(2 ** level_shift):
        loc = (i * tile_size[0] * 2 ** level,
               j * tile_size[1] * 2 ** level)
        img = slide_img.read_region(loc, level, size=tile_size)
        img_tiles_d1.append(img)
        tqdm_bar.update()
    img_tiles_d0.append(np.vstack(img_tiles_d1))
image = np.hstack(img_tiles_d0)

markemus · 2018-11-15T15:08:01Z

@Borda it does read them as tiles, but it only uses one core at the moment.

sbelharbi · 2019-02-08T13:01:12Z

Same issue here. Trying to read a patch of size (w, h) = 23782, 32451). I get the error MemoryError: Integer overflow in ysize.
I use Pillow: region = slide.read_region((upper_left[0], upper_left[1]), 0, (w_rec, h_rec)).convert('RGB')

According to what I've read, it is a Pillow issue.

Any pointers? Thanks!

openslide-python: 1.1.1
openslide: 3.4.1
Pillow: 5.4.1
Python: 3.7.1

EDIT:
I just read the changes in openslide-python 1.1.1:

Version 1.1.1, 2016-06-11

Change default Deep Zoom tile size to 254 pixels
Fix image reading with Pillow 3.x when installed --without-performance
Fix reading >= 2 ** 29 pixels per call --without-performance
Fix some "unclosed file" ResourceWarnings on Python 3
Improve object reprs
Add test suite
examples: Drop support for Internet Explorer < 9

I am not sure if I did something wrong during the installation, since they have already fixed this issue!

I installed openslide-python using: pip install --no-deps openslide-python.

See here about a fix.

@bgilbert @markemus is this the fix that was mentioned in link about the issue Fix reading >= 2**29 pixels per call --without-performance, which is located in openslide.lowlevel:

try:
    from . import _convert
    def _load_image(buf, size):
        '''buf must be a mutable buffer.'''
        _convert.argb2rgba(buf)
        return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBA', 0, 1)
except ImportError:
    def _load_image(buf, size):
        '''buf must be a buffer.'''

        # Load entire buffer at once if possible
        MAX_PIXELS_PER_LOAD = (1 << 29) - 1
        # Otherwise, use chunks smaller than the maximum to reduce memory
        # requirements
        PIXELS_PER_LOAD = 1 << 26

        def do_load(buf, size):
            '''buf can be a string, but should be a ctypes buffer to avoid an
            extra copy in the caller.'''
            # First reorder the bytes in a pixel from native-endian aRGB to
            # big-endian RGBa to work around limitations in RGBa loader
            rawmode = (sys.byteorder == 'little') and 'BGRA' or 'ARGB'
            buf = PIL.Image.frombuffer('RGBA', size, buf, 'raw', rawmode, 0, 1)
            # Image.tobytes() is named tostring() in Pillow 1.x and PIL
            buf = (getattr(buf, 'tobytes', None) or buf.tostring)()
            # Now load the image as RGBA, undoing premultiplication
            return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBa', 0, 1)

        # Fast path for small buffers
        w, h = size
        if w * h <= MAX_PIXELS_PER_LOAD:
            return do_load(buf, size)

        # Load in chunks to avoid OverflowError in PIL.Image.frombuffer()
        # https://github.com/python-pillow/Pillow/issues/1475
        if w > PIXELS_PER_LOAD:
            # We could support this, but it seems like overkill
            raise ValueError('Width %d is too large (maximum %d)' %
                    (w, PIXELS_PER_LOAD))
        rows_per_load = PIXELS_PER_LOAD // w
        img = PIL.Image.new('RGBA', (w, h))
        for y in range(0, h, rows_per_load):
            rows = min(h - y, rows_per_load)
            if sys.version[0] == '2':
                chunk = buffer(buf, 4 * y * w, 4 * rows * w)
            else:
                # PIL.Image.frombuffer() won't take a memoryview or
                # bytearray, so we can't avoid copying
                chunk = memoryview(buf)[y * w:(y + rows) * w].tobytes()
            img.paste(do_load(chunk, (w, rows)), (0, y))
        return img

This way seems strange to me, since the function _load_image() is imported the first time openslide/lowlevel is loaded independently of the size of the patch. Unless there is an import error of the default function, the second definition of the function is never imported. I am not really an expert, but if you have any insight on how to use the mentioned fix in openslide-python 1.1.1 to read large patches from WSI, please let me know. I am not sure if I am doing something wrong. Thanks!

My current fix,

that seems to work, following link:

Create a separate Python file openslide_python_fix.py.
Copy-paste the two definitions of _load_image() located in here and here (see below).
Within the my main code that reads the patches, depending of the size of the patch, I choose the right function. Something like this:

...
import openslide
from openslide_python_fix import _load_image_lessthan_2_29, _load_image_morethan_2_29
...

# The function that reads the patches from a WSI.
def func_read_patch():
   ...
    # Check which _load_image() function to use depending on the size of the region.
    if (h_rec * w_rec) >= 2**29:
        openslide.lowlevel._load_image = _load_image_morethan_2_29
    else:
        openslide.lowlevel._load_image = _load_image_lessthan_2_29

    region = slide.read_region((upper_left[0], upper_left[1]), 0, (w_rec, h_rec)).convert('RGB')

Unless I am mistaken, the issue of reading patches >=2**29 is still here. The provided solution seems a little bit off.

Content of openslide_python_fix.py:

from openslide.lowlevel import *
from openslide.lowlevel import _convert


def _load_image_lessthan_2_29(buf, size):
    '''buf must be a mutable buffer.'''
    _convert.argb2rgba(buf)
    return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBA', 0, 1)


def _load_image_morethan_2_29(buf, size):
    '''buf must be a buffer.'''

    # Load entire buffer at once if possible
    MAX_PIXELS_PER_LOAD = (1 << 29) - 1
    # Otherwise, use chunks smaller than the maximum to reduce memory
    # requirements
    PIXELS_PER_LOAD = 1 << 26

    def do_load(buf, size):
        '''buf can be a string, but should be a ctypes buffer to avoid an
        extra copy in the caller.'''
        # First reorder the bytes in a pixel from native-endian aRGB to
        # big-endian RGBa to work around limitations in RGBa loader
        rawmode = (sys.byteorder == 'little') and 'BGRA' or 'ARGB'
        buf = PIL.Image.frombuffer('RGBA', size, buf, 'raw', rawmode, 0, 1)
        # Image.tobytes() is named tostring() in Pillow 1.x and PIL
        buf = (getattr(buf, 'tobytes', None) or buf.tostring)()
        # Now load the image as RGBA, undoing premultiplication
        return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBa', 0, 1)

    # Fast path for small buffers
    w, h = size
    if w * h <= MAX_PIXELS_PER_LOAD:
        return do_load(buf, size)

    # Load in chunks to avoid OverflowError in PIL.Image.frombuffer()
    # https://github.com/python-pillow/Pillow/issues/1475
    if w > PIXELS_PER_LOAD:
        # We could support this, but it seems like overkill
        raise ValueError('Width %d is too large (maximum %d)' %
                         (w, PIXELS_PER_LOAD))
    rows_per_load = PIXELS_PER_LOAD // w
    img = PIL.Image.new('RGBA', (w, h))
    for y in range(0, h, rows_per_load):
        rows = min(h - y, rows_per_load)
        if sys.version[0] == '2':
            chunk = buffer(buf, 4 * y * w, 4 * rows * w)
        else:
            # PIL.Image.frombuffer() won't take a memoryview or
            # bytearray, so we can't avoid copying
            chunk = memoryview(buf)[y * w:(y + rows) * w].tobytes()
        img.paste(do_load(chunk, (w, rows)), (0, y))
    return img

Here is the running time of the function read_region() on different patch sizes using the above fix (hh:mm:ss with 2**29 = 536,870,912):

read_region (h=5800 x w=5478 = 31,772,400) took 0:00:01.027638 . hxw >= 2**29 False
read_region (h=19689 x w26537= 522,486,993) took 0:00:37.363076 . hxw >= 2**29 False
read_region (h=2425 x w=2047 = 4,963,975) took 0:00:00.261494. hxw >= 2**29 False
read_region (h=3862 x w=4022 = 15,532,964) took 0:00:01.041433. hxw >= 2**29 False
read_region (h=32451 x w=23782 = 771,749,682) took 0:00:57.984822. hxw >= 2**29 True.

lunasdejavu · 2019-05-16T07:54:14Z

@sbelharbi
I modified your code a little:
for openslide_python_fix.py:

from openslide.lowlevel import *
from openslide.lowlevel import _convert



def _load_image_lessthan_2_29(buf, size):
    '''buf must be a mutable buffer.'''
    _convert.argb2rgba(buf)
    return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBA', 0, 1)


def _load_image_morethan_2_29(buf, size):
    '''buf must be a buffer.'''

    # Load entire buffer at once if possible
    MAX_PIXELS_PER_LOAD = (1 << 29) - 1
    # Otherwise, use chunks smaller than the maximum to reduce memory
    # requirements
    PIXELS_PER_LOAD = 1 << 26

    def do_load(buf, size):
        '''buf can be a string, but should be a ctypes buffer to avoid an
        extra copy in the caller.'''
        # First reorder the bytes in a pixel from native-endian aRGB to
        # big-endian RGBa to work around limitations in RGBa loader
        rawmode = (sys.byteorder == 'little') and 'BGRA' or 'ARGB'
        buf = PIL.Image.frombuffer('RGBA', size, buf, 'raw', rawmode, 0, 1)
        # Image.tobytes() is named tostring() in Pillow 1.x and PIL
        buf = (getattr(buf, 'tobytes', None) or buf.tostring)()
        # Now load the image as RGBA, undoing premultiplication
        return PIL.Image.frombuffer('RGBA', size, buf, 'raw', 'RGBa', 0, 1)

    # Fast path for small buffers
    w, h = size
    if w * h <= MAX_PIXELS_PER_LOAD:
        return do_load(buf, size)

    # Load in chunks to avoid OverflowError in PIL.Image.frombuffer()
    # https://github.com/python-pillow/Pillow/issues/1475
    if w > PIXELS_PER_LOAD:
        # We could support this, but it seems like overkill
        raise ValueError('Width %d is too large (maximum %d)' %
                         (w, PIXELS_PER_LOAD))
    rows_per_load = PIXELS_PER_LOAD // w
    img = PIL.Image.new('RGBA', (w, h))
    for y in range(0, h, rows_per_load):
        rows = min(h - y, rows_per_load)
        if sys.version[0] == '2':
            chunk = buffer(buf, 4 * y * w, 4 * rows * w)
        else:
            # PIL.Image.frombuffer() won't take a memoryview or
            # bytearray, so we can't avoid copying
            chunk = memoryview(buf)[y * w:(y + rows) * w].tobytes()
        img.paste(do_load(chunk, (w, rows)), (0, y))
    return img

my code:

import openslide
import matplotlib.pyplot as plt
import numpy as np
from openslide_python_fix import _load_image_lessthan_2_29, _load_image_morethan_2_29
def func_read_patch(slide, h, w):
    # Check which _load_image() function to use depending on the size of the region.
    if (h * w) >= 2**29:
        openslide.lowlevel._load_image = _load_image_morethan_2_29
    else:
        openslide.lowlevel._load_image = _load_image_lessthan_2_29


    region = slide.read_region((0,0), 2, (w, h)).convert('RGB')

def main():
    slide = openslide.OpenSlide('D:/breast_cancer_dataset/HER2 contest/testing/05_HER2.ndpi') #读入图片（）

    downsamples=slide.level_downsamples 
    [w, h] = slide.level_dimensions[0] 
    size1 = int(w*(downsamples[0]/downsamples[2]))
    size2 = int(h*(downsamples[0]/downsamples[2]))
    func_read_patch(slide, h, w)
if __name__ == '__main__':
    main()

but it still showed

Traceback (most recent call last):
  File "C:/Users/willy_sung/Documents/openslide_test.py", line 33, in <module>
    main()
  File "C:/Users/willy_sung/Documents/openslide_test.py", line 23, in main
    func_read_patch(slide, h, w)
  File "C:/Users/willy_sung/Documents/openslide_test.py", line 13, in func_read_patch
    region = slide.read_region((0,0), 2, (w, h)).convert('RGB')
  File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\lib\site-packages\openslide\__init__.py", line 223, in read_region
    level, size[0], size[1])
  File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\lib\site-packages\openslide\lowlevel.py", line 258, in read_region
    buf = (w * h * c_uint32)()
MemoryError

it seems that it didn't call the function from openslide_python_fix.py
what mistake did I make?

sbelharbi · 2019-05-17T02:57:11Z

@lunasdejavu

Can you highlight the difference between your code and mine? If I understand well, you changed func_read_patch(slide, h, w) to accept the slide, h, and w. And, you read at level 2 instead of 0.
If I understand well, [w, h] = slide.level_dimensions[0] returns the width and the height of the level 0. However, you'are calling read_region in region = slide.read_region((0,0), 2, (w, h)).convert('RGB') at level 2. So, this has to be consistent. Grab the right dimensions for the requested level.
Not sure what size1 and size2 are doing. Maybe you forget to use them as the downsized dimensions to pass them to func_read_patch(slide, h, w) instead of h, w. You can use directly slide.level_dimensions[2] to get the dimensions of level 2.
The fix above is supposed to work as long as the region fits in the memory. If it does not fit, the fix can do nothing about it. Usually, we read patches. In my code, w_rec, h_rec are the size of a rectangle within the WS. In your code, you are trying to load an entire WSI at level 0!!!!!!!! into the memory. That's extreme, unless you have a very large memory. If you want to see the entire WS, you can use higher levels (6 for instance). But, if you want high resolution regions, use level 0, but you can only load small regions. The size of the regions depends on your available memory. If you process the regions sequentially, think to free the memory after processing each region.
Check which function is called in func_read_patch by putting a print() within the if else statement. for instance. You can add a return region within the function as well. I think It called the right function. However, you have asked probably too much from your machine (load entire WS in the memory, and it did not like it).

Let me know how it goes. Thanks!

lunasdejavu · 2019-05-17T03:16:12Z

@sbelharbi I fixed the problem by modifying lowlevel.py from here
which you have commented before.

sbelharbi · 2019-05-17T03:23:43Z

@lunasdejavu which parts I have commented? If I remember correctly, I copy-pasted the content of lowlevel.py. Make sure that you are comparing the same versions. Can you elaborate on what modifications you did? Thanks!

lunasdejavu · 2019-05-17T03:29:49Z

just changed the code from line 60 to 110, you have commented in another thread before.

sbelharbi · 2019-05-17T03:35:25Z

The merge you linked is not mine.
Not sure what made you think that I have edited lowlevel.py.

markemus · 2019-05-17T15:39:58Z

@lunasdejavu It looks like you got a memory error trying to initialize the buffer for the image. Keep in mind that WSI images in memory decompress to be much larger than they are on disk (they're very sparse so they compress well). You can use my PR if you don't mind messing with your openslide install, or @sbelharbi 's code if you prefer not to. Either approach should work. Regardless, I'm glad you were able to resolve the problem.

Samyssmile · 2019-05-18T22:22:11Z

We also have exact same problem, is there any workarounds for this?

sbelharbi · 2019-05-18T22:36:14Z

Loading an entire WS (especially with high resolution, i.e., level 0) into memory depends on the size of the memory you have. I don't think that Openslide can do anything about it. Your options are: either loading entire WS with low resolution, or split the WS into large patches and process them sequentially. As @markemus mentioned, when decompressed into memory, WS takes a lot of space (way larger than the size on disc).

JiancongWang · 2019-08-02T19:15:27Z

Update the PIL to 6.2.0 dev solve this for me.

ShanQiong · 2019-08-20T00:21:54Z

Update the PIL to 6.2.0 dev solve this for me.

how to ? pip3 install Pillow, the newest version is 6.1.0

Older versions of Pillow raise "MemoryError: Integer overflow in ysize" when Image.frombuffer() is called on large buffers. The test is unconditionally disabled because of its RAM usage, so this is mostly documentation. See #31. Closes #33.

bgilbert · 2020-09-13T18:41:08Z

Thanks for the report. This is fixed by python-pillow/Pillow#3964, which is in Pillow ≥ 6.2.0. I'll close.

rcasero mentioned this issue Apr 9, 2018

OverflowError in PIL.Image.fromarray python-pillow/Pillow#1475

Closed

markemus mentioned this issue Dec 6, 2018

Fix for MemoryError when reading large WSI into memory #45

Closed

bgilbert mentioned this issue Sep 13, 2020

Use more standard __version__ rather than PILLOW_VERSION #33

Closed

bgilbert closed this as completed Sep 13, 2020

bgilbert added the defect label Sep 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`MemoryError: Integer overflow in ysize` reading ndpi image #31

`MemoryError: Integer overflow in ysize` reading ndpi image #31

rcasero commented Apr 9, 2018 •

edited

andanis commented Apr 11, 2018

thomasaiman commented Jun 11, 2018

willgdjones commented Jul 20, 2018 •

edited

markemus commented Nov 9, 2018

markemus commented Nov 14, 2018

Borda commented Nov 15, 2018

markemus commented Nov 15, 2018

sbelharbi commented Feb 8, 2019 •

edited

lunasdejavu commented May 16, 2019 •

edited

sbelharbi commented May 17, 2019

lunasdejavu commented May 17, 2019

sbelharbi commented May 17, 2019

lunasdejavu commented May 17, 2019

sbelharbi commented May 17, 2019

markemus commented May 17, 2019

Samyssmile commented May 18, 2019 •

edited

sbelharbi commented May 18, 2019

JiancongWang commented Aug 2, 2019

ShanQiong commented Aug 20, 2019

bgilbert commented Sep 13, 2020

MemoryError: Integer overflow in ysize reading ndpi image #31

MemoryError: Integer overflow in ysize reading ndpi image #31

Comments

rcasero commented Apr 9, 2018 • edited

Context

Details

andanis commented Apr 11, 2018

thomasaiman commented Jun 11, 2018

willgdjones commented Jul 20, 2018 • edited

markemus commented Nov 9, 2018

markemus commented Nov 14, 2018

Borda commented Nov 15, 2018

markemus commented Nov 15, 2018

sbelharbi commented Feb 8, 2019 • edited

My current fix,

lunasdejavu commented May 16, 2019 • edited

sbelharbi commented May 17, 2019

lunasdejavu commented May 17, 2019

sbelharbi commented May 17, 2019

lunasdejavu commented May 17, 2019

sbelharbi commented May 17, 2019

markemus commented May 17, 2019

Samyssmile commented May 18, 2019 • edited

sbelharbi commented May 18, 2019

JiancongWang commented Aug 2, 2019

ShanQiong commented Aug 20, 2019

bgilbert commented Sep 13, 2020

`MemoryError: Integer overflow in ysize` reading ndpi image #31

`MemoryError: Integer overflow in ysize` reading ndpi image #31

rcasero commented Apr 9, 2018 •

edited

willgdjones commented Jul 20, 2018 •

edited

sbelharbi commented Feb 8, 2019 •

edited

lunasdejavu commented May 16, 2019 •

edited

Samyssmile commented May 18, 2019 •

edited