Writing your own image plugin documentation missing information #4818

altimore · 2020-07-27T15:47:44Z

Hello,

First, thank you guys for the work you put in.
I hope this is the good place to ask.

Today i got to work on files in an exotic format generated by an old RIP software.

I checked the doc and it seems there is exactly what i need in the handbook chapter called "Writing your own image plugin".

However, the documentation is not explaining everything so I'm left with questions and i hope i could get some pointers here :

What should be the recommended way of loading the new plugin ? Should we clone the pillow repo and copy it in the PIL folder ? or should we call it in some way to add it in the register before calling Image.open() ?
the decoders documentation specify that we should set the data to the PIL's internal pixel layout when writing our own, I tried to check in the source code and it seems that the function set_as_raw from the PyDecoder class is the way to go but it doesn't describe the layout itself nor the way to fill it.

What did you do?

Search the documentation, and browse the source code to try to find the information.

What did you expect to happen?

It would be nice if the page/example could provide a bit more details.

What actually happened?

What are your OS, Python and Pillow versions?

OS: Arch Linux
Python: 3.8.3
Pillow: 7.2.0

The text was updated successfully, but these errors were encountered:

hugovk · 2020-07-27T16:25:54Z

I can quickly answer the first question:

What should be the recommended way of loading the new plugin ? Should we clone the pillow repo and copy it in the PIL folder ? or should we call it in some way to add it in the register before calling Image.open() ?

The second way is better. For example, with SpamImagePlugin.py in my directory, I can import it in my code, and it'll register the plugin and extension:

1.py:

from PIL import Image
import SpamImagePlugin

with Image.open("1.spam") as im:
    print(im)
    print(im.format)
    print(im.format_description)

1.spam:

SPAM 1 1 1 1

$ python 1.py
<SpamImagePlugin.SpamImageFile image mode=1 size=1x1 at 0x1023D5E50>
SPAM
Spam raster image

radarhere · 2020-08-10T11:56:09Z

Are you writing a decoder in C, or in Python?

You've referred to https://pillow.readthedocs.io/en/stable/handbook/writing-your-own-file-decoder.html#the-raw-decoder

The raw mode field is used to determine how the data should be unpacked to match PIL’s internal pixel layout.

I don't think that's the same as saying that you bear the responsibility of mapping your data to PIL's internal pixel layout.

The 'raw mode' is how the image data is stored in the file. Pillow converts this into a given 'mode', standardising things a bit. If you look at the list of unpackers, you can see that for 32 bits, "LA;16B", "RGBA", "RGBa", "BGRa", "RGBA;I", "RGBA;L", "BGRA", "ARGB", "ABGR", "YCCA;P" are all converted into "RGBA". Much easier to deal with in Pillow's conversation operations.

However, it is worth considering that the documentation also states

Note that for the most common cases, the raw mode is simply the same as the mode.

Do you think that your mode isn't covered by anything already in Pillow?

altimore · 2020-08-10T12:48:11Z

Hello,

Thanks for answering,
@hugovk : your method works i managed to load the decoder i made by copying the ones in the PIL folder and adapting a bit.

@radarhere : I'm trying in Python, I'm not familiar enough with C.

From what i gathered

it seems that the raw decoder loop through the sequences in the source file to put it in the memory 4 bytes by 4 bytes in the CMYK mode.
I understood that the file store it in a format and pillow have another internal representation in memory.
Pillow in memory storage is a sequence of data pixel by pixel which we can display with the Image.tobytes() function.
The in memory representation of CMYK is a 4 bytes sequence of these four colors repeating for the length of the file.

The format I'm working on does not correspond to the ones listed in the pillow mode list, it is represented as a line by line sequence with colors separated in a different order as follow :

Lets take an image of 100 pixel width and 50 pixels height (or 50 lines)
Each sequence corresponding to a line of the picture is a

100 bytes containing the yellow of the line
100 bytes containing the magenta,
100 bytes containing the cyan
100 byte containing the black

So i think i need to loop by 400 bytes and group it by pixel (CMYK) to put it in pillow in memory image object or to feed it as a file to the raw decoder.

I did the mapping and i can print it to the console but i didn't yet figure how to feed it to pillow.

When i use the Image.new() followed by Image.putdata() instructions in the decoder the Image.tobytes() always returns a sequence of zeros as if i didn't use the putdata() one.
I started to dig in the Image.tobytes() code which seems to call an encoding function from the raw encoder but didn't finish yet.

I'll let you know if i find a way.

radarhere · 2020-08-14T22:33:56Z

That actually sounds like "CMYK;L", except with a different order - YMCK;L. Maybe this is too simple for your situation because I don't have the specifics, but I think you could just write a plugin loading CMYK;L and then swap the channels.
Not sure what the way forward to resolving this issue is. Did you want to share the format and a sample image (ideally with a standard format version of the same picture for comparison), or do you have enough for now?

Piolie · 2020-11-10T23:58:08Z

Hi, I have a similar situation so I'm posting here, guessing opening another issue would be redundant.

I'm trying to write my own Python plugin to work with a somewhat exotic image format and also found the documentation lacking. The format details are not relevant.

Following the SpamImagePlugin.py example and also taking a look at PpmImagePlugin.py I managed to write a working _open function, register it, etc. Nonetheless, there is not much in the docs as to how to implement the decoder. (A mock decoder for SpamImagePlugin.py would be a nice addition to the documentation.)

For that, I found a related PR that decodes ASCII/Plain PPM and so was able to get a basic decoder for my format working. However, there are some things I don't understand:

_pulls_fd is set to True. Apparently, this EXPERIMENTAL feature overrides Pillow's behavior of using a bytes parameter (buffer). What is the purpose of this variable?
It uses Image.putdata to load the decoded information, but I'm not sure this is how it's supposed to be done. Shouldn't either frombytes method or function be preferred?

Suppose inside decoder.decode I process buffer and save the results in data. data is a bytes (or bytearray) variable with one byte per pixel (either b"\x00" for black or b"\xFF" for white in bitonal mode) or three bytes per pixel (for RGB mode). How do I save this decoded info to the image object? I tried:
- self.im.frombytes(mode, size, data): fails with exception AttributeError: 'ImagingCore' object has no attribute 'frombytes'.
- self.im = Image.frombytes(mode, size, data): does nothing.
- self.im.putdata(data): works, but should I use it?

Documentation on writing your own encoder would also be nice. PpmImagePlugin.py simply delegates to ImageFile._save() so it's not much help. The PixelAccess class states in a note that "If you are looping over all of the pixels in an image, there is likely a faster way using other parts of the Pillow API". Sounds like an encoder might want to do just that. A hint as to which other parts should be used would help.

I have trouble understanding the intended use of the tile attribute. Documenting it in the concepts doc would be great.

I may well be that I am underqualified to tackle this. Either way, any help would be appreciated.

nulano · 2020-11-11T02:16:46Z

I believe docs/example/DdsImagePlugin.py should answer most of your questions. This file is currently only accessible through this repo and should probably be added to the docs.

Piolie · 2020-11-11T04:23:31Z

Thank you for your swift reply. I agree that example should totally be mentioned in the docs.

I'm having some issues setting the rawmode for the raw decoder. The docs state several different modes, none of which conform to my data, which is 1 byte per pixel. I tried with "L" but got ValueError: unknown raw mode.

Diving a bit deeper into the Unpack.c sources I found many other complementary modes. In the bitonal case, using self.set_as_raw(bytes(data), rawmode="1;8") gets the correct results. However, several other modes declared there produce the prior exception. What is happening here? I'm running Pillow 8.0.1 with Python 3.9.0 on Windows 10.

nulano · 2020-11-11T05:08:45Z

The raw decoder takes two parameters: the image mode and the raw mode (specified in the first and second columns of the Unpack.c table) as seen below:

Pillow/src/PIL/ImageFile.py

Lines 688 to 690 in 2d6e51e

    
           if not rawmode: 
        
               rawmode = self.mode 
        
           d = Image._getdecoder(self.mode, "raw", (rawmode))

none of which conform to my data, which is 1 byte per pixel

Do you mean 1 bit per pixel or 1 byte per pixel? Since rawmode="1;8" worked for you, it sounds like the image has mode="1" which is 1 bit per pixel; rawmode="L" corresponds to 1 byte per pixel, or image mode="L".

The message could be a bit clearer, perhaps ValueError: unknown raw mode for the given image mode.

Piolie · 2020-11-11T15:08:26Z

I decoded my (bitonal) image to one byte per pixel. It's just a bytearray of b"\x00" and b"\xFF". The docs sate that for image mode 1 it's 1-bit pixels, black and white, stored with one pixel per byte. So I should stick to "1", right?

I seem to have mixed up the image and raw modes a bit. Maybe that distinction should be made more explicit in the image mode and rawmode docs.

Upon further experimentation, I found that either

self.mode = "1" and rawmode = "1;8", or
self.mode = "L" and rawmode = "L"

yield more or less the same result. The first option considers b"\x00" as black and anything else as white. The second is grayscale, so any value between 0 and 255 works.

However, mixing inconsistent modes (ie: self.mode = "1" and rawmode = "L") raises the exception. A message such as the one you suggest would make it easier to spot the inconsistency.

Thank you very much for your help. I think I've got it now. Should I open a PR with my suggested modifications to the documentation?

Edit: I misunderstood a part of your comment.

nulano · 2020-11-11T15:22:42Z

Yes, it sounds like your image is 1 bit per pixel stored in individual bytes, which is rawmode="1;8".

The one pixel per byte comment for image mode="1" refers to the loaded representation in the internal Pillow object. There are multiple corresponding rawmodes for this mode, with the default rawmode="1" having 8 pixels per byte, differing from the internal representation.

radarhere · 2021-04-30T09:46:07Z

Let me try and round up the requests for better documentation

How to import a new plugin? I've documented this in Improved example plugin documentation #4847
How to use PyDecoder? See Writing your own image plugin documentation missing information #4818 (comment)
Why is docs/example/DdsImagePlugin not linked to? Linked in Improved codec documentation #6094
What the purpose of _pulls_fd? Improved codec documentation #6094 has removed the "Experimental" label and improved documentation for this.
How to use Image.putdata? Why not frombytes? Discussed in Writing your own image plugin documentation missing information #4818 (comment)
Documentation on writing your own encoder in C (in Python can be part of PyEncoder doesn't exist #4059) Done in Improved codec documentation #6094
What are the faster ways in "If you are looping over all of the pixels in an image, there is likely a faster way using other parts of the Pillow API" This has been documented in Document alternatives to accessing individual pixels #6099

I have trouble understanding the intended use of the tile attribute. Documenting it in the concepts doc would be great.

It seems to be documented well enough at in Writing Your Own Image Plugin - which is exactly what you were doing when you needed this information?

There is also a request for an error message to be clearer. I've created #5457 for that.

Piolie · 2021-05-03T06:29:56Z

How to use PyDecoder?
Why is docs/example/DdsImagePlugin not linked to?

Perhaps the SpamImagePlugin example could be extended to decode a simple format. But the reference to DdsImagePlugin set me in the right direction, so maybe it would be enough.

What the purpose of _pulls_fd?

Yes. Playing around I managed to get a decoder working with _pulls_fd = False. However, I found it a bit harder to implement (having to keep the state between calls) and don't see what the need/gains of doing it this way would be. Besides, the docs state it's experimental but, as far as I can see, it's being used by MspImagePlugin.py, SgiImagePlugin and BlpImagePlugin. Aren't these in production?

How to use Image.putdata? Why not frombytes?

Yes.

Documentation on writing your own encoder in C (in Python can be part of #4059)

This would be a very nice perk. Although by the looks of it it's gonna need lots of writing to be useful.

What are the faster ways in "If you are looping over all of the pixels in an image, there is likely a faster way using other parts of the Pillow API"?

Also yes.

It seems to be documented well enough at in Writing Your Own Image Plugin - which is exactly what you were doing when you needed this information?

At the moment I wrote that I was asking myself, why a list of descriptors? when would I need more than one? I was obviously not seeing the big picture, but the docs could at least make a case for the necessity of the list and refer to a format implementation that makes use of multiple tiles.

I would add:

What is the purpose of the cleanup methods in the Python encoder/decoder? (edit from @radarhere: done in Improved codec documentation #6094)

Thanks for following up on this issue.

radarhere · 2022-02-27T04:42:20Z

When i use the Image.new() followed by Image.putdata() instructions in the decoder the Image.tobytes() always returns a sequence of zeros as if i didn't use the putdata() one.

This is hard to debug without a specific example. This following code does not return zeros.

from PIL import Image
im = Image.new("L", (256, 256))
im.putdata(list(range(256)) * 256)
print(im.tobytes())

It uses Image.putdata to load the decoded information, but I'm not sure this is how it's supposed to be done. Shouldn't either frombytes method or function be preferred?

Suppose inside decoder.decode I process buffer and save the results in data. data is a bytes (or bytearray) variable with one byte per pixel (either b"\x00" for black or b"\xFF" for white in bitonal mode) or three bytes per pixel (for RGB mode). How do I save this decoded info to the image object? I tried: - self.im.frombytes(mode, size, data): fails with exception AttributeError: 'ImagingCore' object has no attribute 'frombytes'. - self.im = Image.frombytes(mode, size, data): does nothing. - self.im.putdata(data): works, but should I use it?

self.im.frombytes is trying to call frombytes on the image object in the C layer, which as the error says, doesn't exist.
self.im = Image.frombytes(mode, size, data) is trying to assign a Python object (the result of Image.frombytes() to a C object (self.im)

I don't quite see why there is confusion between putdata and frombytes. putdata takes a sequence, and frombytes takes bytes to be decoded, so their purpose is different. Yes, when you are using the "raw" decoder, those two things are similar.

If I'm following correctly, then you settled on using set_as_raw in your PR. As used in the example DdsImagePlugin, I think that is the correct choice, for this reason - it is using the Python API. BLP, DDS and MSP all use this. The only PyDecoder example we have that does not is SGI.
The C API is not intended to be stable. self.im.putdata or trying to assign to self.im are both making use of the C API.

I've created PR #6094 to document this.

radarhere · 2022-02-28T01:24:21Z

_pulls_fd is set to True. Apparently, this EXPERIMENTAL feature overrides Pillow's behavior of using a bytes parameter (buffer). What is the purpose of this variable?

I've added a comment in #6094 about this providing more freedom, but also possibly leading to more memory usage, depending on how your codec is written.

Besides, the docs state it's experimental but, as far as I can see, it's being used by MspImagePlugin.py, SgiImagePlugin and BlpImagePlugin. Aren't these in production?

I've pushed a commit to #6094 to remove the experimental label.

I've also pushed commits to #6094 adding a note using cleanup(), linking to DdsImagePlugin.py and documenting writing your own encoder in C.

It seems to be documented well enough at in Writing Your Own Image Plugin - which is exactly what you were doing when you needed this information?

At the moment I wrote that I was asking myself, why a list of descriptors? when would I need more than one? I was obviously not seeing the big picture, but the docs could at least make a case for the necessity of the list and refer to a format implementation that makes use of multiple tiles.

I've pushed a commit to #6094 linking to PsdImagePlugin, which uses multiple tiles. I think the best answer to your question might be that multiple tiles are helpful when working within the framework of our unpackers. Multiple tiles would have appeared particularly unhelpful to you because you were also thinking about writing your own decoder, not just a plugin.

I've created #6099 to talk about alternatives to accessing individual pixels.

radarhere · 2022-03-12T07:31:00Z

How to use PyDecoder?
Why is docs/example/DdsImagePlugin not linked to?

Perhaps the SpamImagePlugin example could be extended to decode a simple format. But the reference to DdsImagePlugin set me in the right direction, so maybe it would be enough.

SpamImagePlugin uses the "raw" decoder, and I think there is value in having a simple plugin example. The user may not need to write a custom decoder.

DdsImagePlugin is now linked to, and so is BlpImagePlugin. There is also potentially another example in the Pillow codebase with #6102.

The documentation is improved now. If there are any suggestions about how to improve it further, let us know.

radarhere added the Documentation label Jul 28, 2020

radarhere changed the title ~~Writing your own image plugin documentation missing informations~~ Writing your own image plugin documentation missing information Aug 6, 2020

radarhere mentioned this issue Aug 10, 2020

Improved example plugin documentation #4847

Merged

radarhere mentioned this issue Apr 30, 2021

Clarified error message #5457

Merged

radarhere added Needs Documentation and removed Documentation labels Feb 27, 2022

radarhere mentioned this issue Feb 27, 2022

Improved codec documentation #6094

Merged

radarhere mentioned this issue Mar 1, 2022

Document alternatives to accessing individual pixels #6099

Merged

radarhere closed this as completed Mar 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing your own image plugin documentation missing information #4818

Writing your own image plugin documentation missing information #4818

altimore commented Jul 27, 2020 •

edited by hugovk

hugovk commented Jul 27, 2020

radarhere commented Aug 10, 2020

altimore commented Aug 10, 2020

radarhere commented Aug 14, 2020

Piolie commented Nov 10, 2020 •

edited by radarhere

nulano commented Nov 11, 2020

Piolie commented Nov 11, 2020

nulano commented Nov 11, 2020

Piolie commented Nov 11, 2020 •

edited

nulano commented Nov 11, 2020

radarhere commented Apr 30, 2021 •

edited

Piolie commented May 3, 2021 •

edited by radarhere

radarhere commented Feb 27, 2022

radarhere commented Feb 28, 2022 •

edited

radarhere commented Mar 12, 2022

Writing your own image plugin documentation missing information #4818

Writing your own image plugin documentation missing information #4818

Comments

altimore commented Jul 27, 2020 • edited by hugovk

What did you do?

What did you expect to happen?

What actually happened?

What are your OS, Python and Pillow versions?

hugovk commented Jul 27, 2020

radarhere commented Aug 10, 2020

altimore commented Aug 10, 2020

radarhere commented Aug 14, 2020

Piolie commented Nov 10, 2020 • edited by radarhere

nulano commented Nov 11, 2020

Piolie commented Nov 11, 2020

nulano commented Nov 11, 2020

Piolie commented Nov 11, 2020 • edited

nulano commented Nov 11, 2020

radarhere commented Apr 30, 2021 • edited

Piolie commented May 3, 2021 • edited by radarhere

radarhere commented Feb 27, 2022

radarhere commented Feb 28, 2022 • edited

radarhere commented Mar 12, 2022

altimore commented Jul 27, 2020 •

edited by hugovk

Piolie commented Nov 10, 2020 •

edited by radarhere

Piolie commented Nov 11, 2020 •

edited

radarhere commented Apr 30, 2021 •

edited

Piolie commented May 3, 2021 •

edited by radarhere

radarhere commented Feb 28, 2022 •

edited