Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to read CEL file #4639

Open
pennyliangliping opened this issue Feb 28, 2024 · 5 comments
Open

Fail to read CEL file #4639

pennyliangliping opened this issue Feb 28, 2024 · 5 comments

Comments

@pennyliangliping
Copy link

pennyliangliping commented Feb 28, 2024

Setup

I am reporting a problem with Biopython 1.83, Python 3.11, and MacOS
Please see the test file attached.

from Bio.Affy import CelFile

with open("GSM1389608_SS_1.CEL","r") as handle:
    c = CelFile.read(handle)

print(c.ncols, c.nrows)
print(c.intensities)

Get: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 9: invalid continuation byte

from Bio.Affy import CelFile

with open("GSM1389608_SS_1.CEL","rb") as handle:
    c = CelFile.read(handle)

print(c.ncols, c.nrows)
print(c.intensities)

Get: ValueError: Incorrect magic number in Affy Version 4 CEL file

Expected behaviour

read the file successfully

Actual behaviour

Get error message:

Steps to reproduce

GSM1389608_SS_1.CEL.gz

@peterjc
Copy link
Member

peterjc commented Feb 28, 2024

Do you know what kind (version) of CEL file this is? This could be a duplicate of #3720.

@peterjc
Copy link
Member

peterjc commented Feb 28, 2024

Our example CEL v4 file starts:

$ hexdump -C affy_v4_example.CEL | head
00000000  40 00 00 00 04 00 00 00  05 00 00 00 05 00 00 00  |@...............|
00000010  19 00 00 00 f0 02 00 00  47 72 69 64 43 6f 72 6e  |....?...GridCorn|
00000020  65 72 4c 4c 3d 35 31 38  20 31 38 36 36 38 0a 54  |erLL=518 18668.T|
00000030  6f 74 61 6c 58 3d 32 35  36 30 0a 54 6f 74 61 6c  |otalX=2560.Total|
00000040  59 3d 32 35 36 30 0a 41  6c 67 6f 72 69 74 68 6d  |Y=2560.Algorithm|
00000050  3d 50 65 72 63 65 6e 74  69 6c 65 0a 47 72 69 64  |=Percentile.Grid|
00000060  43 6f 72 6e 65 72 55 4c  3d 36 35 39 20 34 36 39  |CornerUL=659 469|
00000070  0a 4f 66 66 73 65 74 59  3d 30 0a 4f 66 66 73 65  |.OffsetY=0.Offse|
00000080  74 58 3d 30 0a 43 6f 6c  73 3d 35 0a 47 72 69 64  |tX=0.Cols=5.Grid|
00000090  43 6f 72 6e 65 72 4c 52  3d 31 38 38 30 30 20 31  |CornerLR=18800 1|

Here hex 0x40 = 64 in decimal.

Your file starts differently:

$ hexdump -C GSM1389608_SS_1.CEL  | head
00000000  3b 01 00 00 00 01 00 00  3d c7 00 00 00 1b 61 66  |;.......=?....af|
00000010  66 79 6d 65 74 72 69 78  2d 63 61 6c 76 69 6e 2d  |fymetrix-calvin-|
00000020  69 6e 74 65 6e 73 69 74  79 00 00 00 36 30 30 30  |intensity...6000|
00000030  30 30 33 30 38 31 37 2d  31 33 34 30 33 33 37 37  |0030817-13403377|
00000040  30 36 2d 30 30 30 30 30  30 36 33 33 34 2d 30 30  |06-0000006334-00|
00000050  30 30 30 31 38 34 36 37  2d 30 30 30 30 30 30 30  |00018467-0000000|
00000060  30 34 31 00 00 00 00 00  00 00 05 00 65 00 6e 00  |041.........e.n.|
00000070  2d 00 55 00 53 00 00 00  2b 00 00 00 19 00 61 00  |-.U.S...+.....a.|
00000080  66 00 66 00 79 00 6d 00  65 00 74 00 72 00 69 00  |f.f.y.m.e.t.r.i.|
00000090  78 00 2d 00 61 00 6c 00  67 00 6f 00 72 00 69 00  |x.-.a.l.g.o.r.i.|

@pennyliangliping
Copy link
Author

Thank you for your quick reply.
the file is download from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE40568
I am not sure about the version. But I think it's V4 right, since I can't see the head by a text editor.

So you mean the file format is not valid? anything I can try for this format?

@peterjc
Copy link
Member

peterjc commented Feb 28, 2024

It doesn't look like what we expect from a V4 CEL file, so I think it is another variant. I've not worked with these, you might be able to interpret the AffyMetrix high level documentation more easily that me, or know some relevant search terms to try.

@pennyliangliping
Copy link
Author

thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants