Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IllegalCharacterError raised when exporting xlsx #370

Open
MohiniLimbodia opened this issue Jul 8, 2019 · 2 comments
Open

IllegalCharacterError raised when exporting xlsx #370

MohiniLimbodia opened this issue Jul 8, 2019 · 2 comments

Comments

@MohiniLimbodia
Copy link

MohiniLimbodia commented Jul 8, 2019

Using tablib version 0.13.0
The issue is with Control Characters and Surrogates

Steps to reproduce:

from tablib import Dataset
data=Dataset((u'\x1f',),)
data.export('xlsx')

Stack Trace

IllegalCharacterErrorTraceback (most recent call last)
<ipython-input-1-9b9694f739ff> in <module>()
      1 from tablib import Dataset
      2 data=Dataset((u'\x1f',),)
----> 3 data.export('xlsx')

/srv/jupyter/local/lib/python2.7/site-packages/tablib/core.pyc in export(self, format, **kwargs)
    466             raise UnsupportedFormat('Format {0} cannot be exported.'.format(format))
    467 
--> 468         return export_set(self, **kwargs)
    469 
    470     # -------

/srv/jupyter/local/lib/python2.7/site-packages/tablib/formats/_xlsx.pyc in export_set(dataset, freeze_panes)
     41     ws.title = dataset.title if dataset.title else 'Tablib Dataset'
     42 
---> 43     dset_sheet(dataset, ws, freeze_panes=freeze_panes)
     44 
     45     stream = BytesIO()

/srv/jupyter/local/lib/python2.7/site-packages/tablib/formats/_xlsx.pyc in dset_sheet(dataset, ws, freeze_panes)
    145                         cell.value = unicode('%s' % col, errors='ignore')
    146                 except TypeError:
--> 147                     cell.value = unicode(col)

/srv/jupyter/local/lib/python2.7/site-packages/openpyxl/cell/cell.pyc in value(self, value)
    292     def value(self, value):
    293         """Set the value and infer type and display options."""
--> 294         self._bind_value(value)
    295 
    296     @property

/srv/jupyter/local/lib/python2.7/site-packages/openpyxl/cell/cell.pyc in _bind_value(self, value)
    195 
    196         elif isinstance(value, STRING_TYPES):
--> 197             value = self.check_string(value)
    198             self.data_type = self.TYPE_STRING
    199             if len(value) > 1 and value.startswith("="):

/srv/jupyter/local/lib/python2.7/site-packages/openpyxl/cell/cell.pyc in check_string(self, value)
    158         value = value[:32767]
    159         if next(ILLEGAL_CHARACTERS_RE.finditer(value), None):
--> 160             raise IllegalCharacterError
    161         return value
    162 

IllegalCharacterError:
@leonardoarroyo
Copy link

leonardoarroyo commented Jul 15, 2019

Also getting this. Openpyxl detects illegal characters with the following regex:

ILLEGAL_CHARACTERS_RE = re.compile(r'[\000-\010]|[\013-\014]|[\016-\037]')

I'm using django-import-export, which in turn uses tablib, which uses openpyxl. Still trying to figure out who should handle the data cleaning.

@leonardoarroyo
Copy link

leonardoarroyo commented Jul 15, 2019

I fixed the issue by cleaning the data before it gets sent to tablib.

If anyone else is having this issue with django-import-export, you can clean your fields by overriding export_field on your resource.

from import_export import resources
from openpyxl.cell.cell import ILLEGAL_CHARACTERS_RE

class CleanModelResource(resources.ModelResource):
  def export_field(self, field, obj):
    v = super(CleanModelResource, self).export_field(field, obj)
    if type(v) == str:
      v = ILLEGAL_CHARACTERS_RE.sub('', v)
    return v

willtho89 added a commit to willtho89/drf-renderer-xlsx that referenced this issue Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants