Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support InvalidCharHandler for reading #145

Open
jhaber opened this issue Apr 7, 2022 · 3 comments
Open

Support InvalidCharHandler for reading #145

jhaber opened this issue Apr 7, 2022 · 3 comments
Labels
pr-welcome Issue for which progress most likely if someone submits a Pull Request

Comments

@jhaber
Copy link

jhaber commented Apr 7, 2022

It looks like InvalidCharHandler can only be set when writing, but not reading. Do you think it makes sense to support this for reading as well? If a user is dealing with a Reader they could do this sort of transformation pretty easily before passing the stream to woodstox. However, it requires their code to understand which characters are invalid (rather than having woodstox be the source of truth for that). And if the user is dealing with an InputStream, they may not have an easy way to do character-based filtering/replacement

@cowtowncoder
Copy link
Member

I am bit hesitant about trying to support fully configurable approach, given complexity of XML character validity rules. But maybe something to fully disable validity checks for, say, textual content, would be ok -- because if so, user could provide custom InputStream (or, more likely, Reader) to implement validation they want and then Woodstox would just take whatever it gets.
To me it seems that validation at Reader is probably way easier to layer than try to make decoder have validation calls.

I probably won't have time to work on this on my own, either way.
But if anyone wants to create a PR that does not add measurable overhead for the default case, I'd of course be happy to help sanity check it & help get merged if and when it makes sense.

@jhaber
Copy link
Author

jhaber commented Apr 7, 2022

Sounds good, thanks for the quick reply

@cowtowncoder
Copy link
Member

No problem.

Also, now that I think about this -- aside from the question of performance, I don't think I am against InvalidCharHandler on per-character basis. If someone has time to implement it (I don't, but I always do my best to find time to review contributions).
It should be possible to hide the complexity behind error reporting functionality; I assume return value could be the character to use and so on.

@cowtowncoder cowtowncoder added the pr-welcome Issue for which progress most likely if someone submits a Pull Request label Jul 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-welcome Issue for which progress most likely if someone submits a Pull Request
Projects
None yet
Development

No branches or pull requests

2 participants