Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adopt save-and-restore pattern for libxml2 error handlers everywhere #2172

Open
flavorjones opened this issue Jan 13, 2021 · 0 comments
Open

Comments

@flavorjones
Copy link
Member

flavorjones commented Jan 13, 2021

See #2168 and #2169 for details, but the short version is that we should be more rigorous about saving-and-restoring error handlers and error handler metadata around libxml2 calls, in case any are being made recursively within Nokogiri.

For example, these lines in Nokogiri::HTML::Document::EncodingReader are calling HTML::SAX::PushParser to parse a chunk from the IO read callback of a regular document parse:

handler = SAXHandler.new
parser = Nokogiri::HTML::SAX::PushParser.new(handler)
parser << chunk rescue Nokogiri::SyntaxError

To allow users to do similarly complex things, we should always save-and-restore the error callbacks (which are the only global state I can think of that we regularly manipulate).

We're doing this in the HTML::SAX::PushParser class to cover ourselves in the aforementioned case:

Nokogiri_structured_error_func_save_and_set(&handler_state, NULL, NULL);
status = htmlParseChunk(ctx, chunk, size, Qtrue == _last_chunk ? 1 : 0);
Nokogiri_structured_error_func_restore(&handler_state);

typedef struct _libxmlStructuredErrorHandlerState {
void *user_data;
xmlStructuredErrorFunc handler;
} libxmlStructuredErrorHandlerState ;
void init_xml_syntax_error();
void Nokogiri_structured_error_func_save(libxmlStructuredErrorHandlerState *handler_state);
void Nokogiri_structured_error_func_save_and_set(libxmlStructuredErrorHandlerState *handler_state,
void *user_data,
xmlStructuredErrorFunc handler);
void Nokogiri_structured_error_func_restore(libxmlStructuredErrorHandlerState *handler_state);

void
Nokogiri_structured_error_func_save(libxmlStructuredErrorHandlerState *handler_state)
{
/* this method is tightly coupled to the implementation of xmlSetStructuredErrorFunc */
handler_state->user_data = xmlStructuredErrorContext;
handler_state->handler = xmlStructuredError;
}
void
Nokogiri_structured_error_func_save_and_set(libxmlStructuredErrorHandlerState *handler_state,
void *user_data,
xmlStructuredErrorFunc handler)
{
Nokogiri_structured_error_func_save(handler_state);
xmlSetStructuredErrorFunc(user_data, handler);
}
void
Nokogiri_structured_error_func_restore(libxmlStructuredErrorHandlerState *handler_state)
{
xmlSetStructuredErrorFunc(handler_state->user_data, handler_state->handler);
}

This issue is opened to make sure we remember to do this everywhere.

It's somewhat related to wrapping we need to do around any libxml2 callbacks which re-enter the Ruby interpreter and how we handle those exceptions, all of which are detailed at #1610.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant