Skip to content
This repository has been archived by the owner on Nov 6, 2022. It is now read-only.

Introduce raw data callbacks #285

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

RandoMan70
Copy link

These callbacks can be used when developing transparent HTTP traffic analysis and filtering.
The main goal of it is to handle transmitted data as-is, but having ability to distinguish data between headers, bodies and different requests.

@indutny
Copy link
Member

indutny commented Feb 20, 2016

Thank you for submitting this! May I ask you to write a short example in C to demonstrate how this API could be used?

@RandoMan70
Copy link
Author

I just figured out that we need to change position of callbacks in http_parser_settings structure to keep it backward-compatible. I'll update PR and then will do short example.

...also, i guess it's better leave it as is. It's better to get compile-time error, then having mysterious crashes of uninitialized callbacks.

@RandoMan70
Copy link
Author

Short example on how it can be used:

std::string hdr;

static int headers_raw(http_parser*parser, const char *at, size_t length)
{
    hdr.append(at, length);
}

static int headers_complete(http_parser*parser)
{
    do_anything_when_headers_received();
    send(fd, hdr.c_str(), hdr.size(), 0);
    hdr.clear();
}

static int body_raw(http_parser*parser, const char *at, size_t length)
{
    send(fd, at, length, 0);
}

But when developing complex processing, we need keep in mind, that on_header_raw() callback can be received before on_message_begin(). It's easy to reproduce - just send alone newline symbol before sending request. It happens because of non-caching nature of HTTP parser.

@bnoordhuis
Copy link
Member

The numerous style errors alone make this pull request unsuitable for inclusion in its current shape. As to the functionality, I have no strong opinion on whether it's a good addition.

Something to keep in mind is that http_parser is more strict in what it accepts than most browsers. If you're going to use it for MitM-style analysis or filtering, you may have a hard time.

@RandoMan70
Copy link
Author

@bnoordhuis, I don't insist, but these additions extend the limits of applicability of the parser.
Writing transparent proxy with support of keep-alive and pipelining very difficult (if possible at all) without raw data callbacks. Using raw callbacks it becomes quite easy.

@bnoordhuis
Copy link
Member

I'll let @indutny make the call on whether or not to accept this pull request.

Writing transparent proxy with support of keep-alive and pipelining very difficult (if possible at all) without raw data callbacks.

Is that really true? It's my experience that transparent proxies still frequently need to rewrite parts of the request and the response.

@RandoMan70
Copy link
Author

@bnoordhuis, there are different tasks for proxying. Main goal of raw callbacks is to keep control of request/response sequence without need of reassembling stream before forwarding.

(settings->on_body_raw) && \
(p >= raw_mark) && \
(raw_mark < data + len)) \
{ \

This comment was marked as off-topic.

@indutny
Copy link
Member

indutny commented Feb 27, 2016

@RandoMan70 I don't really object to this change, but on other hand try to avoid touching that code. It is very sensitive code path, which may ruin the performance of the method because of high register pressure.

What do you think about doing this under some build-time define? Clearly this feature is not something that the most of the users will use, so it will make more sense to do it this way.

@socketpair
Copy link

Just store that here: OISF/libhtp#123 exactly the same, but in libhtp!

@socketpair
Copy link

@RandoMan70 can you write me a message ? socketpair@gmail.com

@RandoMan70
Copy link
Author

@indutny it's possible, sure. But don't sure that it can significantly affect performance - checks executed mosly once per call and per request/response switch.

@HAT-DK
Copy link

HAT-DK commented May 3, 2018

@bnoordhuis Any chance that this pull-request ever will be merged into master?
This is exactly what i'm looking for as well.

@bnoordhuis
Copy link
Member

Not in v2.x since it changes the ABI. Maybe in 3.0 if and when that is released.

(It could be put behind a build-time flag but that's mighty inconvenient for some downstream users. Distros already have a hard enough time with the strict and non-strict builds.)

@HAT-DK
Copy link

HAT-DK commented Jan 21, 2019

Any update on adding this pull request, in a release?

@bnoordhuis
Copy link
Member

There are no concrete plans for a v3.x release so far.

@indutny
Copy link
Member

indutny commented Jan 21, 2019

@HAT-DK It isn't mentioned in the readme, but nodejs is moving away from http-parser to llhttp. If you're interested, I could guide you through porting this PR to it.

@socketpair
Copy link

socketpair commented Jan 21, 2019

@asvetlov

Seems, aiohttp should move to this parser too :)

I have create an issue for that:
aio-libs/aiohttp#3561

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants