Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wikibase: proxy manifest requests #6136

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Abbe98
Copy link
Member

@Abbe98 Abbe98 commented Nov 4, 2023

This makes the backend proxy requests for Wikibase manifests to allow them to be hosted in MediaWiki itself(removing the CORS requirement).

MediaWiki hosted manifest example(Wikidata): https://www.wikidata.org/wiki/User:Abbe98/openrefine-manifest.json?action=raw

Setting this to draft for now but I would like to hear what others think about this approach before working out the details.

@wetneb
Copy link
Sponsor Member

wetneb commented Nov 5, 2023

I think it makes sense to fetch the manifest from the backend instead of in the frontend.
The implementation you have looks like an open proxy to let the frontend fetch any JSON through the backend. My intuition would be to try to make this more specific to Wikibase manifests, for instance by validating the JSON structure of the manifest before returning it, and perhaps even registering it in the command in the list of known Wikibases.

GitHub identifies a security issue in the fact that we are fetching a user-supplied URL. I guess it makes sense but I am not clear on what the security implications really are. Perhaps the server could serve a maliciously crafted JSON payload which would exploit a vulnerability of our JSON parser, say (like there were similar attacks with exponential expansion of XML documents). But obviously we already do plenty of backend-side fetching of user-supplied URLs so any security issues with that would also apply there.

@Abbe98
Copy link
Member Author

Abbe98 commented Nov 5, 2023

I think it makes sense to fetch the manifest from the backend instead of in the frontend.
The implementation you have looks like an open proxy to let the frontend fetch any JSON through the backend. My intuition would be to try to make this more specific to Wikibase manifests, for instance by validating the JSON structure of the manifest before returning it, and perhaps even registering it in the command in the list of known Wikibases.

I think that #3109 is a natural follow up, however, my intent for now is only to enable the manifest hosting use-case.

GitHub identifies a security issue in the fact that we are fetching a user-supplied URL. I guess it makes sense but I am not clear on what the security implications really are. Perhaps the server could serve a maliciously crafted JSON payload which would exploit a vulnerability of our JSON parser, say (like there were similar attacks with exponential expansion of XML documents). But obviously we already do plenty of backend-side fetching of user-supplied URLs so any security issues with that would also apply there.

I share your view, an exploit here would already effect other parts of the codebase. Just proxying it is probably the safest thing we can do?

@wetneb
Copy link
Sponsor Member

wetneb commented Nov 6, 2023

Yes, we should not be forwarding any more information by making this request from the backend (no added GET/POST parameters, no cookies because you're using a fresh OkHTTP instance), so I think that's fine.

@Abbe98
Copy link
Member Author

Abbe98 commented Nov 6, 2023

Maybe a user-agent string?

@tfmorris
Copy link
Member

tfmorris commented Nov 6, 2023

I agree with @wetneb that some type of restriction/registration of the target domain or payload would be useful. The fact that the current security posture is poor doesn't mean that we shouldn't try to minimize, or at least not increase, the attack surface. Letting the user pick from a set of configured URLs would like resolve the security warning, but if that's not possible, validating the content (a la #3109) would be next best.

What issue # is this associated with? The PR template should have included a placeholder for the issue #.

@Abbe98
Copy link
Member Author

Abbe98 commented Nov 25, 2023

I agree with @wetneb that some type of restriction/registration of the target domain or payload would be useful. The fact that the current security posture is poor doesn't mean that we shouldn't try to minimize, or at least not increase, the attack surface.

In what way would this increase the attack surface?

Copy link
Sponsor Member

@wetneb wetneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After having a fresh look at this, I think this looks safe enough as it stands. @tfmorris do you have more thoughts about in which way this would be unsafe?

response.getWriter().write(res.body().string());
response.setContentType("application/json");
response.setStatus(200);
response.setCharacterEncoding("UTF-8");
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth cleaning up this part to just go through our generic utilities to return JSON from a command, which would probably involve parsing the JSON before (which should be fine?)
I think writing the response body before setting the encoding can cause encoding issues (if I remember correctly).

@tfmorris
Copy link
Member

I've lost the thread here. I'm happy to let you two agree on the best solution.

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 2, 2024

@Abbe98 I'm assuming you won't have time to work on this, so I'd just make a few cleaning tweaks and merge it unless you have concerns?

@wetneb
Copy link
Sponsor Member

wetneb commented Apr 6, 2024

I am actually tempted to make this more generic, rather than more specific, turning it into a endpoint to fetch JSON from arbitrary third-party sources and return it to the frontend. That would be a way to eliminate all uses of JSONP in the frontend, which would be a clear security win. (I think @Abbe98 has suggested something similar in the past.)

We could still have some safeguards to limit the size of the data downloaded (to make sure we don't choke up the backend) and ensure that the JSON parsing is suitable for adversarial payloads (which should be the default with Jackson, but it's always worth checking).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants