Wikibase: proxy manifest requests #6136

Abbe98 · 2023-11-04T18:41:22Z

This makes the backend proxy requests for Wikibase manifests to allow them to be hosted in MediaWiki itself(removing the CORS requirement).

MediaWiki hosted manifest example(Wikidata): https://www.wikidata.org/wiki/User:Abbe98/openrefine-manifest.json?action=raw

Setting this to draft for now but I would like to hear what others think about this approach before working out the details.

extensions/wikibase/src/org/openrefine/wikibase/commands/FetchManifestCommand.java

wetneb · 2023-11-05T18:31:35Z

I think it makes sense to fetch the manifest from the backend instead of in the frontend.
The implementation you have looks like an open proxy to let the frontend fetch any JSON through the backend. My intuition would be to try to make this more specific to Wikibase manifests, for instance by validating the JSON structure of the manifest before returning it, and perhaps even registering it in the command in the list of known Wikibases.

GitHub identifies a security issue in the fact that we are fetching a user-supplied URL. I guess it makes sense but I am not clear on what the security implications really are. Perhaps the server could serve a maliciously crafted JSON payload which would exploit a vulnerability of our JSON parser, say (like there were similar attacks with exponential expansion of XML documents). But obviously we already do plenty of backend-side fetching of user-supplied URLs so any security issues with that would also apply there.

Abbe98 · 2023-11-05T19:23:38Z

I think it makes sense to fetch the manifest from the backend instead of in the frontend.
The implementation you have looks like an open proxy to let the frontend fetch any JSON through the backend. My intuition would be to try to make this more specific to Wikibase manifests, for instance by validating the JSON structure of the manifest before returning it, and perhaps even registering it in the command in the list of known Wikibases.

I think that #3109 is a natural follow up, however, my intent for now is only to enable the manifest hosting use-case.

GitHub identifies a security issue in the fact that we are fetching a user-supplied URL. I guess it makes sense but I am not clear on what the security implications really are. Perhaps the server could serve a maliciously crafted JSON payload which would exploit a vulnerability of our JSON parser, say (like there were similar attacks with exponential expansion of XML documents). But obviously we already do plenty of backend-side fetching of user-supplied URLs so any security issues with that would also apply there.

I share your view, an exploit here would already effect other parts of the codebase. Just proxying it is probably the safest thing we can do?

wetneb · 2023-11-06T11:13:31Z

Yes, we should not be forwarding any more information by making this request from the backend (no added GET/POST parameters, no cookies because you're using a fresh OkHTTP instance), so I think that's fine.

Abbe98 · 2023-11-06T11:26:21Z

Maybe a user-agent string?

tfmorris · 2023-11-06T17:55:46Z

I agree with @wetneb that some type of restriction/registration of the target domain or payload would be useful. The fact that the current security posture is poor doesn't mean that we shouldn't try to minimize, or at least not increase, the attack surface. Letting the user pick from a set of configured URLs would like resolve the security warning, but if that's not possible, validating the content (a la #3109) would be next best.

What issue # is this associated with? The PR template should have included a placeholder for the issue #.

Abbe98 · 2023-11-25T16:03:23Z

I agree with @wetneb that some type of restriction/registration of the target domain or payload would be useful. The fact that the current security posture is poor doesn't mean that we shouldn't try to minimize, or at least not increase, the attack surface.

In what way would this increase the attack surface?

wetneb

After having a fresh look at this, I think this looks safe enough as it stands. @tfmorris do you have more thoughts about in which way this would be unsafe?

wetneb · 2024-02-28T11:26:55Z

extensions/wikibase/src/org/openrefine/wikibase/commands/FetchManifestCommand.java

+            response.getWriter().write(res.body().string());
+            response.setContentType("application/json");
+            response.setStatus(200);
+            response.setCharacterEncoding("UTF-8");


It would be worth cleaning up this part to just go through our generic utilities to return JSON from a command, which would probably involve parsing the JSON before (which should be fine?)
I think writing the response body before setting the encoding can cause encoding issues (if I remember correctly).

tfmorris · 2024-02-28T17:31:49Z

I've lost the thread here. I'm happy to let you two agree on the best solution.

wetneb · 2024-03-02T09:43:41Z

@Abbe98 I'm assuming you won't have time to work on this, so I'd just make a few cleaning tweaks and merge it unless you have concerns?

wetneb · 2024-04-06T10:22:15Z

I am actually tempted to make this more generic, rather than more specific, turning it into a endpoint to fetch JSON from arbitrary third-party sources and return it to the frontend. That would be a way to eliminate all uses of JSONP in the frontend, which would be a clear security win. (I think @Abbe98 has suggested something similar in the past.)

We could still have some safeguards to limit the size of the data downloaded (to make sure we don't choke up the backend) and ensure that the JSON parsing is suitable for adversarial payloads (which should be the default with Jackson, but it's always worth checking).

Wikibase: proxy manifest requests

3e00f95

github-advanced-security bot found potential problems Nov 4, 2023

View reviewed changes

extensions/wikibase/src/org/openrefine/wikibase/commands/FetchManifestCommand.java Dismissed Show dismissed Hide dismissed

wetneb reviewed Feb 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wikibase: proxy manifest requests #6136

Wikibase: proxy manifest requests #6136

Abbe98 commented Nov 4, 2023 •

edited

wetneb commented Nov 5, 2023

Abbe98 commented Nov 5, 2023

wetneb commented Nov 6, 2023

Abbe98 commented Nov 6, 2023

tfmorris commented Nov 6, 2023

Abbe98 commented Nov 25, 2023

wetneb left a comment

wetneb Feb 28, 2024

tfmorris commented Feb 28, 2024

wetneb commented Mar 2, 2024 •

edited

wetneb commented Apr 6, 2024

Wikibase: proxy manifest requests #6136

Are you sure you want to change the base?

Wikibase: proxy manifest requests #6136

Conversation

Abbe98 commented Nov 4, 2023 • edited

wetneb commented Nov 5, 2023

Abbe98 commented Nov 5, 2023

wetneb commented Nov 6, 2023

Abbe98 commented Nov 6, 2023

tfmorris commented Nov 6, 2023

Abbe98 commented Nov 25, 2023

wetneb left a comment

Choose a reason for hiding this comment

wetneb Feb 28, 2024

Choose a reason for hiding this comment

tfmorris commented Feb 28, 2024

wetneb commented Mar 2, 2024 • edited

wetneb commented Apr 6, 2024

Abbe98 commented Nov 4, 2023 •

edited

wetneb commented Mar 2, 2024 •

edited