Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle local images in rustdoc output #3397

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

GuillaumeGomez
Copy link
Member

@GuillaumeGomez GuillaumeGomez commented Feb 27, 2023

Tracking issue

cc @rust-lang/rustdoc

Rendered

@ehuss ehuss added the T-rustdoc Relevant to rustdoc team, which will review and decide on the RFC. label Feb 27, 2023

Since the local images are all put in the same folder, if the same is imported from different crates, the content won't be duplicated since they have the same name and the same hash.

If the path isn't referring to a file, a warning will be emitted and rustdoc will left the path unchanged in the generated documentation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this result in relative links to rustdoc pages that aren't using intra-doc links warning? Warning for these is fine, but should ideally be able to say "this looks like an intra-doc link, use one instead."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't understand what you mean. Intra-doc links is a feature used on links whereas this feature is only applied on image links. For example:

/// ![image](../relative-link-to-image)
///
/// [intra_doc_link_type]

@kpreid
Copy link

kpreid commented Mar 4, 2023

Additional prior art: the library embed-doc-image accomplishes the same goal, by means of embedding the images in data: URLs in the doc markdown, so that they need not be picked up by rustdoc itself.

Unrelated to the above, it would be neat if there was a way to embed images that were generated by the crate's own code. For crates where such generation is at all within their scope, this would both allow for “guaranteed fresh” sample output and charts (much like doc-tests guarantee that examples work), and remove the need for documentation images to be included in the .crate package file (increasing its download size whether or not anyone ever looks at the documentation) or in the source repository (increasing its history, especially if the images need frequent changes).

I imagine that this is implausible for now because it would require Cargo and rustdoc to cooperate on a multi-step process that makes cargo doc into something more like cargo test, but I thought I'd mention it in case it prompts any further ideas.

@GuillaumeGomez
Copy link
Member Author

@kpreid: Do you mean images that are generated by the build.rs script? If so, aren't OUT_DIR and CARGO_TARGET_DIR env variables enough to handle it? But in any case, from what I can see, you can do it like follows:

  • generate images in build.rs when running in doc mode
  • reference to generated images from the documentation using relative paths

If rustdoc sees a relative path, it'll copy the image into the target directory in any case so that should cover your use case as well. Unless I missed something?

If it's not possible for any reason (please explain why), are you suggesting to add an option to rustdoc which would allow to copy local files to the rustdoc output folder?

@kpreid
Copy link

kpreid commented Mar 4, 2023

Do you mean images that are generated by the build.rs script?

No, I mean images generated by code that is dependent on the code in the library or binary crate being documented (in the same way that integration tests are dependent on said code) — so it would be well after the build script stage. I doubt this feature I am describing should be part of this RFC; I am just mentioning it as a future possibility that would be powerful enough that, if there is any consideration here that might make that more or less feasible, perhaps it should be considered now.

@GuillaumeGomez
Copy link
Member Author

Sure. I'm still not sure to fully understand what you explained but I don't mind adding this as part of a future extension. How would you word it?

@kpreid
Copy link

kpreid commented Mar 4, 2023

I don't have any specific changes in mind — I just wanted to raise the idea to get people thinking about it for the future. On re-reading, the only specific thing I would suggest in this RFC is to make it clear that given ![resource title](path), the path is not always a relative path to a local file, but that it might also be

  • an absolute URL (as is currently supported and is presumably unchanged by this RFC)
  • a Rust item path (in the manner of existing intra-doc links, with the semantics of that not yet defined and currently an error treated like any broken link)

Also, I think the first part of my original comment may have been missed; I intended to suggest that the prior art section should discuss the library embed-doc-image which currently accomplishes the same goal, by means of embedding the images in data: URLs in the doc markdown, so that they need not be picked up by rustdoc itself.

The disadvantages of that library which this RFC would improve on are that all image data has to pass through rustc and a proc macro (inefficiency), and data URLs (limited maximum size, not viewable outside a browser, a burden on anything manipulating the HTML).

@GuillaumeGomez
Copy link
Member Author

a Rust item path (in the manner of existing intra-doc links, with the semantics of that not yet defined and currently an error treated like any broken link)

I don't get this part: it doesn't understand the concept of rust item, only URLs and local paths (either relative or absolute). If the path provided isn't found, it will warn and keep it as is.

an absolute URL (as is currently supported and is presumably unchanged by this RFC)

It is written explicitly in the RFC:

The path could be any relative or absolute file path. For example, to include...

Also, I think the first part of my original comment may have been missed; I intended to suggest that the prior art section should discuss the library embed-doc-image which currently accomplishes the same goal, by means of embedding the images in data: URLs in the doc markdown, so that they need not be picked up by rustdoc itself.

I'll add it then.

@GuillaumeGomez GuillaumeGomez changed the title Create RFC for bundling local images in rustdoc output Bundle local images in rustdoc output Mar 16, 2023
@ijackson
Copy link

The Motivation section discusses things other than images: specifically, scripts. But then the actual proposed new feature only works for images and has a lot of image-specific special casing.

Can the RFC text please discuss how to handle scripts, or other resources that aren't images? There's some discussion of this in "Possible Extensions" but I think that approach is unlikely to happen soon.

IMO ideally there would be a way to tell rustdoc to "just copy this file into a known place in the output, preserving its leafname".

@GuillaumeGomez
Copy link
Member Author

It's mentioned in the motivations section because it's something that we could want into the future. However, the approach for handling non-image files would be very different since there is no standard markdown syntax for it, which would force us to first define this syntax. So instead of going all at once in here, I think it's much better to first land this so we can have the code ready in case we want to add support for other files than images.

But I prefer to warn you beforehand: including scripts comes with security concerns and that will be very complicated to go forward with it. I think it'll very likely never be approved. Anyway, discussion for another day. :)

@GuillaumeGomez
Copy link
Member Author

There weren't many comments so I guess we can start the last part of the process.

@rfcbot merge

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 27, 2023

Team member @GuillaumeGomez has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels Mar 27, 2023
@jsha
Copy link

jsha commented May 18, 2023

Thanks for working on this. I really support the goals - consistent reliable images and scripts across local, self-hosted, and docs.rs.

But I'm dissatisfied with the implementation plan for docs.rs: embedding images in the .crate file uploaded to crates.io. I think it unnecessarily bloats .crate files, which wastes bandwidth for crates.io, for end-users, and for CI platforms. Docs.rs downloads each crate once; the various other users of crates download them millions of times. We shouldn't include bytes that are only useful for documentation. Can we find a different solution?

When we first started talking about this issue I was not worried about crate size because I figured this would be mostly used for project icons, and they would be ~5kB of SVG. But thinking about it more I realize some crates will want larger images to illustrate complex concepts. Most doc authors are not experts in minimizing image files, and rustdoc won't have any way to automatically minimize them. And I'd really like to have a rigorous way to use scripts, too; and useful scripts like KaTeX are in the ~200kB range.

IIRC crates.io's limit on crate size is 50MB. In some sense that's good since it means there's plenty of room for doc resources. But on the other hand it's bad since it would allow making very big crate files filled up with doc resources. Of course it is currently allowed to make very big crates anyhow, but this feature would create more incentive to do so.

@rfcbot concern Crate size

The main constraints that are challenging:

  • For docs.rs, we would like the doc build process to not have network access. It should receive a .crate file as input on the filesytem, and output documentation on the filesystem.
  • Local / self-hosted doc builds sometimes include docs for dependencies. When building dependencies, rustdoc only "knows" about the .crate file for that dependency.

For the first problem, docs.rs would have to introduce another input. We could make that input predictable, and pre-fetch it to the filesystem before running the build rather than having the doc build process fetch it. I don't love having another input in docs.rs but I think it would be better than making packages bigger for all non-doc uses. We could reduce the operational problems by only fetching that alternate input from one specific host - like crates.io, or another host created for the purpose.

For the second problem, there would have to be a well-known place to fetch additional resources for a given .crate file.

For user experience, we would want to avoid doc authors needing to create separate credentials for uploading doc resources. That argues in favor of having a way to upload additional files to crates.io, specifically linked to a crate version and using the same credentials as the crate itself. I know this makes for a much more complex project. I'd love to find a better way but am having trouble thinking of one!

@GuillaumeGomez
Copy link
Member Author

Very interesting take! I think we'll need to have opinions from @rust-lang/crates-io and eventually @rust-lang/cargo to move forward on this. I think having two different kind of resources for crates.io is actually a good idea where the non-code files could use a more aggressive compression? But I don't think it's a good idea to require two downloads (one for code and one for doc resources).


A new rustdoc pass will be added which would go through all documentation to gather local resources into a map.

Then in HTML documentation generation, the local resources pathes will be replaced by their equivalent linking to the output directory instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Then in HTML documentation generation, the local resources pathes will be replaced by their equivalent linking to the output directory instead.
Then in HTML documentation generation, the local resources paths will be replaced by their equivalent linking to the output directory instead.

* reduces the number of duplicated images that `docs.rs` has to store
* doesn't require the source code for the source crate when inlining
* only requires storing the hash of the file in the `.rmeta`, not the whole image
* requires rustc to look at the doc comments and hash the image(s)
Copy link
Contributor

@notriddle notriddle May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsha

While doing this, can it also produce a lint warning if the image is bigger than 50KiB or so?

Of course, any value chosen here will be arbitrary and contentious, but so does adding operational complexity to crates.io, docs.rs, and third parties that want to run rustdoc builds in sandboxes (how would hosting images separately fit into Bazel?)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps; or perhaps the lint should be something like "the total of all your non-code resources is >10% of the crate size." It's still not really satisfying though, since it doesn't fully solve the problem. If you want to include lots of resources, or large resources, the only solution is going to be to refer to them using absolute URLs.

Out of curiosity I spot-checked a couple of crates, regex and rayon. They're 248 kB and 169 kB respectively, so 10% would be 24.8 kB and 16.9 kB.

Also it's worth mentioning one counterpoint to my concern: source files that are bundled into .crate files are not minified and comments aren't removed. That's important specifically so docs can be generated from them (including the source file view within rustdoc). So in some sense published crates already do contain bytes that are only useful for documentation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in some sense published crates already do contain bytes that are only useful for documentation.

Examples and tests are also arguably superfluous files — they benefit crater, and those who read the source code of the downloaded crate, but not the typical downloader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it with percentages seems unnecessarily complicated for something that’s just an arbitrary metric pulled out of a hat anyway.

  • as long as you’re using ordinary markdown, you won’t be able to use responsive images, so mobile viewers on metered data plans reading docs.rs will be downloading full-sized images
  • docs.rs isn’t a very good image CDN, since it doesn’t reencode to WebP or anything
  • anyone building docs locally will be downloading the image even if they don’t look at it

In other words, it’s not just about cargo build. Coping with large images is just a chore.

What I’m trying to do is gesture in the direction of “try to use small images.” Not just solve the problem for plain cargo builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random musings on cargo's behavior

To protect against zip bombs, cargo had a 512 MB limit when extracting crates and people ran into this (rust-lang/cargo#11151) and we updated it to take compression size into account (rust-lang/cargo#11337) so we are less likely to hit the 512 MB limit unintentionally.

We also now report crate size on package/publish (rust-lang/cargo#11270). One future possibility mentioned in that PR is to warn when binary files are included (rust-lang/cargo#9058). In general, cargo has been hesitant about adding warnings that because we haven't had a way for people to disable them but with [lints] (rust-lang/cargo#12115), we have the possibility to add [lints.cargo] support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be cognizant of users. By including files that aren't as necessary

  • Slower download of .crate file
  • Extra disk space taken up (currently will be twice, once for the .crate and once for the extracted directory)
  • Extra time to extract the .crate

Some things that would help (and are currently being discussed)

  • Garbage collection of .crate files and extracted .crate files
  • Building directly from .crate files (this is a big more out there)

Dependent CI jobs will never use this content.

I wonder how many users are like me that almost exclusively use docs.rs rather than cargo doc and any missing images wouldn't be a big deal. Or in other words, we should consider if the benefit to the community outweighs the cost to the community.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just answering:

I wonder how many users are like me that almost exclusively use docs.rs rather than cargo doc and any missing images wouldn't be a big deal. Or in other words, we should consider if the benefit to the community outweighs the cost to the community.

It'd be needed for docs.rs too unfortunately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be needed for docs.rs too unfortunately.

How so? I'm not seeing anything listed in the Motivation that benefits us docs.rs-exclusive users (speaking as someone who also has crates that include images).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you host the images on a website, you can't actually use them in the documentation generated by docs.rs. There are some ways to go around this limitation, for example: generating the base64 equivalent of the image and including it directly into your crate doc comment. But it's not great. Hence this RFC. But the problem is as you mentioned that it would impact the community by downloading more content that wouldn't actually be used.

A potential solution you mentioned (Building directly from .crate files (this is a big more out there)) would very likely lessen the negative impact. But better take things slow and be sure we don't miss anything. :)

@Manishearth
Copy link
Member

Yeah I agree with @jsha's concern

Might be worth postponing the RFC, and the interested parties can go talk to the other teams until they have a new proposal (because I don't see this proposal being able to fit that concern without effectively being changed into something rather different)


This would be done by allowing users to specify the path of a local resource file in doc comments. The resource file would be stored in the `doc.files` folder. The `doc.files` folder will be at the "top level" of the rustdoc output level (at the same level as the `static.files` or the `src` folders).

The only local resources considered will be the ones in the markdown image syntax: `![resource title](path)`, where `<path>` is the path of the resource file relative to the source file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this solve the <script> usecase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't cover this case at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... then I guess it might be better not to mention it in the "Motivation" section, or explicitly mention that this RFC does not cover this case :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this line it mentions The only local resources considered will be the ones in the markdown image syntax. But it's true that a bit above we have They would also like to include images (like logos and diagrams), and scripts, which is confusing. I'll remove this part. But otherwise, where is it not clear enough so I can make it more obvious?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something like: "While using custom <script> tags is a valid usecase, this RFC only focusses on the usecase of dealing with images" at the end of the motivation section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


The local resources files are not affected by the `--resource-suffix`.

The impact on `docs.rs` would also be very minimal as the size of a published crate resources is limited to a few megabytes. The only thing needed would be to handle the new `doc.files` folder.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for images this might be true, but I'm a bit worried about JS files. while docs.rs currently allows running arbitrary JS from 3rd-party servers, the situation might become even worse if docs.rs itself would serve those potentially malicious files.

or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

![] syntax generates <img> HTML tag and nothing else. So the idea behind this RFC is to bundle the resource (if local) into the generated rustdoc output.

If you put a .js file in it, it'll be bundled inside a <img> (so not displayed) and I suppose you'll then be able to call it with <script> though. But the same can be done currently with any file extension and then load it as JS into a <script>. So in this regard, there are no changes I suppose?

@Turbo87
Copy link
Member

Turbo87 commented Jun 2, 2023

I generally support the goals of this RFC, but I agree with the previous commenters that we would need some way to discourage including huge resources in the crate tarballs. If the images end up being significantly larger than the source files then it's questionable whether it is worth including them.

Regarding the "having two different kind of resources" idea: I would be strongly against that. It seems like a lot of complexity for a comparatively small benefit.

@GuillaumeGomez
Copy link
Member Author

I talked with @epage and he mentioned that in the future, cargo could potentially store all the content of a crate directly into an archive-like file, which would be used as is directly. If so, it would very likely completely remove this concern. I think the best track here is to actually wait for such a possibility to become reality (if it ever become one) before moving forward with this RFC.

github-merge-queue bot pushed a commit to bevyengine/bevy that referenced this pull request Jul 31, 2023
# Objective

This PR continues #8885

It aims to improve the `Mesh` documentation in the following ways:
- Put everything at the "top level" instead of the "impl".
- Explain better what is a Mesh, how it can be created, and that it can
be edited.
- Explain it can be used with a `Material`, and mention
`StandardMaterial`, `PbrBundle`, `ColorMaterial`, and
`ColorMesh2dBundle` since those cover most cases
- Mention the glTF/Bevy vocabulary discrepancy for "Mesh"
- Add an image for the example
- Various nitpicky modifications

## Note

- The image I added is 90.3ko which I think is small enough?
- Since rustdoc doesn't allow cross-reference not in dependencies of a
subcrate [yet](rust-lang/rust#74481), I have a
lot of backtick references that are not links :(
- Since rustdoc doesn't allow linking to code in the crate (?) I put
link to github directly.
- Since rustdoc doesn't allow embed images in doc
[yet](rust-lang/rust#32104), maybe
[soon](rust-lang/rfcs#3397), I had to put only a
link to the image. I don't think it's worth adding
[embed_doc_image](https://docs.rs/embed-doc-image/latest/embed_doc_image/)
as a dependency for this.
cart pushed a commit to bevyengine/bevy that referenced this pull request Aug 10, 2023
# Objective

This PR continues #8885

It aims to improve the `Mesh` documentation in the following ways:
- Put everything at the "top level" instead of the "impl".
- Explain better what is a Mesh, how it can be created, and that it can
be edited.
- Explain it can be used with a `Material`, and mention
`StandardMaterial`, `PbrBundle`, `ColorMaterial`, and
`ColorMesh2dBundle` since those cover most cases
- Mention the glTF/Bevy vocabulary discrepancy for "Mesh"
- Add an image for the example
- Various nitpicky modifications

## Note

- The image I added is 90.3ko which I think is small enough?
- Since rustdoc doesn't allow cross-reference not in dependencies of a
subcrate [yet](rust-lang/rust#74481), I have a
lot of backtick references that are not links :(
- Since rustdoc doesn't allow linking to code in the crate (?) I put
link to github directly.
- Since rustdoc doesn't allow embed images in doc
[yet](rust-lang/rust#32104), maybe
[soon](rust-lang/rfcs#3397), I had to put only a
link to the image. I don't think it's worth adding
[embed_doc_image](https://docs.rs/embed-doc-image/latest/embed_doc_image/)
as a dependency for this.
@alice-i-cecile
Copy link

For context, Bevy would really love to have this for our docs to demonstrate various rendering and mathematical concepts. That however means shipping dozens to hundreds of shiny full resolution images (scaling them down will mean misleading artifacts, and SVGs simply don't apply). We would hit the 50 MB limit rather quickly if we were using this feature as aggressively as I would like from a purely docs/learning perspective. I definitely think we should avoid shipping all of the images to every user of the crate somehow.

@CAD97
Copy link

CAD97 commented May 14, 2024

An "interesting idea" that crates could apply could be to use a CDN front around GitHub raw file access (e.g. Statically, no association) and release version tags to get the correct revision of the asset, e.g. #[doc = concat!("![img](https://cdn.statically.io/gh/bevyengine/bevy/v", env!(CARGO_PKG_VERSION), "/assets/docs/Mesh.png")], probably with a helper macro so it's easier to spell, e.g. #[doc = doc_img!(alt="img" src="Mesh.png"). It perhaps is far from ideal during development (since you either get the previous version's assets or broken links) but it should theoretically work.

@dev-ardi
Copy link

Another idea would be to make image downloads opt in: cargo add bevy --doc-images. When opening the documentation without having the images downloaded it could just display the alt text and some warning to actually download them.

@Nemo157
Copy link
Member

Nemo157 commented May 14, 2024

It should be possible to ship images as a separate optional dependency with this RFC, assuming rustdoc tracks spans for relative path references

# bevy/Cargo.toml

[features]
example-images = ["dep:bevy-example-images"]

[dependencies]
bevy-example-images = { optional = true, ... }
// bevy/lib.rs

/// Some feature
#[cfg_attr(feature = "example-images", doc = bevy_example_images::some_feature!())]
pub fn some_feature() {}
// bevy-example-images/lib.rs

#[macro_export]
macro_rules! some_feature {
    () => ("![showcase of some feature](./some_feature.png)")
}

(EDIT: Though, this relies on dependency source-code access which the RFC tries to avoid for re-export inlining.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. T-rustdoc Relevant to rustdoc team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet