-
-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snippets : warn against file or part changes through checksum #1824
Comments
I understand the problem associated with using line numbers, it's one of the reasons I was against it for so long, but, people want what they want. Now, reducing this problem with hashes does increase the maintenance burden even more.
The better solution would simply be to provide some arbitrary way to target a partial from a file. That would be something like what is found here: https://rust-lang.github.io/mdBook/format/mdbook.html#including-portions-of-a-file. The link shows a feature where a user can specify a named section in addition to specifying lines. The problem is how do you may it agnostic to the file in question. What if I want to extract a portion of a JavaScript file, but not render the JavaScript file broken? Well, I have to use comments, but how do I detect comments in a variety of file types. Well, I probably can't look for comments specifically, so maybe we just look for a special syntax anywhere, even inline in some cases, but maybe ignore everything on that line except the start and end markers (i.e. we don't ever include any characters from the line with the start and end markers). Maybe: // ---8<--- [start: name]
var test = 3
console.log(3 + 3)
// ---8<--- [end: name] Then we can just specify in our Markdown:
I don't know, I'm just thinking out loud here, so this isn't an official proposal, but maybe it could be 🤷🏻. Currently, I'm taking baby steps to allow people to utilize Snippets in an easier way. With that said, I'm not particularly sold on the hash idea as of now. |
I've created a prototype at #1825. You are welcome to try it out. |
Docsify does it via comments as well: https://docsify.js.org/#/embed-files?id=embedded-code-fragments |
Yeah, allowing them to be inline to be placed in comments is probably the most sane approach. |
Thanks for quick reply !
Oh, right, forgot about this ! Anyway I would say that is a Git problem, not a snippets problem, even if it is indeed better if snippets can convert, to avoid portability issues, especially if you want snippet be easy to use !
Why the need to encode data ? Can't the raw file be hashed instead ? Of course for lines that's quite a problem
And that tool could be called snippets extension :D As I said (sorry if I was unclear), in error log, there would simply be the "expected" (in comment) hash, and the "actual" (currently computed) hash. When user runs build, he/she can copy-paste the actual hash and use it in the inclusion line.
I agree with this, it would be really cool ! However mdbook approach is unsuited for simplicity I think, as it introduces comments that come polluting the code. In a course, this is especially problematic, because I tend to show both parts of code and full files, at different moments. For example, part of the code step by step, and a full file example recap at the end. It also doesn't solve detection of changes in general, for a full file inclusion.
There, I think comment aren't needed actually, or that we can come to a more generic solution : simply specifying part of a line to be looked for. For example, if we want to include following function from the file cool_file.py : def cool_function(a, b):
#do stuff here
return "whao" Syntax could be :
This include lines from the first (included) containing This would allow to both use delimiters comments
Thanks, that was fast ! trying out when I a bit more free time, likely in few days. |
It's a problem that would keep coming my way regardless.
If you want to do something like an MD5 hash, they are applied to the raw bytes of a file. You don't apply it to Unicode. None of this matters except that there is a bit more overhead, and you'd have to provide a way the user can figure out what the hash is
How do you get your initial hash? Put in a bogus hash and run Python Markdown to get a dump of all the hashes?
The problem is, I'm not sure this is a problem I am necessarily wanting to solve. If a partial from a snippet is enclosed in special fences, as I was suggesting, you do not have to be notified that the file is changed. You are no longer stuck with specific lines that could change, you just have to place what you want to be included between the fence, and what you don't want outside the fence.
I think this becomes a highly fragile solution potentially having numerous false positives. It also greatly complicates the syntax as well. Keep in mind, I'm attempting to solve a number of general complaints. Another request was to include only content under a particular Markdown header, which frankly doesn't have much chance of happening as we do not have the appropriate context to do this without running the content through the parser which we are trying to preprocess before running through the parser. Personally, I do not think it unreasonable to suggest that a user that intends to heavily use a file, extracting partials from it, use section fences to gain greater control of what is included. |
I was thinking about an empty hash. For example, if we put hash as [defb945096...], it would be
I think the problem is indeed secondary, but there are still cases where it is nice to detect changes. For example if the documentation explains code referring to identifiers or other constructs in the code, and the code is refactored, out-of-code explanations can become confusing. However this is a really marginal problem, as such explanations can eventually be done through comments in the code in most cases.
It is more fragile than section comments for sure, but I allows inclusion of nearly-arbitrary content. User then is responsible for providing his/her own unique delimiters, even in files that have no comment syntax. From an implementation perspective, maybe a simpler way to parse such expression would be to specify delimiters first in the expression so they are not comprised in the delimited parts. Possible syntax :
However maybe it is a bit too complex for some users ? |
This keeps referencing the following syntax.
I know the idea is that it is separate from the filename enclosed in quotes, but that is just for the single-line notation. We also allow block notation that does not use quotes for file names. Adding this additional requirement in syntax to filenames is not really desired.
We can also allow URLs, and URLs already have more uncommon formats, which makes such a complex syntax even harder to use without running into issues. Snippets was always meant to be a more simple file inclusion plugin. Recently I've caved and URL support and line numbers to appease people despite my apprehensions. I'm open to explicit sections as well to ease this a little more, but there are probably various, more complex enhancements that I'm likely not willing to add. I've had requests to add more advanced templating features, and I'm frankly not likely to add all of that complexity to provide such behaviors. I feel that if we start going down this road, it would make more sense for someone to just implement a Jinja2 preprocessor plugin. As far as this request is concerned, I'm not sure how I feel about adding arbitrary search specifiers. I'm sure that this syntax may work well for your specific needs, but I'm almost positive that it could easily have false positives. What about functions with multiple exit points that look similar? In languages that end functions with braces, it would be quite easy to get a false positive ending if reliant on brace endings. In this case, I feel explicit, known fences are going to be better, and while I understand how that may be frustrating to someone who'd like more freedom to pull snippets from any arbitrary source code without polluting the file with special comments, this is probably as far as I am willing to extend things at this time. Unfortunately, the hash idea is probably something I'm not interested in adding at this point either. The overhead and complexity to add it is currently more than I am currently willing to maintain at this time. I'm not sure if I'll change my mind in the future or not, but currently, this is not a direction in which I wish to go. |
I think I'm willing to pursue the explicit sections, but not the arbitrary search tokens. I'm also not planning on implementing the checksums at this time either. With that said, I will probably leave this issue open (specifically the request for checksums) for now as a possibility for the future, assuming I change my mind on checksums. |
Sorry for the (very) late answer, and thank you a lot for your time !
I definitely understand, I'll adapt myself to the tool then, which shouldn't be too hard. I'll eventually write my own extension when I have time and/or if things becomes too critical. In case I do so I'll keep you informed if you want to include it totally or partially in pymdownx.
False positives are definitely a bad thing in this case, especially for you since people would get frustrated and post issues and so on. After thinking about it, while it would be a convenient feature, you are right it can't be reasonably implemented as a general-purpose solution, as it would too much of a bother most users to deal with.
One way to use special comments without polluting most files would be to have a way to tell snippets to include a file whole while stripping the lines where the comments live. Not sure about syntax, but maybe something like :
In my case it would solve most problems, and I think most documentation cases as well, since it is common to use single file standalone examples for in docs. Stripping could also be done manually with an external script, but it is less convenient unfortunately. |
I've spent some time thinking about this, and I've decided I'm not going to go this way. If I change my mind in the future, I'll open a new issue, but its time to clean out old issues that I'm probably not going to tackle. |
Description
Issue #1462, closed not long ago, added a feature to allow including only specified line ranges from external file with snippets.
One concern regarding this feature was inconsistency it could create if file changed (e.g if lines were added/removed, creating an offset).
I think a checksum mechanism could be used to warn against changes, and it isn't limited to range inclusion, but could also benefit general snippets.
Context
I use pymdownx through Material for Mkdocs to build a computer science course. The course is for absolute beginners, and quite young. For this reason, there are heavy constraints :
Range inclusion solves problem 1, but inconsistency is still a danger here, as even in file fully included, an inconsistency in lines numbers can confuse the students.
I think a lot of tutorials and docs have this problems to a certain extent, and it would be beneficial to have a change warning mechanism.
Benefits
Protects against possible inconsistency when a file included through snippets in the docs changes, makes easier to structure large docs with multiple references to the same file, or with references to out-of-docs files, as test cases.
Solution Idea
Here is a draw at a possible solution :
Through the inclusion syntax, snippets could allow user to provide, at the end of file name chain, a checksum between bracket. So for example :
somefile::4[EA8A25E440]
Then snippet extension would check this checksum against either :
In case a checksum isn't matching, a warning message would be issued, mentioning both checksum :
This way, it is simple for user to update checksum after checking file content. Warning would likely be failing the build, or maybe an option in config could allow to choose whether warn or fail.
An empty checksum
[]
would be considered as a checksum as well, always failing, simply to allow user to use snippets to compute checksum.The text was updated successfully, but these errors were encountered: