feat(markdown codec): return position-to-node mapping for `from_str` #2175

nokome · 2024-03-27T03:39:34Z

Currently the signature for the to_string method of the Codec traits returns a mapping of the each node to the character position in the returned string:

async fn to_string( &self, node: &Node, options: Option<EncodeOptions>) -> Result<(String, Losses, Mapping)>

The Mapping allows us to associate cursor positions in the generated format (e.g. Markdown) with each node in the document. This in turn enables UX such as showing information related to a node or Ctrl+Enter to execute the node that the cursor is currently on.

The problem with this is that the generated Markdown is not necessarily exactly the same as the source Markdown (e.g. multiple blank lines between paragraphs are ignored). This requires us to update the editor with the generated Markdown (and the mapping associated with it), which in turn causes "jumpiness".

A better alternative would be to generate the mapping from the source by changing the signature of from_str to:

async fn from_str(&self, str: &str, options: Option<DecodeOptions>) -> Result<(Node, Losses, Mapping)>

Having the mapping generated from the source Markdown would solve the jumpiness issue in the SourceView editor while still having the benefits of a position-to-node mapping. Perhaps more importantly, it would allow us to have a mapping for Markdown in non-Stencila editors, such as VSCode.

This has not been done yet because the MarkdownCodec is based on pulldown-cmark which does not retain source location of parsed nodes and looks like it is unlikely to in the near future.

Given that we need to make changes to the Markdown decoding anyway, I suggest switching to the markdown crate which has a schema similar to Stencila (in fact, several years ago, the mdast schema was part inspiration for many of the node types in Stencila Schema) and retains a Position for each parsed node, e.g. Paragraph.

Moving from pulldown-cmark to markdown will need a rewrite of the codec_markdown::decode module but the rewritten code is likely to be far simpler given the similarity of mdast schema to Stencila schema. nom-based parsers for extensions should be able to be reused.

The text was updated successfully, but these errors were encountered:

nokome added the module: rust Related to Rust label Mar 27, 2024

nokome self-assigned this Mar 27, 2024

This was referenced Mar 27, 2024

feat(lsp): add a Language Server Protocol server #2176

Closed

wip(markdown): return position-to-node mapping when decoding from string #2179

Merged

nokome closed this as completed in #2179 Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(markdown codec): return position-to-node mapping for `from_str` #2175

feat(markdown codec): return position-to-node mapping for `from_str` #2175

nokome commented Mar 27, 2024

feat(markdown codec): return position-to-node mapping for from_str #2175

feat(markdown codec): return position-to-node mapping for from_str #2175

Comments

nokome commented Mar 27, 2024

feat(markdown codec): return position-to-node mapping for `from_str` #2175

feat(markdown codec): return position-to-node mapping for `from_str` #2175