Refactor html output functions #529

digitalmoksha · 2025-02-05T01:35:57Z

Refactored the html formatter to handle each node via a separate functions.

digitalmoksha · 2025-02-05T01:36:56Z

Went a little nuts today 😬

github-actions · 2025-02-05T01:40:03Z

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./bench.sh ./comrak-8c939a6`	315.5 ± 1.5	313.1	318.5	2.93 ± 0.02
`./bench.sh ./comrak-main`	316.7 ± 6.3	312.1	346.0	2.95 ± 0.06
`./bench.sh ./pulldown-cmark`	107.5 ± 0.5	106.2	108.7	1.00
`./bench.sh ./cmark-gfm`	114.1 ± 1.1	112.4	116.6	1.06 ± 0.01
`./bench.sh ./markdown-it`	482.1 ± 4.8	476.2	497.2	4.48 ± 0.05

Run on Wed Feb 5 01:40:02 UTC 2025

digitalmoksha · 2025-02-05T03:16:50Z

let ... else is only stable in 1.65, and I forgot I was running 1.73 locally.

I was trying to keep the function signature the same (always the same args) for all the render_ functions. But maybe there is no real benefit to that, and I'm making this more difficult than it needs to be. It made sense in my head yesterday 🤷

charlottia · 2025-02-05T04:51:11Z

let ... else is only stable in 1.65, and I forgot I was running 1.73 locally.

D'oh! Maybe we should think about an MSRV bumping policy? Right now we're sitting at 1.62.1 maybe forever, whereas some people out there are using N-2 (!).

I was trying to keep the function signature the same (always the same args) for all the render_ functions.

The advantage that sticks out in my head is they become more amenable to generated (macro) code; we can't inspect the target function signatures at runtime, but if they all share a signature it'd be possible to do some kind of programmatically generated dispatch!

digitalmoksha · 2025-02-05T18:34:25Z

The advantage that sticks out in my head is they become more amenable to generated (macro) code; we can't inspect the target function signatures at runtime, but if they all share a signature it'd be possible to do some kind of programmatically generated dispatch!

That's kind of what I was thinking, maybe setting us up for that possibility.

Maybe we should think about an MSRV bumping policy? Right now we're sitting at 1.62.1 maybe forever, whereas some people out there are using N-2 (!).

I think you're right. I'm not sure what exactly the policy should be. One part it's nice not to require a version that's too bleeding edge, so that larger organizations don't need to upgrade their rust version as quickly. For example GitLab is still on 1.73.

But we also don't want to completely limit ourselves on the ability to use certain features. Maybe it makes sense, for now, to evaluate when we run into something like the let...else and consider it, if it doesn't push us too far forward. In this case we would need to step up only to 1.65. But we could also consider going to 1.70 or 1.73 🤔

digitalmoksha · 2025-02-08T19:37:16Z

I've gone ahead and added a commit to bump the MSRV to 1.65. It's a pretty minimal bump.

I could rewrite the code to not use the let...else syntax, but it really kind of bloats it.

wdyt? 🤔

digitalmoksha · 2025-02-13T20:48:46Z

Hey @kivikakk. So I've made the HtmlFormatter have a public interface. The purpose of this is to provide a way to override certain node formatters while being able to use the default versions.

https://gitlab.com/gitlab-org/ruby/gems/gitlab-glfm-markdown/-/merge_requests/76/diffs?commit_id=eb78621bfe8f47b6a41652c68ed2e95d5cfc0883 is a hacked together way in which it might be used.

I made all the render_xxx... functions public. But just now thinking about it, it might not be necessary. The normal format_node handles default behavior. Even if I wanted to override a particular node, say NodeValue::BlockQuote to tweak a value in that node, and then want the default behavior, I could still call format_node rather than render_block_quote directly.

So maybe I wouldn't need to make all those public.

This relates to the discussion we had on this topic.

I admit I posed the question to an AI and it lead me to this

is there a way to override a function in a rust crate?

In Rust, there isn't a direct way to "override" a function in a crate as you might in object-oriented languages. However, there are several approaches you can use to achieve similar results:

Trait Implementation: If the function is part of a trait, you can implement that trait for your own type and provide your own implementation of the function.

Wrapper Types: Create a new type that wraps the original type from the crate. Implement methods on your wrapper type that either add new functionality or call the wrapped type's methods.

Extension Traits: Define a new trait with the method you want to "override", and implement it for the type from the crate.

Macros: In some cases, you might be able to use macros to redefine or extend functionality, though this approach can be complex.

Fork the Crate: As a last resort, you could fork the crate and modify it directly. This is generally not recommended unless absolutely necessary.

Here's a simple example using an extension trait:

// Original crate
struct OriginalType;

impl OriginalType {
    fn method(&self) {
        println!("Original method");
    }
}

// Your code
trait ExtendedBehavior {
    fn method(&self);
}

impl ExtendedBehavior for OriginalType {
    fn method(&self) {
        println!("Extended method");
    }
}

fn main() {
    let obj = OriginalType;
    obj.method(); // Prints: "Original method"
    ExtendedBehavior::method(&obj); // Prints: "Extended method"
}

So I basically used a wrapper type in my hacked attempt.

kivikakk · 2025-02-15T07:11:19Z

I'm happy with making one or all the renderer functions public, tbh! But maybe better to just do the main one (so fewer changes are "breaking").

One part it's nice not to require a version that's too bleeding edge, so that larger organizations don't need to upgrade their rust version as quickly.

Yes, indeed; same goes for distro package maintainers.

I've gone ahead and added a commit to bump the MSRV to 1.65. It's a pretty minimal bump.

Works for me! :)

digitalmoksha · 2025-02-17T21:08:35Z

Ok, I know better than this - always keep PRs as small as possible. So this just has the refactoring of the render functions.

In another PR I will attempt to make HtmlFormatter#format_node overridable. In my current implementation, I don't like the fact that I basically have to copy the entire format function into the main program. Maybe we can leverage the adapter plugin mechanism we already have, by having an HTMLFormatNodeAdapter or something.

This allows us to use the `let...else` style syntax

kivikakk · 2025-02-18T00:46:54Z

src/html.rs

@@ -400,7 +400,7 @@ where
                        plain
                    } else {
                        stack.push((node, false, Phase::Post));
-                        self.format_node(node, true)?
+                        !self.format_node(node, true)?


This is an interesting change; can you elaborate on it?

(Also, to be clear, if it's a mistaken change, then the real mistake is that no test failure resulted from it! And if it isn't, we should add a regression test! Either way it's interesting.)

Ah sorry, I meant to comment on that change.

All the new render_ functions return a io::Result<bool>, and return an Ok(true) when the function is successful. The only render function that can return an Ok(false) is render_image. And that false needs to trigger some additional work, just like it did before.

Originally format_node was returning Ok(false) by default, which would set new_plain in format.

But with the render functions returning a true for success, then it made more sense to set a processing_complete boolean with value of the render functions, and return that. So now we had to negate the return value in the line above.

The tests do fail pretty spectacularly if the ! is removed.

Ah, got it; I misread the diff and thought the trues and falses hadn't in fact switched positions. I might change those to a binary enum that actually make clear what work they're doing.

kivikakk · 2025-02-18T00:48:38Z

In my current implementation, I don't like the fact that I basically have to copy the entire format function into the main program. Maybe we can leverage the adapter plugin mechanism we already have, by having an HTMLFormatNodeAdapter or something.

You may again wish to consider the approach I outlined in the discussion: #454 (reply in thread); with an exposed format_node, that could be the default case, and with the macro piecing together a HtmlFormatter (and indeed, could be used to create the actual default HtmlFormatter too). Lmk if you want me to try putting something like that together for you to look over.

digitalmoksha · 2025-02-18T03:04:43Z

Lmk if you want me to try putting something like that together for you to look over.

Yeah that would be great. I haven't done anything with macros in Rust, and didn't quite follow what you were suggesting at the time.

kivikakk · 2025-02-18T05:10:34Z

Will do in a bit, then.

kivikakk · 2025-02-18T11:31:44Z

I have pushed an example of what I mean here: https://github.com/kivikakk/comrak/compare/custom-html-formatter?w=1

See the added test for a sample use.

I'm extremely low on cycles and I basically hacked this together in the fastest way possible to demonstrate the idea. There's a lot of cleanup that can be done; I'm exposing a whole bunch of things here which it'd be nicer not to expose.

I am 100% sure this can be done cleaner, and I wouldn't want to merge this as-is — it's just to give you an idea of how a macro could help produce the outcome you want. Note how the baseline html::HtmlFormatter is created in the very last line of html.rs.

digitalmoksha · 2025-02-18T22:38:22Z

Oh awesome, thanks! I'll look it over in the next few days

digitalmoksha · 2025-02-21T22:13:05Z

Ok, I see that the basic component of it is generating the format_node function with the code supplied.

I guess one of the benefits of this is that it's inline - there's not another level of function as there would be with the adapter interface.

As you said, it would need to be scoped down significantly. Making the entire formatter into a macro somewhat feels like adding a lot of complexity to it. I tend to fall on the side of less meta-programming unless there is a huge win.

I did try to integrate it into my crate to test it out, and I couldn't get it to compile, it always had a problem expanding the macro. I was able to very minimal macro working, so I know that using a macro works in general.

kivikakk · 2025-02-21T22:21:06Z

Well, the formatter isn't a macro: it's generated by a macro. Imagine the above code was actually merged as-is into Comrak itself; you could then create your formatter at compile time with create_formatter!, and then call it the same way HtmlFormatter is called.

This is the kind of thing I'm proposing: it would allow any Comrak user to use the supplied formatter, the same way they currently do, and the code they're actually using would be identical (the macro generates ~identical code as what we have currently in main), but they could also use the macro to create their own formatter type specialised for certain node types.

It doesn't need to be "scoped down", I don't think, just cleaned up; I mocked it up as quickly as possible and it's not quite merge-quality, but the general idea is sound.

digitalmoksha · 2025-02-21T22:32:28Z

But it basically inlines the entire formatter into your own code, right? If that's true, then it seems like whatever dependencies get added to the original formatter, now those need to get included where the custom formatter is defined.

As opposed to maybe still using a macro, but making it a small interface into the main formatter. I guess what I'm concerned about is if we make some type of change, such as struct like WriteWithLast, then that has to be exposed in the file that uses the custom formatter, rather than being able to keep it isolated.

I may be overthinking it though.

digitalmoksha · 2025-02-21T22:36:07Z

For example I feel like my custom formatter should only ever need to use crate::nodes::NodeMath if I'm only overriding that node. But if the full formatter is inlined, then I have to pull them all in. I think...

kivikakk · 2025-02-21T23:34:37Z

We can use $crate or whatever — this is the kind of thing I mean by cleaning it up. It should be pretty plug-and-play in the end. Same with WriteWithLast. The macro system is pretty expressive.

digitalmoksha · 2025-02-21T23:51:11Z

Alright, I’m sold, let’s do it! 🚀

kivikakk · 2025-02-21T23:56:05Z

I'll clean it up then :)

kivikakk · 2025-02-22T00:16:24Z

(It's already looking like 10000x better; moving all the default functions out into html::render_… so they don't get regenerated with every invocation. Will open a PR and cc you when done!)

digitalmoksha force-pushed the bw-refactor-html-output branch from 4930706 to 7336534 Compare February 17, 2025 20:36

digitalmoksha added 2 commits February 17, 2025 16:22

Refactor html output functions

bfcb52b

Bump MSRV to 1.65

545419e

This allows us to use the `let...else` style syntax

digitalmoksha force-pushed the bw-refactor-html-output branch from 7336534 to 545419e Compare February 17, 2025 22:22

digitalmoksha merged commit 532d6e7 into main Feb 17, 2025
38 checks passed

digitalmoksha deleted the bw-refactor-html-output branch February 17, 2025 22:27

kivikakk reviewed Feb 18, 2025

View reviewed changes

BrewTestBot mentioned this pull request Feb 28, 2025

comrak 0.36.0 Homebrew/homebrew-core#209228

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor html output functions #529

Refactor html output functions #529

digitalmoksha commented Feb 5, 2025

digitalmoksha commented Feb 5, 2025

github-actions bot commented Feb 5, 2025

digitalmoksha commented Feb 5, 2025

charlottia commented Feb 5, 2025

digitalmoksha commented Feb 5, 2025

digitalmoksha commented Feb 8, 2025

digitalmoksha commented Feb 13, 2025 •

edited

Loading

kivikakk commented Feb 15, 2025

digitalmoksha commented Feb 17, 2025

kivikakk Feb 18, 2025

kivikakk Feb 18, 2025

digitalmoksha Feb 18, 2025

digitalmoksha Feb 18, 2025

kivikakk Feb 18, 2025

kivikakk commented Feb 18, 2025

digitalmoksha commented Feb 18, 2025

kivikakk commented Feb 18, 2025

kivikakk commented Feb 18, 2025

digitalmoksha commented Feb 18, 2025

digitalmoksha commented Feb 21, 2025

kivikakk commented Feb 21, 2025

digitalmoksha commented Feb 21, 2025 •

edited

Loading

digitalmoksha commented Feb 21, 2025 •

edited

Loading

kivikakk commented Feb 21, 2025

digitalmoksha commented Feb 21, 2025

kivikakk commented Feb 21, 2025

kivikakk commented Feb 22, 2025

Refactor html output functions #529

Refactor html output functions #529

Conversation

digitalmoksha commented Feb 5, 2025

digitalmoksha commented Feb 5, 2025

github-actions bot commented Feb 5, 2025

digitalmoksha commented Feb 5, 2025

charlottia commented Feb 5, 2025

digitalmoksha commented Feb 5, 2025

digitalmoksha commented Feb 8, 2025

digitalmoksha commented Feb 13, 2025 • edited Loading

kivikakk commented Feb 15, 2025

digitalmoksha commented Feb 17, 2025

kivikakk Feb 18, 2025

Choose a reason for hiding this comment

kivikakk Feb 18, 2025

Choose a reason for hiding this comment

digitalmoksha Feb 18, 2025

Choose a reason for hiding this comment

digitalmoksha Feb 18, 2025

Choose a reason for hiding this comment

kivikakk Feb 18, 2025

Choose a reason for hiding this comment

kivikakk commented Feb 18, 2025

digitalmoksha commented Feb 18, 2025

kivikakk commented Feb 18, 2025

kivikakk commented Feb 18, 2025

digitalmoksha commented Feb 18, 2025

digitalmoksha commented Feb 21, 2025

kivikakk commented Feb 21, 2025

digitalmoksha commented Feb 21, 2025 • edited Loading

digitalmoksha commented Feb 21, 2025 • edited Loading

kivikakk commented Feb 21, 2025

digitalmoksha commented Feb 21, 2025

kivikakk commented Feb 21, 2025

kivikakk commented Feb 22, 2025

digitalmoksha commented Feb 13, 2025 •

edited

Loading

digitalmoksha commented Feb 21, 2025 •

edited

Loading

digitalmoksha commented Feb 21, 2025 •

edited

Loading