Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor html output functions #529

Merged
merged 2 commits into from
Feb 17, 2025
Merged

Refactor html output functions #529

merged 2 commits into from
Feb 17, 2025

Conversation

digitalmoksha
Copy link
Collaborator

Refactored the html formatter to handle each node via a separate functions.

@digitalmoksha
Copy link
Collaborator Author

Went a little nuts today 😬

Copy link
Contributor

github-actions bot commented Feb 5, 2025

Command Mean [ms] Min [ms] Max [ms] Relative
./bench.sh ./comrak-8c939a6 315.5 ± 1.5 313.1 318.5 2.93 ± 0.02
./bench.sh ./comrak-main 316.7 ± 6.3 312.1 346.0 2.95 ± 0.06
./bench.sh ./pulldown-cmark 107.5 ± 0.5 106.2 108.7 1.00
./bench.sh ./cmark-gfm 114.1 ± 1.1 112.4 116.6 1.06 ± 0.01
./bench.sh ./markdown-it 482.1 ± 4.8 476.2 497.2 4.48 ± 0.05

Run on Wed Feb 5 01:40:02 UTC 2025

@digitalmoksha
Copy link
Collaborator Author

let ... else is only stable in 1.65, and I forgot I was running 1.73 locally.

I was trying to keep the function signature the same (always the same args) for all the render_ functions. But maybe there is no real benefit to that, and I'm making this more difficult than it needs to be. It made sense in my head yesterday 🤷

@charlottia
Copy link
Collaborator

let ... else is only stable in 1.65, and I forgot I was running 1.73 locally.

D'oh! Maybe we should think about an MSRV bumping policy? Right now we're sitting at 1.62.1 maybe forever, whereas some people out there are using N-2 (!).

I was trying to keep the function signature the same (always the same args) for all the render_ functions.

The advantage that sticks out in my head is they become more amenable to generated (macro) code; we can't inspect the target function signatures at runtime, but if they all share a signature it'd be possible to do some kind of programmatically generated dispatch!

@digitalmoksha
Copy link
Collaborator Author

The advantage that sticks out in my head is they become more amenable to generated (macro) code; we can't inspect the target function signatures at runtime, but if they all share a signature it'd be possible to do some kind of programmatically generated dispatch!

That's kind of what I was thinking, maybe setting us up for that possibility.

Maybe we should think about an MSRV bumping policy? Right now we're sitting at 1.62.1 maybe forever, whereas some people out there are using N-2 (!).

I think you're right. I'm not sure what exactly the policy should be. One part it's nice not to require a version that's too bleeding edge, so that larger organizations don't need to upgrade their rust version as quickly. For example GitLab is still on 1.73.

But we also don't want to completely limit ourselves on the ability to use certain features. Maybe it makes sense, for now, to evaluate when we run into something like the let...else and consider it, if it doesn't push us too far forward. In this case we would need to step up only to 1.65. But we could also consider going to 1.70 or 1.73 🤔

@digitalmoksha
Copy link
Collaborator Author

I've gone ahead and added a commit to bump the MSRV to 1.65. It's a pretty minimal bump.

I could rewrite the code to not use the let...else syntax, but it really kind of bloats it.

wdyt? 🤔

@digitalmoksha
Copy link
Collaborator Author

digitalmoksha commented Feb 13, 2025

Hey @kivikakk. So I've made the HtmlFormatter have a public interface. The purpose of this is to provide a way to override certain node formatters while being able to use the default versions.

https://gitlab.com/gitlab-org/ruby/gems/gitlab-glfm-markdown/-/merge_requests/76/diffs?commit_id=eb78621bfe8f47b6a41652c68ed2e95d5cfc0883 is a hacked together way in which it might be used.

I made all the render_xxx... functions public. But just now thinking about it, it might not be necessary. The normal format_node handles default behavior. Even if I wanted to override a particular node, say NodeValue::BlockQuote to tweak a value in that node, and then want the default behavior, I could still call format_node rather than render_block_quote directly.

So maybe I wouldn't need to make all those public.

This relates to the discussion we had on this topic.

I admit I posed the question to an AI and it lead me to this

is there a way to override a function in a rust crate?

In Rust, there isn't a direct way to "override" a function in a crate as you might in object-oriented languages. However, there are several approaches you can use to achieve similar results:

  1. Trait Implementation: If the function is part of a trait, you can implement that trait for your own type and provide your own implementation of the function.

  2. Wrapper Types: Create a new type that wraps the original type from the crate. Implement methods on your wrapper type that either add new functionality or call the wrapped type's methods.

  3. Extension Traits: Define a new trait with the method you want to "override", and implement it for the type from the crate.

  4. Macros: In some cases, you might be able to use macros to redefine or extend functionality, though this approach can be complex.

  5. Fork the Crate: As a last resort, you could fork the crate and modify it directly. This is generally not recommended unless absolutely necessary.

Here's a simple example using an extension trait:

// Original crate
struct OriginalType;

impl OriginalType {
    fn method(&self) {
        println!("Original method");
    }
}

// Your code
trait ExtendedBehavior {
    fn method(&self);
}

impl ExtendedBehavior for OriginalType {
    fn method(&self) {
        println!("Extended method");
    }
}

fn main() {
    let obj = OriginalType;
    obj.method(); // Prints: "Original method"
    ExtendedBehavior::method(&obj); // Prints: "Extended method"
}

So I basically used a wrapper type in my hacked attempt.

@kivikakk
Copy link
Owner

I'm happy with making one or all the renderer functions public, tbh! But maybe better to just do the main one (so fewer changes are "breaking").

One part it's nice not to require a version that's too bleeding edge, so that larger organizations don't need to upgrade their rust version as quickly.

Yes, indeed; same goes for distro package maintainers.

I've gone ahead and added a commit to bump the MSRV to 1.65. It's a pretty minimal bump.

Works for me! :)

@digitalmoksha digitalmoksha force-pushed the bw-refactor-html-output branch from 4930706 to 7336534 Compare February 17, 2025 20:36
@digitalmoksha
Copy link
Collaborator Author

Ok, I know better than this - always keep PRs as small as possible. So this just has the refactoring of the render functions.

In another PR I will attempt to make HtmlFormatter#format_node overridable. In my current implementation, I don't like the fact that I basically have to copy the entire format function into the main program. Maybe we can leverage the adapter plugin mechanism we already have, by having an HTMLFormatNodeAdapter or something.

This allows us to use the `let...else` style syntax
@digitalmoksha digitalmoksha force-pushed the bw-refactor-html-output branch from 7336534 to 545419e Compare February 17, 2025 22:22
@digitalmoksha digitalmoksha merged commit 532d6e7 into main Feb 17, 2025
38 checks passed
@digitalmoksha digitalmoksha deleted the bw-refactor-html-output branch February 17, 2025 22:27
@@ -400,7 +400,7 @@ where
plain
} else {
stack.push((node, false, Phase::Post));
self.format_node(node, true)?
!self.format_node(node, true)?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting change; can you elaborate on it?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, to be clear, if it's a mistaken change, then the real mistake is that no test failure resulted from it! And if it isn't, we should add a regression test! Either way it's interesting.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I meant to comment on that change.

All the new render_ functions return a io::Result<bool>, and return an Ok(true) when the function is successful. The only render function that can return an Ok(false) is render_image. And that false needs to trigger some additional work, just like it did before.

Originally format_node was returning Ok(false) by default, which would set new_plain in format.

But with the render functions returning a true for success, then it made more sense to set a processing_complete boolean with value of the render functions, and return that. So now we had to negate the return value in the line above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests do fail pretty spectacularly if the ! is removed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it; I misread the diff and thought the trues and falses hadn't in fact switched positions. I might change those to a binary enum that actually make clear what work they're doing.

@kivikakk
Copy link
Owner

In my current implementation, I don't like the fact that I basically have to copy the entire format function into the main program. Maybe we can leverage the adapter plugin mechanism we already have, by having an HTMLFormatNodeAdapter or something.

You may again wish to consider the approach I outlined in the discussion: #454 (reply in thread); with an exposed format_node, that could be the default case, and with the macro piecing together a HtmlFormatter (and indeed, could be used to create the actual default HtmlFormatter too). Lmk if you want me to try putting something like that together for you to look over.

@digitalmoksha
Copy link
Collaborator Author

Lmk if you want me to try putting something like that together for you to look over.

Yeah that would be great. I haven't done anything with macros in Rust, and didn't quite follow what you were suggesting at the time.

@kivikakk
Copy link
Owner

Will do in a bit, then.

@kivikakk
Copy link
Owner

I have pushed an example of what I mean here: https://github.com/kivikakk/comrak/compare/custom-html-formatter?w=1

See the added test for a sample use.

I'm extremely low on cycles and I basically hacked this together in the fastest way possible to demonstrate the idea. There's a lot of cleanup that can be done; I'm exposing a whole bunch of things here which it'd be nicer not to expose.

I am 100% sure this can be done cleaner, and I wouldn't want to merge this as-is — it's just to give you an idea of how a macro could help produce the outcome you want. Note how the baseline html::HtmlFormatter is created in the very last line of html.rs.

@digitalmoksha
Copy link
Collaborator Author

Oh awesome, thanks! I'll look it over in the next few days

@digitalmoksha
Copy link
Collaborator Author

Ok, I see that the basic component of it is generating the format_node function with the code supplied.

I guess one of the benefits of this is that it's inline - there's not another level of function as there would be with the adapter interface.

As you said, it would need to be scoped down significantly. Making the entire formatter into a macro somewhat feels like adding a lot of complexity to it. I tend to fall on the side of less meta-programming unless there is a huge win.

I did try to integrate it into my crate to test it out, and I couldn't get it to compile, it always had a problem expanding the macro. I was able to very minimal macro working, so I know that using a macro works in general.

@kivikakk
Copy link
Owner

Well, the formatter isn't a macro: it's generated by a macro. Imagine the above code was actually merged as-is into Comrak itself; you could then create your formatter at compile time with create_formatter!, and then call it the same way HtmlFormatter is called.

This is the kind of thing I'm proposing: it would allow any Comrak user to use the supplied formatter, the same way they currently do, and the code they're actually using would be identical (the macro generates ~identical code as what we have currently in main), but they could also use the macro to create their own formatter type specialised for certain node types.

It doesn't need to be "scoped down", I don't think, just cleaned up; I mocked it up as quickly as possible and it's not quite merge-quality, but the general idea is sound.

@digitalmoksha
Copy link
Collaborator Author

digitalmoksha commented Feb 21, 2025

But it basically inlines the entire formatter into your own code, right? If that's true, then it seems like whatever dependencies get added to the original formatter, now those need to get included where the custom formatter is defined.

As opposed to maybe still using a macro, but making it a small interface into the main formatter. I guess what I'm concerned about is if we make some type of change, such as struct like WriteWithLast, then that has to be exposed in the file that uses the custom formatter, rather than being able to keep it isolated.

I may be overthinking it though.

@digitalmoksha
Copy link
Collaborator Author

digitalmoksha commented Feb 21, 2025

For example I feel like my custom formatter should only ever need to use crate::nodes::NodeMath if I'm only overriding that node. But if the full formatter is inlined, then I have to pull them all in. I think...

@kivikakk
Copy link
Owner

We can use $crate or whatever — this is the kind of thing I mean by cleaning it up. It should be pretty plug-and-play in the end. Same with WriteWithLast. The macro system is pretty expressive.

@digitalmoksha
Copy link
Collaborator Author

Alright, I’m sold, let’s do it! 🚀

@kivikakk
Copy link
Owner

I'll clean it up then :)

@kivikakk
Copy link
Owner

(It's already looking like 10000x better; moving all the default functions out into html::render_… so they don't get regenerated with every invocation. Will open a PR and cc you when done!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants