Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use footnote name for reference id #300

Merged
merged 6 commits into from
May 19, 2023

Conversation

digitalmoksha
Copy link
Collaborator

For example

one[^zz]
[^zz]: two

should generate

<p>one<sup class="footnote-ref"><a href="#fn-zz" id="fnref-zz" data-footnote-ref>1</a></sup></p>
<section class="footnotes" data-footnotes>
<ol>
<li id="fn-zz">
<p>two <a href="#fnref-zz" class="footnote-backref" data-footnote-backref data-footnote-backref-idx="1" aria-label="Back to reference 1">↩</a></p>
</li>
</ol>
</section>

@kivikakk kivikakk force-pushed the bw-fix-footnote-numbering branch from 46a89b7 to 358de5d Compare May 4, 2023 02:23
@kivikakk
Copy link
Owner

kivikakk commented May 4, 2023

Thanks so much for this! I've adjusted the AST format to use a struct rather than using a tuple. That's also fixed the example build failure in 46a89b7. There's still a number of CI tests failing -- would you be able to adjust them?

@digitalmoksha
Copy link
Collaborator Author

Oh cool, great idea on the struct. Yes I can update the tests.

Still a couple more things we need to solve. Percent encoding of footnote names, and detection of footnote names that begin with _ (and probably other chars).

So this

first[^1] and second[^😄second] and twenty[^_twenty]

[^1]: one
[^😄second]: two
[^_twenty]: twenty

should yield this

<p>first<sup class="footnote-ref"><a href="#fn-1" id="fnref-1" data-footnote-ref>1</a></sup> and second<sup class="footnote-ref"><a href="#fn-%F0%9F%98%84second" id="fnref-%F0%9F%98%84second" data-footnote-ref>2</a></sup> and twenty<sup class="footnote-ref"><a href="#fn-_twenty" id="fnref-_twenty" data-footnote-ref>3</a></sup></p>
<section class="footnotes" data-footnotes>
<ol>
<li id="fn-1">
<p>one <a href="#fnref-1" class="footnote-backref" data-footnote-backref data-footnote-backref-idx="1" aria-label="Back to reference 1">↩</a></p>
</li>
<li id="fn-%F0%9F%98%84second">
<p>two <a href="#fnref-%F0%9F%98%84second" class="footnote-backref" data-footnote-backref data-footnote-backref-idx="2" aria-label="Back to reference 2">↩</a></p>
</li>
<li id="fn-_twenty">
<p>twenty <a href="#fnref-_twenty" class="footnote-backref" data-footnote-backref data-footnote-backref-idx="3" aria-label="Back to reference 3">↩</a></p>
</li>
</ol>
</section>

according to cmark-gfm.

Not sure if I'll get a chance tomorrow, but hopefully over the next couple of days.

@digitalmoksha
Copy link
Collaborator Author

@kivikakk wondering if I could ask a question, I'm still a newbie with Rust.

I notice in cm.rs that in methods such as fn format_text(&mut self, literal: &[u8] or fn format_code(&mut self, literal: &[u8], you're passing in a &[u8] for literal.

Why not use a &String (or String - I'm still figuring out when to reference and not). Strings are UTF8 encoded, and it doesn't seem necessary to be operating at the byte level. It also makes debugging a little more challenging, as inspecting the variables shows a list of bytes, rather than a string of characters.

I need to change format_footnote_definition to pass in the name, and I'm wondering why fn format_footnote_definition(&mut self, name: &String would not be the way to do it. 🤔

@digitalmoksha
Copy link
Collaborator Author

I see that in outc you are checking for specific characters, such as [, but I'm not sure why you're checking at the byte level rather than the character level.

@digitalmoksha
Copy link
Collaborator Author

vendor/cmark-gfm/test/extensions.txt needs to get updated in order for the footnote tests to start passing. I still need to add support for handling multiple references to the same footnote, and encoding.

Would it be worth updating to the latest cmark-gfm in a separate PR, which can focus on any general fixes that might be needed? I'll continue to work on getting tests to pass with the local copy of extensions.txt.

@digitalmoksha
Copy link
Collaborator Author

@kivikakk I like to keep PRs as small and focused as possible. I'll create another PR to handle encoding the footnote name, and also a capitalization issue.

So I think this is ready for a review if you have time. It passes the latest footnote spec from upstream, except for encoding the name.

@kivikakk
Copy link
Owner

Why not use a &String (or String - I'm still figuring out when to reference and not). Strings are UTF8 encoded, and it doesn't seem necessary to be operating at the byte level. It also makes debugging a little more challenging, as inspecting the variables shows a list of bytes, rather than a string of characters.

It's usually one of two reasons in Comrak: performance, or to match upstream behaviour where it does checks at a byte level instead of character. (Fwiw, you'll either want String if you're taking ownership of the copy, or &str — very rarely is &String wanted.) It varies from place to place, but in ~95% of cases the choice is intentional.

In the case of cm.rs, we never need to consider any character that would be multibyte in UTF-8, and we need to index buffers in a few inner loops. Getting the nth character (including e.g. the 'last' character) is O(n), and it would be for no gain here.

@kivikakk
Copy link
Owner

So I think this is ready for a review if you have time.

To be honest, I don't have time for as thorough a review as I'd like, and that seems unlikely to change in the short- to medium-term future. But I've looked through, and it seems sound. I'll run fuzzing on it for a while after to be sure, but I'm happy with these changes — thank you!

@kivikakk kivikakk enabled auto-merge May 19, 2023 01:58
@kivikakk kivikakk disabled auto-merge May 19, 2023 02:05
@kivikakk kivikakk merged commit 127da5c into kivikakk:main May 19, 2023
@digitalmoksha
Copy link
Collaborator Author

To be honest, I don't have time for as thorough a review as I'd like, and that seems unlikely to change in the short- to medium-term future

Completely understand! Thanks for getting this merged 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants