-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error "incompatible character encodings: UTF-8 and ASCII-8BIT" when combined with a rails app #9
Comments
Amazing write up, thank you. I'll take a look at this within the day. There might need to be a forced UTF-8 encoding. |
I have bad news and good news. The bad news is, I cannot get the exception to throw. I started a new rails project, jumped into console, and tried to see what would happen if I passed the same data:
The "good" news is that there's definitely something weird going on with those escape codes. I would expect I wonder if this is specific to
But that seems wrong/unfair/not the responsibility of the consumer. |
To finish my thought: probably this library should do the |
You nailed it! It works. Thanks. |
BTW... I'm using slim. Maybe it's related? |
@gjtorikian we're seeing this same issue. Is there a way to traverse all nodes and convert each to utf8? Any pointers you can provide would be greatly appreciated! |
Which version of commonmarker are you using? |
@gjtorikian We're on version 0.23.4 |
So you can absolutely walk the AST tree: https://github.com/gjtorikian/commonmarker#example-walking-the-ast But that's very slow/time-consuming, and ideally shouldn't be necessary. Are you able to share your markdown doc or create a small (failing) test to show the error? |
@gjtorikian thank you for your response. I'm trying to paste a minimal example but it appears Github's editor is stripping out the problematic character from the following:
You may need to insert the missing 0x200b character locally so as to achieve: We solved this problem with:
but it would be great if commonmarker gave us an option to treat the whole document's tree as utf-8, so we don't need to force all encodings. Would that be feasible? |
Yes, it should be. I agree that forcing the encoding is not ideal! |
@duhaime Hm. One thing that's different here is that when I run your code, with the encoded character placed, my tree doesn't recognize any link nodes at all:
Could you change your sample code to
And list the walked nodes as I've done here? |
Hmm, the plot thickens! I get:
And:
Why would these results look so different? I'm using the Rails console instead of irb above--is that relevant? |
Ah, I think I misread your example. The string is literally |
Oh no, sorry, it should be exactly as it appears in the image above (the latter in your comment above). |
Strange! What version of Ruby do you have running? |
2.6.8 via rbenv:
|
I simply can't reproduce this. And even CI, running Ruby 2.6.6 on Windows/Ubuntu/MacOS. I booted a Rails 7 app to test the logic in the console, and it worked fine, too. Just to be extra explicit, this is the code I'm using to test:
A couple of things to note:
I'm afraid without more information I'm not sure what I can do to solve this. |
Ah I think your example just needs to be updated. Your snippet has:
In this case, your string literally contains the characters in the Unicode character that's causing the issue. I think we just need to update the string you're using. As it turns out, the string I posted initially ( You can see the codepoints of the string if you use Does this help you reproduce the situation? |
Got it. In ruby the convention is to use irb(main):006:0> str = "hello: <https://world.com\u200b>"
=> "hello: <https://world.com >" I can now reproduce the problem; now we're getting somewhere. |
Oh, and how what's the code snippet for how you're rendering the string? |
Yes, or |
@duhaime Can you try pointing the gem to the |
Hmm, the change looks good but I'm still getting the same error. This must be user error. Here's what I'm doing: gem uninstall commonmarker
git clone https://github.com/gjtorikian/commonmarker
cd commonmarker && gem build commonmarker.gemspec
gem install commonmarker-0.23.4.gem
irb Then in the irb console: require 'commonmarker'
s = "hello: <https://world.com>"
doc = CommonMarker.render_doc(s, :DEFAULT)
parsed = ""
doc.walk do |node|
if node.type == :link
text_node = node
text_node = text_node.first_child until [:text, :code].include? text_node.type
if node.url.include?(text_node.string_content)
puts(node.url)
end
end
end Which throws:
Should I be doing something differently to test? |
With the repo cloned, try:
|
Interesting, I ran those steps on a fresh rbenv env, and I still get the same result. Do you get a different result with the code block I posted above? |
Oh shoot, I do. Ok. I'll make time for this today. |
Due to #186, walking over nodes has been removed in v1.0.0. Users can use https://github.com/gjtorikian/html-pipeline if they wish to iterate over HTML after the fact. |
I think this might not be a
commonmarker
problem, BUT the error is not raised when usingpandoc-ruby
norredcarpet
, so it has something to do withcommonmarker
.Here you can see a test run from the command line with both
cmark
andcommonmarker
and there's no problem:That said, I'm testing different markdown parsers/renderers for our rails 4.1.12 (ruby 2.2.2) based app and I'm getting the following error:
I have these helpers:
Changing the call to
commonmarker_markdown
to eitherpandoc_markdown
orredcarpet_markdown
renders the expected result with no errors.It's not a DB (postgresql) encoding problem either as hardcoding the test phrase in place of the
text
variable (no DB involved) causes the same problem.Any ideas about what could be happening?
The text was updated successfully, but these errors were encountered: