-
-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing an expression containing a huge list of token triggers a stack overflow #162
Comments
Thanks @evomassiny I’m not sure how the refactor would work right now but it’s something to think about. I’m open to ideas. Stack machine sounds about right. |
Hi, I found a way to build a parser without fonction recursion, by using an intermediate representation of the Abstract Syntax Tree in Polish Notation, and build a AST from it, in a second pass. The benefit of the polish notation is its "flatness", we can represent the whole AST into a vector of expression, which make it easier to work with, specialy using a stack machine. the main parsing algorithm would be something like that;
The hard part is the implementation of the function that match expressions patterns (basically regex but for tokens). To convince myself that it could be done, I implemented a parser for a subset of the javascript langage in this repo: https://github.com/evomassiny/toylang, I hope this helps, but to be honest I don't have much knowledge of AST's parsers, there might be easier solutions. |
Also look into https://en.wikipedia.org/wiki/Shunting-yard_algorithm |
thanks @simonbuchan, that is what my current WIP implementation is based upon |
@simonbuchan thats a great algorithm! |
Wrong @simon :P |
Adding benchmark to keep track of expression parsing jasonwilliams#226 |
What algorithm does the parser currently use? I've not had the stack overflow problem with Pratt parsing. I even tried plugging the text in from #162 into my Pratt expression parser, and it parsed very quickly (and this is written in TypeScript: imagine the gains if it was in Rust). The nice thing about Pratt parsing is that it integrates very naturally with recursive decent parsing. |
@jrop Pratt parsers are very similar to the shunting yard algorithm: both are based on attaching precedence and associativity to infix operators. The main difference seems to be that Pratt parsers store nested precedence levels on the call stack, while shunting yard handles that with an explicit stack. If you have a sensible number of precedence levels, Pratt parsing won't hit the stack limit, but it's theoretically a bit slower due to the theoretically higher number of calls: in practice you won't notice a difference. Really the difference is that there's much better documentation on how to implement Shunting Yard, Pratt parsers tend to use weird terms like "null denominator". |
This bug still occurs - here are a few examples. 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1 [[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]] Both of those panic in debug mode/the website WASM demo. You may need to increase the size of the expressions to cause a panic in release mode. |
I will re-open this issue, since I think we are back again doing recursive parsing since we did our new parser. Maybe @jasonwilliams can confirm or give more insights. |
The original description doesn't overflow anymore as there's been some optimzations but this example does still cause the issue: |
i think all thats left here is to add a test on the parser to make sure this doesn't happen again |
discussion is happening at #4089 (comment) |
Hello,
I've stumbled across a weird bug in the javascript parser:
when a single expression uses a lot of token (>120 in debug mode, and >3400 in release mode),
the Parser::parse() triggers a stack overflow.
Here is a minimal way to reproduce the bug:
My guess is that the function recursively calls itself to process each right hand of an addition,
which slowly increase the size of the stack until it eventually overflows.
The only solution that I can think of, is to re-implement the parser into a stack machine, which would basically turns the recursive calls into a while loop (but this might represents a tons of work :S ).
Thank you for your time,
evomassiny
The text was updated successfully, but these errors were encountered: