Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split single-line and multi-line matching #53

Merged
merged 2 commits into from
Nov 6, 2022

Conversation

bnjbvr
Copy link
Owner

@bnjbvr bnjbvr commented Nov 3, 2022

This adds two test cases that were failing before the accompanying change.

Basically: regular expressions are hard.

A single multi-line regular expression was used, but some of its subsets would have weird behaviors, especially if the preceding line was a comment. The sink would receive a multi-line match that started with a comment, and thus incorrectly classified the whole match as a comment, while the second line was, in fact, not a comment, so that would result in a false positive.

Splitting the regular expression in one that operates on single-line and another one that operates on multiple-lines has a few benefits:

  • the grep crates have a few optimizations when the searcher and matcher are line-based
  • it's likely that we rarely have to use the multiline matcher, so most of the time the line-based search should suffice
  • code maintenance is easier, as I find it easier to reason about the line matchers separately from the multi-line monstrosity

@bnjbvr bnjbvr force-pushed the fix-multiline-starting-with-comment branch from 442f53b to 34bf4eb Compare November 6, 2022 22:50
@bnjbvr bnjbvr merged commit ac5a923 into main Nov 6, 2022
@bnjbvr bnjbvr deleted the fix-multiline-starting-with-comment branch November 6, 2022 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant