Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Node class in C, fix memory management #8

Merged
merged 5 commits into from
Jun 26, 2015
Merged

Implement Node class in C, fix memory management #8

merged 5 commits into from
Jun 26, 2015

Conversation

nwellnhof
Copy link
Contributor

See the commits for comments.

This prepares for the next commit.
Most methods of Node are implemented directly in C avoiding the
overhead of a Ruby wrapper. The wrappers mostly added error handling
which is now done in the C functions.
This commit makes sure that cmark nodes are never freed if they're still
referenced from Ruby. Take the following code example:

    doc = Node.parse_string("Hi *there*")
    text = doc.first_child.last_child.first_child
    doc = nil
    GC.start

It must be assured that `doc` isn't freed after a GC run because the
document is still referenced by `text`.

This is solved by implementing a "mark" function that is run during Ruby's
GC and marks the parent node of every node that is still alive. Marking
the parent node will propagate up to the root of the tree.

In order to get the Ruby VALUE of the parent node, a 1:1 mapping between
cmark_nodes and Ruby objects is enforced. Upon creation, Ruby objects are
stored in the user data field of a cmark_node. If the cmark_node is
passed to Ruby again, the cached value is used. This also improves
performance if a node is accessed from Ruby multiple times.

If Ruby objects are cached, it must be made sure that they're not freed
by Ruby's GC as long as the cmark_node is still alive. For this reason,
all children of a node are also marked during GC. This essentially means
that all Ruby objects referencing any node of a tree are kept alive
until the whole tree is destroyed.

Last but not least, the finalizer must only be run on root nodes that can
be freed safely. This is achieved by updating the finalizer whenever a
node's parent is changed. The previous check for document nodes in
`rb_free_c_struct` was unsafe because it might access a child node's data
that was already freed (and it failed to free root nodes that aren't
documents).
Node.new now accepts a symbol.
@gjtorikian
Copy link
Owner

Wow. Holy cow. This branch name is the understatement of the year.

My one complaint is that this is lacking Check_Type for some methods. But I can absolutely do that on my own.

Thank you so much for this contribution!

gjtorikian added a commit that referenced this pull request Jun 26, 2015
Implement Node class in C, fix memory management
@gjtorikian gjtorikian merged commit 3b2dba7 into gjtorikian:master Jun 26, 2015
@gjtorikian gjtorikian mentioned this pull request Jun 26, 2015
@nwellnhof
Copy link
Contributor Author

Calling StringValueCStr on a non-string will stringify the value automatically. So Check_Type shouldn't be needed unless you really want to force users to pass a string.

@gjtorikian
Copy link
Owner

When it comes down to jumping into C, I don't trust anyine passing in anything. 😺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants