Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safepoints #18

Closed
ghost opened this issue Oct 15, 2017 · 10 comments
Closed

Safepoints #18

ghost opened this issue Oct 15, 2017 · 10 comments

Comments

@ghost
Copy link

ghost commented Oct 15, 2017

The problem is that in some special situations pinning might be prohibitively slow. For example, suppose you have the following code:

for _ in 0..ITERS {
    do_something_that_pins();
}

If do_something_that_pins performs a quick operation and pins the current thread every time, then the cost of pinning might take a large percentage of the total running time. A way to alleviate the problem would be to enclose the whole loop with epoch::pin:

epoch::pin(|scope| {
    for _ in 0..ITERS {
        do_something_that_pins();
    }
})

But the new problem is that the GC will not be able to make any progress during the whole loop, which might take a long time! Let's try fixing that as well:

for _ in 0..ITERS / 50 { // Assumming `ITERS` is divisible by 50.
    epoch::pin(|scope| {
        for _ in 0..50 {
            do_something_that_pins();
        }
    })
}

Okay, that's a bit better, but ugly, error-prone and cumbersome to write.

I'm proposing to introduce safepoints. The idea is to add a new method Scope::safepoint and change epoch::pin so that it takes a closure that accepts a &mut Scope instead of &Scope:

impl Handle {
    pub fn pin<F, R>(&self, f: F) -> R
    where
        F: FnOnce(&mut Scope) -> R,
    {
        // ...
    }
}

impl Scope {
    pub fn safepoint(&mut self) {
        // ...
    }
}

Now we can change the original code like this:

epoch::pin(|scope| {
    for _ in 0..ITERS {
        scope.safepoint();
        do_something_that_pins();
    }
})

Method safepoint increments an internal counter and every, say, 50th step simply unpins and re-pins the current thread. But, of course, only if this scope is the outermost one (the one that actually pinned the thread). This way we have essentially implemented something very much like QSBR (quiescent-state based reclamation).

Note that the safepoint method is pretty fast. We can implement it like this:

fn safepoint(&mut self) {
    if self.is_outermost() && self.increment() {
        let local: &Local = ...;
        let global: &Global = ...;
        let e = global.epoch.load(Relaxed);
        if local.epoch.swap(e, Relaxed) != e {
            fence(SeqCst);
        }
    }
}

Maybe we don't even need to count steps to occasionally re-pin (the && self.increment() part). Perhaps it's just fine to re-pin on every call.

The reason why the method is called safepoint is because it reminds me of safepoints in garbage-collected languages (e.g. see this and this).

cc @hniksic

@schets
Copy link
Member

schets commented Oct 15, 2017

Nice, I tried something like this a while back and it was good but I didn't think of the &mut api to prevent epoch advancement issues. This seems like a pretty simple branch to do right anyways with the new scope guard, unless others here are really worried about misuse I think we should put it in for the upcoming release I saw discussion of.

Are we sure that the relaxed store is ok? My prior says no since safepoint is basically a clean wrapper over re-pinning. If we take some worst-case scenario where safepoint advancements to cur_epoch+2 are ordered before loads from cur_epoch, wouldn't that expose the code to a situation where the collector could free memory while the thread read it? I suspect a release store (not fence!) would work and remain cheaper (free on x86) than a seq_cst fence.

Finally, a simple addition that would be really nice is to have a safepoint_weak api which advances the counter but doesn't safepoint. This would be ideal for something reacting to input quickly, like:

loop {
    if poll_more_data() {
        process_data();
        safepoint_weak(false); // this is a trite use of if, but one could imagine
more complex conditions leading to some boolean
    }
    else {
        safepoint();
    }
}

or maybe more generally, a safepoint_at(num_call) which safepoints if the counter is greater than num_call.

@ghost
Copy link
Author

ghost commented Oct 15, 2017

Are we sure that the relaxed store is ok?

Nope, you're absolutely right. I've edited the comment - can you take a look again? It should be enough to simply execute a fence if the epoch changed.

or maybe more generally, a safepoint_at(num_call) which safepoints if the counter is greater than num_call.

Easy and configurable. I like it. :)

@schets
Copy link
Member

schets commented Oct 15, 2017

I think a release store may be good enough, since it ensures all memory accesses (not just stores) have completed before the store completes. That would limit the leakage to loads from epoch N+1 going into epoch N, which I believe is safe. This keeps safepoint essentially free on x86 and fairly fast on architectures with optimized release fences.

Something else that might be really good is having safepoint prefetch the global epoch, so the chance of throwing an extra cache miss into the pipeline decreases

@jeehoonkang
Copy link
Contributor

jeehoonkang commented Oct 16, 2017

Correctness

I think the store should be Release, but it can be just a plain store rather than update:

let g = GLOBAL_EPOCH.load(Relaxed);
let l = self.local_epoch.load(Relaxed);
if l != g {
    self.local_epoch.store(g, Release);
    fence(SeqCst);
}

As @schets said, The Release store prevents memory accesses from being reordered after safepoint(). In the proof, we can reuse the reasoning that the store to the local epoch in unpin() and the load from it in try_advance() are rel-acq synchronized.

Note that we don't need to issue fence(SeqCst), because the initial pin()'s fence(SeqCst) is enough for the reasoning in the existing proof.

A note on signature: I think the it should be unsafe fn safepoint(), because the user should guarantee that the knowledge accumulated before the safepoint() should not be used after it.

API

Background: these days I'm implementing wait-free objects in Crossbeam. Wait-free objects typically require "helping": when a thread is slow, other fast threads should be able to "help" the slow thread. In order to do that, we need to maintain the list of all the threads that works on the object. The list should be safely managed by some memory reclamation scheme, too. Naturally, I want to manage the helper thread list by a Collector.

Problem: A (fast) thread should iterate over all the threads, looking for a thread to help. The entire iteration should be inside a single critical section (think: the code region inside pin()). But the iteration isn't fit into the current pin() API: pin(|scope| { ... }), because iteration is interspersed across multiple invocations of operations.

Proposal: So I began to think of a more low-level API to "pin" the thread: a pair of Handle::set_pinned() and Handle::set_unpinned() marks the beginning and the end of a critical section. (We can also think of Handle::reset_pinned() that corresponds to safepoint().) . If we want to leave the current Handle API . as-is, maybe we can introduce QSBRHandle or so for this more low-level API.

@ghost
Copy link
Author

ghost commented Oct 16, 2017

I think the store should be Release, but it can be just a plain store rather than update:

I see now. Agreed.

I think the it should be unsafe fn safepoint(), because the user should guarantee that the knowledge accumulated before the safepoint() should not be used after it.

But safepoint takes a &mut self so you can't use it incorrectly. If you disagree, can you show an example where you think its usage would be unsafe?

But the iteration isn't fit into the current pin() API: pin(|scope| { ... }), because iteration is interspersed across multiple invocations of operations.

Are you saying that an operation iterates over a few threads, stores the pointer where it left off, then exits the pinned scope, and continues later from the same pointer?

Proposal: So I began to think of a more low-level API to "pin" the thread: a pair of Handle::set_pinned() and Handle::set_unpinned() marks the beginning and the end of a critical section. (We can also think of Handle::reset_pinned() that corresponds to safepoint().) .

Would these functions have the if self.is_outermost() check like I wrote in the original post?

@schets
Copy link
Member

schets commented Oct 16, 2017

Release should work with increments on one, but I'm not about increments of two. Wouldn't it be possible to see loads of objects unlinked in epoch N+2 happen before the epoch update to N+2 if we move local_epoch by two or more?

If it's a movement of only one, I believe that there are non seq-cst fence alternatives which may still work. As long as we can avoid that we'll get code is is between really to decently fast on many architectures. Correctness is most important here of course.

@jeehoonkang
Copy link
Contributor

On the safety of fn safepoint()

I think the user should guarantee that the Ptrs gathered before safepoint() should not be used to access the shared memory after safepoint(). Otherwise, the program may invoke undefined behavior. For example, consider the following code:

pin(|scope| {
    // suppose the local epoch = 10.
    let ptr1 = ref.load(Relaxed, scope);
    // suppose the object `ptr1` references is unlinked w/ epoch 10 by another thread.
    scope.safepoint();
    // suppose the new local epoch = 11.
    // now it's possible that the global epoch becomes 12,
    // and the object `ptr1` references can be deallocated.
    let ptr2 = ptr1.as_ref().unwrap().next.load(Relaxed, scope); // possibly use-after-free!
})

Here, the old ptr1 we got before safepoint() is used to access the shared memory after safepoint(), which may invoke undefined behavior due to use-after-free. The example is valid even if safepoint() takes &mut self.

Helping, in more details

Are you saying that an operation iterates over a few threads, stores the pointer where it left off, then exits the pinned scope, and continues later from the same pointer?

As you said, it stores the pointer where it left off, and continues later from the same pointer. However, in my design, the scope should not be left off, and it should be reused over multiple method calls. When the iteration is done, the scope is repinned (on_quiescent_state()), and the iteration began from the head of the thread list.

This may induce a very long GC latency. Even, when a thread doesn't invoke any method, the GC may be indefinitely delayed. That's the reason I want to use a designated Collector instead of using the default one. In the worst case, the entire thread list can be leaked, but arguably it will not be too much.

@ghost
Copy link
Author

ghost commented Oct 16, 2017

Otherwise, the program may invoke undefined behavior. For example, consider the following code:

No, this code will not compile because safepoint takes a &mut self.

The code is attempting to borrow the scope (let ptr1 = ref.load(Relaxed, scope)), then mutably borrow it (scope.safepoint()), and use the first borrow again (ptr1.as_ref()).

However, in my design, the scope should not be left off, and it should be reused over multiple method calls. When the iteration is done, the scope is repinned (on_quiescent_state()), and the iteration began from the head of the thread list.

Can you just wrap the whole code within an epoch::pin(|scope| ...) and this way reuse the same scope accross multiple method calls? If the answer is no, why?

@jeehoonkang
Copy link
Contributor

jeehoonkang commented Oct 16, 2017

Ah.. now I understand why fn safepoint() takes &mut self. Thanks!

On helping

Suppose there's a wait-free queue Q, and a thread A has a handle handle to Q as the evidence that A can invoke methods on the queue. The API would be:

let handle = queue.handle();
...
handle.push(3);
...
let v = handle.try_pop();

In order to implement helping, all handles should be connected in e.g. a linked list. Also, a handle should point to another handle in the list. The pointer represents the next thread to help. We need to store the pointer because we need to be fair in helping other threads in order to achieve wait-freedom. Also, for safety, the pointer should be valid all the time. So the existing of a handle should imply that a GC scope is somehow pinned. However, I failed to provide such a guarantee with pin(|scope| { ... }). Please let me know if you have an idea on this!

I think the easiest way to guarantee that a GC scope is pinned whenever we have a queue handle, is using a QSBR-flavored handle crossbeam_epoch::QSBRHandle: it pins a scope when created, unpins the scope when dropped, and repins the scope on unsafe pub fn on_quiescent_state(). (It's unsafe because the user should guarantee that a Ptr obtained before it is not used after it. I think here we cannot apply the &mut self idea used in safepoint().) . And queue::Handle has a QSBRHandle inside it.

For the separation of concern, I'm writing a reusable library for helping: https://github.com/jeehoonkang/crossbeam-queue/blob/master/src/helpers.rs Tracker<T> is basically the list of T each from a participating thread, and each thread has a Handle as the evidence that it is in a tracker. With a Handle, you can access your data of type T (get()), access another thread's data of type T for helping (peer()), or move to the next thread (next()). Using helpers, I'm implementing a wait-free queue.

@Vtec234
Copy link
Member

Vtec234 commented Nov 30, 2017

Closing since safepoint has been implemented. Please reopen if there is outstanding discussion.

@Vtec234 Vtec234 closed this as completed Nov 30, 2017
@ghost ghost mentioned this issue Nov 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants