-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReshapeLayer and FlattenLayer should not allow in-place computation #3393
Conversation
While so far only neuron layers should be applied in-place, I am not sure if there is anyone who replies on in-place reshape. |
Perhaps this should be generalized to throw an error for all layers that aren't neuron layers? |
Yes, I believe this should be correct guideline of in-place computation. |
For me, unlike other common layers these reshaping layers seemed to work in-place since they do not change the input values, and I had been using that way for months. I recently found it did not work during debugging. I am not sure if this is a common case, but IMO it is better to be prevented or documented at least. |
Generally I think in-place should be disallowed except in explicit layers where the coder for that layer has indicated that it is safe. I think this needs to be part of an explicit There are also considerations in terms of the previous layer. For example, if a layer needs the |
Thanks for the PR @kkhoot. I think these checks should be moved to the I started trying to clean up the in-place behavior a while ago but never really completed it. I was trying to have |
Thanks for comments. I also agree with @seanbell 's idea. Hope @jeffdonahue 's nice work will be merged someday. |
I went ahead and merged this in 940b299 (with a couple of spacing fixes) as it's useful to have even without fancier/default checks. These checks should be removed if/when such checks are implemented. (And I realized @kkhoot put the check in the |
This PR is to prevent incorrect gradients caused by in-place reshaping of Blobs in ReshapeLayer and FlattenLayer.
If bottom and top blobs are the same in both of these layers and they are on top of a layer which uses the shape of its top blob for backward (such as SliceLayer or PoolingLayer) , then its gradients can be calculated in a wrong way. It can also allow bottom and top blobs of different counts since currently the count checks are performed after reshaping the blob.
A simple solution to this is not allow in-place operation for these layers. Another solution can be save the bottom shape and restore it during backward.
This PR is the former simpler solution.