MKL/non-MKL Reconciliation #165

shelhamer · 2014-02-26T05:57:30Z

After no short journey, the port of Caffe away from MKL has come full circle and is ready for integration.

Chronicle:

boost/eigen3 port of Caffe started by @rodrigob, carried forward by @kloudkl, then fixed up by Alejandro Dubrovsky and @jeffdonahue.
A boost + openblas alternative was tried and our options weighed.
Rowland Depp emerges from the shadows with a brilliant commit 56b8ca0 to settle the issue.
cpp/cu layer implementations were split by @erictzeng
@shelhamer fixed the OSX break according to @satol's suggestion of a façade to hide boost rng from cuda code

There could be dragons here.

I made this PR from BVLC:boost-eigen so that we can join together for this glorious merge.

shelhamer · 2014-02-26T06:07:52Z

Water, water, every where,
And all the diffs did shrink.

Incorporated the linting.

shelhamer · 2014-02-27T05:33:48Z

Builds on ubuntu 12.04 in MKL and non-MKL modes. Does not build on OSX.

@erictzeng the only conflicts were in the dropout layer since it 1. was not split in this branch and 2. relied on random number generation before Rowland Depp's MKL/non-MKL integration.

erictzeng · 2014-02-27T05:43:25Z

Seems alright to me!

shelhamer · 2014-02-27T06:02:57Z

Ok, I fixed a little makefile logic. Sadly, this still breaks in OSX, but less so.

/usr/local/cuda/bin/nvcc -ccbin=/usr/bin/clang++ -Xcompiler -fPIC -DNDEBUG -O2  -DUSE_MKL -I/Users/shelhamer/anaconda/include -I/Users/shelhamer/anaconda/include/python2.7 -I/Users/shelhamer/anaconda/lib/python2.7/site-packages/numpy/core/include -I/usr/local/include -I./src -I./include -I/usr/local/cuda/include -I/opt/intel/mkl/include -gencode arch=compute_30,code=sm_30 -c src/caffe/layers/bnll_layer.cu -o build/src/caffe/layers/bnll_layer.cuo
/usr/local/include/boost/type_traits/is_abstract.hpp(72): error: identifier "__is_abstract" is undefined

/usr/local/include/boost/type_traits/is_abstract.hpp(72): error: function call is not allowed in a constant expression

/usr/local/include/boost/type_traits/is_abstract.hpp(72): error: type name is not allowed

/usr/local/include/boost/type_traits/is_enum.hpp(181): error: identifier "__is_enum" is undefined

/usr/local/include/boost/type_traits/is_enum.hpp(181): error: function call is not allowed in a constant expression

/usr/local/include/boost/type_traits/is_enum.hpp(181): error: type name is not allowed

Any ideas? Perhaps building with g++ could work (see this Gadgetron build thread), but I don't exactly want to try it out.

Recall that this is a CUDA/boost conflict, so a CPU only compilation would partially sidestep this.

shelhamer · 2014-02-27T07:37:12Z

This seems to be isolated to boost RNG in <boost/random/mersenne_twister.hpp>. Splitting random number generation into two flavors, MKL and boost, should resolve the OSX issues with the condition that OSX still requires MKL.

satol · 2014-02-27T07:50:51Z

@shelhamer Hi! May I suggest more radical splitting of cu files? Right now the split files still reference boost. Do you really need this? See the commit on my fork that fixes bnll_layer build on 10.9. Seems similar fixes should be done for other issues (dropout_layer, etc)?
satol@1bbe45f

shelhamer · 2014-02-27T08:03:47Z

@satol Thank you! I like your suggestion and proof-of-concept commit more than my original plan. Splitting off common_cu.hpp and layer_cu.hpp should provide the separation we need. It should avoid introducing a per-layer header too. If you could further develop this line of refactoring and submit a PR I would welcome the contribution! (Please PR to BVLC:boost-eigen.)

Thoughts? @Yangqing @jeffdonahue @erictzeng

satol · 2014-03-03T04:12:56Z

@shelhamer To make it build on mac, instead of changing many cu files, I refactored the random generator to hide the <boost/random/mersenne_twister.hpp> from CUDA. Everything compiled ok.
https://github.com/satol/caffe/tree/boost-eigen

jamt9000 · 2014-03-19T10:16:27Z

Can the changes be merged into dev soon? I do not have MKL, so contributing changes against the dev branch is currently quite hard since I cannot test them!

shelhamer · 2014-03-19T17:07:47Z

@jamt9000 we'd certainly like to merge too! However we're held up on some boost/CUDA integration work for OSX. Now that v0.99 is out we'll take another stab at it–this branch has gone on long enough.

Rebased: fast-forward, but still breaks OSX.

- examples, test and pycaffe compile without problem (matcaffe not tested) - tests show some errors (on cpu gradient tests), to be investigated - random generators need to be double checked - mkl commented code needs to be removed

[shelhamer: removed math function tests, since they were merged via other branches]

previously filled in all NaNs for me, making many tests fail)

add MKL dirs conditioned on USE_MKL include libraries before making LD_FLAGS

They were the wrong way round, causing linking to fail in some cases

This ensures that it works with ATLAS's header file, which doesn't include such a guard itself (whereas the reference version from Ubuntu's libblas-dev does)

The FIXMEs about RNG were addressed by caffe_nextafter for uniform distributions and the normal distribution concern is surely a typo in the boost documentation, since the normal pdf is correctly stated elsewhere in the documentation.

shelhamer · 2014-03-22T07:55:09Z

This beast is slain. This builds and tests pass on linux and osx.

Although the facade to hide boost rng from cuda makes it look unnecessarily complex, it doesn't intrude into the rest of the code and the caffe_vRng* methods are none the worse for it.

@Yangqing, please let me know if you have any reservations about this approach. @jeffdonahue, any comments on style are welcome.

Let's merge this soon.

p.s. I tried to totally split common into cpp / cu code but it's not so simple.

kloudkl · 2014-03-22T08:09:34Z

Since the solution was proposed by @satol and authored by @shelhamer, the Copyright comment of the HDF5DataLayer which was also authored by two contributors would be a nice example to follow.

// Copyright 2014 BVLC.
/*
Contributors:
- Sergey Karayev, 2014.
- Tobias Domhan, 2014.

shelhamer · 2014-03-22T08:35:48Z

I've set the copyright to blanket BVLC and contributors for now, since code contributions are tracked per-author by versioning and @satol is thanked in the commit message of 19bcf2b. Let's all discuss this at #249.

jeffdonahue · 2014-03-22T17:04:32Z

Looks great Evan! Thanks for getting this long-standing issue resolved.

@satol

Split boost random number generation from the common Caffe singleton and add a helper function for rng. This resolves a build conflict in OSX between boost rng and nvcc compilation of cuda code. Refer to #165 for a full discussion. Thanks to @satol for suggesting a random number generation facade rather than a total split of cpp and cu code, which is far more involved.

The exact details of the contributions are recorded by versioning.

MKL/non-MKL Reconciliation Caffe no longer requires MKL. By default it builds without it, relying on atlas and cblas instead. Set the `USE_MKL` var in your Makefile.config accordingly.

shelhamer · 2014-03-23T05:54:03Z

Join hands! We are one.

@satol

Split boost random number generation from the common Caffe singleton and add a helper function for rng. This resolves a build conflict in OSX between boost rng and nvcc compilation of cuda code. Refer to BVLC#165 for a full discussion. Thanks to @satol for suggesting a random number generation facade rather than a total split of cpp and cu code, which is far more involved.

MKL/non-MKL Reconciliation Caffe no longer requires MKL. By default it builds without it, relying on atlas and cblas instead. Set the `USE_MKL` var in your Makefile.config accordingly.

Fix FindNCCL.cmake

shelhamer mentioned this pull request Feb 26, 2014

C++ linter #163

Merged

shelhamer mentioned this pull request Feb 27, 2014

Splitting source files between CUDA and CPU code. #172

Merged

shelhamer mentioned this pull request Feb 27, 2014

boost-eigen branch doesn't build on OSX 10.7, 10.9 (10.8 untested) #122

Closed

shelhamer added enhancement and removed enhancement labels Feb 27, 2014

sergeyk added this to the 1.0 milestone Mar 13, 2014

sergeyk assigned erictzeng Mar 13, 2014

rodrigob and others added 10 commits March 21, 2014 13:52

Fixed uniform distribution upper bound to be inclusive

04ca88a

Fixed FlattenLayer Backward_cpu/gpu have no return value

d666bdc

Fix test stochastic pooling stepsize/threshold to be same as max pooling

38457e1

Fix math funcs, add tests, change Eigen Map to unaligned for lrn_layer

788f070

[shelhamer: removed math function tests, since they were merged via other branches]

relax precision of MultinomialLogisticLossLayer test

d37a995

nextafter templates off one type

2ae2683

mean_bound and sample_mean need referencing with this

b925739

make uniform distribution usage compatible with boost 1.46

93c9f15

use boost variate_generator to pass tests w/ boost 1.46 (Gaussian filler

4b1fba7

previously filled in all NaNs for me, making many tests fail)

shelhamer and others added 7 commits March 21, 2014 13:52

rewrite MKL flag note, polish makefile

c028d09

add MKL dirs conditioned on USE_MKL include libraries before making LD_FLAGS

make MKL switch surprise-proof

f6cbe2c

comment out stray mkl includes

ff27988

Fixed order of cblas and atlas linker flags

40aa12a

They were the wrong way round, causing linking to fail in some cases

Added extern C wrapper to cblas.h include

a9e772f

This ensures that it works with ATLAS's header file, which doesn't include such a guard itself (whereas the reference version from Ubuntu's libblas-dev does)

clean up residual mkl comments and code

453fcf9

The FIXMEs about RNG were addressed by caffe_nextafter for uniform distributions and the normal distribution concern is surely a typo in the boost documentation, since the normal pdf is correctly stated elsewhere in the documentation.

lint

aaa2646

shelhamer mentioned this pull request Mar 22, 2014

Split CUDA code (*.cu) from CPU code (*.cpp). #152

Closed

5 tasks

shelhamer mentioned this pull request Mar 22, 2014

Add more convenience math functions and all tests pass #201

Merged

shelhamer added 2 commits March 22, 2014 12:08

Set copyright to BVLC and contributors.

bece205

The exact details of the contributions are recorded by versioning.

shelhamer merged commit 699b557 into dev Mar 23, 2014

shelhamer removed the work in progress label Mar 23, 2014

shelhamer deleted the boost-eigen branch May 2, 2014 15:04

shelhamer mentioned this pull request May 20, 2014

Next: 0.999 #429

Merged

This was referenced Jul 3, 2014

CMake build system - cleaned #573

Closed

Device Abstraction #610

Closed

shelhamer mentioned this pull request Aug 30, 2014

MacOS: NVCC builds crashed after updating boost #1009

Closed

slayton58 pushed a commit to slayton58/caffe that referenced this pull request Jun 9, 2016

Merge pull request BVLC#165 from lukeyeager/nvidia/fix-nccl-cmake

d750a7a

Fix FindNCCL.cmake

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MKL/non-MKL Reconciliation #165

MKL/non-MKL Reconciliation #165

shelhamer commented Feb 26, 2014

shelhamer commented Feb 26, 2014

shelhamer commented Feb 27, 2014

erictzeng commented Feb 27, 2014

shelhamer commented Feb 27, 2014

shelhamer commented Feb 27, 2014

satol commented Feb 27, 2014

shelhamer commented Feb 27, 2014

satol commented Mar 3, 2014

jamt9000 commented Mar 19, 2014

shelhamer commented Mar 19, 2014

shelhamer commented Mar 22, 2014

kloudkl commented Mar 22, 2014

shelhamer commented Mar 22, 2014

jeffdonahue commented Mar 22, 2014

shelhamer commented Mar 23, 2014

MKL/non-MKL Reconciliation #165

MKL/non-MKL Reconciliation #165

Conversation

shelhamer commented Feb 26, 2014

shelhamer commented Feb 26, 2014

shelhamer commented Feb 27, 2014

erictzeng commented Feb 27, 2014

shelhamer commented Feb 27, 2014

shelhamer commented Feb 27, 2014

satol commented Feb 27, 2014

shelhamer commented Feb 27, 2014

satol commented Mar 3, 2014

jamt9000 commented Mar 19, 2014

shelhamer commented Mar 19, 2014

shelhamer commented Mar 22, 2014

kloudkl commented Mar 22, 2014

shelhamer commented Mar 22, 2014

jeffdonahue commented Mar 22, 2014

shelhamer commented Mar 23, 2014