Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenCL support #32

Merged
merged 33 commits into from
Aug 5, 2019
Merged

Add OpenCL support #32

merged 33 commits into from
Aug 5, 2019

Conversation

soreau
Copy link
Collaborator

@soreau soreau commented May 22, 2019

This attempts to use OpenCL to convert rgb data to yuv instead of sws_scale() when wf-recorder is built with OpenCL support and the --to-yuv or -t option is passed. This helps offload work to the gpu that would otherwise be done on the cpu.

Copy link
Owner

@ammen99 ammen99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea

@ammen99
Copy link
Owner

ammen99 commented May 27, 2019

I think that the options now are a bit confusing TBH.

For example, we have a --no-opencl which works only with --convert-rgb, and we'd need --convert-rgb to enable OpenCL with libx264 (so GPU converting and CPU encoding), which is of course strange. What would you say about an option --opencl (which enables OpenCL if it can be used) and another, --force-yuv for the radeon vaapi usecase?

@soreau
Copy link
Collaborator Author

soreau commented May 27, 2019

I think that the options now are a bit confusing TBH.

For example, we have a --no-opencl which works only with --convert-rgb, and we'd need
--convert-rgb to enable OpenCL with libx264 (so GPU converting and CPU encoding), which is of >course strange. What would you say about an option --opencl (which enables OpenCL if it can be >used) and another, --force-yuv for the radeon vaapi usecase?

I guess this is fine but in the case where opencl isn't built in, should we ifdef the option out? In the case where you'd use --force-yuv, I'm guessing you'd probably want to use opencl if it is built in, so you'd need to pass both options. You'd only need to disable opencl to test performance gains or if there's a problem.

I would like to have opencl enabled by default if it was built in and the other option could be called whatever but I would like --to-yuv.

@soreau
Copy link
Collaborator Author

soreau commented May 28, 2019

I made it like you suggest to see how it works out. I meant -t as --to-yuv and -e as --enable-opencl but have --force-yuv and --opencl for now. I also committed your patch to use a function to avoid code duplication and literal string for the opencl program.

}
#else
/* Silence compiler warning when opencl is disabled */
(void)(y_invert);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can move this inside the #ifdef

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this is to not leave y_invert variable declared but not used in the case where opencl is disabled. If we put it inside the ifdef, the warning still happens.

src/main.cpp Outdated
{ 0, 0, NULL, 0 }
};

int c, i;
std::string param;
size_t pos;
while((c = getopt_long(argc, argv, "o:f:g:c:p:d:la::t::", opts, &i)) != -1)
while((c = getopt_long(argc, argv, "o:f:g:c:p:d:la::t::e::", opts, &i)) != -1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does e have optional parameter when it doesn't take any value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@ammen99
Copy link
Owner

ammen99 commented May 29, 2019

I'll test this later today and hopefully merge, if there is nothing else that you have planned to do.

@soreau
Copy link
Collaborator Author

soreau commented May 29, 2019

I don't have anything else planned at this time.

@soreau
Copy link
Collaborator Author

soreau commented Jun 3, 2019

@ammen99 Can you test this patch (on top of opencl branch)? Adjust the value of SIZE_MULTIPLIER and see if there is a relation to how much cpu is being used. I've tried between 1-32, with and without vaapi using libx264 and h264_vaapi. Here the results are that the higher the number (meaning the more memory allocated and copied), the less cpu that is being consumed (via htop). It seems counterintuitive that copying more data would result in less cpu usage. Is it the case for you too? Any idea what might be going on here?

@soreau
Copy link
Collaborator Author

soreau commented Jun 11, 2019

As discussed in irc, it has been realized that the cpu was probably doing less because the gpu was doing more copying, essentially the same as calling usleep after copying. Now that I've set the buffers to the correct size, I've tested on a box with libx264 and h264_vaapi, with and without opencl, while idle and while moving mouse cursor, recording the top two (highest cpu usage) threads and the results are always better in the opencl case except libx264+sws_scale when idle. It also should now clean up properly without crashing.

                    |   sws_scale  |    OpenCL    |
                    |______________|______________|
                    |   T1   T2    |   T1   T2    |
____________________|______________|______________|
           | idle   |   95%  50%   |   97%  54%   |
libx264    |        |              |              |
           | motion |   57%  32%   |   51%  26%   |
___________|________|______________|______________|
           |        |              |              |
           | idle   |   47%  46%   |   38%  37%   |
h264_vaapi |        |              |              |
           | motion |   38%  38%   |   32%  32%   |

@ammen99
Copy link
Owner

ammen99 commented Jul 12, 2019

So this should be ready for merging, right? Could you please rebase on top of master?

soreau added 18 commits July 12, 2019 12:58
OpenCL will be used for converting from rgb to yuv data when
built with OpenCL support and using vaapi with to-yuv option.
Namely this avoids reallocating the output buffer each frame.
This makes it so OpenCL is used for rgb to yuv conversion instead of
sws_scale(), when --to-yuv option is used.
And other stuff.
…ption

The new yuv420p shader does pixel averaging to produce uv values from
2x2 subsample. It also supports odd size resolutions in the vaapi case.
The --no-opencl option (or -n) disables opencl if it was built in.
@soreau
Copy link
Collaborator Author

soreau commented Jul 12, 2019

So this should be ready for merging, right? Could you please rebase on top of master?

Yes, should be ready to go now.

@ammen99
Copy link
Owner

ammen99 commented Aug 5, 2019

I tested this final version with both OpenCL on and off, and both VAAPI on and off, seems to work fine.

Thanks for the work on this, and sorry for the delay.

@ammen99 ammen99 merged commit 1388f45 into ammen99:master Aug 5, 2019
@soreau
Copy link
Collaborator Author

soreau commented Aug 5, 2019

I tested this final version with both OpenCL on and off, and both VAAPI on and off, seems to work fine.

Thanks for the work on this, and sorry for the delay.

No problem, thanks for testing. I'd like to see this 'fixed' in the gallium vaapi driver and test the internal converter for comparison, but maybe that will come in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants