* tests: Add im2col perf tests
* vulkan: optimize im2col, more elements per thread
* vulkan: increase small tile size for NV_coopmat2
* vulkan: change im2col to 512 elements per workgroup
Vulkan doesn't mandate a specific rounding mode, but the shader_float_controls
feature allows rounding mode to be requested if the implementation supports it.