Pages

31 January, 2024

How to properly use prefetch instructions?

I am trying to vectorize a loop, computing dot product of a large float vectors. I am computing it in parallel, utilizing the fact that CPU has large amount of XMM registers, like this:

__m128* A, B;
__m128 dot0, dot1, dot2, dot3 = _mm_set_ps1(0);
for(size_t i=0; i

No comments:

Post a Comment

Thanks