LikeLike

]]>LikeLike

]]>One things that comes to mind is that modeling the FLOPS is a good measure of performance only if the computation has a high arithmetic intensity. That is, the number of floating point operations to memory accesses is high. This is true of Gaussian Elimination. However, in an operation such as a sparse matrix-vector multiply the arithmetic intensity is low (1 fused multiply add per memory access) and the operation becomes bandwidth limited.

]]>One thing that comes to mind is that this is true as long as the computation is arithmetically intense. That is, the number of floating point operations to memory accesses is large. In the case of Gaussian elimination, this is true and so counting FLOPS is a good measure of cost. However, in a operation such as a sparse matrix-vector product (where you have low arithmetic intensity, 1 fused multiply-add per memory access) it is the memory bandwidth that limits performance.

LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>other direct method. We know of no reason not to use it in this application if you are

sure that the matrix inverse is what you really want.” Press, Teukolsky, et al,

Numerical Recipes: The Art of Scientific Computing 2007. ]]>

LikeLike

]]>