This lecture investigates the performance of the Schoenauer Vector triads benchmark over the full memory heirarchy of a single core Intel Haswell processor. Analysing the data transfers throughout the memory hierarchy a performance modell is established which qualitatively describes the performance levels for data sets in different memory hierarchy levels. Further, the dense matrix vector multiplication is investigated to identify performance imporvements by increasing the temporal reuse of vector data. As first optimization strategy outer-loop unroll&jam is identified and successfully tested.