We study here the behavior of two numerical algorithms (matrix multiplications and finite difference methods) on a three-level memory hierarchy multiprocessor RP3. Using different versions of these algorithms which differ on data placement (global, local, global and cacheable, local and cacheable) and on data access (blocked on non-blocked), we study the impact of these parameters on the performance of the program. This performance analysis is done using a very accurate monitoring system (VPMC) which records instructions, memory requests, cache requests and misses. We perform also a theoretical performance analysis of these programs using a model of computation and communication. Good agreements are found between theoretical and experimental results. As a conclusion we discuss the use of local memory on such a machine and show it is not worth with the RP3 ratio of communication between local and global memories. We also discuss optimal use of cache, show the optima can only be met under some cache properties (private store-in cache with user control of write-back) and show blocked optimal algorithms are to be used to meet it.