Jithul's suggestion is helpful, but only if you can set up to do sequential reads. Dave's early comments suggests random access. If this is the case, you either need to live with slow reads or figure a way to rewrite your functions to avoid random reads. We have done this in some applications by moving blocks of data using DMA into internal memory ahead of the read operation, while the processor is busy doing something else.
If you are running out of MIPs, examine interrupt overhead, SIMD options or perhaps consider some assembly routines. Maybe the profiler will help. Some C functions are not that efficient. If you can keep the core busy, the accelerator might be able to process something in parallel. Large FIRs can be implemented with fast convolution (FFT based).
DSPs are very deterministic. You should be able to sort out what is possible for a given sample rate and set of algorithms.
Good luck
Al Clark