In this paper, we study the problem of exploiting parallelism in a hard real-time streaming application modeled as an acyclic synchronous data flow (SDF) graph and scheduled on a heterogeneous multiprocessor system-on-chip platform to alleviate the capacity fragmentation due to partitioned scheduling algorithms and reduce the number of required processors when a throughput requirement is satisfied. As the main contribution in this paper, we propose a method to determine a replication factor for each task in an acyclic SDF graph such that by distributing the workloads among more parallel tasks with lower utilization in the obtained transformed graph, the left capacity on the processors can be efficiently exploited, hence reducing the number of required processors. The experimental results, on a set of real-life streaming applications, demonstrate that our approach can reduce the minimum number of processors required to schedule an application and considerably improve the memory requirements and application latency compared to related approaches while meeting the same throughput constraint.