In large-scale cloud computing systems, a task is often divided into multiple subtasks which can be executed in parallel in different machines. As a result, the task completion time is constrained by the completion time of the slowest subtask. To reduce the task completion time, the strategy of replicating the straggling subtasks has been employed in cloud computing frameworks such as MapReduce and Hadoop. Analyzing mathematically the performance of such replication strategies has recently received great attention. However, most of the analytical work focuses on the case where the completion times of the subtasks are identically distributed. This assumption may not hold in practice due to the modularization and encapsulation of the computation of a task, resulting in different service requirements for different subtasks. In this paper, we consider the case where the completion times of the subtasks of a task are drawn from heterogeneous/empirical distributions. Furthermore, we consider the case where jobs consist of hierarchical tasks that are required to be executed in a specific order described by a task precedence graph. We propose a novel framework to investigate how to allocate replication resources among the subtasks such that the overall task completion time is minimized. Specifically, we devise a Lagrange multiplier-based method and a water-filling-like algorithm for integer programs. We show via analysis and simulations the optimality and efficiency of our proposed algorithms, and explore the tradeoff between cost and latency from introducing replications in a task graph.