With the rapid growing complexity of 3D applications, the memory subsystem has become the most bandwidth-exhausting bottleneck in a Graphics Processing Unit (GPU). To produce realistic images, tens to hundreds of thousands of primitives are used. Furthermore, each primitive generates thousands of pixels, and these pixels are computed by shaders with special effects, even to blend multiple texture pixels from external memory to obtain a final color. To hide the long latency texture operations, the shaders are usually highly multithreaded to increase its throughput. However, conventional memory scheduling mechanisms are unaware of the producer-consumer relationship between primitives and pixels. The conventional scheduling mechanisms neither assume that all initiators are independent nor that they use a fixed priority scheme. This paper proposes Demand Look-Ahead (DLA) memory access scheduling based on the statuses of each unit in the GPU, and dynamically generates priority for the memory request scheduler. By considering the producer-consumer relationship, the proposed mechanism reschedules most urgent requests to be serviced first. Experimental results show that the proposed DLA improves 1.47 % and 1.44 % in FPS and IPC, respectively, than First-Ready First-Come-First-Serve (FR-FCFS). By integrating DLA with Bank-level Parallelism Awareness (BPA), DLA-BPA improves FPS and IPC by 7.28 % and 6.55 %, respectively. Furthermore, shader thread performance is improved by 22.06 % and increases the attainable bandwidth by 5.91 % with DLA-BPA.