The gap transfer illusion is an auditory illusion in which a temporal gap in a long glide is perceived as if it had transferred to a physically continuous shorter glide. The illusion typically occurs when the long and the shorter glide cross each other at their temporal midpoints, where the long glide is physically divided by the gap. The occurrence of the gap transfer illusion was investigated in stimuli in which the duration and the slope of the long glide were 5000 ms and ∼0.8 oct/s. The shorter glide was given different frequency ranges and different temporal ranges, and thus its time-frequency slope was also varied. The overlap configuration of these crossing glides was varied as well. As control stimuli, we used stimuli in which a continuous long glide crossed a shorter glide with a gap, i.e., the opposite configuration of the gap-transfer stimuli as above, as well as stimuli in which both crossing glides were continuous. The perception of two crossing tones tended to be facilitated when the glides differed in duration and/or slope. When the glides were relatively similar in duration and slope, however, bouncing percepts appeared more often. Similarity between the crossing tones thus promoted auditory bouncing, while dissimilarity between them facilitated the crossing percept. If the crossing percept dominated in gap-transfer stimuli, the gap transfer illusion took place in a typical manner, but the illusory transfer of the gap could occur even when the crossing percept was not dominant. When the shorter glide was as short as 500 ms, the crossing percept and the gap transfer illusion were robust. The mechanism of the illusion was examined in terms of factors that can influence the perceptual integration of auditory stimulus edges, i.e., onsets and offsets, of physically different sounds. Much like the perceptual construction of speech units, we suggest that the auditory system utilizes a rough time window of several hundreds of milliseconds to construct an initial skeleton percept of auditory events. The present data indicated the importance of the temporal proximity, rather than the frequency proximity, between sound edges in the illusory tone construction.