As a growing number of blinds, many blind aid systems which use the way of converting image to sound or vibration have been designed. However, phonetic prompt and vibration are very limited in information expression. Therefore, a system called Image-Sound Eye-substitution Equipment (ISEE) is designed in this paper. This system consists of binocular camera, Raspberry Pi, and stereo headsets. It can convert image to sound by mapping each pixel into an independent sound source based on sensory substitution technology and enable blinds to browse pictures, appreciate scenery, and avoid obstacles for its high-density information expression. Its core of data processing design is the Sensory Substitution Technology-based Image-to-Sound Transform (SST-IST) algorithm. The SST-IST mainly consists of 3 steps, i.e., the image reshape, image-to-sound conversion matrix generation, and image-to-sound transmission. First, a single-frequency pitch of pixel is generated, followed by wave interception and localization to obtain independent sound clips to express pixel information. Second, the sound chips based on Head Related Transfer Functions (HRTF) are stereo processed. Third, a series of single-frequency sound chips are accumulated with weights to generate a complete sound. Extensive experiments show that our SST-IST can restore image information well and the sound restore rate can achieve 73.0%. Also, it is universally adaptive and robust to the changes in ambient light and noises within a certain range. The auditory discrimination experiments show that, with 4-5 hours training, the users can recognize basic graphics by using ISEE system.