Over the past decade, there has been extensive work in developing integrated silicon photonics (SiPh) gratings for the optical addressing of trapped ion qubits among the ion trap quantum computing community. However, when viewing beam profiles from gratings using infrared (IR) cameras, it is often difficult to determine the corresponding heights where the beam profiles are located. In this work, we developed transformer models to recognize the corresponding height categories of beam profiles in light from SiPh gratings. The models are trained using two techniques: (1) input patches and (2) input sequence. For the model trained with input patches, the model achieved a recognition accuracy of 0.924. Meanwhile, the model trained with input sequence shows a lower accuracy of 0.892. However, when repeating the model training for 150 cycles, a model trained with input patches shows inconsistent accuracy ranges between 0.289 to 0.959, while the model trained with input sequence shows accuracy values between 0.75 to 0.947. The obtained outcomes can be expanded to various applications, including auto-focusing of light beams and auto-adjustment of the z-axis stage to acquire desired beam profiles.