Midsagittal ultrasound imaging of the tongue is a portable and inexpensive way to provide articulatory information. However, although ultrasound images show a portion of the tongue surface, other vocal tract structures (e.g., palate) are not typically visible. This missing information may be useful for speech therapy and other applications, e.g., by characterizing vocal tract constrictions and informing how morphological variations affect speech patterns. Prediction of the vocal tract shape from information available during ultrasound imaging (e.g., tongue contours and audio recordings) is, thus, potentially valuable. Recent advancements in articulatory prediction from audio recordings (i.e., acoustic inversion) and speech recognition using combined articulatory and acoustic data have used neural network models. Inspired by these models, this study investigates how well fusion of articulatory and acoustic features in speaker-independent models can predict expanded articulatory information. Specifically, recurrent neural network models will be trained to predict the vocal tract shape based on partial tongue contours and acoustic features, during production of vowels and central approximants. Features will be extracted from simultaneously recorded audio and 2D MRI (USC 75-Speaker Database). Different acoustic features and network architectures will be compared, with the goal of refining future models to predict vocal tract shapes during ultrasound imaging.