How to select the optimal formulations for building blended physics-machine learning models for ship motions is not currently clear. This work compares and contrasts two approaches to this problem: (1) A black-box deep learning approach based on a new neural network architecture that can better handle varying wave conditions, and (2) a clear-box model based on updates to linear response amplitude operators via a Gaussian process regression. Both models are trained and evaluated on a dataset consisting of more than 15,000 30-minute-long motion observation windows from two research vessels at sea in the Atlantic and Pacific oceans. Three different hindcast weather services are used, including two models from the EU’s Copernicus system and NOAA’s WAVEWATCH III. The evaluation shows that a tradeoff exists between the formulations, with the black-box formulation offering higher accuracy and the cost of less transparency. The weather hindcast used has a small impact on the results, and the ability of both models to generalize predictions between near-sister ships is also encouraging for the practical application of these techniques.