Abstract

Learning knowledge embedding representation is an increasingly important technology. However, the choice of hyperparameters is seldom justified and usually relies on exhaustive search. Understanding the effect of hyperparameter combinations on embedding quality is crucial to avoid the inefficient process and enhance practicality of embedding representation along subsequent machine learning applications. This work focuses on translational embedding models for multi-relational categorized data in the clinical domain. We trained and evaluated models with different combinations of hyperparameters on two clinical datasets. We contrasted the results by comparing metric distributions and fitting a random forest regression model. Classifiers were trained to assess embedding representation quality. Finally, clustering was tested as a validation protocol. We observed consistent patterns of hyperparameter preference and identified those that achieved better results respectively. However, results show different patterns regarding link prediction, which is taken as strong evidence that traditional evaluation protocol used for open-domain data does not necessarily lead to the best embedding representation for categorized data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call