Vector databases have emerged as crucial tools for managing and retrieving representation embeddings of unstructured data. Given the explosive growth of data, vector data is often distributed and stored across multiple organizations. However, privacy concerns and regulations like GDPR present new challenges in collaborative and secure queries, also known as federated queries, over those vector data distributed across various data owners. Although existing research has attempted to enable such query services for low-dimensional data, such as relational and spatial data, these solutions can be inefficient in answering vector similarity queries involving high-dimensional data. Therefore, we are motivated to develop a new prototype system called FedSQ that (1) ensures privacy protection across data owners and (2) balances query efficiency and result accuracy when processing federated vector similarity queries. To achieve these goals, FedSQ utilizes advanced secure multi-party computation techniques to prevent information leakage during query processing and incorporates indexing and sampling based optimizations to strike a proper performance balance.
Read full abstract