Data Collaboration Analysis Over Matrix Manifolds

The effectiveness of machine learning (ML) algorithms is deeply intertwined with the quality and diversity of their training datasets. Improved datasets, marked by superior quality, enhance the predictive accuracy and broaden the applicability of models across varied scenarios. Researchers often integrate data from multiple sources to mitigate biases and limitations of single-source datasets. However, this extensive data amalgamation raises significant ethical concerns, particularly regarding user privacy and the risk of unauthorized data disclosure. Various global legislative frameworks have been established to address these privacy issues. While crucial for safeguarding privacy, these regulations can complicate the practical deployment of ML technologies. Privacy-Preserving Machine Learning (PPML) addresses this challenge by safeguarding sensitive information, from health records to geolocation data, while enabling the secure use of this data in developing robust ML models. Within this realm, the Non-Readily Identifiable Data Collaboration (NRI-DC) framework emerges as an innovative approach, potentially resolving the ‘data island’ issue among institutions through non-iterative communication and robust privacy protections. However, in its current state, the NRI-DC framework faces model performance instability due to theoretical unsteadiness in creating collaboration functions. This study establishes a rigorous theoretical foundation for these collaboration functions and introduces new formulations through optimization problems on matrix manifolds and efficient solutions. Empirical analyses demonstrate that the proposed approach, particularly the formulation over orthogonal matrix manifolds, significantly enhances performance, maintaining consistency and efficiency without compromising communication efficiency or privacy protections.

Article

Download

View PDF