v - Instance-aware Model Ensemble With Distillation For Unsupervised

arXiv

Weimin Wu, Jiayuan Fan, Tao Chen, Hancheng Ye, Bo Zhang, Baopu Li
2022

View PDF https://arxiv.org/abs/2211.08106

Cite

APA Click to copy
Wu, W., Fan, J., Chen, T., Ye, H., Zhang, B., & Li, B. (2022). Instance-aware Model Ensemble With Distillation For Unsupervised Domain Adaptation.

Chicago/Turabian Click to copy
Wu, Weimin, Jiayuan Fan, Tao Chen, Hancheng Ye, Bo Zhang, and Baopu Li. “Instance-Aware Model Ensemble With Distillation For Unsupervised Domain Adaptation,” 2022.

MLA Click to copy
Wu, Weimin, et al. Instance-Aware Model Ensemble With Distillation For Unsupervised Domain Adaptation. 2022.

BibTeX Click to copy

@unpublished{weimin2022a,
  title = {Instance-aware Model Ensemble With Distillation For Unsupervised Domain Adaptation},
  year = {2022},
  author = {Wu, Weimin and Fan, Jiayuan and Chen, Tao and Ye, Hancheng and Zhang, Bo and Li, Baopu}
}

Abstract:

The linear ensemble-based strategy (i.e., averaging ensemble) has been proposed to improve the performance in unsupervised domain adaptation (UDA) task. However, a typical UDA task is usually challenged by dynamically changing factors, such as variable weather, views and background in the unlabeled target domain. Most previous ensemble strategies ignore UDA’s dynamic and uncontrollable challenge, facing limited feature representations and performance bottlenecks. To enhance the model adaptability between domains and reduce the computational cost when deploying the ensemble model, we propose a novel framework, namely Instance-aware Model Ensemble With Distillation (IMED), which fuses multiple UDA component models adaptively according to different instances and distills these components into a small model. The core idea of IMED is a dynamic instance-aware ensemble strategy, where for each instance, a non-linear fusion sub-network is learned that fuses the extracted features and predicted labels of multiple component models. The non-linear fusion method can help the ensemble model handle dynamically changing factors. After learning a large-capacity ensemble model with good adaptability to different changing factors, we leverage the ensemble teacher model to guide the learning of a compact student model by knowledge distillation. Furthermore, we provide the theoretical analysis on the validity of IMED for UDA. Extensive experiments conducted on various UDA benchmark datasets (e.g., Office-31, OfficeHome, and VisDA-2017) show the superiority of the model based on IMED to the state-of-the-art methods under the comparable computation cost.