UAFS: Uncertainty Aware Feature Selection for Multiple Imputation Problems
Conference Year
January 2019
Abstract
Imputation methods are crucial for dealing with missing data when performing statistical analysis and supervised learning. Excessive missingness, high dimensionality, and complex data sets challenge most methods. Here we introduce a method, Uncertainty Aware Feature Selection (UAFS), that handles extreme levels of missingness in imputation problems. UAFS works by selecting subsets of variables that appear most informative or most significantly related to an outcome variable we wish to predict while accounting for the missing data in those variables. Once this subset is selected standard multiple imputation methods can be applied. We apply this method to real data sets with synthetic missingness, demonstrating that across a variety of types of data and missingness we achieve more accurate imputation results. UAFS is general, works with a variety of imputation methods, and is beneficial in problems with any amount of missingness.
Primary Faculty Mentor Name
James Bagrow
Status
Graduate
Student College
College of Engineering and Mathematical Sciences
Program/Major
Data Science
Primary Research Category
Engineering & Physical Sciences
UAFS: Uncertainty Aware Feature Selection for Multiple Imputation Problems
Imputation methods are crucial for dealing with missing data when performing statistical analysis and supervised learning. Excessive missingness, high dimensionality, and complex data sets challenge most methods. Here we introduce a method, Uncertainty Aware Feature Selection (UAFS), that handles extreme levels of missingness in imputation problems. UAFS works by selecting subsets of variables that appear most informative or most significantly related to an outcome variable we wish to predict while accounting for the missing data in those variables. Once this subset is selected standard multiple imputation methods can be applied. We apply this method to real data sets with synthetic missingness, demonstrating that across a variety of types of data and missingness we achieve more accurate imputation results. UAFS is general, works with a variety of imputation methods, and is beneficial in problems with any amount of missingness.