UAFS: Uncertainty Aware Feature Selection for Multiple Imputation Problems

Conference Year

January 2019

Abstract

Imputation methods are crucial for dealing with missing data when performing statistical analysis and supervised learning. Excessive missingness, high dimensionality, and complex data sets challenge most methods. Here we introduce a method, Uncertainty Aware Feature Selection (UAFS), that handles extreme levels of missingness in imputation problems. UAFS works by selecting subsets of variables that appear most informative or most significantly related to an outcome variable we wish to predict while accounting for the missing data in those variables. Once this subset is selected standard multiple imputation methods can be applied. We apply this method to real data sets with synthetic missingness, demonstrating that across a variety of types of data and missingness we achieve more accurate imputation results. UAFS is general, works with a variety of imputation methods, and is beneficial in problems with any amount of missingness.

Primary Faculty Mentor Name

James Bagrow

Status

Graduate

Student College

College of Engineering and Mathematical Sciences

Program/Major

Data Science

Primary Research Category

Engineering & Physical Sciences

Abstract only.

Share

COinS
 

UAFS: Uncertainty Aware Feature Selection for Multiple Imputation Problems

Imputation methods are crucial for dealing with missing data when performing statistical analysis and supervised learning. Excessive missingness, high dimensionality, and complex data sets challenge most methods. Here we introduce a method, Uncertainty Aware Feature Selection (UAFS), that handles extreme levels of missingness in imputation problems. UAFS works by selecting subsets of variables that appear most informative or most significantly related to an outcome variable we wish to predict while accounting for the missing data in those variables. Once this subset is selected standard multiple imputation methods can be applied. We apply this method to real data sets with synthetic missingness, demonstrating that across a variety of types of data and missingness we achieve more accurate imputation results. UAFS is general, works with a variety of imputation methods, and is beneficial in problems with any amount of missingness.