Learning with Low Samples in High-Dimensions: Estimators, Geometry, and Applications
ABSTRACT: Many machine learning problems, especially scientific problems arising in areas such as climate science, ecology, and brain sciences, operate in the so-called `low sample, high dimension' regime. Such problems typically have numerous possible predictors or features for a prediction task but the number of training examples is small, often much smaller than the number of features. In this talk, we will discuss recent advances in designing and analyzing statistical machine learning formulations for such problems, which considerably generalize prior work such as the Lasso and the Dantzig selector. We will discuss the geometry underlying such problems, and how the geometry helps in characterizing properties of such formulations. We will also discuss applications of such results to multivariate time series modeling, structure learning in graphical models, as well as real world applications in ecology and climate science.
This is joint work with Sheng Chen, Farideh Fazayeli, Andre Goncalves, Jens Kattge, Igor Melnyk, Franziska Schrodt, Hanhuai Shan, Vidyashankar Sivakumar, and Peter Reich.
BIO: Arindam Banerjee is an Associate Professor at the Department of Computer & Engineering and a Resident Fellow at the Institute on the Environment at the University of Minnesota, Twin Cities. His research interests are in statistical machine learning and data mining, and applications in complex real-world problems including climate sciences, ecology, recommendation systems, text analysis, brain sciences, finance, and aviation safety. He has won several awards, including the Adobe Research Award (2016), the IBM Faculty Award (2013), the NSF CAREER award (2010), and six Best Paper awards in top-tier conferences.