Research by Topic

Graphical Models: Methods and Theory with Missing Data

High-dimensional graphical models have been a powerful tool for learning connections or interaction patterns among a large number of variables, with wide applications such as learning stock networks, social networks, etc. While most prior work focuses on the case when all variables are measured simultaneously, one typical challenge in real data sets is that only certain subsets of variables can be measured together, or can be measured sufficiently many times. To estimate the graph (conditional independence relationship) or certain characteristics of the graph accurately, novel statistical methods and theory need to be developed. I am actively working on this direction and happy to collaborate on related topics!

Papers:

  • Graphical Model Inference with Erosely Measured Data
    Lili Zheng, Genevera I. Allen
    Accepted to Journal of the American Statistical Association, Theory and Methods
    In this work, we are primarily concerned with graphical model inference from uneven and irregular measurements, which we term as ‘‘erose measurements". This is motivated by neuroscience and genetic data applications where the missingness can be highly uneven with drastically different sample sizes. In these scenarios, uncertainty quantification can be extremely important since some parts of the graph can be estimated with much higher confidence than the others. We propose GI-JOE (Graph Inference when Joint Observations are Erose) to perform edge-wise testing in this setting, where the uncertainty level of each edge depends on the sample size of the associated neighbors. Below is a illustrative example of how GI-JOE (tested graph on the right) improves graph selection by considering uneven uncertainties across the graph.
    alt text

Interpretable Machine Learning

With machine learning models being implemented everywhere in modern life, making them interpretable and trustworthy is a crucial task for researchers from different domains. As a statistician, I am passionate about contributing to the challenging problems in interpretable machine learning through statistical lens, e.g., statistical theory and inference methods.

Papers:

High-dimensional Networks Estimation in Time Series Models

High-dimensional autoregressive models can capture how the past eventsstatus associated with a huge collection of nodes influence their future eventsstatus, where the influence patterns can reveal underlying network structures. For example, the past firing of neurons may trigger or inhibit the future firings of their neighbors; past posts of a twitter user may also influence the likelihood of his/her followers to send new tweets. The influence network among these nodes can then be encoded by the high-dimensional autoregressive parameter. The estimation and testing problem for the underlying network structure imposes both methodological and theoretical challenges.

Papers:

  • Context-dependent self-exciting point processes: models, methods, and risk bounds in high dimensions
    Lili Zheng, Garvesh Raskutti, Rebecca Willett, Benjamin Mark
    Journal of Machine Learning Research. 2021 [Slides][Code]
    In this work, we propose two autoregressive models with corresponding methods and theory for learning context-dependent networks that reflect how features associated with an event (such as the content of a social media post) modulate the strength of influences among nodes. The multinomial approach we propose is suited to categorical marks and while the logistic-normal approach is suited to marks with mixed membership in different categories; a mixture approach is also proposed to combine the merits of both methods. The following figure provides a comparison among the three approaches.
    alt text

  • Testing for high-dimensional network parameters in auto-regressive models
    Lili Zheng, Garvesh Raskutti
    Electronic Journal of Statistics. 2019 [Code]
    Below is an example of the hypothesis testing results of our method on Chicago crime data, where the goal is to test which community's past crimes has significant influence upon another community's future crimes. All communities involved in significant edges are colored, showing geographical approximity.

alt text

Tensor Data Analysis

Tensor data has attracted wide interest in recent years since it contains valuable high-order information, while its high-dimensionality imposes numerous statistical and computational challenges. One of my research interest is to develop efficient and statistically accurate algorithms for solving real-world tensor problems.

Papers:

Non-convex Optimization

Non-convex optimization problems arise frequently from both modern machine learning algorithms (e.g., deep neural networks and Gaussian processes) and complex data structures (missing data). Although being challenging solely from an optimization perspective, these problems can often lend a helping hand from certain statistical modeling. I am interested in the intersection between statistics and optimization, especially when efficient non-convex algorithms can still exhibit strong statistical performance.

Papers: