
Machine Learning, Deep Learning and Reinforcement Learning
Our research spans a wide range of topics in machine learning and deep learning, including machine learning theory, reinforcement learning, graphical neural networks, adversarial learning, interpretable machine learning, federated learning and natural language processing. Our primary goal is to advance the state-of-the-art in machine learning theory and applications, addressing real-world challenges across various industrial and scientific domains. Our application fields include FinTech, Blockchain, transportation, urban computing, bioinformatics, neuroscience, social network analysis and cyber-physical system design. Many machine learning faculty members are also core faculty of the Center for AI Research (CAIR). Our research has received broad support by different agencies, including NSF, DOE, AFOSR, DOT, NIH, DOD and industry partners. The results are consistently published in high-impact journals and top-tier machine learning and artificial intelligence conferences, such as Nature Machine Intelligence, Nature Communications, IEEE Transactions on Neural Networks and Learning Systems (TNNLS), NeurIPS, ICML, KDD, AAAI, IJCAI and ICDM.
Jing Li |
Research Areas: Real-time systems, parallel computing, cyber-physical systems and reinforcement learning for system design and optimization Reinforcement Learning-Based System Design The design space of modern complex systems is increasingly large. Finding good designs often involves solving mixed-integer optimization problems that are highly intractable. Our research develops reinforcement learning-based frameworks with graph neural networks and active learning techniques to intelligently and efficiently find good designs from the huge design space. We have applied our frameworks to various system design tasks, including resource allocation in cyber-physical systems and circuit design. |
Akshay Rangamani |
Research Areas: Deep learning, signal processing, computational neuroscience, neural collapse Sparse and Low Rank Structures in Deep Networks Deep learning has demonstrated great success across tasks in diverse domains that involve understanding image, audio and text. However, a mathematical description of how and why deep learning works and how to reliably generate models that generalize well is still incomplete. We aim to provide a description of how deep networks learn sparse and low rank features in their layers and how these structures can be used to provide generalization guarantees for deep networks. Neural Collapse is one such low rank structure among others that can provide this description. This project aims to characterize the conditions under which Neural Collapse emerges and how it can provide generalization guarantees. Fine grained control of Multimodal Models through Neural Collapse Neural Collapse (NC) is an emergent phenomenon of training deep networks that describes low rank structures that emerge in deep network layers. We will use the tools of NC analysis to improve the supervised fine-tuning step of training multimodal models. NC analysis can pinpoint the layers to be adapted, hence saving memory and time. It can also accelerate Supervised Finetuning (SFT) by guiding training towards the desired classifier geometry. We will also explore how Neural Collapse geometry can be used in two contexts: 1) continual learning of new concepts and 2) unlearning of concepts that need to be deleted. Identifying simple structures in these models through NC geometry can help accelerate these tasks. |
Usman Roshan |
Research Areas: Machine learning, medical informatics Adversarial Robust Machine Learning With 0-1 Loss Machine learning models today are highly accurate but not very robust. They can be fooled to misclassify data with minor perturbations known as adversarial attacks. Adversaries targeting a given convex model are known to affect other convex models. We find this transferability phenomenon to be less effective between 0-1 loss and convex losses such as hinge and logistic, both of which are approximations to 0-1 loss and known to be affected by outliers. Consequentially, it is harder to attack 0-1 loss models with convex substitute model black-box attacks and when the black-box attacker is 0-1 loss, the attack is highly ineffective on all models. Based upon these observations, we are researching novel algorithms and design implementations for scalable and faster 0-1 loss models. |
Hai Phan |
Research Areas: Social network analysis, machine learning, spatio-temporal data mining Ontology-Based Interpretable Deep Learning Machine learning models are trained with large amounts of data and achieve a certain level of competency in interpreting and classifying new input data. However, even if they work well, it can be difficult to explain why. Lingering doubt persists that, in some situations, the classification output of the model might be wrong. In applications such as self-driving cars, this could have spectacularly negative consequences. We tie predictions of the model to a set of keywords taken from a predefined vocabulary of relevant terms. The number of words hard-coded into the model that influence the outcome produced by a machine learning for a new input is reduced and those words are taken from a limited and relevant ontology. This makes the output of the model easier to interpret as it becomes independent from terms that are irrelevant to the application domain |
Guiling “Grace” Wang |
Research Areas:Applied Deep learning, AI in Finance, AI in Transportation, LLM Deep Reinforcement Learning for Intersection Control Reinforcement Learning-based traffic signal controllers can adaptively adjust signals based on real-time demand. To learn a good policy, we have proposed combining a dueling network, target network and double Q-learning network with prioritized experience replay technology in one signalized intersection. It has proven to be a successful attempt to stabilize the learning process and mitigate over-estimations of the learning agent. To further guarantee the safety of vehicles, we have incorporated domain safety standards in the above RL-based traffic signal controller. Nonetheless, the small proportion of the collision data makes this problem extremely challenging, not to mention that the learning agent might not obey the safety rules in practice. We thus, instead of letting the RL model learn by itself, incorporate domain safety standards into different parts of the RL model (i.e., action, state, loss function and reward function) as a safety shield. This safetyenhanced approach can guide and correct the RL agent toward learning a much safer action and dramatically lower the collision rate. Meanwhile, we not only focus on one intersection but also consider multiple intersections as a network to make a smoother driving experience. By enabling cooperation among intersections, each RL agent learns to communicate with others to spread out the traffic pressure quickly. Intensive experiments show the effectiveness of our systems. |
Mengjia Xu |
Research Areas:Machine learning, LLMs for dynamical systems (e.g., climate modeling), causal inference, AI for Neuroscience Fully Hyperbolic Graph Neural Networks for Brain Aging Trajectory Detection Characterizing age-related alterations in brain networks is crucial for understanding aging trajectories and identifying deviations indicative of neurodegenerative disorders, such as Alzheimer’s disease. In this study, we developed a Fully Hyperbolic Neural Network (FHNN) to embed functional brain connectivity graphs derived from magnetoencephalography (MEG) data into low dimensions on a Lorentz model of hyperbolic space. Using this model, we computed hyperbolic embeddings of the MEG brain networks of 587 individuals from the Cambridge Centre for Ageing and Neuroscience (Cam-CAN) dataset. Notably, we leveraged a unique metric—the radius of the node embeddings—which effectively captures the hierarchical organization of the brain, to characterize subtle hierarchical organizational changes in various brain subnetworks attributed to the aging process. Our findings revealed that a considerable number of subnetworks exhibited a reduction in hierarchy during aging, with some showing gradual changes and others undergoing rapid transformations in the elderly. Moreover, we demonstrated that hyperbolic features outperform traditional graph-theoretic measures in capturing age-related information in brain networks. Overall, our study represents the first evaluation of hyperbolic embeddings in MEG brain networks for studying aging trajectories, shedding light on critical regions undergoing significant age-related alterations in the large cohort of the Cam-CAN dataset. Towards Efficient Edge-Aware Dynamic Graph Embedding with Mamba Dynamic graph embedding is crucial for modeling time-evolving networks across various domains. While transformer-based models capture long-range dependencies in temporal graphs, they struggle with scalability due to quadratic complexity. Our study compares transformers with the Mamba architecture, a state space model with linear complexity. We developed three hybrid models: TransformerG2G with graph convolutional networks, MambaG2G and MambaG2G enhanced with graph isomorphism network edge convolutions. Experiments show that Mamba-based models offer comparable or superior performance to transformers in link prediction tasks while being more computationally efficient, especially on longer sequences. MambaG2G consistently outperforms transformers on variable datasets like UCI, Slashdot and Bitcoin and remains competitive on stable graphs like SBM. Additionally, Mamba-based models provide interpretable insights by analyzing attention weight and state matrices, advancing efficient temporal graph representation learning for applications like climate modeling, finance and biological systems. |
Lingxiao Wang |
Research Areas: Privacy and Security in Machine Learning, Collaborative Learning (Distributed and Federated Learning), Optimization for Machine Learning, Deep Generative Models (e.g., Diffusion Models) Towards the Next Generation of Federated Learning System for Foundation Models Foundation models (FMs), such as ChatGPT and Diffusion Models, have driven significant breakthroughs across various fields, particularly in text and image generation. The success of these FMs depends on extensive training costs and large amounts of data. However, modern data generation — from individuals’ personal devices, in smart homes and cities, within hospitals or financial institutions — fundamentally changes machine learning pipelines from classical scenarios, where we view data as a sample from a single large underlying population. These new data modes result in heterogeneous siloed data residing in the devices or organizations that generated it. Such a shift presents new challenges in the development and deployment of FMs, such as improving the accuracy and efficiency of training across siloed data, mitigating risk and protecting data privacy and ownership and incorporating social and economic principles that incentivize data sharing and provide trustworthy cooperative learning schemes. Our research aims to develop the next generation of federated learning systems that enables coordinated, efficient, secure and trustworthy training of FMs across multiple parties and diverse data sources. |
Jason Wang |
Research Areas: Data mining, machine learning, deep learning, explainable AI, generative AI, trustworthy AI, data science Cyberinfrastructure-Enabled Interpretable Machine Learning Cyberinfrastructure (CI) enabled machine learning refers to new computing paradigms such as machinelearning-as-a-service (MLaaS), operational near realtime forecasting systems and predictive intelligence with Binder-enabled Zenodo-archived open-source machine learning tools, among others. These computing paradigms take advantage of advances in CI technologies, incorporating machine learning techniques into new CI platforms. In this project we focus on interpretable machine learning where we attempt to explain how machine learning works, why machine learning is powerful, what features are effective for machine learning and which part of a testing object is crucial for a machine learning model to make its prediction. Methods, techniques and algorithms developed from this project will contribute to advancements of CI-enabled predictive analytics and explainable artificial intelligence in general. |
Lijing Wang |
Research Areas: Machine Learning, deep learning, epidemic modeling/forecasting, clinical NLP, ML in Social Science Exploring Spatial, Temporal and Semantic Patterns for Graph Neural Networks The project focuses on the exploration and integration of spatial, temporal and semantic patterns within the framework of Graph Neural Networks (GNNs). As GNNs have become a powerful tool for processing and learning from data structured as graphs, there is growing interest in extending their capabilities to better handle complex data that exhibits not only topological relationships but also dynamic temporal behaviors and rich semantic information. This project aims to advance the state-of-the-art in GNNs by investigating methods to effectively capture and utilize these three dimensions—spatial, temporal and semantic—to enhance performance across various applications, such as social network analysis, traffic prediction and recommendation systems. Scalable and Robust AI Models for Information Extraction Extracting information from EHRs has been an active area of research in recent years due to the advances in natural language processing (NLP) techniques. Large language models (LLMs), such as Bidirectional Encodings Representations from Transformers (BERT), have achieved state-of-the-art performance for a variety of NLP tasks through either pretraining a domain-specific LLM from scratch or fine-tuning a general domain pretrained LLM on a domain-specific dataset. Our research will investigate advanced scalable and robust LLM architectures for improved generalization, adaptability and efficiency for information extraction from EHRs and general text information |
Shuai Zhang |
Research Areas: Parameter-efficient transfer learning, LLMs, Weight analysis, LoRA, Task vector, Machine unlearning, Multi-task learning Parameter-efficient Transfer Learning Large pre-trained models have become foundational modules in modern machine learning systems. In the pre-training and fine-tuning paradigm, traditional full-parameter fine-tuning demonstrates superior performance but results in significant computational inefficiency. Our goal is to systematically analyze the attributes of the weights in pre-trained models with theoretical guarantees and design a parameterefficient transfer learning approach to adapt models to specific requirements. To meet the diverse dimensional requirements of pre-trained models, we also aim to design a modular architecture that allows different components to be integrated orthogonally, enabling rapid adaptation without the need for extensive finetuning. |
Zhi Wei |
Research Areas: Machine learning, statistical modeling, bioinformatics Explainable AI for Unsupervised Learning In recent years, explanation techniques have emerged and received a lot of attention. In supervised learning settings, it emphasizes the ability to correctly interpret a prediction model’s output. Most existing explanation works have been focused on supervised learning, such as LIME and SHAP. Yet very little work has been done in an unsupervised learning setting. Our goal is to explain why and how a sample is assigned to a specific cluster in unsupervised learning. Specifically, we would like to know which features are portrayed as contributing to a cluster, or evidence against it. With this information, a practitioner can make an informed decision about whether to trust the model’s cluster assignment. There is also a so-called “double use of data” problem when trying to find discriminative features that distinguish the resultant clusters. We will apply the new methods to analyze finance and accounting data. |