Data Mining | Ying Wu College of Computing

Screen Shot 2024-01-25 at 12.36.12 AM.png

Dimitri Theodoratos
Associate Professor
dimitri.theodoratos@njit. edu

Research Areas: Data mining, pattern extraction

Mining and Summarizing Patterns from Large Trees and Graphs

Extracting frequent patterns hidden in trees and graphs is critical for analyzing data and a first step for downstream data mining. Most pattern-mining algorithms do not scale to big data applications. We have designed algorithms to extract patterns from large trees and graphs, leveraging results using compressed bitmap views.

Searching Structured and Semi-Structured Data with Keyword Queries

Disambiguating a user’s intention in posing a keyword query and efficiently retrieving relevant results is an immense challenge for keyword search when using big data. We have devised an approach that exploits a structural summary of the data to extract pattern graphs for keyword queries. This empowers non-expert users to extract information from data sources and databases without mastering a query language and without any knowledge of the organization or structure of data sources.

Screen Shot 2024-01-25 at 12.36.43 AM.png

Jason Wang
Professor
jason.t.wang@njit.edu

Research Areas: Data mining, machine learning, deep learning, data science

Mining Big Data Through Deep Learning

We are designing and implementing new deep learning algorithms and architectures for mining big data. We have developed a 3D-atrous convolutional neural network, used it as a deep visual feature extractor and stacked convolutional long short-term memory networks on top of the feature extractor. This allows us to capture not only deep spatial information but also long-term temporal information in the data. In addition, we use stacked de-noising autoencoders to learn latent representations of the data that construct feature vectors suitable for classification. We also develop new recurrent neural networks to mine time-series data for stock market forecasting and space weather prediction. Currently, we are building a deep learning framework with generative adversarial networks. This framework will be used for stochastic video prediction, image synthesis and image-to-image translation. The framework can handle model uncertainty as well as data uncertainty and sparsity. Our deep learning models are suited for big data applications that have few, incomplete, imperfect, missing, noisy or uncertain training data.

Screen Shot 2024-01-25 at 12.37.20 AM.png

Hua Wei
Professor
hua.wei@njit.edu

Research Areas: Data mining, reinforcement learning, urban computing

Learning Realistic Simulations from Real-World Data

This project aims to build a realistic traffic simulator by investigating data mining algorithms and provide solutions toward mimicking real-world simulations with real-world data. It focuses on the application context of learning to simulate the movements of humans, including human travelers or vehicles with human drivers, which essentially leads to a more realistic traffic simulator. The simulator of human movement is a starting point to build a city simulator. City simulators can be a valuable tool to quantify and optimize city policies, such as those for traffic signal control strategies. The city simulator can utilize multi-source urban data and advanced learning techniques to find better policies to make the city more sustainable.

Screen Shot 2024-01-25 at 12.38.14 AM.png

Brook Wu
Associate Professor
yi-fang.wu@njit.edu

Research Areas: Text mining, information extraction, information retrieval

Early Detection of Fake News on Social Media

A major challenge of effective and early detection of fake news is fully utilizing the limited data observed at the early stage of news propagation. We propose a novel deep neural network to detect fake news early, by combining user and post-based features into status sensitive crowd responses. Experimental results show that our proposed model can detect fake news with greater than 90% accuracy within five minutes after it starts to spread and before it is retweeted 50 times. Most importantly, our approach requires only 10% labeled fake news samples to achieve this effectiveness under PU learning settings. We plan to extend this work by incorporating additional social context data extracted from user interactions to further enhance user representations and prediction accuracy.

Neural Fake News Detection

Motivated by the inevitability of ‘neural’ fake news, we are working on building a framework to generate indistinguishable neural fake news stories. Our ultimate goal is to use them to augment fake news training data to accurately detect neural fake news stories. In our design, the framework will have three components: 1. Synthetic News Generation using a short claim as an input to neural language models. We are trying to resolve apparent contradictions and inconsistencies in the synthetic news generated by other approaches. 2. Deceptive Fake News Generation by using fact tampering attacks on the generated news or fact tampering attacks on the claim. 3. Neural Fake News Detection. Finally, we plan to use the generated fake news as training data, along with other publicly available true news, to train a neural fake news detection model.