
Student Research Opportunities
You're never too young to do research!
You're never too young to do research!
The professors of Ying Wu College of Computing are always looking for students to help them with their research.
Don't worry about "not being ready" for research. We have a 9th grade high school student taking part in Social Media research. Don't even worry if you are not sure what "doing research" means. You will learn it -- by doing it.
What you need is working knowledge of one common programming language (for example, Java, C, C++, Python, JavaScript or R). Some students even learn programming on their own, without taking a class.
You need one more thing: A good work attitude. Be on time for meetings, reply to email messages of your professor, tell your professor if you run into problems (including private problems) or if you don't understand your assignments, do what your professors assigns you, and make a little progress every week.
We professors do research because it is fun. Hopefully, you will also feel the fun and excitement of doing something new that nobody has ever done before.
Even more important, you can put your research on your resume. You can talk about it at job interviews and impress your interviewer, and it will prepare you for jobs where you are expected to do research. That widens the range of possible jobs you can take.
Projects can be done for payment (if funding is available) or for academic credit.
Contact the supervisor for more details and the timeline.
Academic credit can be obtained by enrolling in one of the following courses:
Computing the Global Minimum of a Continuous Function by Domain Subdivision
Building High Performance Cloud Infrastructures
Applications of AI in Digital Games and Creative Computing
Improving Smartphone Security and Reliability
Visualizing Complex Medical Terminologies
Dimensionality and Scalability Issues in High Dimensional Spaces
Analysis, Exploration and Visualization of Big Data for Traffic Congestion and Traveling Behavior Prediction
DeepPrivate - Differential Privacy Preservation in Deep Learning under Model Attacks
High-Assurance Open-Source Software for Cybersecurity and Cryptography
Blockchain Applications
NetExplorer
Deep Learning for DNA Sequence Pattern Recognition
Quantitative Stock Selection and Trading
Content Moderation and Online Harassment
Understanding Livestreaming and eSports
Write-and-Learn: Promoting Meaningful Learning Through Concept Map-Based Formative Feedback On Writing Assignments
Simulating Spatial Meme Diffusion
Building a Framework of Software and Hardware for Real-time Predictive Analytics on Social Networks
Interactive Cross-Reality (XR) Platform
Science and Engineering in Program Repair
Mining Big Data through Deep Learning
AI Deep Learning in Medical Image Analysis and Other Applications
Leveraging the power of algorithm design to combat discrimination in ridesharing
Scalable Graph Learning Algorithms
Comparing Information Sharing Behaviors Across Platforms
Identifying Crisis-Related Information Needs and Priority in Social Media
Mapping Political Ideology Across Online Spaces
Data Collection and Search Across Alternative Online Social Spaces
Improving Software Quality and Reliability using AI, Program Analysis, and Big Data
High Performance Algorithms for Interactive Data Science at Scale
Supervisor: James Calvin and Craig Gotsman
Email: james.m.calvin@njit.edu
Description: Many applications in computer science and engineering require the solution of an opti- mization problem, namely the minimization of some cost function, which would imply that the solution to the problem is the best possible. Examples include:
In constructing artificial neural networks, it is desirable to choose a set of weights to minimize training error.
In image registration, the goal is to align images (for example medical images taken of the same body area at different times) while minimizing a misalignment cost.
In data clustering, the goal is to partition data points into groups that are similar.
The goal is to find the global minimum of the cost function. Unfortunately, most cost functions have multiple local minima, and standard optimization algorithms are capable of finding only a local minimum of the cost, which may be quite worse than the global minimum. This project addresses the important problem of finding a global minimum of a continuous cost function by a suitable software algorithm.
The approach we propose to compute a global minimum is based on “searching” for the global minimum in the function domain by adaptively subdividing the domain, narrowing down the region of the domain where the global minimum is to be found. Thus, the subdivision becomes finer where the cost function is smallest. The figure on the right below depicts a subdivision into triangles for the cost function depicted on the left. A different subdivision scheme could be based on rectangles and recursive partitioning.
The cost of refining the subdivisions can grow rapidly with the dimension of the do- main. The purpose of this project is to develop efficient data structures and algorithms for subdivision refinement, and explore their use in optimization algorithms. The project will also investigate the application of the algorithms to different computing problems, possibly the ones listed above. The project will involve sophisticated software development in an object-oriented language.
Prerequisites: Experience in programming, basic knowledge in data structures and algorithms at the level of CS435.
Supervisor: Xiaoning Ding
Email: xioning.ding@njit.edu
Homepage: http://web.njit.edu/~dingxn/
Description: The project improves virtualization to allow programs in virtual machines to make efficient utilization of hardware resources, such as multicore processors and memory devices. The main approach is to create expressive interfaces to hardware resources, with which programs in virtual machines can obtain better knowledge and gain more control over the hardware resources to optimize performance. Three main tasks in the project are 1) creating expressive interfaces, 2) developing programs that can optimize performance utilizing the expressive interfaces, and 3) testing the performance of the programs.
Prerequisites: For students interested in task 1, skills of modifying and building Linux kernel are required. Students interested in task 2 must be able to write multi-threaded C/C++ programs on Linux systems. Students interested in task 3 are required to write scripts on Linux systems.
Supervisor: Amy Hoover
Email: amy.k.hoover@njit.edu
Homepage: http://amykhoover.com/
Description: I work on AI systems that together with humans, collaboratively make games, music, sound, or art. My research acknowledges that humans and computers excel in different areas of the creative process, and draws on these unique talents to build systems that harness the power of each. My work develops methods for designing games and facilitating human-computer collaborations that focus on solving problems in digital and creative domains. I am looking for motivated students interested in these domains. Programming competence in C#, Java, or Python is important.
Supervisor: Iulian Neamtiu
Email: iulian.neamtiu@njit.edu
Homepage: https://web.njit.edu/~ineamtiu/
Description: Users are increasingly relying on smartphones, hence concerns such as mobile app security, privacy, and correctness have become increasingly pressing. Prof. Neamtiu’s group is working on filling thisgap through tools that permit a wide range of software analysis for the Android smartphone platform, e.g., static analysis, dynamic analysis, record-and-replay or network traffic profiling. Our tools aim to analyze substantial, widely-popular apps (e.g., Yelp, Facebook) running directly on smartphones, and without requiring access to the app's source code. Our results include finding bugs in popular apps, high-fidelity record-and-replay, exposing risky URLs, self-healing apps, etc.
Prerequisites: Experience with Android or iOS development AND strong programing skills.
Supervisors: Yehoshua Perl, James Geller and Christopher Ochs
Email: james.geller@njit.edu
Homepages: http://cs.njit.edu/faculty/perl; http://web.njit.edu/~geller/
Description: Biomedical ontologies are large and complex knowledge representation systems. We have developed a software system called the Ontology Abstraction Framework (OAF) to create, visualize, and explore summaries of ontologies called abstraction networks. In our research ewe use abstraction networks to support comprehension of ontology structure, ontology quality assurance, and ontology change analysis. The OAFis composed of several modules, each of which enables the summarization of a different aspect of an ontology's structure. In a current project we are extending the OAF to support "Live Abstraction Networks," which summarize an ontology as a user is editing it.
Prerequisites: Students interested in working on the OAF project should have experience designing and developing software projects in Java. Experience with Swing, threads, JSON APIs, and Lambda are recommended. A strong background in CS theory, with a thorough understanding of trees and graphs, is required. Experience with Java IDEs, debuggers, profilers, and Git are a plus.
Email: vincent.oria@njit.edu
Homepage: https://web.njit.edu/~oria/index.htm
Description: For many fundamental operations in the areas of search and retrieval, data mining, machine learning, multimedia, recommendation systems, and bioinformatics, the efficiency and effectiveness of implementations depends crucially on the interplay between measures of data similarity and the features by which data objects are represented. When the number of features (the data dimensionality) is high, the discriminative ability of similarity measures diminishes to the point where methods that depend on them lose their effectiveness. We are investigating effective feature selection techniques, and their application to search and clustering.
Supervisor: Hai Phan
Email: hai.phan@njit.edu
Description: The long-term effects of traffic congestion may cost the U.S. government and American taxpayers hundreds of billions of dollars annually. Emissions of gases from billions of gallons of fuel lost in gridlock cause global warming and environmental degradation. Long commutes are associated with lower fitness levels, higher weight, and higher blood pressure, all of which are strong predictors of heart disease, diabetes, and different types of cancer. To slow or even reverse the trend of growing gridlock, accurately predicting traffic congestion and traveling behavior is desirable. It leads to more effective investment decisions for transportation improvements, which affect safety, environmental quality, economic development, quality of life, and lower health risks. This project aims at developing innovative solutions using cutting-edge technologiessuch as Internet of Things (IoT) and deep learning to analyze, explore, and visualize big data for traffic congestion and traveling behavior prediction. This project takes an integrated approach to (1) modeling andrepresenting dynamic traffic flow over time; (2) contextually predicting traffic congestion; (3) modeling and forecasting traffic influence networks (TINs), in which traffic conditions in one location may affect traffic conditions in other locations; (4) predicting traveling behavior, including changes in routes and departing time, in both long and short terms and (5) querying and visualizing large-scale urban transportation data.
Prerequisites: Strong in Python and have a background about data mining, machine learning. Students will have opportunities to work with cutting edge-technologies such as big data and deep learning in urban data science. Students are expected to implement and deploy practical tools to predict and visualize traffic congestion and human behavior in traveling.
Supervisor: Hai Phan
Email: hai.phan@njit.edu
Description: Today, the remarkable development of deep learning in medicine and healthcare domains presents obvious privacy issues, when deep neural networks are built based on patients' personal and highly sensitive data, e.g., clinical records, user profiles, biomedical images, etc. To convince individuals to allow that their data be included in deep learning projects, principled and rigorous privacy guarantees must be provided. However, no deep learning techniques have yet been developed that incorporate privacy protection against model attacks, in which adversaries can use released deep learning models to infer sensitive information from the data. In clinical trials, such lack of protection and efficacy may put patient data at high risk and expose health care providers to legal action based on HIPAA/HITECH law. This project will develop a mechanism, called "DeepPrivate," for privacy preservation in deep learning under model attacks. The PIs' mechanism will offer strong privacy protections for data used in deep learning. To put the DeepPrivateframework to work, fundamental challenges in differential privacy preservation in deep learning under model attacks need to be synergistically overcome. Consequently, this project will advance the state-of-the-art in key questions: (1) The framework design to preserve differential privacy in various types of deep neural networks; (2) The utility maximization of the models; (3) The potential model attacks in deep neural networks under differential privacy; (4) The information disclosure prevention approaches ; and (5) The multiparty computation protocols in deep learning under model attacks.
Prerequisites: Strong in Python and have a background about data mining and machine learning. Students will have opportunities to work with cutting edge-technologies such as deep learning and security and privacy in data science. Students are expected to implement and deploy practical tools in security and privacy in deep learning.
Supervisor: Kurt Rohloff
Email: kurt.rohloff@njit.edu
Homepage: https://web.njit.edu/~rohloff/
Description: Researchers in the NJIT cybersecurity research center have written and published one of the most advanced open-source libraries for post-quantum cryptography and homomorphic encryption. This library, called PALISADE, is being increasingly used by researchers and designers outside of NJIT. We have been taking pains to design the library using software engineering industry best practices. As the library and its user base grows, we seek to improve software quality through software testing using the Google C++ unit testing libraries. We are seeking motivated student with C++ experience who can help us implement unit tests using the Google C++ unit testing library for our open-source cryptography library.
Prerequsities: Experience with C++, such as in CS280
Supervisor: Qiang Tang
Email: qiang.tang@njit.edu
Homepage: https://web.njit.edu/~qiang/
Description: Blockchain technology is currently one of the potentially most disruptive technologies that foster re-shaping the new Internet infrastructure and re-building trust in open networks. We would like to re-consider many traditional settings and rebuild them on top of the decentralized infrastructures.
1a. blockchain based anonymous e-commerce platform. Decentralized market place enabling digital commerce without relying on a central party like Amazon or Ebay, furthermore, we would make the transactions anonymous but the participants can be hold accountable if dispute happens.
1b. blockchain based financial applications: exploring decentralized and autonomous financial services, such as prediction market, or new payment system.
1c. blockchain based IoT system: study how to tailor and re-design blockchain protocols for lightweight and/or fast-response IoT applications.
Prerequisites: Programming proficiency, and ability to learn new technology quickly. Some knowledge about cryptography and bitcoin / blockchain technology is a plus.
Supervisor: Jason Wang
Email: jason.t.wang@njit.edu
Homepage: http://web.njit.edu/~wangj
Description: The goal of NetExplorer is to develop a suite of algorithms, tools and web servers for inferring biological, social and transport networks using graph mining algorithms. Specifically, we design, develop, andimplement new software for (1) reconstructing networks using a data cleaning approach; (2) inferring networks using deep learning; (3) predicting missing links integrated, heterogeneous networks; and (4) reverse engineering networks using Big Data technologies such as Apache Spark and Hadoop in the cloud.
Prerequisites: The student is expected to collect and clean data. Depending on the student's background and expertise, the student will either implement his/her own algorithms or use existing tools to mine the data for network inference. Knowledge of data science languages such as Python, Java, R, Matlab, Hadoop or Spark is recommended, but not required.
Email: zhi.wei@njit.edu
Homepage: http://web.njit.edu/~zhiwei
Description: In this project, we will apply deep neural networks to recognize interesting regulatory patterns in DNA sequence. For sequence data, classical machine learning methods cannot operate on the sequence directly, and thus need to pre-define features for model training. Features can be extracted from the sequence based on prior knowledge. As a results, the success of these conventional methods heavily replies on human engineered features. Deep neural networks can help circumventing the manual extraction of features by learning them from data. In addition, deep neural networks can capture nonlinear dependencies in the sequence and interaction effects and span wider sequence context at multiple genomic scales. In the past few years, attesting to their utility, deep neural networks have been successfully applied to quite a few applications of DNA sequence mining. In this project, we will investigate how deep neural networks work using simulated and real data. We will design several testing cases for finding out the connection between key deep learning parameters and their impact on prediction performance.
Prerequisites: Python or Java programming; knowledge of machine learning.
Supervisor: Zhi Wei
Email: zhi.wei@njit.edu
Homepage: http://web.njit.edu/~zhiwei
Description: In this project, we will develop and use quantitative approaches to select/trade stocks. We will identify a candidate list of potential factors that may assist in predicting stock returns (valuation, growth, sentiment information from ER, twitter, etc.). The techniques we may use include Machine Learning, Data mining, Nature language Processing, and Time Series Data Analytics. We will collect and clean historical price data, financial report data and social media data. We will develop, implement and back test quantitative trading strategies over historical data.
Prerequisites: Python or R programming; have taken some quantitative courses (algorithms, statistics, etc). Finance knowledge is a plus but not required.
Supervisor: Yvette Wohn
Email: wohn@njit.edu
Homepage: socialinteractionlab.com
Description: This project involves using qualitative and quantitative methods to understand the work of volunteer content moderators and designing interventions to deal with online harassment.
Prerequisites: Survey and/or statistical data analysis experience, conducting interviews, analyzing qualitative data.
Supervisor: Yvette Wohn
Email: wohn@njit.edu
Homepage: socialinteractionlab.com
Description: Livestreaming and esports are relatively new cultural trends that are recreational activities that require high technology specifications. Current projects are aimed at understanding more about people's behaviors in these environments with the aim of developing better systems, such as 1) Understanding virtual currency/ economies, 2) Relationships between streamers and viewers, 3) Cultural practices in esports. Based on your expertise and/or interest, you will be conducting qualitative or quantitative research in a collaborative environment with researchers from other universities.
Prerequisites: Strong interest in topic matter, voracious reader, good at interacting with people.
Supervisor: Brook (Yi-Fang) Wu
Email: yi-fang.wu@njit.edu
Homepage: http://web.njit.edu/~wu
Description: The primary goal of meaningful learning is to deliver course content in innovative ways that allow students to learn and then apply. As a pedagogical strategy, Writing-to-Learn activities use writing to improve students; understanding of course content. We are developing an enhanced "Write-and-Learn" framework to generate automated formative feedback through comparing the concept maps constructed from teaching materials and students' writing assignments. Our work aims to (1) evaluate how effective the automated formative feedback is on the acquisition and development of conceptual knowledge, and (2) explore how such formative feedback can be utilized to scaffold and promote meaningful learning. We are developing Write-and-Learn system to generate automated formative feedback by taking advantage of the concept maps constructed from instructors' lecture notes and individual students' writing assignments to improve students' meaningful learning of conceptual knowledge in WTL activities. We are looking for students to participate in the design, development, maintenance, and evaluation of the research prototype, as well as the design and execution of the research studies in all facets of the project.
Prerequisites: Preferences will be given to students who (1) have solid skills and proven experience in building front-end web applications using HTML, CSS, and JavaScript; (2) have knowledge of working with server-side web application languages and frameworks (preference given to experience with Python and Django); (3) have some experience with relational database engines (e.g. MySQL, PostgreSQL,SQL Server); and (4) have a good understanding of general system administration and web security best practices.
Supervisor: Xinyue Ye and Shaohua Wang
Email: xinyue.ye@njit.edu, shaohua.wang@njit.edu
Description: The proliferation of online data and mapping technologies has greatly increased access to and utility of spatial decision support (SDS) systems in a wide range of application domains. Nevertheless, researchers working within academic, government, industry, and not-for-profit organizations recognize a number of challenges to improving their utility, including methods to share and synthesize digital resource objects (data, models, and workflows) and techniques to facilitate broader participation. This project will advance research critical to the development of open knowledge networks (OKN) through the combination and testing of participatory and automated ontology development processes. Three domain-specific case studies (wildland fire, water quality, and biodiversity conservation) will build on participatory Geographic Information System (GIS) and ontology development work through engagement of problem-focused stakeholder networks. At the same time, the utility of automated tools for resource discovery, ontology development, and social network analysis will be tested in these real-world problem environments. Through integration and comparison of these techniques, the project team will deliver insights into efficient and effective methods for OKN development.
Supervisor: Andrew Sohn
Email: andrew.sohn@njit.edu
Description: Social networks are graphs that change constantly to reflect the current state of the mindsets and behaviors of users, organizations, institutions, and even countries. Capturing major changes is critical to the success of social networks, as analytics would allow them to predict what may come next and therefore be prepared with possible actions. Capturing major changes in real-time, however, is a challenge. This project builds a framework comprised of both hardware and software to address the problem of capturing critical changes in social networks. In particular, an initial cluster of 16 machines is currently being built to compute changes in real-time, while spectral graph partitioning software is being designed and implemented for analytics on the initial cluster. The outcome of the project is a prototype framework to demonstrate that dynamic spectral graph partitioning on a cluster of machines can enable real-time predictive analytics on large-scale social-networks. The qualifications of the principal investigator for this project include the contributions to the NASA Ames Research Center as well as Lawrence Berkeley National Laboratory on large-scale graph partitioning for computational science under a NASA University Joint Venture faculty fellowship. For the efforts, the PI was recognized as a NASA Education Research Pioneer.
Supervisor: Margarita Vinnikov
Email: margarita.vinnikov@njit.edu
Description: Students will have an opportunity to be part of developing interactive Cross-Reality (XR) platform for serious games, industrial partners, and experimentation. Students will develop virtual and augmented reality and cross-modal multi-sensory user interfaces. They will device new techniques for improving navigation in XR (walking, driving, or/and flying) and new approaches of the head and body tracking for XR interaction. They will help in developing new gaze-contingent displays for the new generation of head-mounted displays. Finally, they will assist in data collection and data analysis for the XR systems that we have already developed in our lab.
Prerequisites: For students interested in my work would encourage to develop Unity 3D skills. Any other software that they would like to learn and experiment with for the purpose to create VR/AR/XR applications would also be supported. Alternative, students who are interested in data collection and analysis will be required to develop skills in MATLAB, RStudio, SPSS, or Excel macros.
Supervisor: Ali Mili
Email: mili@njit.edu
Description: For the past ten years, researchers in software engineering have been working on developing automated tools for program repair. We are interested in evolving theoretical foundations for this discipline, and to analyze the impact of these on the state of the art and the state of the practice in program repair.
Supervisor: Jason Wang
Email: Jason.t.wang@njit.edu
Description: We are designing and implementing new deep learning algorithms for mining big data. We have developed a 3D-atrous convolutional neural network, used it as a deep visual feature extractor, and stacked convolutional long short-term memory networks on top of the feature extractor. This allows us to capture not only deep spatial information but also long-term temporal information in the data. In addition, we use stacked denoising autoencoders to learn latent representations of the data, to construct feature vectors suitable for classification. Currently, we are building a semi-supervised deep learning framework with generative adversarial networks (GANs) for event prediction. Such a framework is suited for big data that have few, incomplete, imperfect, missing, noisy or uncertain training data.
Prerequisites: Experience with Python, TensorFlow, Keras or PyTorch.
Supervisor: Frank Shih
Email: shih@njit.edu
Homepage: http://web.njit.edu/~shih
Description: Artificial intelligence (AI) applied on medical image analysis for early detection, diagnosis, and treatment of diseases has been extremely important in human healthcare. Conventionally, meaningful features were primarily designed by human experts based on their knowledge of target domains. However, deep learning has relieved such obstacles by absorbing the feature engineering step into a learning step. Instead of extracting features in a hand-designed manner, deep learning requires only a set of data with minor preprocessing and then discovers the informative representations in a self-taught manner. My research group utilizes traditional AI techniques as well as the recently developed deep learning approaches for medical image analysis, ecological data classification, and other applications.
Prerequisites: The students are expected to have AI and image processing background. They can investigate existing algorithms or develop their own architectures to run on big data. Knowledge of data science languages, such as Python, Java, C, Matlab, is recommended, but not required.
Supervisor: Pan Xu
Email: pxu@njit.edu
Description: Ridesharing companies like Uber and Lyft have grown rapidly over the last decade. In ridesharing, riders arrive in an online manner, and once a rider arrives, the system will assign it to a nearby driver instantly, and then the driver will decide whether to accept the assignment or not. Additionally, both drivers and riders can cancel the assignment after the driver’s confirmation. Though drivers are made (technically) oblivious to riders’ sensitive information such as destination, gender, race, name, and photo before confirmation, they have devised strategies to get around it. This can lead to a wide range of gender- and race-based discrimination from drivers to riders in ridesharing. In particular, Ge et al. from the University of Washington reported that “In the Boston experiment, black Uber riders were much more likely than white riders to have a driver cancel on them after confirming, and the effect is especially pronounced for black men, whose cancellation rate was three times as high as white males.” The race-based discrimination becomes even worse, however, when it comes to taxi services. How do we design algorithms to combat discrimination in ridesharing? How much profit does the system potentially lose if required to maintain a high level of fairness? This project aims to solve these issues.
Prerequisites: Strong in Python or C++ or Java and have a background about algorithm design and analysis. Students are expected to process big online available data and implement state-of-the-art algorithms deployed in current ridesharing platforms.
Supervisor: David Bader
Email: bader@njit.edu
Homepage: https://cs.njit.edu/faculty/bader
Description: Deep learning has boosted the machine learning field at large and created significant increases in the performance of tasks including speech recognition, image classification, object detection, and recommendation. It has opened the door to complex tasks, such as self-driving and super-human image recognition. However, the important techniques used in deep learning, e.g. convolutional neural networks, are designed for Euclidean data type and do not directly apply on graphs. This problem is solved by embedding graphs into a lower dimensional Euclidean space, generating a regular structure. There is also prior work on applying convolutions directly on graphs and using sampling to choose neighbor elements. Systems that use this technique are called graph convolution networks (GCNs). GCNs have proven to be successful at graph learning tasks like link prediction and graph classification. Recent work has pushed the scale of GCNs to billions of edges but significant work remains to extend learned graph systems beyond recommendation systems with specific structure and to support big data models such as streaming graphs. This project will focus on developing scalable graph learning algorithms and implementations that open the door for learned graph models on massive graphs. We plan to approach this problem in two ways. First, developing a scalable high performance graph learning system based on existing GCNs algorithms, like GraphSage, by improving the workflow on shared-memory NUMA machines, balancing computation between threads, optimizing data movement, and improving memory locality. Second, we will investigate graph learning algorithm-specific decompositions and develop new strategies for graph learning that can inherently scale well while maintaining high accuracy. This includes traditional partitioning, however in general we consider breaking the problem into smaller pieces, which, when solved will result in a solution to the bigger problem. We will explore decomposition results from graph theory, for example, forbidden graphs and the Embedding Lemma, and determine how to apply such results into the field of graph learning. We will investigate whether these decompositions could assist in a dynamic graph setting.
Prerequisites: Strong interest in solving real-world data science problems, and programming in C or C++
Supervisor: Cody Buntain
Email: cbuntain@njit.edu
Homepage: http://inf.eco
Description: A lot of attention has been paid to coordinated, manipulative information campaigns across online social platforms, with examples like the Russian IRA and its use of Twitter and Facebook, or Iranian and Venezuelan accounts pushing an artificial agenda in Twitter. Building tools to detect these manipulation efforts is possible, but what happens when domestic actors use these same tools? Are such campaigns illegal or standard marketing tools? While such foreign influence campaigns are obviously different in intent, we do not know whether they are significantly different technologically or behaviorally from domestic political or marketing campaigns. In this project, based on your expertise and/or interest, you will develop data analytics tools to compare the behaviors of coordinated action and foreign influence campaigns in online social platforms. Example research includes quantifying similarities in text or images posted by politicians', marketers', and foreign influencers' accounts across platforms and media types.
Prerequisites: Experience in machine learning and/or data science tools
Supervisor: Cody Buntain
Email: cbuntain@njit.edu
Homepage: http://inf.eco
Description: Social media, blogs, and other online information sources contain large volumes of data, especially during and in the aftermath of crises. During these times of stress, individuals often turn to social media for social support, but those directly affected by these events also rely on these platforms to share critical information and request assistance. Correctly identifying such critical information, across media types/platforms, and doing so rapidly represent unsolved problems in the crisis informatics field. In this project, based on your expertise and/or interest, you will develop machine learning models for classifying social media data according to a pre-specified set of information types and priorities. These models may process text, images, links, or some subset of these media types during classification/prioritization. The system you develop may then participate in the annual Text Retrieval Conference's Incident Streams track.
Prerequisites: Experience in machine learning and/or data science tools
Supervisor: Cody Buntain
Email: cbuntain@njit.edu
Homepage: http://inf.eco
Description: People increasingly rely on online social spaces (e.g., Facebook, Twitter, Reddit, or YouTube) for political information. Simultaneously, some claim these spaces are being used to radicalize individuals, spread misinformation, and increase polarization. While existing research has explored methods to quantify ideology and extremism in Twitter, users' ideologies in platforms like YouTube and Reddit are understudied. Given the role such platforms (especially YouTube) play in the online information ecosystem, critical questions are left open about the political lean of these platforms and how they have evolved over time. Based on your expertise and/or interest, you will explore methods for transferring models of political ideology across platform boundaries and media types and quantifying temporal changes in ideological distribution in a collaborative environment with researchers from other universities.
Prerequisites: A background in programming, data science, and (preferably) statistical analysis
Supervisor: Cody Buntain
Email: cbuntain@njit.edu
Homepage: http://inf.eco
Description: As part of a larger effort to study how information moves across platforms and in the larger information ecosystem, we are developing a cross-platform search infrastructure to support researchers. While we already have data collection pipelines for Twitter and datasets for Reddit, we are looking to develop a pipeline to collect and search data from political forums on 4chan and 8chan and other alternative online spaces as well. While several scraping tools exist to harvest images posted to boards on 4chan, we are specifically interested in the textual content posted to politically relevant boards such as 4chan’s /pol/ and creating an archive of that content for comparison against Twitter, Reddit, and others. Based on your expertise and/or interest, you will develop collection pipelines and/or search infrastructure to support interactively querying this data. You will work in a collaborative environment with researchers from other universities and disciplines to understand their needs.
NB: Data posted in alternative online platforms often can be extremely offensive and upsetting. Every effort will be made to support the emotional and mental well-being of students involved in this project. If you are uncomfortable with this sort of content but are still interested in the project, we can work together to minimize exposure.
Prerequisites: Familiarity with web programming, JSON, REST APIs, the Linux command line, and potentially ElasticSearch.
Supervisor: Shaohua Wang
Email: shaohua.wang@njit.edu
Description: Improving software quality and reliability is a never-ending demand. Several approaches have been introduced to help developers in detecting and fixing software defects to improve software quality, ranging from static approaches (e.g., program analysis, bug detection, bug prediction, model checking, validation and verification, software mining, etc.) to dynamic approaches (e.g., testing, debugging, fault localization, etc.). Our current research is working on combining cutting-edge AI-based models and program analysis techniques to improve the detection of source code defects and vulnerabilities and automatically fix them. Our recent studies have shown than our AI-based detection and auto-fix approaches can significantly outperform all of the state-of-the-art techniques on the market. In this project, students will be able to touch the following topics, but not limited to, (1) Program Analysis (e.g., code modeling and transformation); (2) Complex Networks for modeling software systems; (3) AI (especially deep learning neural networks); and (4) Software Engineering/Analytics (e.g., data mining and big data in SE, bug tracking systems).
Prerequisites: Knowing Python is a must and other program skills are highly appreciated. Having software development and AI experience is a plus.
Email: bader@njit.edu
Description: A real-world challenge in data science is to develop interactive methods for quickly analyzing new and novel data sets that are potentially of massive scale. We seek students who wish to design and implement fundamental algorithms for high performance computing (HPC) solutions that enable the interactive large-scale data analysis of massive data sets. Based on the widely-used data types and structures of strings, sets, matrices and graphs, our methodology will produce efficient and scalable software that will drastically improve the performance on a wide range of real-world queries or directly realize frequent queries. These innovations will allow us to move massive scale data exploration from time-consuming batch processing to interactive analyses that give a data analyst the ability to comprehensively, deeply and efficiently explore the insights and science in real world data sets. The scalability to high performance computing systems and optimal algorithmic selection based on multiple criteria such as time and space complexity will be incorporated into our methodology to address the requirements when exploring large scale data sets. Specifically, we are focusing on these three important data structures for data analytics: 1) suffix array construction, 2) treap construction and 3) distributed memory join algorithms, useful for analyzing large scale strings, implementing random search in large string data sets, and generating new relations, respectively. These fundamental algorithms serve as the cornerstone to support interactive data science at scale. To evaluate and show the effectiveness of the proposed algorithms, these algorithms will be implemented in and contribute to an open source NumPy-like software framework Arkouda that aims to provide productive data discovery tools on massive, dozens-of-terabytes data sets by bringing together the productivity of Python with world-class high performance computing. Together, this work will allow Python-trained programmers to readily use HPC resources, lowering the barrier making data scientists productive at massive scales.
Prerequisites: Strong interest in solving real-world data science problems, and programming in C or C++.