Cybersecurity, Data Mining, Machine Learning, and Health Intelligence.
Current Research Projects
- Combat COVID-19: AI-driven Techniques for Community-level Risk Assessment and Community Resilience Improvement
- Intelligent Malware Detection and Adversarial Machine Learning
- Securing Cyberspace: Gaining Deep Insights into the Online Underground Ecosystem
- Mining Large Scale and Dynamic Heterogeneous Networks to Combat Opioid Crisis and Reduce Opioid Overdose Risks
It is believed that the novel virus which causes COVID-19 emerged from an animal source, but it is now rapidly spreading from person-to-person through various forms of contact. According to the Centers for Disease Control and Prevention (CDC), the coronavirus seems to be spreading easily and sustainably in the community - i.e., community transmission which means people have been infected with the virus in an area, including some who are not sure how or where they became infected. Before a vaccine or drug becomes widely available, community mitigation, which is a set of actions that persons and communities can take to help slow the spread of respiratory virus infections, is the most readily available interventions to help slow transmission of the virus in communities. A growing number of areas reporting local sub-national community transmission would represent a significant turn for the worse in the battle against the novel coronavirus, which points to an urgent need for expanded surveillance so we can better understand the spread of COVID-19 and thus better respond with actionable strategies for community mitigation. In practicing community mitigation, there is still a need for groceries, medical supplies, etc., requiring travel and visits to local establishments. In doing so, we all have the opportunity to make choices on where we go and what establishments we visit to meet our daily needs. To assist with making an informed decision, as an initial offering, we have proposed and developed a system (named alpha-Satellite) to provide users with up-to-date community-level risk assessment in the United State. The system advances capabilities of artificial intelligence (AI) to estimate risk indices associated with a given area, by leveraging the large-scale and real-time data obtained from multiple sources including disease related data from official public health organizations and digital media, demographic and mobility data, and social media data. The available data is automatically analyzed and combined by the system to provide actionable information to users, by local area, to assess the potential risk of traveling to a specific area. After we launched our system to the public for beta test on April 20, it had attracted 42,546 users in the first week. The large number of its users indicate the high demand from the public for effective computational tools to assist people with actionable strategies. The system has receiving a lot of good feedback from the media and users on the ease of use as well as the utility of the relative risk estimation. The developed system, paper and generated benchmark datasets have made publicly accessible through our website. We are continuing our efforts to expand the data collection and enhance the system to help combat the fast evolving COVID-19.
The COVID-19 pandemic has also exposed a critical set of vulnerabilities that have impacted community resilience in responding to escalating societal, economic, and behavioral issues. Unfortunately, there are no established solutions or proven models for us to depend on to tackle the complex challenges with significant uncertainties and unknowns. To help address the devastating effects caused by COVID-19, by advancing AI innovations, we will extend our efforts on the development of an AI-driven paradigm for collective and collaborative community resilience in responses to a variety of crises and exposed vulnerabilities in the COVID-19 era and beyond.
NSF IIS/IIS-RAPID Awards (PI)
Google Maps Platform
Real-Time Risk Assessment Tool Could Aid Reopening Measures ( MetroLab Innovation of the Month, May 2020)
Malware (short for malicious software) is a generic term to denote all kinds of unwanted software (e.g., viruses, trojans, worms, bots, ransomware, and cryptojacking). It has been used as a major weapon by the cybercriminals to launch a wide range of attacks that cause serious damages and significant financial losses to many Internet users. To protect legitimate users from these attacks, the most significant line of defense against malware is anti-malware software products, which traditionally used signature-based methods to recognize threats. However, driven by considerable economic benefits, malware attackers are using automated malware development toolkits to quickly write and modify malicious codes that can evade detection by anti-malware products. In order to remain effective, the anti-malware industry calls for much more powerful methods that are capable of protecting the users against new threats and are more difficult to evade. To combat the evolving malware attacks, systems applying machine learning techniques have been successfully deployed and offer unparalleled flexibility in automatic malware detection. In these systems, based on different feature representations, various kinds of classifiers are constructed to detect malware. Unfortunately, as classifiers become more widely deployed, the incentive for defeating them increases. With long-term and strong collaboration with our industrial partners, this project will design and develop intelligent and resilient solutions against malware attacks, at both feature and model levels. Furthermore, the proposed techniques will also be designed to be arms race capable, and can be used in other cybersecurity domains, such as anti-spam, fraud detection, etc.
The importance of cybersecurity can hardly be understated, especially during the global pandemic we are facing. As many of social activities have moved online, society's overwhelming reliance on the complex cyberspace makes its security more important than ever. Unfortunately, utilizing both fear and financial incentives, cyber threat actors are using COVID-19 or coronavirus as a lure all over the spectrum of sophistication to spread malware to gain profits from the pandemic. To better protect users in the cyberspace, we continue our efforts on the development of innovative links between AI and security to design and develop an intelligent framework for COVID-19 themed malware detection to help mitigate its negative effects on public health, society, and the economy.
NSF SaTC-RAPID/TWC/SaTC/CICI Awards (PI)
Cybercrime has become more and more dependent on the online underground ecosystem which has evolved into a complex and increasingly decentralized system that has an incentive to prevent infiltration. This forces cybersecurity researchers and industry practitioners to reconsider fire-fighting behavior. Built on the our prior work and strong collaborations with industry partners, we aim to design and develop an integrated framework (algorithms, scalable techniques) for in-depth investigation of the online underground ecosystem and thus to help secure cyberspace by producing data-driven intervention of cybercrime. We have developed our own web crawlers to collect the data from underground markets emerging in the forms of online mediums (e.g., underground forums, dark webs). By July 2018, we have crawled the data from four underground forums (e.g., Blackhat, Hack Forums, Nulled, etc) including 508,876 threadswith 8,232,550 posts corresponding to 725,449 users; we have also successfully collected the data from the dark webs (e.g., Dream Market) including products of crimeware and crimeware-as-a-service (CaaS). We have also manually annotated 62,512 threads posted by 5,312 users in Hack Forums as the ground-truth for automatic detection of cybercrime-suspected threads.
NSF Career Award (PI)
As opioid overdose deaths have continued to increase over the past decade across the country, it is critical to understand the drugs involved in those deaths and the potential role of polypharmacy (i.e., the concurrent use of multiple medications) in opioid overdose deaths. However, due to the formidable complexity of drug-drug interactions (DDIs) arising from polypharmacy, it is challenging if not impossible to count them all manually. Therefore, there is an urgent need for developing novel computational methodologies and models for early detection of risky DDI patterns when opioids are combined with other drugs (e.g., sedatives, muscle relaxants, anti-anxieties). Since relying on a single data source for biomedical knowledge discovery often results in unsatisfactory performance, the goal of this project is to design and develop a novel and integrated framework (algorithms, models, and techniques) to construct a heterogeneous network built from multiple data sources and extract useful information from the constructed network to reduce the risk of opioid overdoses resulting from polypharmacy. In addition, based on the large-scale data generated from social media and darknet, we aim to advance capbilities of artificial intelligence (AI) to further combat opioid epidemic and online trafficking. These works are greatly supported by the National Science Foundation and the National Institute of Justice.
DoJ/NIJ Award (PI)
NSF IIS Award (PI)
Former Research Projects
- Phishing Fraud Detection
- Smart Devices for Children's Safety
Phishing is a form of online fraud, whereby perpetrators adopt social engineering schemes by sending emails, instant messages, or online advertising to allure users to phishing websites that impersonate trustworthy websites in order to trick individuals into revealing their sensitive information (e.g., financial accounts, passwords, and personal identification numbers) which can then be used for profit. To defend against phishing websites, security software products generally use blacklisting to filter against known websites. However, there is always a delay between website reporting and blacklist updating. Indeed, as lifetimes of phishing websites are reduced to hours from days, this method might be ineffective. In our study, resting on the webpage content and its related information, we propose a principled cluster ensemble framework to integrate different clustering solutions for phishing fraud detection.
In recent years, crimes against children and the cases of missing children have been increased at a high rate. Therefore, there’s an urgent need for safety support systems to prevent crimes against children or for anti-loss, especially when the parents are not around with their children, such as the children on their ways to and back from schools. In collaboration with our industrial partner, in this project, based on the children’s location histories reported by the smart devices the children wear, we explore the children’s life patterns which capture their general life styles and regularities, and apply big data analytic techniques to learn the safe regions as well as safe routes of the children. When the children are under potential dangers, their parents or guardians will receive automatic notifications. We also explore an effective energy-efficient positioning scheme for the smart devices which leverages the location tracking accuracy of the children while keeping energy overhead low.