SciTech

Cybersecurity tool keeps users safe

Credit: Matthew Guo/ Credit: Matthew Guo/

There are some websites that you probably know you should not click on. They might have a weird domain name, or ads that tell you about hot singles in your area, or other seedy but innocuous symbols of internet malfeasance. The website's real purpose, however, is to give your computer some sort of nasty malware, or to obtain your personal information.

Unfortunately, users often visit these websites anyway. To combat exposure to risky domains, one common strategy is to blacklist sites that have been flagged as known malware or phishing URLs. Services such as Google Safe Browsing maintain a list of dangerous sites and warn users when they try to access them.

A study titled "Predicting Impending Exposure to Malicious Content
from User Behavior"
from Carnegie Mellon's CyLab and Japanese telecommunications company KDDI, improves on this model by building a system that attempts to predict when users are about to attempt to access risky sites, allowing more time for possible intervention.

Ph.D. candidate and co-author Mahmood Sharif told The Tartan that the '"traditional defenses like anti-viruses and blacklists" are often reactive rather than proactive, which can put the user at risk because "by the time they react, users are often about to execute malicious programs or visit risky websites." Furthermore, current systems operate as the "last line of defense" with very few protections after a user chooses to bypass a warning.

The researchers, including Sharif, Carnegie Mellon Professor Nicolas Christin, and Jumpei Urakawa, Ayumu Kubota, and Akira Yamada from KDDI research, used browsing data from over 20,000 KDDI mobile phone users who opted to help build and train their model.

They found that 11 percent of users were, in fact, exposed to sites in the Google Safe Browsing database at some point, and that exposed users had some meaningful differences from users that were not, both in terms of their behavior in their browsing sessions and their answers to behavioral questions on a survey.

Users who were exposed to malicious sites tended to browse longer and later, requesting more web pages than unexposed users. Users who frequented advertising or adult websites were more likely to be exposed.

Somewhat counterintuitive, the study found that the strongest effect in the survey responses was whether users reported they had anti-virus software installed, perhaps because it allows users to be more confident when proceeding to view pages flagged as risky. Users who stated that they tended to proceed past browser warnings also were more likely to be exposed, which suggests that some other measures may be needed to deter risky behavior.

The system the researchers built, according to Sharif, "leverages cues from users’ browsing behavior that are indicative of future exposure,” like the categories of websites they are visiting and the rate of content downloading and uploading.

The study uses this system to try and predict if a browsing session will result in exposure, which it did with around a 90 percent true positive rate and a one percent false positive rate. This low false positive rate was only achieved once the researchers expanded their definitions of true positives to include websites marked as risky within the next year. "Too many false positives can disrupt users’ normal (and safe) browsing, and may lead the system to be unusable," said Sharif.

The code used to train the neural networks has been released, but the training data has not, due to privacy concerns.

The authors of the paper hope that systems like these will give users more time to think about the risk of accessing dangerous websites or allow networks where a compromised computer could be catastrophic to terminate a browsing session before exposure. This study pokes at the squishy vulnerability that all our personal devices suffer from — risky behavior from the human operating it.