Classifying Actionable Cyber Threat Intelligence

By: Lori Cole, Threat Intelligence Consultant at Recorded Future

TLDR: A decision tree bagging ensemble model was the most accurate in classifying actionable cyber threat intelligence. Why? My guess: bagging builds a forest of decision tree models each using a different slice of the training set; the individual results are averaged to produce a consensus result. And, I will now have model minions to triage my threat intel.

Introduction

Cyber threat intelligence focuses on providing actionable information on adversaries, ex- ploits, and vulnerabilities. This information is becoming crucial to cyber defense, as malware and threat actors evolve, evade detection, and obfuscate analysis. The need for threat intelligence has resulted in investment and creation of many new sources of information; however, this has created challenges of its own as analysts are overwhelmed with information and burdened with triaging threat intelligence in order to make time-sensitive defensive decisions. Assessing threat intelligence noise for credibility and accuracy compounds the problem. Cyber threat intelligence analysts curate sources of information from their respective professional sector, technical communities, and threat exchange platforms, but are still faced with scoring the information for relevance, importance, and potential impact to inform decision-makers on defensive strategy (The Cyber Threat).

Implementing a threat intelligence analysis process is an essential component of active, informed security. Actionable intelligence is key to understanding threats for defensive mitigation (CSO). These intelligence information needs can be satisfied by leveraging classification mechanisms which inform binary decisions regarding aggregated threat intelligence (e.g., ‘actionable’ or ‘irrelevant’). Some organizations try to incorporate threat data feeds into their Security Information and Event Management System (SIEM), but do not to classify, prioritize, or categorize the data, adding to the burden of analysts who may not have the analytic tools (or time) to decide what to action and what to discard (RecordedFuture). Cyber threat intelligence models can address this issue. A viable solution would use machine learning to prioritize intelligence data to ensure resultant threat intelligence is actionable (timely, relevant, credible). This research explores the feasibility of decision models which will inform cybersecurity analysts of relevant threats affecting their environment. By considering a multi-variate applicability score via an ensemble of models, security analysts can enhance the quality of intelligence ingested into the SIEM and prune threat intelligence feeds to mitigate noise and false positives. A classified threat intelligence feed enables security analysts to leverage massive community enriched data for manageable integration into a SIEM, and more importantly, to inform defensive decisions to adequately protect an organization against discovered threats. These models will improve upon threat feeds to systemically optimize threat alerting via a SIEM. Various models will be built, assessed, and compared for optimal selection of viable models for trial implementation.

The dataset identified for this analysis is robust and appropriate for developing classifica- tion models that satisfy this research need. It contains malicious hash signatures, including MD5, SHA1, and SHA256 from a threat intelligence aggregator (OPSWAT) which have been identified on the networks of community users within the last 24 hours of feed export. The dataset is also composed of threat intelligence vendor enrichment (CREDIBILITY, DATE_PUBLISHED), threat detectability (number of anti-virus alerts), file type (CATEGORY, EXTENSION), and threat name/type/attribution. Using this concatenated dataset, a multitude of common and ensemble models were built and assessed.

Common, ensemble, and hybrid models were developed to characterize threat intelligence gists into distinct sets for SIEM ingestion, threat alert reporting, vulnerability assessment, or incident response (action or discard) and for comparison to a singular Ensemble model previously created to perform the same function. Ensemble models are developed by combining common models for strength, but are not always the most accurate (Maldonado). A mix of ensemble models (averaging and maximizes the resultant probabilities) and common models will be compared to determine the best model for this data analytic need. The classification dataset has a binary target variable (1, 0) and 2,001 cases, boasting 13 feature variables with a considerable skewness (68%).

Data Examination, Cleansing, and Preparation

During the data selection phase of this project, this data type seemed fitting as it contained several classifier data points (scoring variables for determining actionable threshold) which could be used to bisect threat intelligence gists. Exploratory analysis of the dataset revealed anticipated relationships between threat intelligence gist details and premium vendor sources, for example a gist provided by a vendor would contain more details concerning the malware type, affected sector, campaign details, and indicators of compromise as compared to an open-source threat feed which often lacks granular detail. Additionally, popular malware files (highly active, often observed) were expectedly correlated with high AV alerting scores, as most anti-virus solutions are aware of these threats and detect against them accurately. Unanticipated data relationships included a Trojan-type malware with a file type of ‘.gif’. I see you. Not today Satan.

Using visual exploration, unanticipated trends observed in the threat intelligence dataset included a large number of ransomware and worm files, with high AV detection rates, which were sourced from community-driven feeds and not represented by commercial vendor feeds. Also, a malicious shell code file was reported in a community-driven threat gist which had a low AV detection rate (9 out of 53). This may indicate that community-driven feeds could be more timely and sensitive to malware trends and should be given increased credibility. +1 for the free stuff! ^__^

The threat intelligence dataset was imported into SAS Enterprise Miner. Upon ingest, key data roles were assigned: target variable (ACTIONABLE) and classifiers (INTEL_SOURCE_CRED, TOTAL_AVS). A simple variable replacement function was performed to ensure the target variable was binary (1,0), as decision models require this. The dataset was partitioned into two subsets: train and validate (70% and 30% respectively). Model performance was evaluated on the validation set.

Predictive Models Developed

A total of five models were developed to classify cyber threat intelligence gists, as shown in Figure 1 and detailed in Figure 2. 

No alt text provided for this image
No alt text provided for this image

Results

When comparing ensemble models (average and maximum probabilities), the Fit Statistics window declared the champion model based on the lowest Average Squared Error and highest Cumulative Lift: Ensemble_MAX. The ROC and Cumulative Lift charts for the ensemble models illustrates their competitiveness and sensitivity when compared to a baseline (Ensemble_AVG was more sensitive). The Event Classification Table and Chart revealed proneness (22%) to False Negatives in the Validation set in both ensemble models (AVG and MAX).

No alt text provided for this image

When comparing common models (regression, neural network, decision tree), the Fit Sta- tistics window declared the champion model based on the misclassification rate: Neural Network. As seen in the ROC chart, the regression model suffered high sensitivity in the training set, but corrected to match the neural network and decision tree models in subsequent sets.

No alt text provided for this image

When comparing hybrid models (common and ensemble), the Fit Statistics window declared the champion model based on the misclassification rate: HP Neural Network. As seen in the ROC chart, the Neural Network HP model outperformed the HP Regression (next accurate) in the validation set.

No alt text provided for this image

When comparing all of the developed models (to include a previously developed ensemble model), the champion model was selected based on the misclassification rate: Decision Tree Bagging method. As seen in the Classification charts, several models displayed misclassification in the validation set: Regression, Decision Tree2, Neural Network, Ensemble_AVG.

No alt text provided for this image
No alt text provided for this image

Conclusions and Takeaways

This predictive analytics project provides initial evidence of a proposed classification mechanism informed by high confidence data models to characterize threat intelligence gists provided by community and vendor sources. Cybersecurity analysts should continue to develop and trial processes to leverage bulk data feeds to best understand pertinent threats and action them appropriately. Nominating actionable threat intelligence based on algorithmic classification can perform triage analysis of data feeds, and offer additional time for security analysts to enumerate and extrapolate threats and designate proper defensive posturing. The Decision Tree Bagging ensemble model was the most accurate in predicting actionable classifications, and thus could be implemented on a trial basis. The models developed could be improved by adjusting parameters (stratification sample sets, randomization of seed sets, decision weights, forest size) in addition to recompiling the models and running additional training cycles. Focus should be placed on the HP Regression model as it demonstrated next highest predictive strength. Penalized regression techniques (tuning fit criteria to determine minimum prediction error) should be trialed in a development diagram.

References

CSO Online (2019). The Critical Need for Threat Intelligence. Retrieved March 30, 2019 from: https://www.csoonline.com/article/3269066/the-critical-need-for-threat-intelligence.html

Lutins E. (2017). Ensemble Methods in Machine Learning: What are They and Why Use

Them. Towards Data Science. Retrieved from: https://towardsdatascience.com/ensemble- methods-in-machine-learning-what-are-they-and-why-use-them-68ec3f9fef5f

Maldonado, M., Dean, J., Czika, W., & Haller, S. (2014) Leveraging ensemble models in SAS Enterprise Miner. Retrieved from: https://support.sas.com/resources/papers/proceed ings14/SAS133-2014.pdf

OPSWAT (2019). Threat Intelligence Feed. Retrieved March 30, 2019 from: https://www.opswat.com/developers/threat-intelligence-feed

Penalized Regression Methods for Linear Models in SAS/STAT Funda Gunes, SAS Institute Inc. Retrieved April 9, 2019 from: https://support.sas.com/rnd/app/stat/papers/2015/Penalize dRegression_LinearModels.pdf

Recorded Future (2019). What Is Threat Intelligence? Retrieved March 30, 2019 from: https://www.recordedfuture.com/threat-intelligence/

SAS Enterprise Miner: Reference Help. Introduction to SEMMA. SAS Institute Inc. Retrieved April 9, 2019 from: https://documentation.sas.com/?docsetId=emref&docsetTar get=n061bzurmej4j3n1jnj8bbjjm1a2.htm&docsetVersion=14.3&locale=en

The Cyber Threat (2019). Cyber Threat Intelligence Feeds. Retrieved March 30, 2019 from: http://thecyberthreat.com/cyber-threat-intelligence-feeds

Lori Cole: Threat Intelligence Consultant at RecordedFuture

https://www.linkedin.com/in/lori-cole-1b708a92/