A Trial Using Support Vector Machines

By: Lori Cole, Threat Intelligence Consultant at Recorded Future.

Event Classification in Firewall Logs:

TLDR – I attempt to create evidence of classification mechanisms informed by high confidence data models to characterize events captured in firewall logs. I fail a lot, but that’s the key to learning!

Introduction

Firewalls are a fundamental component of prevention-focused security architecture. They provide defense in depth by blocking attacks against hosts and services and control traffic between zones of trust. Network-based firewalls provide network traversal visibility, granular access controls, and can identify traffic and enforce policy decisions (e.g., allow, deny, inspect) to assist in traffic-shaping and categorization of unidentified threats for forensic referral or quarantine. A web application firewall (WAF) is deployed to protect a specific web application or set of web applications, offering rule-based protection against application-level attacks such as cross-site scripting and SQL injection (PaloAlto).

Web applications of all kinds, especially e-commerce and payment gateway services, have become a trending target in cyber attacks (Tech Talks). Attackers are using methods which are specifically aimed at exploiting potential weaknesses in the web application software itself, evading detection by traditional IT security systems such as network firewalls or IDS/IPS systems. Additional protection against these attacks is offered at the WAF level, leveraging activity and event classification mechanisms which inform binary traffic routing decisions such as ‘permit’ or ‘BYE Felicia’.

The main benefit of a web application firewall is the protection of production web ap- plications without having to change the application itself. WAFs offer several security mechanisms which prevent the tampering, manipulation, injection, and compromise of critical data parameters including cookie tokenization, HTTP session management, URL encryption and link validation, mandatory SSL connections, data input validation, and malicious file scanning and execution blocking (OWASP). However, there are resource tradeoffs when using this type of firewall. WAFs increase the complexity of IT infrastructure, are more expensive to operate and maintain, training and testing can result in false positives, they can require complex troubleshooting, and they may force termination of a web application if severe errors occur.

Despite the obvious security benefit of WAF deployment, there is room for improvement. According to a BlackHat Conference presentation, protocol-level tactics to evade web application firewalls continue to evolve and succeed (Qualys). Protocol-level manipulation can enable WAF bypass by modifying server-side scripting languages (e.g., PHP) or communication protocols (e.g., HTTP) parameter values using simple character escaping or additions. This research addresses the need for continued improvement of WAF event classification to inform traffic restrictions in the form of a predictive classification model using support vector machines (SVM). By considering a multi-variate disposition score, firewall security engineers can enhance discretionary classification rulesets for optimized web application protection. (Also I think it would be badace if machineLearning could write rules at the byte-level after observing a training dataset.)

The dataset identified for this analysis is robust and appropriate for developing classifica- tion models that satisfy this research need. Not the byte-level one, the more attainable one, (for me) classifying events for basic rule composition.

The selected dataset is composed of next-generation firewall events, port and protocol addressing, disposition and attribution characteristics (RELEVANCE, SEVERITY, MAGNITUDE, CREDIBILITY), and other pertinent data points (e.g., DATE_TIME, IP_GEO). Using this dataset, support vector machine models (of various kernel types) will be developed to characterize these events into distinct sets for firewall ingress and egress optimization rules (permit or deny). The classification dataset has a binary target variable (1, 0) and 1,500 cases, boasting 14 feature variables with a considerable skewness (89%). This dataset has been sanitized to prevent attribution and is available for public download here: https://www.kaggle.com/lorisaysrawr/ngfw-logs

No alt text provided for this image

Data Examination, Cleansing, and Preparation

During the data selection phase of this project, this data type seemed fitting as it contained several event classifier data points (enumerative and estimative) which could be used to bisect web application traffic to optimize network defense. Exploratory analysis of the datasets revealed anticipated relationships between web application event disposition scoring and policy allowance, for example an event with a magnitude score of 8 is categorized as a “firewall deny” despite its use of a permitted protocol (SMTP). Unanticipated data relationships included UDP traffic using port 53 denied. Denying TCP traffic on port 53 is an accepted security practice to mitigate against DNS Zone Transfers (Network World), but denial across the UDP protocol without logging variables to support a severity rating prompts inquiry. This anomaly might be explained by packet sequence disruption, as packets would not be interpreted as expected and thus denied. Additionally, data records containing no protocol or port variables were permitted according to firewall log evidence, which begs additional scrutiny of the standing policy.

Using the SEMMA process (SAS Handbook), the dataset was further explored revealing anomalies and unexpected relationships. Using visual exploration, unanticipated trends observed in the firewall log dataset included a large number of unknown IP geolocation resolutions. Geolocation of IP addresses cannot be performed if IPs are not present in the logs, this observance may indicate IP fabrication (spoofing), anonymization, or other evasion tactics. An example of unexpected protocol and port relationship included UDP across port 123 (NTP), which may provide information about system internals, allow modification or cancellation of timed processes, and offer an avenue of attack for intruders (SpeedGuide).

The firewall log dataset was imported into SAS Enterprise Miner to continue the data preparation process. The data was prepared by cleaning and transforming variables and partitioning the dataset. Upon ingest, key data roles were assigned: target variable and classifiers. A simple variable replacement function was performed to ensure the target variable was binary (1,0), as SVMs require this. There were no erroneous entries or incorrect values identified when sorting variable inputs within their respective columns, therefore none were removed or imputed. A missing value threshold was defined when creating the data sources in SAS Enterprise Miner. Variables with missing values greater than the threshold (1) were scheduled to be dropped. There were no instances of missing values in the selected dataset. The variable section node was used to reject irrelevant values in the dataset (DEVICE_ID, EVENT_CATEGORY).

The dataset was partitioned into three subsets: train, test, validate (40, 30, 30 respectively). The model fit is determined using the training set and model performance was evaluated on the validation and test sets. No new features were engineered for this dataset. In future efforts, the individual disposition scores could be combined in SAS Enterprise Miner to represent a composite score and therefore a single discriminatory classifier.

Predictive Models Developed (aka do my bidding machines!)

Eight support vector machine (SVM) models were developed to classify firewall log events, as shown in Figure 2.

Each SVM model employs varied optimization methods, kernel functions, and penalty values as shown.

The first SVM model uses default configurations. Subsequent models demonstrate exper- imentation with tuning parameters to improve model efficacy using feature expansion (polynomial degrees) and expanding predictive data into higher planes where it can be linearly separable (polynomial and radial basis kernel tricks). Different kernels exist for transforming data spaces into better separated groups. Kernel selection affects how well the model properly classifies the data. The penalty parameter is a critical tuning variable when constructing a good model that generalizes well. Penalty value scaling allowed examination of values occurring on inappropriate sides of the hyperplane (Ben-Hur).

Results

The objective of SVMs is to construct a hyperplane that maximizes the margin between two classes (Ben-Hur). When comparing linear SVMs, the Fit Statistics window declared the champion model based on the highest selection criterion score: SVM 1 (default parameters). The Score Rankings Overlay window shows the ROC Chart for the SVM models. SVM model 5 has the most Sensitivity meaning it would have the weakest binary predictions. The Score Rankings Overlay window shows the Cumulative Lift for the SVM models. SVM model 5 has the most deviance from the actual lift curves (Train, Validate, Test), meaning it would be the least accurate in predicting event classification. When examining SVM models of varied kernel functions, linear models proved to be more accurate. SVM linear models with varied penalty values were further compared, revealing the linear champion model: SVM 8 (penalty value of 0.25). This was expected; as the penalty value decreases, the decision boundary becomes soft (less curved) to accommodate smaller error margins.

No alt text provided for this image
No alt text provided for this image

Conclusions and Takeaways

This predictive analytics project provides initial evidence of proposed classification mechanisms informed by high confidence data models to characterize events captured in firewall logs. Discretionary firewall rules should be continuously and intelligently improved and revised to combat cyber threat actor attacks and evasion techniques. Nominating traffic restrictions based on algorithmic classification of firewall events can automate human analysis of security logs, and offer additional time for threat research and firewall optimization strategy. SVM models 1 and 8 were the most accurate in predicting event classifications, and thus should be further trialed.

Next steps include additional sampling of firewall logs for model classification to highlight the error rate (misclassification of events) of standing firewall rules to justify model-based rule tuning on a trial basis. Time series nodes would also inspire additional studies using this dataset, for example: correlating real world geopolitical or military events with increased firewall deny classifications (potential indications of cyber warfare). Improvements for future model iterations include a more intuitive visualization of the SVM model results. I was unable to generate scatter or density plots to reflect my hyperplane, margins, and support vectors properly. 

References

Ben-Hur A, Weston J. (2010). A User’s Guide to Support Vector Machines. Retrieved March 16, 2019 from: http://pyml.sourceforge.net/doc/howto.pdf

Network World. (2010). Core Networking. Allow Both TCP and UDP Port 53 to Your DNS Servers. Retrieved March 16, 2019 from: https://www.networkworld.com/article/ 2231682/cisco-subnet-allow-both-tcp-and-udp-port-53-to-your-dns-servers.html

OWASP. (2019). Web Application Firewall. Retrieved March 16, 2019 from: https://www.owasp.org/index.php/Web_Application_Firewall

OWASP. (2019). OWASP Best Practices Series. Retrieved March 16, 2019 from https://www.owasp.org/index.php/Category:OWASP_Best_Practices:_Use_of_Web_Ap plication_Firewalls

PaloAlto. (2019). Firewalls and Appliances. Retrieved March 16, 2019 from: https://docs.paloaltonetworks.com/hardware

Qualys. (2012). Protocol-level Evasion of Web Application Firewalls. Retrieved March 16, 2019 from: https://media.blackhat.com/bh-us-12/Briefings/Ristic/BH_US_12_Ristic_Proto col_Level_Slides.pdf

SpeedGuide. (2019). Port Details. Retrieved March 16, 2019 from: https://www.speedguide.net/port.php?port=123

Stanford University. Stats 202. Data Mining and Analysis. Retrieved March 16, 2019 from: https://web.stanford.edu/class/stats202/content/lec23-cond.pdf

Tech Talks. (2016). Why are Web Applications Attractive Targets for Hackers. Retrieved March 16, 2019 from: https://bdtechtalks.com/2016/02/29/why-are-web-applications-attractive- targets-for-hackers/ 


Lori Cole, Threat Intelligence Consultant at Recorded Future.
https://www.linkedin.com/in/lori-cole-1b708a92/