Data on 123M US households exposed in latest misconfigured AWS cloud storage case

“Discussing the possibility that the data may have been accessed, JASK’s Director of Security Research, Rod Soto told SiliconANGLE that “there’s a good chance data is the wrong hands” as “malicious actors are using many different tools to discover such buckets, or they are finding information in other sources such as github.com, or by performing other attacks that may get hints or direct clues of the use of AWS buckets.”

Read Full Article Here.


Cueing Threat Hunters with Change Detection

 

Artificial Intelligence (AI) and various component tools such as machine learning (ML) are not intended to fully-automate threat mitigation and response, at least not in the current generation of technologies. Instead, AI and ML are beginning to provide a much greater degree of organization and prioritization for existing workflows. For example, the Threat Hunter ideally keys off a highly-targeted alert, perhaps one that is indicative of a specific threat actor or use of a particular tool. However, such targeted alerts are notoriously brittle, and this model fails for novel threat vectors.

In lieu of specially-crafted signatures for known or previously-seen attacks, or alerts that might fail to reliably fire in a particular network, how can we key an investigation into potentially malicious activity in which an attacker has already gained access, before it’s too late?

Nearly all “generic” or non-signature detections involve detecting a change in an observable field or a change in the level of activity on some observable field. This is called “changepoint detection” and is an area of probabilistic or statistical time series analysis. Some methods are more reliable than others. Some methods are “fooled” by regular changes in periodic activity - that is, they detect changes due to the normal ebb-and-flow of the work day or week.

In our last blog, we discussed methods on the highly-sophisticated end (neural networks) for learning nominal activity patterns in data. These and similar models require a lot of training data and can be challenging to deploy in production (we will have a follow-up on this subject.) Here we discuss how to apply much simpler statistical models that are relatively straightforward to deploy in production. However, as with most data science based models, the key is framing the problem and what might be called data logistics.

The following chart shows hourly samples of SMB file access from a real network. If you squint you can see a pattern of daily activity compressed near the x-axis. However, the spike one morning really stands out, as this was multiple orders of magnitude greater than the usual level of activity for this activity type for this network.

 

What goes through a threat hunter’s mind when they see a graph like this?  It could represent something benign like a file backup, a policy change, or some other routine high-volume file server activity. Or it could indicate a hostile scenario such as a recon, ransomware, or a mass file delete. This triage process can be hectic and stressful until the cause of the unusual activity is run to ground. In order to determine threat or not-threat, the hunter will need to dig further, but first, the unusual activity must be flagged.

To reduce the burden on hunters and the security workload in general, JASK’s Trident product combines multiple behaviors, like change points, signals, and threat intelligence into a greatly reduced number of Smart Alerts that merit greater attention. But any analyst or hunter can benefit from a better understanding of how to better detect changes on the network.

One good initial step in a statistical analysis of data is to determine the distribution of data. It can be tempting to start off by calculating high-level or coarse statistical measures like mean (average) and variance (square of standard deviation). But these measures are only valid if the data really does fit a standard normal distribution, the oft-cited “bell curve.” A quick look at the above time series shows that this data does not quite meet this assumption. In fact, research has shown that network traffic does not generally follow a normal distribution, but tends to be heavy-tailed, like the data example here.

 

This graph is zoomed into the area around the bulk of the data, so we can see the shape of the distribution more clearly. There are a small number of outliers, in particular, the dramatic activity peak on Wednesday morning.

Going back to analyzing some of the descriptive statistics associated to these values we see that the histogram is known as a skewed (or heavy-tailed) distribution.  Network counts for instance are skewed, and in this case mean is a poor indicator of data centrality. Similarly, variance is a poor estimate for how much spread is typical in the skewed data sets.  A commonly-used indicator of change in simple anomaly detection applications is building a threshold rule, such as “three sigma,” which is three times the standard deviation above and below the average trend of values in the timeseries.

In practice when we use these type of threshold rules on skewed data we sometimes run into noisy situations, where the model raises too many outliers or not any at all. For example In the output below we have overlaid some useful descriptive statistics. In the example below we end up computing a standard deviation that is less than zero, which is hard to interpret for counts. Furthermore looking at the distribution of values in relation to the outlier, we see that  the mean and standard deviation are concentrated towards the skewed part of the distribution.

An alternate approach is to use median instead of mean. As shown in the same figure, median is a much more reliable measure of data centrality. Median is also less prone to pull from small numbers of outliers and other extreme data. A common change point detection metric used based on median is median absolute deviation, or MAD, defined like:

In the example above we see if we use MAD for flagging change points or outlier the values we get a much less noisy threshold. As can be seen from the chart above, this would only trigger on the one unusual hour in the observed data. This might seem to be what we want, however, what if the data is collected or processed in 10 min or 1 min bins instead of hourly bins? How does this impact the distribution?

The standard deviation is often a poor measure of data “dispersion” or spread on skewed data sets commonly used in cyber security modeling. The same chart shows the alternate measure of dispersion based on median, or MAD. Again, this appears to fit our visual intuition. Also, using a threshold of 3 or 4 times MAD would ignore most of the data, but flag a small number of high-end data points in addition to the extreme outlier - which is way off the right end of of this zoomed in chart.

From a scalability perspective we have to answer one key question: are we concerned about performing these calculations on the vast amount of network data collected on real large-scale networks? The answer is “no” because streaming implementation of the median exist, based on a mini-heap computation, which scales as O(n log n), principally because as new data arrives, a sort must be performed so that the mid-point  can be identified, or the P2 algorithm, which is more memory efficient than using heaps.

As mentioned already, barring a clear signature of a known attack tool or use of malicious code, which for a wide variety of reasons is becoming more rare, the threat hunter requires some starting-point for investigations. Changes in the network are one of those key behaviors, and using a metric that suits the heavy-tailed distribution of network activity is essential to keep false alerts in check while still being sensitive to moderate changes. Correlating these and other types of changes in context can be performed using an manual correlations and SIEM. More and more, through additional machine learning automation. This next level of contextual automation is where JASK is focusing, and we will discuss some of the ways this can be reliably done is subsequent posts in this series.


What is a botnet? And why they aren’t going away anytime soon

“In addition to creating a common, worldwide cybercrime enforcement system, there also needs to be standard regulations for manufacturers, requiring a certain level of minimal security in IoT devices. "Any regulation must also apply to all manufacturers, as many markets tend to be flooded with very cheap devices produced in regions where internet laws are very lax or non-existent," says Rod Soto, director of security research at Jask, an AI cybersecurity startup.”

Read Full Article Here.


Flaw in macOS High Sierra allows easy access

“The MacOS High Sierra vulnerability is alarming because it makes it seamless for someone to log into a system as root. While there are other methods that can provide bad actors with access and password reset capabilities via physical access, these require some technical knowledge and time,” said JASK Director of Cybersecurity Rod Soto, who has tested and verified the flaw. “The severity of this is how simple and quick anyone can execute the method and log in to reset and access user information even if their passwords are complicated.”

Read Full Article Here.


Single Sign On: Feature or Threat?

A conflicting issue between usability and security is at the core of single sign on capabilities. The use of single sign on (SSO) is from the perspective of usability, a must have. SSO is required to maintain efficency within a workplace. Modern enterprise users are constantly using multiple applications, accessing, sharing, storing data across multiple file shares, sending, downloading emails, authenticating through VPNs, mobile devices, etc. Without single sign on, each step would inhibit productivity levels.  It would be impossible, from the functional view of user interactions and tasks, to require them to authenticate every time they access a resource, read, write or modify a file. It is very clear that SSO is a fundamental need for enterprises.

However SSO represents a single point of failure and a driving factor for credential reuse/extraction attacks. This means attackers can gain access to a variety of resources by simply obtaining and reusing credentials. If organization defense posture is weak, this creates a risk that can come from  simply snooping over someone’s shoulder, reading a sticky note, or all the way to a sophisticated targeted phishing, malware execution, social engineering or post exploitation attack, where attackers can obtain user credentials and then proceed to gain access and move laterally across an organization.

There have been significant numbers of breaches and known compromises that started by simply obtaining credentials from users, and even administrators as malicious actors tend to pretext and target them. Weak passwords and policies clearly augment the damage that an attack of this type can cause. In some cases the reuse of passwords, for example, has exposed not only targeted organizations, but partners and even defense service providers.

Credential reuse/extraction attacks, used in post exploitation environments, provide powerful tools to move around the enterprise leveraging SSO technologies. Very popular tools such as Mimikatz are designed to especifically exploit SSO features. Tools like this allow attackers to perform things such as Pass The Hash, Pass The Ticket and other related credential extraction/reuse attacks.

These type of attacks and tools constantly evolve as new ways of abusing/exploiting SSO features are discovered. Recently security researcher Juan Diego found a method to extract NTLM hashes that then can be reused (or cracked) to obtain credentials in a post exploitation environments to then move laterally. In spite of all the attacks already available and upcoming, single sign on cannot be abandoned.

Single sign on can be fortified by using strong password policies and complementing monitoring and detection technologies such as JASK Trident. JASK Trident uses a number of multiple sources of information and contextual indicators to detect abnormal activity and credential reuse attacks, these multi contextual indicators are based in experience security operation center operators along with machine learning models.

The following figures show multi contextual indicators used by JASK Trident, that can indicate credential extraction/reuse.

Fig 1 Shows Lateral Movement activity alert (SMB) Scanning

Fig 2 Shows First Seen Access - SMB Share

JASK Research team has produced a threat advisory outlining a proof of concept of this new attack and specific steps for mitigation.  Access the Threat Advisory by clicking here.


Death of the Tier 1 SOC Analyst

"Greg Martin, founder of startup JASK, which offers an artificial intelligence-based SOC platform, says Tier 1 analysts are basically the data entry-level job of cybersecurity. "We created it out of necessity because we had no other way to do it," he says. But he envisions them ultimately taking on more specialized tasks such as assisting in investigations using intel they gather from an incident."

Read Full Article Here.


Cool Companies in Cognitive Computing

"JASK, an enterprise artificial intelligence (AI) cybersecurity company, that recently launched with the announcement of $12 million in Series A funding, offers a cloud platform that uses machine learning and AI to deliver end-to-end network monitoring—identifying and triaging the most relevant attacks, and allowing security analysts to focus their resources on only the most dangerous threats."

Read Full Article Here.


What is Bad Rabbit? Petya-Style Ransomware Attack Hits Russia, Ukraine

"Security researchers reported Tuesday a new wave of potentially destructive ransomware known as Bad Rabbit. The malicious attack spread quickly across computer systems in Eastern Europe, including targets in Russia and Ukraine, and has been detected in the United States.

The outbreak of Bad Rabbit, which reportedly bears some similarity to the damaging Petya/NotPetya wiper attack that spread earlier this year, resulted in service outages at news agencies, train stations and airports among other organizations."

Read Full Article Here.


Time Series Anomaly Detection in Network Traffic: A Use Case for Deep Neural Networks

Introduction

As the waves of the big data revolution cascade across industries, more and more forms of sensor data become valuable inputs to predictive analytics.  This sensor data has an intrinsic temporal component to it – and this temporality lets us use a family of techniques for predictive analytics called Time Series Models [1]. In this blog post we explore the underlying nature of time series modeling in the context of enterprise IT analytics particularly for cyber security use-cases.

Time series can exist in many different industries and problem spaces, but at its essence it is simply a data set that has values indexed by time. In research literature we usually refer to a univariate time series as a data set that has timestamps and single values associated to each timestamp. Examples of univariate time series include the number of packets sent over time by a single host in a network, or the amount of voltage used by a smart meter for a single home over the year. Multivariate time series are an extension of the original concept to the case where each time stamp has a vector or array of values associated with it. Examples of multivariate time series are the (P/E, price, volume) for each time tick of a single stock or the tuple of information for each netflow between a single session (e.g. source and destination ip and port, packets and bytes sent and received, etc.)

 

Time Series Models For Network Security

Time series data is particularly prevalent in any modeling scenario dependent on input from a modern IT infrastructure. Almost every single component of the hardware and software used in enterprise networks have some sub-system that generates time series data. For cybersecurity models univariate/multivariate time series form one of the cornerstone data structures, particular for studying evolving patterns of behavior.

There are multitudes of different use cases relevant for modeling problems in cybersecurity.  To illustrate some of the common phenomena associated with this class of problems we enumerate a couple of the most common scenarios below.

 

Use Case 1: Detecting DDOS Attacks

With the growing prevalence of pay for play attack infrastructure, Distributed Denial of Service (DDOS) attack volume has hit all time records, including the latest attack on Krebs last year using the Merai botnet [1,2].  Denial of service attacks come in a couple of different varieties inducing ‘Layer-4’ attacks and ‘Layer-7’ attacks, referencing the OSI 7-layer network model. Typically the detection of the application layer attacks (Layer-7) is more difficult than the lower layer attacks because it involves exploiting some property of an API.  For either case though, we can use the data related to overall flow, size/volume, and app layer traffic stats generated by our routers and perimeter infrastructure over time, to build time series models for layer 4 and layer 7 inbound traffic patterns. A standard time series model is then overlayed on this data to detect change points in the normal traffic baseline of the key choke points and DMZ assets exposed to inbound network traffic. The goal of this model is to identify spikes in traffic patterns that are extreme deviations from the observed baseline like in the figure below.

Use Case 2: Detecting Failed Login Spikes

Another common attack pattern usually following a large leak of user names or PII data onto the darkweb is called Account Takeover (ATO).  For instance, after the leak of a large number of user names for a financial institution, attacks can follow by targeting the login infrastructure for the banking applications. Typically attackers will script an automated test of usernames /passwords against the list of stolen data; there will be a pattern of logins on the application that is rapidly changing the number of attempted logins per username. There is potential for major financial gain to be had, even in the case of a single successful login, so attackers are incentivized to target weak infrastructure in combination with the darkwebs economy of stolen PII. This type of attack manifests as a time series problem, particularly in the application logs of the web service being targeted. A changepoint, in total number of failed logins related to a particular external subnet or other group information, is a one primary indicator an ATO attack is taking place. Typical patterns we look for in this case can be seen as intermittent spikes of activity spread out over time (see below figure).

 

Use Case 3: Data Exfiltration

Finally the last common use case that is most common with regards to time series models is exfiltration of data. There are many sub-problems and behaviors to take under consideration here depending on the particular security scenario. For instance, an enterprise may be dealing with a disgruntled insider who is actively dumping data from repos onto a physical usb disk or sending it to attachments through google drive. Different paths of exfiltration require careful analysis of the protocol and methods involved. One rich area that is nice to model, using multivariate time series, is time series behaviors involving DNS data*. In the example below we see that if we build the appropriate multivariate vector on each individual endpoint, DNS requests we can predict multiple attack patterns with a single model. *See the JASK blog post here for more details on some of insights into searching for key patterns related to DNS exfiltration [10].

 

Time Series Prediction Using Neural Nets

Neural networks have a long and interesting history as pattern recognition engines used in machine learning [4].  Over the last decade the advent of next generation hardware for specific learning tasks (e.g. tensor processing units) along with breakthroughs in neural-net training has led us to the era of Deep Learning [6,7].  State-of-the-art libraries like TensorFlow and PyTorch provide high level abstractions for making some of most important techniques from Deep Learning available to solve business problems.

One of the most important aspects of leveraging time series output in security operations is  building detections tuned to highest priority outcomes. With most of the toolsets and solutions designed for security operations center (SOC) workflows, the operator has to specify a manual threshold in order to detect time series outliers. Neural networks provide a nice solution, from an engineering standpoint, for cybersecurity models with temporal data because they provide a more dynamic learning aspect that helps drive data-driven detections past static thresholds.

In 1997 Hochreiter and Schmidhuber wrote their original paper that introduced the concept of long-short term memory (LSTM) cell in neural net architectures [5].  Since then LSTMs have become one of the most flexible and best-in-breed solutions for a variety of classification problems in deep learning.

Traditional statistical/mathematical approaches for analyzing time series are run over a specified time window frame. The length of this window needs to be pre-determined and the results of these approaches are heavily influenced by the length of this window. Traditional machine learning algorithms require extensive feature engineering to train the classifier on. However, with any change in the input data, the dynamics of the features change as well, forcing a re-design of feature vectors to maintain performance. During the feature extraction phase, if the features are not appropriately chosen, then there are high chances of losing important information from the time series. LSTM, on the other hand, showcases the ability to learn long-term sequential patterns without the need for feature engineering:  part of the magic here is the concept of three memory gates specific to this particular implementation of deep learning. Recurrent Neural Networks suffer from the problem of vanishing gradient descent, which prevents the model from converging properly due to insufficient error correction, and which is overcome by LSTM. On account of these advantages, we turn to LSTM for modeling our time series.

 

TensorFlow LSTM Model Layer-By-Layer

Using TensorFlow [13]  we can build a template for processing with arbitrary types of time series data. For a good introductory overview into TensorFlow and LSTM check out some of the great books and blogs that have been published recently on the topic [9,11,12].

In our prototype example we build a simple architecture description of a neural network specifying the number of layers and some of related properties. We define our LSTM model to contain a visible layer with 3 neurons, followed by a hidden “dense” (densely connected) layer with two-dimensional output and finally an activation layer. The mean squared error regression problem is the objective that the model tries to optimize. The final output is a single prediction.

The input to the LSTM is higher-dimensional than traditional machine learning modeling inputs. A diagrammatic representation of our data is as shown:

 

Algorithmic Scalability Notes

For univariate time series data LSTM training scales linearly for single time series (O(N) scaling with N number of time steps). The training time using LSTM networks is one of the drawbacks but because time series models are often embarrassingly parallel these problems are suitable to running on large GPU/TPU clusters.

To test if our model overfit we plotted a training size versus the RMSE plot and saw that the error reduced with the increase in the training data (RMSE is a quick metric that is easy to use but proper overit analysis requires a more detailed testing paradigm). This is the expected trend since the model should be able to predict better with the increase in the training data. The tests below are run on synthetic time series data and are on regular CPU cores.

Conclusion

Part of the appeal of neural network methods for time series problems is they let us move past traditional threshold-based detections as well as automate some key use cases.  There is a lot depth to this topic and related engineering design. We have found Python and TensorFlow are great tools for prototyping ideas for building operationalized solutions with low initial complexity. In the realm of cybersecurity we can move a lot of the generic queries that end up being driven by fixed thresholds to a more dynamic learning paradigm driven by deep learning models. The benefit we see for choosing LSTM in these cases is that we can get better data driven detections while moving away from simple rule based time series alerts.

 

 

References

  1. Jan G. De Gooijer, Rob J. Hyndman, 25 years of time series forecasting, In International Journal of Forecasting, Volume 22, Issue 3, 2006, Pages 443-473, ISSN 0169-2070, https://doi.org/10.1016/j.ijforecast.2006.01.001
  2. https://www.theguardian.com/technology/2016/oct/26/ddos-attack-dyn-mirai-botnet
  3. https://www.abusix.com/blog/5-biggest-ddos-attacks-of-the-past-decade
  4. Christopher M. Bishop. 1995. Neural Networks for Pattern Recognition. Oxford University Press, Inc., New York, NY, USA.
  5. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (November 1997), 1735-1780. DOI=http://dx.doi.org/10.1162/neco.1997.9.8.1735
  6. Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Comput. 18, 7 (July 2006), 1527-1554. DOI=http://dx.doi.org/10.1162/neco.2006.18.7.1527
  7. http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning-part-4/
  8. Greff K, Srivastava R, Koutnik J, Steunebrink B, Schmidhuber J, LSTM: A Search Space Odyssey, IEEE Transactions on Neural Networks and Learning Systems (2016) Published by Institute of Electrical and Electronics Engineers Inc.
  9. Hands-On Machine Learning with Scikit-Learn and TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems By Aurélien Géron
  10. https://jask.ai/cyber-security/threat-hunting-part-3-going-hunting-with-machine-learning/
  11. http://papers.nips.cc/paper/822-bounds-on-the-complexity-of-recurrent-neural-network-implementations-of-finite-state-machines.pdf
  12. https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
  13. https://www.tensorflow.org/

 


Executive Summary and SOE