On the Hunt - Threat Hunting with Base64 Decoder

Every now and again you hit a day where you just feel like scrolling. One of those lazy, rainy days just before the holidays. Today is one of those days and that's where my less efficient threat hunting ideas come from. Today I'm playing with extracting Base64 strings from HTTP URI's, HTTP Cookies, and just about anywhere I can find Base64 strings in a network feed. Let’s get to it!

The first thing we need is to write a Base64 extraction function;I need some coffee this morning and need one massive brain push for this trick. The goal is to search for any strings that look like they could be Base64. Accept this regex as elementary and not the "best of the best" for  Base64  string detection, it's our quick start to prove a hypothesis that something is hiding in our network via Base64.


Breaking our Regex down


This is the lookback string for an equal sign that represents the start of a Base64 string. The reason it's a look back is the decodeBase64 function needs a 4 byte string and the = sign doesn't need to be extracted from the full string.


This is matching any sequence of letters and numbers occurring any number of times. This is likely where the most improvement can be made in my regex. Maybe on a sunnier day.


This matches two equal signs to show the end of a Base64 string.

Now that we have a regex, let’s test it and find lots of great matches for Base64 encoded strings.

Part One of our task is completed. We've  built a Base64 string detector. Apply this function to a network data stream and now we are matching and displaying  HTTP URI's with Base64 inside of them.

Part Two is extracting these Base64 strings. It's one thing to simply find them. The real trick for me was extracting only the Base64 string within the URI. The flexibility of JASK is perfection for this task and we can utilize Spark to write an extraction function. Let's get to it! We define the variable pattern as our previous regex and build a function to extract the Base64 string that matches this pattern.

Now we are cooking with bacon! We have our getBase64 function for extracting Base64 strings registered as a UDF to use anywhere in our notebooks. Now we need a Base64 decoding function. I'm lazy today and it's raining, let me see if there isn't already a function for this. Got it! https://github.com/scalaj/scalaj-http. I'm going to import scalaj.http.Base64 and call it my lucky day. Remember we are being lazy hunters today, time to register this as a UDF.

Job done! Now I can call my getBase64 extraction function first and feed the results to our decodeBase64 function and it will return the Base64 decoded string. That's it! Now let's do this at MASSIVE SCALE!

The results are fun. We've found process tracking, device fingerprinting and plenty of ads pulling email addresses of users logged in. An Interesting (disgusting) way of user-id tracking. I also applied our function to the HTTP Cookie data and found a different set of fun findings, more interesting than you would expect, but I'm going to keep that between JASK and our affected customer.

Here’s a quick screenshot of raw results from the last day:

Rainy day threat hunting, testing a hypothesis and having fun. Lots of scrolling through results, which is exactly what I felt like doing on this lazy rainy day in San Francisco. I've committed this Base64 Decoding notebook to our JASK clusters for customers to take advantage of, so please come join us! I also converted some of what we've found to signals for our Ai to learn from. The holidays are almost over and I’m ready to go back to work!


2016: A Big Data Year in Review

With another year almost in the books, and with 2017 looming just over the horizon, now is a good time to take stock of what happened in the big data analytics space over the previous twelve months, to assess where we’ve come from and what direction we may go.

It’s been an eventful year for big data, to say the least. Nobody knows what 2017 will bring, although that won’t stop us from sharing predictions from some of the brightest minds in big data (stay tuned). Here are some of the top news-making items, events, and trends that helped to shape 2016 and turn it into the big data year that was.

Source: Datanami

Read full article here

The Dangerous Rise of Ransomware

Ransomware is a relatively new type of cybersecurity threat.  It amounts to an attacker taking and encrypting your valuable data, and then charging you to de-crypt it.  The idea came about 10 years ago, as a theoretical concept called “cryptovirology”.  Although the idea is not new, it has only become a real threat in recent years. The economics of ransomware is different than the threats we have seen before it, new economics that give hope to cyber-defenders hoping to combat it successfully.

First, there is money in trafficking ransomware.  The criminal usually demands to be paid in bitcoin to de-crypt. Bitcoin fits this need perfectly; it is hard to trace and easy to launder. In terms of US dollars, the amounts demanded were in the low hundreds, but are steadily climbing higher; some estimate that $1 billion USD will be paid in 2016. Compared to spam botnets, where criminals make pennies per bot, and the actual income from spam email click through have plunged to almost nothing.  If there’s money to be made, criminals will focus on using the most effect manner with the highest payout. Today that happens to be ransomware.

Second, the business of ransomware is scalable.  When a new tool becomes available in the hacker market, criminal organizations mount campaigns, just like sales and marketing departments all around the world advertising their product. Much like a successful commercial, each of these campaigns continues as long as it makes money. If a threat is widespread and therefore scalable, then defense for it become scalable, too. There are enough artifacts to effectively study the campaigns, and build defenses for it that are based on the behavior of the campaign, not the specific signatures used. This behavioral defense is more sustainable and can limit the life of ransomware campaigns.

Third, ransomware, surprisingly, relies on open-source. Ransomware has started to appear as Github repositories, where it is modified by other hackers to create new variants. While this may sound scary, compare this another threat in past years: zero-day exploits that were secretly developed and may only be possessed by a few actors around the world. If hackers have access to open source, then security product developers have access to it as well. For those who are active members of the open source community, this puts the cyber-defender on a more even footing.

Ransomware represents a new combination of economic factors in a cybersecurity threat.  The revenue stream is more direct; from the consumer to the criminal, with no middlemen.  It operates on a larger scale and it does not rely as much on limited supply inputs. This attracts a lot of attention and innovation from the malware community, but it gives security products a chance for strong innovation.

As Chief Data Scientist at JASK, I study the network behavior and tools of Ransomware to better defend companies against a dominant threat in cybersecurity today.