Hadoop New Core SOC

Security teams are increasingly frustrated with legacy solutions that are not designed to address the data volumes they face today. Threat hunting and incident investigations are hindered by searches that take too long to run or simply time out. If the searches do finally run, we quickly discover critical data is missing because it was deliberately excluded due to either the costs associated with indexing it or the long term storage costs, ultimately yielding an incomplete picture.

To begin solving the problem, Hadoop becomes the first logical choice. It’s free, it is scalable, and it’s fast. Once Hadoop is in place, an interesting shift starts to occur inside the SOC – suddenly there is new demand to get more data in, extend access to more users, and of course the data has to be kept safe and secure. However, these are actually all good problems to have, you just need the right approach to ensure that Hadoop never becomes just another isolated data silo with limited data.

At JASK, we believe Hadoop is the new core of the SOC for the same fundamental reasons. We leverage Clouderas Hadoop distribution to enable security teams to capture, store and process, and drive value from security data – at massive scale – all while providing everything your organization needs to keep data safe and secure while still meeting compliance requirements.


Why are we using logs to do the networks job?!


Why cook eggs on a glass stove instead of using the non-stick pans in the cupboard? Sure it’ll cook the eggs, but it is not the proper tool for the job. So, why is the SOC using endpoint logs to gain the visibility the network provides? Clearly someone forgot about what’s in the kitchen. Why has the SOC spent the last decade forcing the SIEM to do the job of network tools? To get technical, why are my Linux guru’s using auditd to monitor sockets? (Talk about not using the right tools!)

The formal Kill Chain model as described by Lockheed Martin consists of 7 stages. Different vendors butcher each of the stages to their benefit, but let’s start with the Kill Chain as Lockheed described (and ultimately has the copyright for). Recon, Weaponization, Delivery, Exploitation, Installation, Command & Control, and finally Stage 7 - Actions on Objectives. Analyzing the seven stages, we find that only TWO (2!) stages do not traverse the network.

Don’t believe me? Well here is your proof:

Stage 1- Active Reconnaissance. This is when a remote attacker MUST cross the internet. It’s as simple as that, this isn’t a log event, this is network communication. If this were monitored the way many organizations attempt to leverage their SIEM and Log environment using endpoints, that knowledge would be replicated by every host that was involved in the reconnaissance event. Why deploy agents and monitor logs on an entire /21 [AS1] to capture a port scan when a network based sensor could monitor at one point in the network and see the entire Reconnaissance phase?

Stage 2 – Weaponization. That’s the weakness of every solution, endpoint based or network. It’s also the stage that most vendor’s cut out of their solutions messaging because it’s what happens in the attacker’s basement. It’s the stage where the cyber-criminal builds an exploit based on evidence found in the Recon phase before sending it to your devices. Moving quickly to

Stage 3 – Delivery. Guess what? In order to deliver a package to grandma the UPS truck has to drive on the highway. Similarly, the cyber-criminals exploit must pass the information super-highway, also known as the network. So why in the world is the SOC monitoring endpoint logs to gain second hand information that’s the networks first-hand knowledge? I’m befuddled by the complexity the SIEM vendors have bestowed upon our poor SOCs, aren’t you?

Stage 4 – Exploitation. Show me some endpoint love! Finally, we find a proper location for end-point monitoring. When the magic package lands on the endpoint and is executed, there is no better place to monitor the outcome than the endpoint itself. Thank the syslog-ng lord [AS2] for endpoint forensics and logs. Are you ready for another perfect task for endpoint monitoring?

Stage 5 – Installation. When it’s time to install malware, it’s time to touch the endpoint again. That’s two stages out of five so far that logs are actually the correct tool for the job.

Stage 6 – Command and Control. Getting back to the network; When an attacker in Guangdong, China wants to control his botnet in San Francisco, California, there’s a solid guarantee it’s going to be over the internet, that is unless the attacker plans on taking a cargo ship to the Port of Oakland and has a BART train pass to get to my office, walks up the stairs to my computer, and left clicks my mouse. Monitoring for Command and Controls with endpoint logs? Are you kidding? Are you really going to get on that cargo ship? It’s like eating spaghetti with a spoon. Sure it gets noodles into your mouth, but most slip back into the data lake of logs.

Stage 7 – Actions on Objectives. Finally, let the network come back to light! Guess what? Unless your cyber-criminal once again plans on taking PTO time to board that cargo ship and visit you to steal your documents, data exfiltration almost certainly will cross the network. For what bloody reason are we monitoring sockets with auditd for this? My brain hurts watching 90% of SOCs around the world leveraging logs in the SIEM to detect everything and then wonder why everything is failing!

We get the point, more logs isn’t going to cover the gaps that network sensors were built from birth to cover. Now please, stop using logs to do the networks job. [End Soapbox.]

The Modern SOC Runs on Slack


The modern SOC runs on Slack!

I was first introduced to the concept of using modern web collaboration apps like Slack or Hipchat for Security Operations by another great security startup: Area1 Security. They were piping security events into Slack automatically and using SlackBots to help with analysis.  At JASK we have also embraced this model, our product now fully supports Slack and we actively encourage it's usage as a primary communications channel for security teams working remotely or even within the same SOC ops room.   The ability to easily add custom integrations, collaborate, and automate enrichment with other tools via the simple plugins and slackbots, truly makes this an ideal platform for collaboration around cyber security incidents!  The team at JASK thinks that Slack is the future of Security Operations and is a great canidate to be the new “Single pane of glass”.  Are you currently using a collaboration tool like Slack to power your SOC??  We want to know and hear about the tips and tricks you have developed for your team.  Get in touch with us on twitter @jasklabs or with the author directly @gregcmartin

Greg is the Co-Founder and CEO of JASK and has a long and storied history fighting bad guys on the internet.  JASK (based in Silicon Valley) is the leader in AI for Cyber Security.


Tribal Knowledge- Did your security expert leave with all your knowledge?


Threat hunting isn’t only about finding compromised assets, it’s also performing the predictive function of finding the holes a malicious attacker might take advantage of. As I mentioned last week, your customers are your best hunters, accessing your website in a million different ways, with a thousand different web browsers and hundreds of different types of devices. This doesn’t include the automated mass vulnerability scanners, such as Shodan or research projects like MassScan that are scrubbing your applications as well. Today I’ll share some of my queries and I hope you share some of your most recent hunting exercises and queries with me.

At JASK we utilize Hadoop and Zeppelin notebooks. This allows us to write functions in spark and query our data using spark-sql syntax. This also allows us to export notebooks in json to share with the security community, work with our customers and the threat hunting community to build even more powerful notebooks and applied research. Now onto the data.

Searching for DNS non-authoritative answers for customer domains:

The results showed a large number of hosts querying the internal DNS server for customer.com.customer.com. Example: jask.com.jask.com. The internal DNS server did not have a record for this, so the query would then be forwarded to an external DNS server.  This looked strange and we realized this misconfiguration would point all users to their CMS licensing manager page since this particular domain was not registered under their license. I would categorize this as information disclosure, resulting in disclosing the CMS server version and dropping everyone to the admin login page of the CMS (both internal and external users). From this information disclosure it turns out they were running a vulnerable CMS version as well. Were they exploited yet? We had been in this POC for a few weeks and can query our data to determine if anyone accessed the CMS admin page while we have been in place. We are also able to close the loop and write a rule to produce a signal for logins to the admin page. Often times the business will decide this is not a risk and we simply keep it in our hunting notebook.

The zeppelin paragraph:

SELECT src_ip.addres
FROM dns
WHERE authoritative != trueandquerylike"%.jask.com"GROUPBY

Building on the CMS information disclosure story we mentioned earlier. Here’s the query we used to perform a historical check and determine if anyone had accessed the vulnerable CMS.

SELECT src_ip.address,
FROMhttpWHERE request.uri 
like"%CMSSiteManager%"or request.uri 
src_ip.address notlike"192.168.%"

Non-Standard software - User-Agents:

Most of the customers I’ve worked with function like the wild west, with BYOB and no managed software or hard and fast policies. Every now and again you get an easy one where the customer maintains an approved software list and possibly even an approved web browser. This makes for easy anomaly hunting or “Never have I seen X” type hunting. If we see anything that does not match the customers “approved” user-agent, we have a finding worth chasing.  Below is a sample query, but usually you’ll add more to the query, an internal subnet to hunt or regex of acceptable user-agents. Below is a sample of a basic Zeppelin paragraph, I will leave the rest to your own imagination and hunting specific hunting exercise. Here we are looking for all IE 11 User Agents. This is to get your mind thinking, but this one is fairly simple for this post.

SELECT src_ip.address,dst_ip.address,
FROMhttpWHERE request.headers['USER-AGENT'] != " Mozilla/5.0 (compatible; IE 11.0; Win32; Trident/7.0)"FROMhttp

Maybe you just want to see what your TOP 10 Most popular User-agents are?

SELECT request.headers['USER-AGENT'],
FROMhttpGROUPBY request.headers['USER-AGENT'] 

Maybe you just want the distinct User-Agents in your network? This query has found me anti-virus agents fetching update lists and validating the license key through a base64 encoded User-Agent string. Lame…


None of the above queries are all that efficient and depending on how tight lipped the network is the more clarity these queries can provide. Nesting queries can help clean the results and mean the difference between having a threat hunter analyze 100 results or 1,000’s.

Wasting your time searching for ad-trackers?

I’m not aware of what can be done here short of our government stepping in to protect our privacy and this hasn’t bore me much fruit in a hunt. It has found me people accessing inappropriate content in the workplace. Even while the organization had invested in a web proxy and end-point software to prevent adult content in the workplace. We could use this to validate the effectiveness of those automated content blocking tools and web proxies. Ad-tracker’s give up a lot of information about the quality of the website you are accessing and you just might find this query bearing fruit for you to find users searching websites in “poor” taste for the workplace. I find the more deceptive the ad-tracker, usually the dirtier the website. Here’s one of the most common ad-tracker’s I’ve seen recently.

FROMhttpWHERE request.headers['GET'] like"%beacon.krxd.net%"

Searching for plain text passwords floating around.

This one can be a bit noisy, so make sure to tighten it up after you scrub your first round of results with a few “not like” statements. We’ve found poor business applications with hardcoded passwords crossing the network boundary and floating around internally.

select src_ip.address,
from http 
request.uri like "%password%"

Searching for plain-text Protocols:

We all promise plaintext protocols are not allowed on the network, but we always find them. How about we take a look at the types of FTP activity happening and the exact commands that were run? One piece of information against logs for hunting. If you don’t control the FTP server, do you think the FTP server is going to send you the logs? This is the type of hunting that MUST be done with network data. Log data is a ho-hum source for hunting, maybe you have it, maybe you don’t. You just don’t know if you are getting the true results with logs, you never know which servers are logging. Sometimes the servers running are not yours, but a service a user throws up to get their job done quickly. That was the case with one of our most recent hunting exercises finding a quickly stood up FTP server on the internal network.

SELECT src_ip.address,

Maybe you are searching for anyone using those pesky Dell or IBM superfish root * certificates? This is just a dabble into the power of hunting based on TLS certificates, the cipher being used, and more. I’ve yet to find anything in a customer network related to weak ciphers or export encryption and that’s a good sign. TLS parameters are easy to hunt for and you should do it. It’s not always about what your certificates look like, but the certificates of the sites your users are interacting with. This might be the case with encrypted malware and TLS encrypted botnets using self-signed certificates or misconfigured certificates. Hackers make mistakes and it’s your job to catch their mistakes. They are doing a good job at catching ours.

select * 
from tls 
subject like"%edell%"

The story goes on forever, are you focused on the perimeter and want to see any connections that were established from external to internal? We remove RFC 1918 space in this query. As we graduate our knowledge in Spark we begin to define variables utilize functions, but for this article you’ll see no variables are used and we simply code the customer’s used RFC 1918 private addresses into the query.

SELECT src_ip.address,
FROM flows 
WHERE conn_state = "S1"and dst_ip.address like"172.%"and src_ip.address notlike"172.%"and src_ip.address notlike"192.168.%"and month = month(current_timestamp()) 
GROUPBY src_ip.address,dst_ip.address,dst_port,conn_state 

Still loving DNS and want to see your top 10 DNS queries? Your domain will likely be the top hit, go ahead and set it as a “Not like” and keep paring down those not like statements for a personal fit. Remember this is a write once, run many times hunt. Investing your time to write good queries the first time will result in a more efficient and quicker hunting exercise in the future.

FROM dns 
WHEREquery != ''andquerynotlike'%jask.com'GROUPBYqueryORDERBYCOUNT(query)

Have any ugly buggers trying to perform DNS exfiltration? Try searching for DNS queries of long length. This is a pretty weak one and almost every hit ends up with spotify’s long DNS queries for playlists.

SELECTqueryFROM dns 
WHERELENGTH(query) >= 100andquerynotlike"%.er.spotify.com"

Weak Kerberos Ciphers?

RC4-hmac and DES are seen on Windows XP and up to Windows 2003 servers. It’s something most environments should be moving away from for obvious weak cipher reasons. This query is great for validating strong ciphers are used throughout an environment and calculating the risk associated with where these weak ciphers are occurring in your network.

FROM kerberos 
cipher like"%rc4%"or 
cipher like"%des%"

Finally, let us not forget the world of executables. Those hundreds of thousands of dollars spent on full packet capture devices for the sole business purpose of extracting executables. Save yourself:

select src_ip.address,
from file 
group by src_ip.address,dst_ip.address,hash.sha256,mime_type

That’s a small sample of the 100’s of queries, paragraphs, and notebooks we’ve built at JASK for our customers to jump right into hunting in Big Data. We prefer to organize these queries into focused notebooks, such as DNS Security, HTTP, and TLS notebooks and run them at the notebook level vs. paragraph level, adding tremendous value and efficiency to a threat analytics program.

What to do with the results and wrapping up the Hunting Exercise.

Results are nothing if you can’t wrap them into the business process. When the hunting exercise is complete, take your query and turn it into signal intelligence to drive Artificial Intelligence. In JASK we have a rule engine for this exact design. Teach JASK a new skill and the AI becomes smarter. No security detection technology will catch everything, but when humans, customers, the data science, and security community are able to continually improve detection through hunting exercises and close the loop, we are one step closer to defending the business and turning hunting exercises into a repeatable process.

Happy Hunting!



The Rise of the Security Data Scientist

In the future of cybersecurity, there is a new role that will be critical to the security of an organization: the Security Data Scientist. The security data scientist will bring new skills to the job, but that doesn’t mean they will be brought in from outside the InfoSec domain.

     The new age of security has arrived. Populating the security operations centers (SOC) with skilled firefighters is no longer enough. An organization now needs smoke detection, experts sitting in the same room, looking for the beginnings of a security incident. In an ideal world, these people could even spot the areas with newspapers and gasoline stored together, heading off the danger. This can be accomplished, but only with a new mindset and new tools.

I have spent the past years trying to produce such a change: creating the tools for the job, training and developing the people for the job, evangelizing the need, even doing the job itself. I have learned many things from pursuing the Data Scientist Role. I came to the same understanding everyone agrees on: big data platforms and machine learning have a large role to play in the new age. What I have seen in the wild surprised me, and it just might surprise you too:

It is easier to teach data science to a security person than it is to teach security to a data scientist.

While this has been my experience so far, the reason behind the concept has taken a while to grasp. It comes down to expectations of the job and the type of people hired based on this criteria. For the most part, today’s data scientist is expected to have a broad knowledge of what tools are available, without too much depth into the computational details of each. This makes sense, as they are expected to spend a short time on each problem and then move on to the next. Security personnel are tasked with comprehending a complex dynamical network that includes machines and their human counterparts; in the science world, they have more in common with biologists who spend years studying one species. The passion and the interest in security data may lie more with the security person trying to do better at their job, but the data scientist can certainly contribute to the new SOC.

In general, data scientists most need knowledge about what tool to use next and less details of the domain, while security personnel need only the most relevant tools and they need to be able to use them well. The best security products will find a way to accommodate both needs.

     At JASK, as an experienced scientist and tool maker , my goal is to create products that are wholly relevant to protecting your network and elevating the security skill of your SOC, while providing a place for data scientists to contribute their expertise and models.