Every now and again you hit a day where you just feel like scrolling. One of those lazy, rainy days just before the holidays. Today is one of those days and that’s where my less efficient threat hunting ideas come from. Today I’m playing with extracting Base64 strings from HTTP URI’s, HTTP Cookies, and just about anywhere I can find Base64 strings in a network feed. Let’s get to it!
The first thing we need is to write a Base64 extraction function;I need some coffee this morning and need one massive brain push for this trick. The goal is to search for any strings that look like they could be Base64. Accept this regex as elementary and not the “best of the best” for Base64 string detection, it’s our quick start to prove a hypothesis that something is hiding in our network via Base64.
Breaking our Regex down
This is the lookback string for an equal sign that represents the start of a Base64 string. The reason it’s a look back is the decodeBase64 function needs a 4 byte string and the = sign doesn’t need to be extracted from the full string.
This is matching any sequence of letters and numbers occurring any number of times. This is likely where the most improvement can be made in my regex. Maybe on a sunnier day.
This matches two equal signs to show the end of a Base64 string.
Now that we have a regex, let’s test it and find lots of great matches for Base64 encoded strings.
Part One of our task is completed. We’ve built a Base64 string detector. Apply this function to a network data stream and now we are matching and displaying HTTP URI’s with Base64 inside of them.
Part Two is extracting these Base64 strings. It’s one thing to simply find them. The real trick for me was extracting only the Base64 string within the URI. The flexibility of JASK is perfection for this task and we can utilize Spark to write an extraction function. Let’s get to it! We define the variable pattern as our previous regex and build a function to extract the Base64 string that matches this pattern.
Now we are cooking with bacon! We have our getBase64 function for extracting Base64 strings registered as a UDF to use anywhere in our notebooks. Now we need a Base64 decoding function. I’m lazy today and it’s raining, let me see if there isn’t already a function for this. Got it! https://github.com/scalaj/scalaj-http. I’m going to import scalaj.http.Base64 and call it my lucky day. Remember we are being lazy hunters today, time to register this as a UDF.
Job done! Now I can call my getBase64 extraction function first and feed the results to our decodeBase64 function and it will return the Base64 decoded string. That’s it! Now let’s do this at MASSIVE SCALE!
The results are fun. We’ve found process tracking, device fingerprinting and plenty of ads pulling email addresses of users logged in. An Interesting (disgusting) way of user-id tracking. I also applied our function to the HTTP Cookie data and found a different set of fun findings, more interesting than you would expect, but I’m going to keep that between JASK and our affected customer.
Here’s a quick screenshot of raw results from the last day:
Rainy day threat hunting, testing a hypothesis and having fun. Lots of scrolling through results, which is exactly what I felt like doing on this lazy rainy day in San Francisco. I’ve committed this Base64 Decoding notebook to our JASK clusters for customers to take advantage of, so please come join us! I also converted some of what we’ve found to signals for our Ai to learn from. The holidays are almost over and I’m ready to go back to work!