From Big Data to Beautiful Data: Bridging the gap from Threat Hunter to C-Suite graphs with Zeppelin notebooks and D3

From Big Data to Beautiful Data: Bridging the gap from Threat Hunter to C-Suite graphs with Zeppelin notebooks and D3

In my previous posts we worked through a number of Threat Hunting queries and data mining ideas. In the end we left off with how to demonstrate and translate value to the C-Suite. This has lead me into the realm of presenting data in beautiful ways. At JASK, customers access big data with Zeppelin notebooks, but Zeppelin begs for better implementations of beautiful data, providing only a small number of graphing types. A pie chart and a bar chart are not going to cut the mustard when demonstrating value up the chain. Cue D3 (https://d3js.org/) and its infinite flexibility in displaying beautiful data.

Working on the cluster from one of our research sensors at a very large Tech University, we’ve written a function to parse Top Level Domains (The .com, .org, .net portion of a URL). Using the java.net.URL function we query our data for the TLD and search for suspicious TLDs in HTTP request headers. Here is the code where we apply our TLD UDF (spark) definition to the dataset.

 

pasted_image_at_2016_11_22_10_11_am_480

This query results in your standard big data row/table type of result. (Something an analyst might consume)

 

pasted_image_at_2016_11_22_10_18_am_480

Now it’s time to start the Beautiful Data transformation! (Something the C-Suite can consume)

Here we are printing html and javascript within a zeppelin notebook against json data output. Instead of staring at rows and columns of big data, beautiful data translates up the management stack and helps tell a clearer story of the threat hunter’s findings.

The write once and use forever concept, works wonderfully with Zeppelin + D3. In this example we graphed TLDs, but we could easily represent a different Threat Hunting dataset with this graphing method. Graphing makes it easy for everyone to see the most frequently visited TLD and the least frequented TLD and that’s the job of beautiful data. Once more we’ve applied the same TLD notebook to all of our customer’s clusters to experience their own Beautiful Data.

5

You May Also Like