Skip to main content

Hi everyone,

I’m working on building native dashboards in Google SecOps and need to create a specific visualization. I have an SQL query that provides the required data, but I’m unsure how to achieve the same result using a YARA-L query in Google SecOps.

Pseudo SQL Query:

WITH user_table AS (
SELECT ARRAY_TO_STRING(ARRAY_AGG(DISTINCT user)) AS users
FROM events
GROUP BY id, destination_ip
)

SELECT AVG(STRING_LENGTH(users)) FROM user_table;

For example, my dataset looks like this:

iddestination_ipuser
1dest_ip_1user_1
2dest_ip_2user_2
1dest_ip_1user_3
2dest_ip_2user_4

After executing with clause SQL query, I get the following transformed table:

iddestination_ipusers
1dest_ip_1user_1, user_3
2dest_ip_2user_2, user_4

Next, the original query calculates the average length of the users field.

Calculation:

result = (length(user_1,user_3) + length(user_2,user_4)) / 2
= (13+13) / 2
= 13

Is there a way to achieve this in a Google SecOps native dashboard using a YARA-L query?

Any suggestions or reference documentation would be greatly appreciated.

Thanks,
Prashant Nakum

Hello @prashant_nakum , could you elaborate what is the final goal behind this ? I don't understand the part where you calculate the average length of the whole string concatenating all usernames joined by a comma interacting with a single IP addr. Maybe there is a simpler way to implement your usecase.

Anyway, from your example, I assume that the id column is the id of a given dest_ip.

Here is what you can try, or at least start from in NAtive Dashboards :

 

 

events:
// Try to refine the event filter by focusing on some well-defined log_types, otherwise the query will be quite resource intensive...
//$e.metadata.log_type = /^(?:LOGTYPE1|LOGTYPE2|LOGTYPE3)$/
$e.metadata.log_type = "ZSCALER_WEBPROXY"

// Define placeholder vars
$user = group($e.principal.user.userid, $e.target.user.userid)
$dest_ip = $e.target.ip

$user != ""
$dest_ip != ""

// This will agg events for your whole time frame by destination IP
match:
$dest_ip

// You can then compute various calculations in the outcome section
outcome:
// The distinct count of users by dest_ip
$distinct_users_cnt = count_distinct($user)
// The list of each distinct user ("users" column of the intermediate table in your example)
$distinct_users = array_distinct($user)
// The total length of all concatenated distinct usernames joined by a comma as in your example
$len_of_all_usernames = strings.length(arrays.join_string($distinct_users, ","))

order:
$distinct_users_cnt asc

 

 

From my understanding, you cannot implement your exact use case in Native Dashboards since there is 2 levels of aggregations :

  1. The distinct list of users by dest_ip
  2. The calculation of the average length

Yara-L supports only 1 layer of aggregation (please correct me if I'm wrong i'm interested in this).
Also, keep in mind that Native Dashboards tables return only 1000 results so GL performing this kind of stats over large timeframes.


@chrisd2 thank you for your post and response


Hello @prashant_nakum , could you elaborate what is the final goal behind this ? I don't understand the part where you calculate the average length of the whole string concatenating all usernames joined by a comma interacting with a single IP addr. Maybe there is a simpler way to implement your usecase.

Anyway, from your example, I assume that the id column is the id of a given dest_ip.

Here is what you can try, or at least start from in NAtive Dashboards :

 

 

events:
// Try to refine the event filter by focusing on some well-defined log_types, otherwise the query will be quite resource intensive...
//$e.metadata.log_type = /^(?:LOGTYPE1|LOGTYPE2|LOGTYPE3)$/
$e.metadata.log_type = "ZSCALER_WEBPROXY"

// Define placeholder vars
$user = group($e.principal.user.userid, $e.target.user.userid)
$dest_ip = $e.target.ip

$user != ""
$dest_ip != ""

// This will agg events for your whole time frame by destination IP
match:
$dest_ip

// You can then compute various calculations in the outcome section
outcome:
// The distinct count of users by dest_ip
$distinct_users_cnt = count_distinct($user)
// The list of each distinct user ("users" column of the intermediate table in your example)
$distinct_users = array_distinct($user)
// The total length of all concatenated distinct usernames joined by a comma as in your example
$len_of_all_usernames = strings.length(arrays.join_string($distinct_users, ","))

order:
$distinct_users_cnt asc

 

 

From my understanding, you cannot implement your exact use case in Native Dashboards since there is 2 levels of aggregations :

  1. The distinct list of users by dest_ip
  2. The calculation of the average length

Yara-L supports only 1 layer of aggregation (please correct me if I'm wrong i'm interested in this).
Also, keep in mind that Native Dashboards tables return only 1000 results so GL performing this kind of stats over large timeframes.


+1 to @chrisd2 response. I started working on this as well, and came up with something similar. Shared below.



 

The part that is not straightforward at the moment is the final result calculation. I'll bounce that around internally to see if we can find a solution. Hopefully these ideas get you moving in the right direction in the meantime.

 

$user = principal.user.userid
$user != ""
$destination_ip = target.ip
$destination_ip != ""

match:
$destination_ip

outcome:
$users = array_distinct($user)
$num_users = count_distinct($user)
$len_users = strings.length(arrays.join_string($users, ""))

order:
$num_users desc

 




Hello @prashant_nakum , could you elaborate what is the final goal behind this ? I don't understand the part where you calculate the average length of the whole string concatenating all usernames joined by a comma interacting with a single IP addr. Maybe there is a simpler way to implement your usecase.

Anyway, from your example, I assume that the id column is the id of a given dest_ip.

Here is what you can try, or at least start from in NAtive Dashboards :

 

 

events:
// Try to refine the event filter by focusing on some well-defined log_types, otherwise the query will be quite resource intensive...
//$e.metadata.log_type = /^(?:LOGTYPE1|LOGTYPE2|LOGTYPE3)$/
$e.metadata.log_type = "ZSCALER_WEBPROXY"

// Define placeholder vars
$user = group($e.principal.user.userid, $e.target.user.userid)
$dest_ip = $e.target.ip

$user != ""
$dest_ip != ""

// This will agg events for your whole time frame by destination IP
match:
$dest_ip

// You can then compute various calculations in the outcome section
outcome:
// The distinct count of users by dest_ip
$distinct_users_cnt = count_distinct($user)
// The list of each distinct user ("users" column of the intermediate table in your example)
$distinct_users = array_distinct($user)
// The total length of all concatenated distinct usernames joined by a comma as in your example
$len_of_all_usernames = strings.length(arrays.join_string($distinct_users, ","))

order:
$distinct_users_cnt asc

 

 

From my understanding, you cannot implement your exact use case in Native Dashboards since there is 2 levels of aggregations :

  1. The distinct list of users by dest_ip
  2. The calculation of the average length

Yara-L supports only 1 layer of aggregation (please correct me if I'm wrong i'm interested in this).
Also, keep in mind that Native Dashboards tables return only 1000 results so GL performing this kind of stats over large timeframes.


@chrisd2 , @herrald thank you for the quick response!

My end goal is to create a single-value chart that displays the average user length.

As I mentioned earlier, the average user length is calculated by first grouping users by id and destination_ip, then calculating the average length of those grouped user strings.

This requires two levels of aggregation. Here’s an another example:

Sample Dataset:

iddestination_ipusers
1dest_ip_1user_1
2dest_ip_2user_2
1dest_ip_1user_1
2dest_ip_2user_4

After applying the SQL WITH clause (CTE), the intermediate result would be:

iddestination_ipusers
1dest_ip_1user_1
2dest_ip_2user_2, user_4

Then, we calculate the average length of the users field.

result = (length(user_1) + length(user_2,user_4)) / 2
= (6+13) / 2
= 9.5

Since this involves grouping first and then calculating the average, we clearly need two-level aggregation.

Is there any alternative approach to achieve this in YARA-L, or any suggestions would be greatly appreciated.

Regards,
Prashant Nakum


Reply