Hello @prashant_nakum , could you elaborate what is the final goal behind this ? I don't understand the part where you calculate the average length of the whole string concatenating all usernames joined by a comma interacting with a single IP addr. Maybe there is a simpler way to implement your usecase.
Anyway, from your example, I assume that the id column is the id of a given dest_ip.
Here is what you can try, or at least start from in NAtive Dashboards :
events:
// Try to refine the event filter by focusing on some well-defined log_types, otherwise the query will be quite resource intensive...
//$e.metadata.log_type = /^(?:LOGTYPE1|LOGTYPE2|LOGTYPE3)$/
$e.metadata.log_type = "ZSCALER_WEBPROXY"
// Define placeholder vars
$user = group($e.principal.user.userid, $e.target.user.userid)
$dest_ip = $e.target.ip
$user != ""
$dest_ip != ""
// This will agg events for your whole time frame by destination IP
match:
$dest_ip
// You can then compute various calculations in the outcome section
outcome:
// The distinct count of users by dest_ip
$distinct_users_cnt = count_distinct($user)
// The list of each distinct user ("users" column of the intermediate table in your example)
$distinct_users = array_distinct($user)
// The total length of all concatenated distinct usernames joined by a comma as in your example
$len_of_all_usernames = strings.length(arrays.join_string($distinct_users, ","))
order:
$distinct_users_cnt asc
From my understanding, you cannot implement your exact use case in Native Dashboards since there is 2 levels of aggregations :
- The distinct list of users by dest_ip
- The calculation of the average length
Yara-L supports only 1 layer of aggregation (please correct me if I'm wrong i'm interested in this).
Also, keep in mind that Native Dashboards tables return only 1000 results so GL performing this kind of stats over large timeframes.
@chrisd2 thank you for your post and response
Hello @prashant_nakum , could you elaborate what is the final goal behind this ? I don't understand the part where you calculate the average length of the whole string concatenating all usernames joined by a comma interacting with a single IP addr. Maybe there is a simpler way to implement your usecase.
Anyway, from your example, I assume that the id column is the id of a given dest_ip.
Here is what you can try, or at least start from in NAtive Dashboards :
events:
// Try to refine the event filter by focusing on some well-defined log_types, otherwise the query will be quite resource intensive...
//$e.metadata.log_type = /^(?:LOGTYPE1|LOGTYPE2|LOGTYPE3)$/
$e.metadata.log_type = "ZSCALER_WEBPROXY"
// Define placeholder vars
$user = group($e.principal.user.userid, $e.target.user.userid)
$dest_ip = $e.target.ip
$user != ""
$dest_ip != ""
// This will agg events for your whole time frame by destination IP
match:
$dest_ip
// You can then compute various calculations in the outcome section
outcome:
// The distinct count of users by dest_ip
$distinct_users_cnt = count_distinct($user)
// The list of each distinct user ("users" column of the intermediate table in your example)
$distinct_users = array_distinct($user)
// The total length of all concatenated distinct usernames joined by a comma as in your example
$len_of_all_usernames = strings.length(arrays.join_string($distinct_users, ","))
order:
$distinct_users_cnt asc
From my understanding, you cannot implement your exact use case in Native Dashboards since there is 2 levels of aggregations :
- The distinct list of users by dest_ip
- The calculation of the average length
Yara-L supports only 1 layer of aggregation (please correct me if I'm wrong i'm interested in this).
Also, keep in mind that Native Dashboards tables return only 1000 results so GL performing this kind of stats over large timeframes.
+1 to @chrisd2 response. I started working on this as well, and came up with something similar. Shared below.
The part that is not straightforward at the moment is the final result calculation. I'll bounce that around internally to see if we can find a solution. Hopefully these ideas get you moving in the right direction in the meantime.
$user = principal.user.userid
$user != ""
$destination_ip = target.ip
$destination_ip != ""
match:
$destination_ip
outcome:
$users = array_distinct($user)
$num_users = count_distinct($user)
$len_users = strings.length(arrays.join_string($users, ""))
order:
$num_users desc
Hello @prashant_nakum , could you elaborate what is the final goal behind this ? I don't understand the part where you calculate the average length of the whole string concatenating all usernames joined by a comma interacting with a single IP addr. Maybe there is a simpler way to implement your usecase.
Anyway, from your example, I assume that the id column is the id of a given dest_ip.
Here is what you can try, or at least start from in NAtive Dashboards :
events:
// Try to refine the event filter by focusing on some well-defined log_types, otherwise the query will be quite resource intensive...
//$e.metadata.log_type = /^(?:LOGTYPE1|LOGTYPE2|LOGTYPE3)$/
$e.metadata.log_type = "ZSCALER_WEBPROXY"
// Define placeholder vars
$user = group($e.principal.user.userid, $e.target.user.userid)
$dest_ip = $e.target.ip
$user != ""
$dest_ip != ""
// This will agg events for your whole time frame by destination IP
match:
$dest_ip
// You can then compute various calculations in the outcome section
outcome:
// The distinct count of users by dest_ip
$distinct_users_cnt = count_distinct($user)
// The list of each distinct user ("users" column of the intermediate table in your example)
$distinct_users = array_distinct($user)
// The total length of all concatenated distinct usernames joined by a comma as in your example
$len_of_all_usernames = strings.length(arrays.join_string($distinct_users, ","))
order:
$distinct_users_cnt asc
From my understanding, you cannot implement your exact use case in Native Dashboards since there is 2 levels of aggregations :
- The distinct list of users by dest_ip
- The calculation of the average length
Yara-L supports only 1 layer of aggregation (please correct me if I'm wrong i'm interested in this).
Also, keep in mind that Native Dashboards tables return only 1000 results so GL performing this kind of stats over large timeframes.
@chrisd2 , @herrald thank you for the quick response!
My end goal is to create a single-value chart that displays the average user length.
As I mentioned earlier, the average user length is calculated by first grouping users by id and destination_ip, then calculating the average length of those grouped user strings.
This requires two levels of aggregation. Here’s an another example:
Sample Dataset:
id | destination_ip | users |
1 | dest_ip_1 | user_1 |
2 | dest_ip_2 | user_2 |
1 | dest_ip_1 | user_1 |
2 | dest_ip_2 | user_4 |
After applying the SQL WITH clause (CTE), the intermediate result would be:
id | destination_ip | users |
1 | dest_ip_1 | user_1 |
2 | dest_ip_2 | user_2, user_4 |
Then, we calculate the average length of the users field.
result = (length(user_1) + length(user_2,user_4)) / 2
= (6+13) / 2
= 9.5
Since this involves grouping first and then calculating the average, we clearly need two-level aggregation.
Is there any alternative approach to achieve this in YARA-L, or any suggestions would be greatly appreciated.
Regards,
Prashant Nakum