Skip to main content

Welcome back to exploring metrics within Google SecOps. For those just tuning in, we’ve introduced the metric concept and used aggregations to build a detection in our previous blogs. Last time we focused on the metric of value_sum, which is used to create a sum of bytes in events across the metric time period and window. This is great when we are using metrics and working with bytes, but if we want to explore event counts or first and last seen, we need some additional metrics. That’s what we are going to explore today!


Let’s start by looking at our metric functions, which are found in the outcome section of our YARA-L rule. All metrics follow the same general layout. To unlock these additional metrics, we are going to modify the value next to metric: in the function and depending on the function, we may change the type of aggregation (the value next to agg: ) but we will get to that in due time.


We are going to change our example and rather than focus on network metrics, we’ll look at failed authentication attempts. Our example rule is gathering user login events where the action is not allowed. This generally aligns with the metrics we are working with. Much like our network examples, we are only focusing on a single hostname and we are aggregating our UDM events over the course of a day.


rule metric_examples_failed_authentication {

meta:
author = "Google Cloud Security"

events:
$login.metadata.event_type = "USER_LOGIN"
NOT $login.security_result.action = "ALLOW"
$login.principal.hostname = $hostname
$login.principal.hostname = "win-adfs.lunarstiiiness.com"

match:
$hostname over 1d

outcome:
$max_event_count_window = max(metrics.auth_attempts_fail(
period:1d, window:30d,
metric:event_count_sum,
agg:max,
principal.asset.hostname: $hostname
))
$min_event_count_window = max(metrics.auth_attempts_fail(
period:1d, window:30d,
metric:event_count_sum,
agg:min,
principal.asset.hostname: $hostname
))
$total_event_count = max(metrics.auth_attempts_fail(
period:1d, window:30d,
metric:event_count_sum,
agg:sum,
principal.asset.hostname: $hostname
))
$event_count_days_seen = max(metrics.auth_attempts_fail(
period:1d, window:30d,
metric:event_count_sum,
agg:num_metric_periods,
principal.asset.hostname: $hostname
))
$avg_event_count_window = max(metrics.auth_attempts_fail(
period:1d, window:30d,
metric:event_count_sum,
agg:avg,
principal.asset.hostname: $hostname
))
$stddev_event_count_window = max(metrics.auth_attempts_fail(
period:1d, window:30d,
metric:event_count_sum,
agg:stddev,
principal.asset.hostname: $hostname
))

condition:
$login
}

Our outcome section contains the same six aggregations we saw in the network metric example in our last blog, but this time our metric is set to metrics.auth_attempts_fail and our metric is set to event_count_sum which is going to count the events in each period that meet the criteria of the metric and then calculate a max, min, sum (total for the window), average, standard deviation and the number of days this activity is seen. Notice how our group by in the metric function is now principal.asset.hostname based on the placeholder value of $hostname.



When we test our rule, we get a series of metrics, much like our byte count example, but this time our results are associated with failed authentication events over the past 30 days and their statistical measures. Notice how our max and min changed over the window.


To put these metrics to work, let’s modify our rule a bit. We’ll start by broadening our rule to identify more than one host by using a regex expression that focuses the principal.hostname to  a specific domain. Otherwise the events and match sections remain the same.


rule metric_examples_failed_authentication {

meta:
author = "Google Cloud Security"

events:
$login.metadata.event_type = "USER_LOGIN"
NOT $login.security_result.action = "ALLOW"
$login.principal.hostname = $hostname
$login.principal.hostname = /\\.lunarstiiiness\\.com$/

match:
$hostname over 1d

outcome:
$max_event_count_window = max(metrics.auth_attempts_fail(
period:1d, window:30d,
metric:event_count_sum,
agg:max,
principal.asset.hostname: $hostname
))
$daily_failed_logins = count($login.metadata.event_type)

condition:
$login and $daily_failed_logins > $max_event_count_window
}

Our outcome section just contains our max metric function and an outcome variable that calculates a count of failed logins for the match window. It’s worth noting that there is additional tuning that can be performed on the events section to get higher fidelity in the detections, but as we familiarize ourselves with metrics, we will keep the tuning to a minimum.


Finally, our condition section contains an additional statement that will only trigger our detection when the current day’s failed login event count exceeds the maximum of the 30 day window. Essentially, we want to know when today is exceeding the most we’ve seen in the past 30 days. When we test our rule, we can see that two systems appear to have exceeded the prior 30 day maximum, in fact they both have done that at least twice in the past two weeks, so we may want to dig into those two systems to understand why we are seeing more excessive amounts of login failures.



The event_count_sum metric is pretty straightforward, particularly if you are already comfortable with value_sum which is what we used for bytes.


Now let’s look at a few additional metrics; first_seen and last_seen. These are fairly self explanatory based on their names but are bound to the time window. Their output will be in epoch format. That also means that if we have not seen an event that aligns to a metric within the 30 day window, the value would be 0. One additional note about this. When using the metric first_seen, the aggregation should be min and for last_seen, the aggregation should be max.


$first_seen_login_window = max(metrics.auth_attempts_success(
period:1d, window:30d,
metric:first_seen,
agg:min,
target.user.userid: $userid
))
$last_seen_login_window = max(metrics.auth_attempts_success(
period:1d, window:30d,
metric:last_seen,
agg:max,
target.user.userid: $userid
))

Notice when we test our rule, we get our detections with our epoch times in our outcome variables.



Let’s apply these concepts to a detection like first seen logon events. We will use the metric associated with successful authentications and generate a detection if this is the first time we are seeing that user login. Since the window is 30 days, we may have previously had a metric for first seen but if it is outside of that 30 day window, we won’t have one and therefore our result would be zero. It’s worth noting that in this example we are just looking at a metric that is based on the user login, not the user login on a specific system or a specific application, that’ll come later (how’s that for foreshadowing!)


The events section of our rule has criteria for successful login events and we added some additional criteria to remove system accounts and to handle user logins with mixed case versus lower case. This is the kind of tuning that you may find yourself doing with your rules, so hopefully these tips and tricks are helpful!


rule metric_examples_success_authentication {

meta:
author = "Google Cloud Security"

events:
$login.metadata.event_type = "USER_LOGIN"
$login.security_result.action = "ALLOW"
$login.target.user.userid != /\\$$/
strings.to_lower($login.target.user.userid) = $userid
$login.principal.hostname = /\\.lunarstiiiness\\.com$/
$login.target.user.company_name = "LunarS"

match:
$userid over 1d

outcome:
$first_seen_login_window = max(metrics.auth_attempts_success(
period:1d, window:30d,
metric:first_seen,
agg:min,
target.user.userid: $userid
))
$systems_accessed = array_distinct($login.principal.hostname)

condition:
$login and $first_seen_login_window = 0
}

Our metric function in the outcome section is calculating the first_seen metric for the target.user.userid. Additionally, we added another outcome variable to capture the systems the logins are associated with. Finally, in the condition section, we specified that our $first_seen_login_window value is 0, that is we have not seen a successful login from this user during our window before.



When we test our rule, we can see that both heather.glenn and heather.glenn_admin successfully logged into win-adfs and wrk-pacman for the first time within our 30 day window. If we haven’t seen a specific user logging into specific systems before, perhaps that suspicious behavior is something we need to look into.


This blog introduced three additional metrics to our function, event_count_sum, first_seen and last_seen. We were able to use these metrics to identify suspicious activity within authentication activities but remember there are more metrics that these concepts can be applied to. Our next blog will get into the final section of the metric function which is the grouping and filtering components.

Hello Jstoner,

could please help me create a yara rule to check for spike in blocked traffic from zscalar webproxy by using metrics to calculate average of last 30 days blocked traffic if daily block count exceeds that average i need an alert

 


thanks in advance


 


The metrics that are available are the ones listed in this doc link:


https://cloud.google.com/chronicle/docs/detection/metrics-functions#functions


As I look at that link, the alert events metric claims to be based on Google Workspace. This is NOT accurate and I opened a ticket to fix this and accurately reflect what it is, but it's really for EDR events like S1, CS, CB and Microsoft, so that really isn't a good choice for HTTP type events.


I think the best metric available based on what you are looking for is the metrics.http_queries metrics. There are 3, total, success and fail and are based on the response codes, failure being 400 and above success being under 400.


I also don't have zscalar data in my tenant so I've mocked this rule up using total and all network http events, and commented out some fields/values that you may want to include to tighten your rule to Zscaler but hopefully this provides something to get you going.


 



rule metric_example_http {

meta:
author = "Google Cloud Security"

events:
$net.metadata.event_type = "NETWORK_HTTP"
//$net.metadata.product_name = "NSS"
//$net.metadata.vendor_name = "Zscaler"
//$net.security_result.action = "BLOCK"
//$net.http.response_code >= 400
$net.principal.ip = $ip
$net.network.http.user_agent = $user_agent

match:
$ip, $user_agent over 1d

outcome
:
$avg_count_window = max(metrics.http_queries_total(
period:1d, window:30d,
metric:event_count_sum,
agg:avg,
principal.asset.ip: $ip, network.http.user_agent: $user_agent
))
/*$avg_count_window = max(metrics.http_queries_fail(
period:1d, window:30d,
metric:event_count_sum,
agg:avg,
principal.asset.ip: $ip, network.http.user_agent: $user_agent
)) */
$avg_count_window_x2 = $avg_count_window * 2
$daily_count = count($ip)

condition:
$net and $daily_count > $avg_count_window_x2
}


While I used the metric for total, i added a commented out fail metric that you can easily swap in. Finally I calculated an outcome variable that is 2x avg in addition to a daily count. Because you mentioned anything over the average, I am a little concerned that anything above average may get a bit noisy so I'm using 2x over average as a way to start getting closer to values that are further away from the mean. This is also something you fill find you need to tune, but hopefully this is a good start.


I covered some of these concepts in this blog:


https://www.googlecloudcommunity.com/gc/Community-Blog/New-to-Google-SecOps-Using-Metrics-in-YARA-L-Rules-Part-2/ba-p/726336


as part of the broader series on metrics, so hopefully this helps.


 


thanks a lot


Will the metrics comply with all the conditions specified in the events section?

Is the avg_count_window specific to each IP, and will it adhere to all the conditions we have specified in the events section? I need to whitelist some users, apps, etc., within the events section.

For example, if my rule is to get alerted if a user's ChatGPT access exceeds his average hourly usage based on below metrix , I need conditions such as whitelisting certain usersinside the rule. Will the average calculated via metrics meet all my conditions in the events section?

I believe the following represents the hourly average

 

 

$max_event_count_window = max(metrics.auth_attempts_fail(
period:1h, window:today,
metric:event_count_sum,
agg:max,
principal.asset.hostname: $hostname

 

 

i put condition as count of events> above hourly metrix

 

 


The metrics are independent of the events section so there will be tuning you need to do there. When it comes to the metric, there are additional filters that can be deployed in the metric on top of the dimensions, I talk about it here https://www.googlecloudcommunity.com/gc/Community-Blog/New-to-Google-SecOps-Using-Metrics-in-YARA-L-Rules-Part-4/ba-p/726359


Keep in mind the dimensions in the metric like 


  principal.asset.hostname: $hostname

basically says that we are using the dimension of hostname, so you do have a few controls to work with.


hello jstoner could you please check the below use case

on the below use case i am getting too many unknown for metrics value could u please explain the reason 

 

 

events:

$traffic.metadata.event_type = "NETWORK_CONNECTION" or  $traffic.metadata.event_type = "NETWORK_HTTP"

//Search for ZScaler logs

$traffic.metadata.vendor_name = "Zscaler"

   //Search for the specific blocked actions within ZScaler

$traffic.security_result.action = "BLOCK"

$traffic.network.http.response_code >= 400

        //Search for the specific type of actions within ZScaler

 ($traffic.additional.fields["urlclass"] = /advanced security.*/ nocase  or $traffic.security_result.category_details  = /.*misp.*/)

 (not $traffic.security_result.category_details = /adware/ nocase )

 

        //Filter out noisy users/applications/urls

not $traffic.principal.user.email_addresses  = /usera/ and not $traffic.principal.user.email_addresses = /userb/

not $traffic.principal.application = "appx" nocase

not $traffic.target.url = /google/

not $traffic.target.url = /microsoft/

 

//Create variable for the match

 

$traffic.principal.user.userid = $user_name

$traffic.principal.ip = $ip

$traffic.principal.ip[0] = $ip

 

$traffic.network.http.user_agent = $user_agent

 

       

 

    match:

$ip,$user_agent over 1d

outcome:

        $count_url = count_distinct($traffic.target.url)

        $avg_total_count_windows = max(metrics.http_queries_total(period:1d, window:30d, metric:event_count_sum, agg:avg, principal.asset.ip:$ip, network.http.user_agent:$user_agent))

        $avg_fail_count = max(metrics.http_queries_fail(period:1d, window:30d, metric:event_count_sum, agg:avg, principal.asset.ip:$ip, network.http.user_agent:$user_agent))

        $avg_fail_countx2 = $avg_fail_count*2

        $avg_total_count_windowsx2 = $avg_total_count_windows*2

        $daily_fail_count = count($ip)

 

    condition:

        #traffic>5 and $daily_fail_count> $avg_total_count_windowsx2

 

}


I don't have a firm final answer on this, but a couple of things I've noticed as I tested what you have above include the following:



  • If the metric is calling for a dimension like principal.asset.ip for example, I'd strongly suggest that be part of the criteria in the events section. If you use asset.ip for instance in the events section, you are potentially linking to a metric that doesn't include that same value and therefore, no join exists, so an unknown would be returned.

  • This also goes for extra event types that may not be populated with http response codes and http methods.


Trying the metric with a single dimension to troubleshoot is always a good spot to check as well.


I have also found it useful to troubleshoot using the metric of event_count_sum with the aggreagation of num_metric_periods which tell me how many non-0 periods I have in my window.


There may have been other things happening in the processing of these metrics which occur daily that perhaps caused something not to trigger. Support may be able to shed light on more in that regard.


I realize that doesn't fully answer your question but hopefully provides some steps that you can take.


Hello Jstoner,

Thank you for your reply. I have resolved the issue; however, I would like to highlight a small glitch in the tool. The tool is currently unable to calculate the division of zero.

When calculating the daily average, I first computed the total sum for the last 30 days and then divided it by 30, as the average metric was providing the average per period observed, not the daily average. In cases where no conditions were met in the last 30 days, the sum becomes zero and the average should be 0/30, resulting in a zero average. Instead, I am getting an unknown result.


hello, Im having some problems with the syntax of the function. when I try to write the "period" and "window" section, I get an error saying the ":" after those arguments is not correct. I've tried other combinations and nothing works.
Thank you


Currently there are two options for period and window, 1d and 30d as shown below and the other is 1h and today. The latter period and window can only be used if the longer metric window is included in the rule. I don't have a good idea on why you are getting an error without seeing it, but for reference here is my example.


If you have a more detailed syntax error message and the metric, that would be helpful.


$avg_total_count_windows = max(metrics.http_queries_total(
period:1d, window:30d,
metric:event_count_sum,
agg:avg,
principal.asset.ip:$ip, network.http.user_agent:$user_agent)
)

 


thank you for your help. I've used your example to show the problem:

in every documentation, I see the same syntax, like in your posts.
thank you for your help.


I tried to recreate this on my instance and I can't get the error to show up in the same spot despite removing different pieces of the rule. I don't know what entitlements the tenant you are working on has, so my first question would probably be to check with the support or account team that the UEBA entitlement is present for the instance. If that isn't the cause I would open a support ticket to investigate further as everything you have in the image above, I was able to mirror but with success.


Dear community,

I'm trying to adapt the published example to principal.user.userid instead hosname.

Goal: Get the average of authentication success per user per day for the last 30d. 

Result: The table is giving all the fields with zero if I try to follow the example provided and no timestamp on the table as well. 

Note: provided example is also no working on my deployment because I do not have hostname.

Code with just the average based on the provided example:

 

 

events:
$login.metadata.event_type = "USER_LOGIN"
$login.principal.user.userid = $user


match:
$user over 1d

outcome:
$avg_event_count_window = max(metrics.auth_attempts_success(
period:1d, window:30d,
metric:event_count_sum,
agg:avg,
principal.user.userid: $user
))

condition:
$login

 

 
Thanks in advance for the support.
 
Regards,
TS

 


 

hello tiagosantos

with the same query ideally you should get something like this ,daily success average per user for the past 30 days

 

zero means there is no successful attempts from the user for the past 30 days note: today is not included


Hi @NASEEF,

I'm doing a query and not a rule. Does it make any difference? 

I can buildup tables with the amount of authentications per day for the last 30days... as soon as I try to calculate average per day. No results. The query should be really simple but is not working. The query bellow gives results. 

 

 

 

events:
metadata.event_type = "USER_LOGIN"
security_result.action = "ALLOW"
$date_day = timestamp.get_date(metadata.event_timestamp.seconds)
$userid = principal.user.userid
match:
$userid, $date_day
outcome:
$auth_success_count = count(metadata.id)
$avg_event_count_window = max(metrics.auth_attempts_success(
period:1d, window:30d,
metric:event_count_sum,
agg:avg,
principal.user.userid: $userid
))

 

 

Can you post the query that you used? 

Thanks,

Tiago


I tried this on a stats UDM search as well, similar to how you did, but I'm getting the daily average instead.

 

 

I'm not sure I completely understand your question, but if it's the zero that's concerning you, it might simply mean that there haven’t been any successful authentications from that user in your SecOps data over the past 30 days.

Is this a new instance where you've just started ingesting logs?


It is a new instance and there are success authentications.

I can get a total of success authentications in the last 30 days per user per day. It's the first column.

The avg calculation per hour in the last 30 days I'm not able to make it work. [Also not the avg per day during the last 30d].   

 


Let me throw a few things out there and hopefully they will help get you going in the right direction. The first is that the metrics examples in the blog are all assuming rules. While we continue to converge capabilities in the rules and search engines, there are some differences so this will impact what you can do with each.



Second, make sure principal.user.userid is the field you want. In a Windows domain environment, I've found that target.user.userid may provide the domain login authentications so that may be something to look at.


Third, while the metric works in the search, you will get a singular value. We are continuing to extend the flexibility to do more statistical functions but with search you won't get a rolling average that changes on a daily basis like you would with rules.


 



Hi ​@jstoner. When using metrics in search, first seen and last seenfor example, is there a way to convert the epoch time that is displayed in the output table to a human-readable format?


metadata.event_type = "USER_LOGIN"
target.user.userid != /\$$/
target.user.userid = $user
match:
$user
outcome:
$first_seen = timestamp.get_timestamp(cast.as_int(max(metrics.auth_attempts_success(
period:1d, window:30d,
metric:first_seen,
agg:avg,
target.user.userid: $user
))))
condition:
$first_seen != "1970-01-01 00:00:00"

I used instead of principal but either would work, probably some more tuning to be done but the key is that you need to cast.as_int to get the time which is in float to integer so timestamp.get_timestamp can put it in a nice neat format. timestamp.get_timestamp can also be played around with to do more elaborate date/time formatting if you want. Then I added a condition at the bottom to get value that were 0 for whatever reason out. Depending how you mess with your time formats, this might have to change as well but just get the output first and then add the condition last of all to filter any noise.

 

 


Reply