Skip to main content

Hi,

I tried to write a simple rule using match section.

This is the rule - 

rule storage_bucket_creation_gcp {

meta:
description = "Identify the creation of new storage bucket in GCP"
severity = "Low"

events:
$create.metadata.event_type = "RESOURCE_CREATION"
$create.metadata.product_event_type = "storage.buckets.create"
$create.metadata.product_name = "Google Cloud Storage"
$create.metadata.vendor_name = "Google Cloud Platform"
$create.principal.user.userid = $test

// match:
// $test over 5m

outcome:
$bucket_name = $create.target.resource.name
$gcp_project_name = $create.target.cloud.project.name
$bucket_location = $create.target.location.name
$requesting_user = $create.principal.user.userid
$user_agent = $create.network.http.user_agent

condition:
$create
}

 

I got an error when I uncomment the match section.

The rule is working good if I work just with the match section without the outcome section.

The rule works fine whenI work just with the match section without the outcome section. 

The issue is when I use both.

What can I do to fix this?

Thank you !

The issue arises because the outcome section is trying to access the field $create.target.resource.name without proper aggregation when a match section is present.


Here's a breakdown of why:




  • Single-Event vs. Multi-Event Context: When a match section is used, the rule transitions from a single-event to a multi-event context. This fundamentally changes how variables are interpreted. In a single-event context, $create would represent a single event. However, with a match section, $create now represents a group of events within the specified time window.




  • Repeated Field Unnesting: The field $create.target.resource.name is not inherently a repeated field. However, in a multi-event context, it becomes effectively repeated because $create represents multiple events, potentially each with a different target.resource.name. This is due to a process called "repeated field unnesting," where Chronicle expands repeated fields to create individual rows for each element within those repeated fields.




  • Outcome Aggregation Requirement: In the outcome section, each variable assignment must result in a single value. However, with the match section and repeated field unnesting, $create.target.resource.name could represent multiple distinct values. Therefore, it must be aggregated to produce a single output for the $bucket_name outcome.




The Solution:


To fix this issue, you need to apply an aggregation function to $create.target.resource.name within the outcome section. Here are a few options:




  1. array_distinct: This will collect all unique bucket names within the matching events:


    outcome:
    $bucket_name = array_distinct($create.target.resource.name)



  2. count_distinct: This will count the number of distinct bucket names created within the time window:


    outcome:
    $bucket_name_count = count_distinct($create.target.resource.name)



  3. Conditional Aggregation: You could use a conditional to select a specific bucket name based on a criteria:


    outcome:
    $primary_bucket_name = max(if($create.target.resource.labels["primary"] = "true", $create.target.resource.name, ""))



Updated Rule:


Here's the updated rule with the array_distinct aggregation applied:


rule storage_bucket_creation_gcp {
meta:
description = "Identify the creation of new storage bucket in GCP"
severity = "Low"

events:
$create.metadata.event_type = "RESOURCE_CREATION"
$create.metadata.product_event_type = "storage.buckets.create"
$create.metadata.product_name = "Google Cloud Storage"
$create.metadata.vendor_name = "Google Cloud Platform"
$create.principal.user.userid = $test

match:
$test over 5m

outcome:
$bucket_name = array_distinct($create.target.resource.name)
$gcp_project_name = $create.target.cloud.project.name
$bucket_location = $create.target.location.name
$requesting_user = $create.principal.user.userid
$user_agent = $create.network.http.user_agent

condition:
$create
}

By applying aggregation in the outcome section, you address the ambiguity caused by repeated field unnesting and ensure each outcome variable has a single, defined value.


The issue arises because the outcome section is trying to access the field $create.target.resource.name without proper aggregation when a match section is present.


Here's a breakdown of why:




  • Single-Event vs. Multi-Event Context: When a match section is used, the rule transitions from a single-event to a multi-event context. This fundamentally changes how variables are interpreted. In a single-event context, $create would represent a single event. However, with a match section, $create now represents a group of events within the specified time window.




  • Repeated Field Unnesting: The field $create.target.resource.name is not inherently a repeated field. However, in a multi-event context, it becomes effectively repeated because $create represents multiple events, potentially each with a different target.resource.name. This is due to a process called "repeated field unnesting," where Chronicle expands repeated fields to create individual rows for each element within those repeated fields.




  • Outcome Aggregation Requirement: In the outcome section, each variable assignment must result in a single value. However, with the match section and repeated field unnesting, $create.target.resource.name could represent multiple distinct values. Therefore, it must be aggregated to produce a single output for the $bucket_name outcome.




The Solution:


To fix this issue, you need to apply an aggregation function to $create.target.resource.name within the outcome section. Here are a few options:




  1. array_distinct: This will collect all unique bucket names within the matching events:


    outcome:
    $bucket_name = array_distinct($create.target.resource.name)



  2. count_distinct: This will count the number of distinct bucket names created within the time window:


    outcome:
    $bucket_name_count = count_distinct($create.target.resource.name)



  3. Conditional Aggregation: You could use a conditional to select a specific bucket name based on a criteria:


    outcome:
    $primary_bucket_name = max(if($create.target.resource.labels["primary"] = "true", $create.target.resource.name, ""))



Updated Rule:


Here's the updated rule with the array_distinct aggregation applied:


rule storage_bucket_creation_gcp {
meta:
description = "Identify the creation of new storage bucket in GCP"
severity = "Low"

events:
$create.metadata.event_type = "RESOURCE_CREATION"
$create.metadata.product_event_type = "storage.buckets.create"
$create.metadata.product_name = "Google Cloud Storage"
$create.metadata.vendor_name = "Google Cloud Platform"
$create.principal.user.userid = $test

match:
$test over 5m

outcome:
$bucket_name = array_distinct($create.target.resource.name)
$gcp_project_name = $create.target.cloud.project.name
$bucket_location = $create.target.location.name
$requesting_user = $create.principal.user.userid
$user_agent = $create.network.http.user_agent

condition:
$create
}

By applying aggregation in the outcome section, you address the ambiguity caused by repeated field unnesting and ensure each outcome variable has a single, defined value.


Thank you for the detailed response!

However, I'm still facing the error with the new rule - 

validating intermediate representation: event values in a match window must be aggregated

 Thank you


Thank you for the detailed response!

However, I'm still facing the error with the new rule - 

validating intermediate representation: event values in a match window must be aggregated

 Thank you


I modified the rule to this, and it is now functioning properly - 

rule storage_bucket_creation_gcp {
meta:
description = "Identify the creation of new storage bucket in GCP"
severity = "Low"

events:
$create.metadata.event_type = "RESOURCE_CREATION"
$create.metadata.product_event_type = "storage.buckets.create"
$create.metadata.product_name = "Google Cloud Storage"
$create.metadata.vendor_name = "Google Cloud Platform"
$create.principal.user.userid = $test

match:
$test over 1h

outcome:
$bucket_name = array_distinct($create.target.resource.name)
$gcp_project_name = array_distinct($create.target.cloud.project.name)
$bucket_location = array_distinct($create.target.location.name)
$requesting_user = array_distinct($create.principal.user.userid)
$user_agent = array_distinct($create.network.http.user_agent)

condition:
$create
}

Is it okay to do it this way?

Thank you!!!

 

 


I modified the rule to this, and it is now functioning properly - 

rule storage_bucket_creation_gcp {
meta:
description = "Identify the creation of new storage bucket in GCP"
severity = "Low"

events:
$create.metadata.event_type = "RESOURCE_CREATION"
$create.metadata.product_event_type = "storage.buckets.create"
$create.metadata.product_name = "Google Cloud Storage"
$create.metadata.vendor_name = "Google Cloud Platform"
$create.principal.user.userid = $test

match:
$test over 1h

outcome:
$bucket_name = array_distinct($create.target.resource.name)
$gcp_project_name = array_distinct($create.target.cloud.project.name)
$bucket_location = array_distinct($create.target.location.name)
$requesting_user = array_distinct($create.principal.user.userid)
$user_agent = array_distinct($create.network.http.user_agent)

condition:
$create
}

Is it okay to do it this way?

Thank you!!!

 

 


Hi @Roni11,

The above looks great - nice work!. Since you're using a match statement, you're essentially saying 'group me events based on the variable 'test' (which stores the value in principal.user.userid) over a 1 hour period. Within the outcome section, you will need to aggregate any output. 

 

For example, lets say you have 2 events that occur in the match period (1 hour), *NOTE THAT THE RUN FREQUENCY WILL ALSO BE A FACTOR IN EVENT AGGREGATION*, and you have two different user_agents (within network.http.user_agent), the use of the function 'array_distinct', will imply that you want it to output unique user agents over the aggregated (group) events.

Kind Regards,

Ayman


I modified the rule to this, and it is now functioning properly - 

rule storage_bucket_creation_gcp {
meta:
description = "Identify the creation of new storage bucket in GCP"
severity = "Low"

events:
$create.metadata.event_type = "RESOURCE_CREATION"
$create.metadata.product_event_type = "storage.buckets.create"
$create.metadata.product_name = "Google Cloud Storage"
$create.metadata.vendor_name = "Google Cloud Platform"
$create.principal.user.userid = $test

match:
$test over 1h

outcome:
$bucket_name = array_distinct($create.target.resource.name)
$gcp_project_name = array_distinct($create.target.cloud.project.name)
$bucket_location = array_distinct($create.target.location.name)
$requesting_user = array_distinct($create.principal.user.userid)
$user_agent = array_distinct($create.network.http.user_agent)

condition:
$create
}

Is it okay to do it this way?

Thank you!!!

 

 


Sorry. My previous response missed the aspect of the outcome section when a match section is used: all event fields used in the outcome section, even non-repeated ones, must be aggregated when a match section is present. This is because the introduction of a match section changes the context from single-event to multi-event, causing all event fields to become effectively repeated.


Here's a more detailed explanation:


Multi-Event Context:



  • When a match section is used, the rule engine groups events based on the specified match variable and time window. Within each group (or "match window"), there can be multiple events.

  • Due to this grouping, any reference to an event variable in the outcome section now represents a set of values—one value from each event within the group.

  • Even fields that are not explicitly marked as "repeated" in the UDM schema become effectively repeated in a multi-event context. For example, $create.target.cloud.project.name might seem like a single value, but because $create now represents a group of events, it's possible for different events within that group to have different project names.


Repeated Field Unnesting:



  • As the previous response mentioned, Chronicle's rule engine performs "repeated field unnesting." In a multi-event context, this unnesting is applied to all event fields, not just those explicitly marked as repeated.

  • This unnesting process essentially expands the events within the match window, creating separate rows for each possible combination of values from all the fields. This leads to every field being treated as if it were a repeated field.


Why my original example didn't work:



  • The original fix only addressed the $create.target.resource.name field, which was explicitly recognized as potentially having multiple values due to its nature as a potentially dynamic field within a bucket creation event.

  • However, I missed the fact that all event fields referenced in the outcome section needed aggregation because of the multi-event context and the unnesting process.


Why Your Rule Works:



  • You correctly applied the array_distinct function to all event fields used in the outcome section: $bucket_name, $gcp_project_name, $bucket_location, $requesting_user, and $user_agent.

  • By aggregating each field, you ensure that the outcome section only assigns a single value to each output variable, even if the underlying event data contains multiple values within the 1-hour match window.

  • This satisfies the rule engine's requirement that "event values in a match window must be aggregated" and allows the rule to function correctly.


When a match section is present in a YaraL-2 rule, remember that all event fields used in the outcome section, even non-repeated fields, must be aggregated. This is due to the multi-event context and repeated field unnesting, which effectively treat all fields as repeated within a match window. Your updated rule correctly addresses this by applying aggregation to all relevant fields. Sorry for the miss!


Reply