The issue arises because the outcome
section is trying to access the field $create.target.resource.name
without proper aggregation when a match
section is present.
Here's a breakdown of why:
Single-Event vs. Multi-Event Context: When a match
section is used, the rule transitions from a single-event to a multi-event context. This fundamentally changes how variables are interpreted. In a single-event context, $create
would represent a single event. However, with a match
section, $create
now represents a group of events within the specified time window.
Repeated Field Unnesting: The field $create.target.resource.name
is not inherently a repeated field. However, in a multi-event context, it becomes effectively repeated because $create
represents multiple events, potentially each with a different target.resource.name
. This is due to a process called "repeated field unnesting," where Chronicle expands repeated fields to create individual rows for each element within those repeated fields.
Outcome Aggregation Requirement: In the outcome
section, each variable assignment must result in a single value. However, with the match
section and repeated field unnesting, $create.target.resource.name
could represent multiple distinct values. Therefore, it must be aggregated to produce a single output for the $bucket_name
outcome.
The Solution:
To fix this issue, you need to apply an aggregation function to $create.target.resource.name
within the outcome
section. Here are a few options:
array_distinct
: This will collect all unique bucket names within the matching events:
outcome:
$bucket_name = array_distinct($create.target.resource.name)
count_distinct
: This will count the number of distinct bucket names created within the time window:
outcome:
$bucket_name_count = count_distinct($create.target.resource.name)
Conditional Aggregation: You could use a conditional to select a specific bucket name based on a criteria:
outcome:
$primary_bucket_name = max(if($create.target.resource.labels["primary"] = "true", $create.target.resource.name, ""))
Updated Rule:
Here's the updated rule with the array_distinct
aggregation applied:
rule storage_bucket_creation_gcp {
meta:
description = "Identify the creation of new storage bucket in GCP"
severity = "Low"
events:
$create.metadata.event_type = "RESOURCE_CREATION"
$create.metadata.product_event_type = "storage.buckets.create"
$create.metadata.product_name = "Google Cloud Storage"
$create.metadata.vendor_name = "Google Cloud Platform"
$create.principal.user.userid = $test
match:
$test over 5m
outcome:
$bucket_name = array_distinct($create.target.resource.name)
$gcp_project_name = $create.target.cloud.project.name
$bucket_location = $create.target.location.name
$requesting_user = $create.principal.user.userid
$user_agent = $create.network.http.user_agent
condition:
$create
}
By applying aggregation in the outcome
section, you address the ambiguity caused by repeated field unnesting and ensure each outcome variable has a single, defined value.
The issue arises because the outcome
section is trying to access the field $create.target.resource.name
without proper aggregation when a match
section is present.
Here's a breakdown of why:
Single-Event vs. Multi-Event Context: When a match
section is used, the rule transitions from a single-event to a multi-event context. This fundamentally changes how variables are interpreted. In a single-event context, $create
would represent a single event. However, with a match
section, $create
now represents a group of events within the specified time window.
Repeated Field Unnesting: The field $create.target.resource.name
is not inherently a repeated field. However, in a multi-event context, it becomes effectively repeated because $create
represents multiple events, potentially each with a different target.resource.name
. This is due to a process called "repeated field unnesting," where Chronicle expands repeated fields to create individual rows for each element within those repeated fields.
Outcome Aggregation Requirement: In the outcome
section, each variable assignment must result in a single value. However, with the match
section and repeated field unnesting, $create.target.resource.name
could represent multiple distinct values. Therefore, it must be aggregated to produce a single output for the $bucket_name
outcome.
The Solution:
To fix this issue, you need to apply an aggregation function to $create.target.resource.name
within the outcome
section. Here are a few options:
array_distinct
: This will collect all unique bucket names within the matching events:
outcome:
$bucket_name = array_distinct($create.target.resource.name)
count_distinct
: This will count the number of distinct bucket names created within the time window:
outcome:
$bucket_name_count = count_distinct($create.target.resource.name)
Conditional Aggregation: You could use a conditional to select a specific bucket name based on a criteria:
outcome:
$primary_bucket_name = max(if($create.target.resource.labels["primary"] = "true", $create.target.resource.name, ""))
Updated Rule:
Here's the updated rule with the array_distinct
aggregation applied:
rule storage_bucket_creation_gcp {
meta:
description = "Identify the creation of new storage bucket in GCP"
severity = "Low"
events:
$create.metadata.event_type = "RESOURCE_CREATION"
$create.metadata.product_event_type = "storage.buckets.create"
$create.metadata.product_name = "Google Cloud Storage"
$create.metadata.vendor_name = "Google Cloud Platform"
$create.principal.user.userid = $test
match:
$test over 5m
outcome:
$bucket_name = array_distinct($create.target.resource.name)
$gcp_project_name = $create.target.cloud.project.name
$bucket_location = $create.target.location.name
$requesting_user = $create.principal.user.userid
$user_agent = $create.network.http.user_agent
condition:
$create
}
By applying aggregation in the outcome
section, you address the ambiguity caused by repeated field unnesting and ensure each outcome variable has a single, defined value.
Thank you for the detailed response!
However, I'm still facing the error with the new rule -
validating intermediate representation: event values in a match window must be aggregated
Thank you
Thank you for the detailed response!
However, I'm still facing the error with the new rule -
validating intermediate representation: event values in a match window must be aggregated
Thank you
I modified the rule to this, and it is now functioning properly -
rule storage_bucket_creation_gcp {
meta:
description = "Identify the creation of new storage bucket in GCP"
severity = "Low"
events:
$create.metadata.event_type = "RESOURCE_CREATION"
$create.metadata.product_event_type = "storage.buckets.create"
$create.metadata.product_name = "Google Cloud Storage"
$create.metadata.vendor_name = "Google Cloud Platform"
$create.principal.user.userid = $test
match:
$test over 1h
outcome:
$bucket_name = array_distinct($create.target.resource.name)
$gcp_project_name = array_distinct($create.target.cloud.project.name)
$bucket_location = array_distinct($create.target.location.name)
$requesting_user = array_distinct($create.principal.user.userid)
$user_agent = array_distinct($create.network.http.user_agent)
condition:
$create
}
Is it okay to do it this way?
Thank you!!!
I modified the rule to this, and it is now functioning properly -
rule storage_bucket_creation_gcp {
meta:
description = "Identify the creation of new storage bucket in GCP"
severity = "Low"
events:
$create.metadata.event_type = "RESOURCE_CREATION"
$create.metadata.product_event_type = "storage.buckets.create"
$create.metadata.product_name = "Google Cloud Storage"
$create.metadata.vendor_name = "Google Cloud Platform"
$create.principal.user.userid = $test
match:
$test over 1h
outcome:
$bucket_name = array_distinct($create.target.resource.name)
$gcp_project_name = array_distinct($create.target.cloud.project.name)
$bucket_location = array_distinct($create.target.location.name)
$requesting_user = array_distinct($create.principal.user.userid)
$user_agent = array_distinct($create.network.http.user_agent)
condition:
$create
}
Is it okay to do it this way?
Thank you!!!
Hi @Roni11,
The above looks great - nice work!. Since you're using a match statement, you're essentially saying 'group me events based on the variable 'test' (which stores the value in principal.user.userid) over a 1 hour period. Within the outcome section, you will need to aggregate any output.
For example, lets say you have 2 events that occur in the match period (1 hour), *NOTE THAT THE RUN FREQUENCY WILL ALSO BE A FACTOR IN EVENT AGGREGATION*, and you have two different user_agents (within network.http.user_agent), the use of the function 'array_distinct', will imply that you want it to output unique user agents over the aggregated (group) events.
Kind Regards,
Ayman
I modified the rule to this, and it is now functioning properly -
rule storage_bucket_creation_gcp {
meta:
description = "Identify the creation of new storage bucket in GCP"
severity = "Low"
events:
$create.metadata.event_type = "RESOURCE_CREATION"
$create.metadata.product_event_type = "storage.buckets.create"
$create.metadata.product_name = "Google Cloud Storage"
$create.metadata.vendor_name = "Google Cloud Platform"
$create.principal.user.userid = $test
match:
$test over 1h
outcome:
$bucket_name = array_distinct($create.target.resource.name)
$gcp_project_name = array_distinct($create.target.cloud.project.name)
$bucket_location = array_distinct($create.target.location.name)
$requesting_user = array_distinct($create.principal.user.userid)
$user_agent = array_distinct($create.network.http.user_agent)
condition:
$create
}
Is it okay to do it this way?
Thank you!!!
Sorry. My previous response missed the aspect of the outcome
section when a match
section is used: all event fields used in the outcome
section, even non-repeated ones, must be aggregated when a match
section is present. This is because the introduction of a match
section changes the context from single-event to multi-event, causing all event fields to become effectively repeated.
Here's a more detailed explanation:
Multi-Event Context:
- When a
match
section is used, the rule engine groups events based on the specified match
variable and time window. Within each group (or "match window"), there can be multiple events.
- Due to this grouping, any reference to an event variable in the
outcome
section now represents a set of values—one value from each event within the group.
- Even fields that are not explicitly marked as "repeated" in the UDM schema become effectively repeated in a multi-event context. For example,
$create.target.cloud.project.name
might seem like a single value, but because $create
now represents a group of events, it's possible for different events within that group to have different project names.
Repeated Field Unnesting:
- As the previous response mentioned, Chronicle's rule engine performs "repeated field unnesting." In a multi-event context, this unnesting is applied to all event fields, not just those explicitly marked as repeated.
- This unnesting process essentially expands the events within the match window, creating separate rows for each possible combination of values from all the fields. This leads to every field being treated as if it were a repeated field.
Why my original example didn't work:
- The original fix only addressed the
$create.target.resource.name
field, which was explicitly recognized as potentially having multiple values due to its nature as a potentially dynamic field within a bucket creation event.
- However, I missed the fact that all event fields referenced in the
outcome
section needed aggregation because of the multi-event context and the unnesting process.
Why Your Rule Works:
- You correctly applied the
array_distinct
function to all event fields used in the outcome
section: $bucket_name
, $gcp_project_name
, $bucket_location
, $requesting_user
, and $user_agent
.
- By aggregating each field, you ensure that the
outcome
section only assigns a single value to each output variable, even if the underlying event data contains multiple values within the 1-hour match window.
- This satisfies the rule engine's requirement that "event values in a match window must be aggregated" and allows the rule to function correctly.
When a match
section is present in a YaraL-2 rule, remember that all event fields used in the outcome
section, even non-repeated fields, must be aggregated. This is due to the multi-event context and repeated field unnesting, which effectively treat all fields as repeated within a match window. Your updated rule correctly addresses this by applying aggregation to all relevant fields. Sorry for the miss!