Skip to main content

The following doc makes reference to the term "predicate".

https://cloud.google.com/chronicle/docs/detection/yara-l-2-0-syntax

For example:

  • In either case, the predicate is true if the string contains a substring that matches the regular expression provided. It is unnecessary to add .* to the beginning or at the end of the regular expression.
  • In the events section, list the predicates to specify the following...
  • Some function expressions return boolean value, which can be used as an individual predicate in the events section.
  • When any is used, the predicate is evaluated as true if any value in the repeated field satisfies the condition.

What is a "predicate." Can someone please break down this term for me in the context of YARA-L or point me to any resource that explains this concept.

Thank you!

Thanks for the question!


I've always interpreted predicates in the documentation you are referring to as a line of criteria which may not be the best way to describe it but that is the best I've come up with as I talk about building rules.


If you take the sample rule below, the documentation states


In the events section, list the predicates to specify the following:




  • What each match or placeholder variable represents




  • Simple binary expressions as conditions




  • Function expressions as conditions




  • Reference list expressions as conditions




  • Logical operators




Each line of criteria in the events section are what the docs refer to as predicates. We have logical operators separating a field with a term. The last line of the event section is associating a field to placeholder variable that is also being used as a match variable in the match section. This example doesn't have regex or functions or lists but the same concepts apply, each of these lines of criteria are predicates.


In the events section, all predicates are regarded as anded together by default.


Notice that we don't have AND or OR separating these lines in the event section. This is based on the above statement. We can use parenthesis and then AND/OR are required inside of those parenthesis.


The final example I will call out is in the condition section


List condition predicates for outcome variables here, joined with the keyword and or or, or preceded by the keyword not.


In the condition section we have the event variable of $conn because our rule needs to take into account the event predicates specified in the event section, but we also use outcome variable of total and largest bytes received as they are calculated in the outcome section but are condition predicates, again I think of this as criteria for the rule to trigger, that is both of these outcome variables must meet a threshold before firing.



rule zeek_network_connections_bytes_received {
meta:
author = "Google Cloud Security"

events:
$conn.metadata.event_type = "NETWORK_CONNECTION"
$conn.metadata.product_name = "Bro"
$conn.metadata.vendor_name = "Zeek"
$conn.metadata.product_event_type = "conn"
$conn.metadata.description = "SF - Normal establish & termination"
$conn.network.received_bytes > 0
$conn.principal.hostname = $hostname

match:
$hostname over 30m

outcome:
$largest_bytes_received = max($conn.network.received_bytes)
$smallest_bytes_received = min($conn.network.received_bytes)
$total_bytes_received = sum($conn.network.received_bytes)

condition:
$conn and $total_bytes_received > 1000000 and $largest_bytes_received < 8000
}

Hope this helps!



Thanks for the question!


I've always interpreted predicates in the documentation you are referring to as a line of criteria which may not be the best way to describe it but that is the best I've come up with as I talk about building rules.


If you take the sample rule below, the documentation states


In the events section, list the predicates to specify the following:




  • What each match or placeholder variable represents




  • Simple binary expressions as conditions




  • Function expressions as conditions




  • Reference list expressions as conditions




  • Logical operators




Each line of criteria in the events section are what the docs refer to as predicates. We have logical operators separating a field with a term. The last line of the event section is associating a field to placeholder variable that is also being used as a match variable in the match section. This example doesn't have regex or functions or lists but the same concepts apply, each of these lines of criteria are predicates.


In the events section, all predicates are regarded as anded together by default.


Notice that we don't have AND or OR separating these lines in the event section. This is based on the above statement. We can use parenthesis and then AND/OR are required inside of those parenthesis.


The final example I will call out is in the condition section


List condition predicates for outcome variables here, joined with the keyword and or or, or preceded by the keyword not.


In the condition section we have the event variable of $conn because our rule needs to take into account the event predicates specified in the event section, but we also use outcome variable of total and largest bytes received as they are calculated in the outcome section but are condition predicates, again I think of this as criteria for the rule to trigger, that is both of these outcome variables must meet a threshold before firing.



rule zeek_network_connections_bytes_received {
meta:
author = "Google Cloud Security"

events:
$conn.metadata.event_type = "NETWORK_CONNECTION"
$conn.metadata.product_name = "Bro"
$conn.metadata.vendor_name = "Zeek"
$conn.metadata.product_event_type = "conn"
$conn.metadata.description = "SF - Normal establish & termination"
$conn.network.received_bytes > 0
$conn.principal.hostname = $hostname

match:
$hostname over 30m

outcome:
$largest_bytes_received = max($conn.network.received_bytes)
$smallest_bytes_received = min($conn.network.received_bytes)
$total_bytes_received = sum($conn.network.received_bytes)

condition:
$conn and $total_bytes_received > 1000000 and $largest_bytes_received < 8000
}

Hope this helps!



Thanks, John for breaking it down.

I was wondering why not just use the word "criteria" as opposed to "predicate" to describe this concept.


One way to look at the events section in YARA-L is as a big Boolean expression that serves to select a base set (or sometimes sets) of events for the rule. Seen through that lens, each element in the events section is indeed a "predicate" in the sense that it can be interpreted as a Boolean expression that returns a value of true or false. When the rule is processed, all these predicates are combined together using Boolean logic (even when you see no Boolean operator between predicates there is an implicit AND). So, characterizing the elements as predicates is correct in a technical sense, but I certainly agree the wording could be clarified and the terminology simplified to serve a wider audience.


ah yes! Thanks so much, @herrald.

That makes a lot of sense. Even in JS, there is a concept of predicate functions that return either true or false.


Reply