Skip to main content
Question

yara l rule for AWS provisioning activity from a previously unseen geographic region.

  • May 18, 2026
  • 4 replies
  • 67 views

NASEEF
Forum|alt.badge.img+9

Hello Everyone,
Greetings..!!

I am trying to build a Google SecOps detection similar to a Splunk behavioral rule for “AWS provisioning activity from a previously unseen geographic region.”

Current logic:

  • Detect CloudTrail provisioning actions such as Run* and Create*

  • Baseline source geolocation per activity eg : country,region ,ip ,city 

  • Alert when provisioning comes from a new/unseen location

The issue is with the baseline scale.

The current baseline stores combinations of:

  • source IP

  • city

  • country

  • region/state

Because of this, the baseline has grown to around 600,000 rows.

In Splunk, this was implemented using a CSV lookup with historical tracking, but in SecOps we are limited by the Data Table size limit (~1000 rows), so the approach is not scalable as-is.

I am looking for recommendations on how to create a rule that alerts on AWS CloudTrail Run* or Create* operations from a new source IP, city, country, or region.

The expected behavior is:

When a new IP/location combination is seen for the first time, the rule should trigger an alert. After that, the same IP, city, country, and region combination should be treated as already seen and should not trigger again.

 

Any guidance or architectural recommendations would be appreciated.
 

4 replies

hliu
Forum|alt.badge.img+2
  • Bronze 1
  • May 18, 2026

Per the documentation:
Maximum rows per data table: 10 million < the 600000 rows you’ve described.
Maximum data volume in a data table: 10 GB.


Also those 4 combinations involving specially source IP might expand quite a lot, remember to use table row TTL or implement some table cleanup logic.


NASEEF
Forum|alt.badge.img+9
  • Author
  • Silver 2
  • May 18, 2026

Hello Team,

I have created the data table and faced a few challenges while developing the rule above

The field principal.ip_geo_artifact.location.city was not being populated for any principal.ip in my instance, so I had to use $activity.principal.location.city instead. However, not all logs contain this field, so I used a coalesce logic to handle missing values. Because of this, I also had to modify the condition logic separately for cases where the city exists and where it does not.

Additionally, in SecOps, we do not have a separate field for region; instead, country and region are combined into a single field (country_or_region).

Additionally, I faced an issue while updating the lastSeen value. If the same event repeats, the rule correctly does not generate another alert, but I still need the lastSeen field in the data table to be updated with the latest occurrence time.

The requirement is to generate an alert whenever a new combination of IP, country, city, and region is observed performing a run or create event

could any one please help me the correct the logic

events:
    $activity.metadata.log_type = "AWS_CLOUDTRAIL" AND
    $activity.metadata.product_event_type = /^(run|Create)/ AND
    $src_ip = $activity.principal.ip
    $country = $activity.principal.ip_geo_artifact.location.country_or_region
    $city = strings.coalesce($activity.principal.location.city,"")
    $Region = $activity.principal.ip_geo_artifact.location.country_or_region

match:
    $src_ip,$country,$Region,$city over 10m

outcome:
 

    $IP_in_datatable = max(if($src_ip in %soc_datatable_previously_seen_provisioning_activity_src.sourceIPAddress,1,0))
    $country_in_datatable = max(if($country = %soc_datatable_previously_seen_provisioning_activity_src.Country,1,0))
    $Region_in_datatable = max(if($country = %soc_datatable_previously_seen_provisioning_activity_src.Region,1,0))
    $city_in_datatable = max(if($city = %soc_datatable_previously_seen_provisioning_activity_src.City,1,0))
    $city_exist = max(if($city != "",1,0))

    // Standard fields
    $event_time = array_distinct(timestamp.get_timestamp($activity.metadata.event_timestamp.seconds,"%Y-%m-%d %H:%M:%S %Z","GMT"))
    $first_seen = timestamp.get_timestamp(min($activity.metadata.event_timestamp.seconds), "%Y-%m-%d %H:%M:%S %Z", "GMT")
    $last_seen = timestamp.get_timestamp(max($activity.metadata.event_timestamp.seconds), "%Y-%m-%d %H:%M:%S %Z", "GMT")
    $firstseen_in_epoch = cast.as_string(min($activity.metadata.event_timestamp.seconds))
    $lastseen_in_epoch = cast.as_string(max($activity.metadata.event_timestamp.seconds))

    // Additional fields
    $source_ip = array_distinct($activity.principal.ip)

condition:
    $activity and ((not( $IP_in_datatable =1 and ($country_in_datatable =1 or $Region_in_datatable =1)) and $city_exist =0)
    or
    ($city_exist =1 and not( $IP_in_datatable =1 and $city_in_datatable =1 and ($country_in_datatable =1 or $Region_in_datatable =1) ) ))

export:
    %soc_datatable_previously_seen_provisioning_activity_src.write_row(
        Country: $country,
        Region: $Region,
        City:$city,
        sourceIPAddress:$src_ip,
        firstTime:$firstseen_in_epoch,
        lastTime: $lastseen_in_epoch
    )


hliu
Forum|alt.badge.img+2
  • Bronze 1
  • May 19, 2026

Those Splunk ES correlations rely on the |iplocation command, which leverages a proprietary 3rd party IP geolocation database file (by default, the free copy of MaxMind GeoLite2-City).
Because IP is by design a non-geographically aware protocol, Google must be doing something similar with its own proprietary IP geolocation database.
It also makes IP geolocation inherently imprecise, with accuracies ranging from a few kilometers to 1000 kilometers (at least in the case of MaxMind).

The referenced detection is based on a data set of cloud resource provisioning activities, listing the source/principal IP and its geolocation data.

I also don't see the “city” populated in my Cloudtrail logs (although region_latitude and longitude are available). Not sure if the missing “city” enrichment is a bug or a feature from Google.

But in any case, without knowing Google's, the accuracy of city level usually drops significantly since many IPs are registered to a provider's central hub rather than the user's specific location.

The rule's false positive rate would be directly linked to the accuracy of the IP geolocation database. Using the field “city” would definitely increase the FP rate.

So for the rule, which requires a minimum of 2 fields to function: ip and $place, I'd suggest for $place a coalesce or concat of (state, country_or_region), or simply choose one of the 2 fields.



Regarding the conflict of updating all lastSeen values vs alerting only on new ones,
I understand that in Yara-L "writing data to the data table must be the final action of the query." So it doesn't provide the flexibility of SPL's |outputlookup which can be placed anywhere in the query.

I don't know if there's a more elegant or better way to do it but if not, try using 2 rules: one to keep updated the datatable (the supporting rule), and the main rule using the datatable for exclusions in the event section (alerting rule).

Multi-stage queries or composite rules could be potential options also.

Just ideas, I haven't personally tested them using export datatable. Let us know if any of those worked!


NASEEF
Forum|alt.badge.img+9
  • Author
  • Silver 2
  • May 19, 2026

Hello Hliu,

As per my understanding, even if we create a new rule to update the “last time” field, it may not be possible to update only that single field. , see the error attached

From what I understand, SecOps requires the entire data table record to be updated rather than allowing modification of just one field. I’m also not sure whether the operation would update the existing row or create a completely new row instead.

Could you please confirm whether partial field updates are supported for data tables, or if full row replacement is required?



if possible could you please write a rule  to generate an alert whenever a new combination of IP, country, city, and region is observed performing a run or create event
assuming this as a datatable sample where we need to validate whether ip already seen or not