Skip to main content

Time Travel in the SOC: Troubleshooting Late-Triggering YARA-L Rules

  • May 6, 2026
  • 0 replies
  • 36 views

dnehoda
Staff
Forum|alt.badge.img+16

Author: David Nehoda, Technical Solutions Consultant

 

The 11:47 Ticket

At 11:47 on a Tuesday, a detection engineer opens a P1. A YARA-L rule for AWS root console logins fired 14 hours after the event. The root account had logged in at 21:32 the previous night, performed IAM changes, and logged out. The rule was correct. The logic was tight. The tuning was sound. But the alert landed the next morning, long after the window to contain the session had closed.

The team spends two days on the wrong argument. The SIEM vendor gets blamed. The rule gets rewritten three times. An engineer drafts a. migration plan to a different detection platform. None of that is the problem.

The problem is two timestamps.

metadata.event_timestamp on that UDM event is 2025-10-13T21:32:04Z. metadata.ingested_timestamp is 2025-10-14T11:29:51Z. Delta: 13 hours, 57 minutes. That rules out the YARA-L engine entirely. The event arrived 14 hours late. What followed was a 60-second check of the AWS CloudTrail feed polling interval. Someone had set it to 12 hours during a rate-limit test six months earlier and never changed it back. Two clicks to restore the 5-minute interval. Rule fires within SLA the next time. Two days of engineering argument wasted.

This article is the deterministic way to rule out the wrong layer fast.
 

Executive Summary

 

Dimension

Undiagnosed Delay

Systematic Timestamp Audit

Detection Latency

6 to 24 hours (silent)

Under 5 minutes (near real-time)

Root Cause Isolation

Days to weeks of finger-pointing

Under 60 seconds via timestamp comparison

Engineering Waste

Weeks rewriting correct rules

Zero. Fix targets the actual bottleneck

Attacker Dwell

Extends linearly with delay

Bounded by detection plus response time

Breach Exposure

$4.5M+ average with extended dwell

Contained by rapid detection

 

Who this is for: Detection engineers, SOC analysts, and security architects who see YARA-L rules fire late in Google SecOps and need to isolate whether the problem is the log source, the ingestion pipeline, or the rule logic itself.
 

The Three Places Delay Lives

A late-triggering rule is a symptom, not a disease. The delay lives in exactly one of three layers:

  1. Origin Delay. The vendor or log source did not generate or deliver the logs in time.

  2. Ingestion Delay. The forwarder, feed, or parser pipeline held the log before it reached UDM.

  3. Evaluation Delay. The YARA-L rule engine itself is slow due to misconfigured match windows, expensive regex, or state explosion.

The diagnostic process is deterministic. Compare two timestamps, and the layer reveals itself.
 

Vendor Delivery Latency: The Table You Should Memorize

Before you debug anything, internalize what "on time" actually means. No SIEM configuration can make a log arrive faster than the source will send it.
 

Vendor / Source

Typical Delivery Latency

Worst Case

AWS CloudTrail (S3 polling)

5 to 15 minutes

30+ minutes during AWS service events

AWS CloudTrail (EventBridge)

Under 1 minute

2 to 3 minutes

O365 Management Activity API

5 to 30 minutes

Hours during Microsoft outages

Azure AD Sign-in Logs

2 to 15 minutes

30+ minutes

Azure Event Hub streaming

2 to 5 minutes

10 minutes

Okta System Log

1 to 5 minutes

15 minutes

GCP Cloud Audit Logs (Pub/Sub)

1 to 3 minutes

10 minutes

CrowdStrike Streaming API

Under 1 minute

5 minutes

Duo Admin API

2 to 10 minutes

30 minutes

Salesforce EventLogFile

Up to 24 hours (hourly tier)

24 hours (daily tier)

 

Read this carefully. If your YARA-L rule fires 15 minutes after an AWS CloudTrail event, that is inside AWS's own SLA. The SIEM is working. AWS took 15 minutes to write the log into the S3 bucket the feed polls. Rewriting the rule will not shrink that gap. Switching to EventBridge will.
 

Step 1: The Timestamp Interrogation

Every UDM event carries two timestamps. Comparing them isolates the delay in under a minute.
 

Timestamp

Set By

Meaning

metadata.event_timestamp

The log source

When the event physically occurred on the endpoint, server, or cloud API

metadata.ingested_timestamp

SecOps

When SecOps received, parsed, and indexed the log into UDM

 

Running the Comparison

  1. Open the SOAR case for the late-firing alert.

  2. Navigate to the underlying UDM event that triggered the detection.

  3. Inspect the raw JSON and extract both timestamps.

  4. Calculate ingested_timestamp - event_timestamp.

Interpreting the Delta

 

Delta

Diagnosis

Meaning

Minutes to hours

Origin or Ingestion Delay

The log arrived late. The YARA-L engine evaluated it promptly. The rule is innocent. The problem is upstream.

Seconds or near-zero

Evaluation Delay

The log arrived on time, but the YARA-L rule took hours to fire. The engine is the bottleneck. Match window, regex, or queue pressure.

Negative (event > ingested)

Clock Skew

The log source's system clock is ahead of UTC. Fix NTP on the source. Temporal correlation cannot be trusted until you do.

 

Most of the time, the delta is large. Most of the time, the rule is innocent. Believe the timestamps.

 

Step 2: Fixing Upstream Delays

If the interrogation proves the log arrived late, debug the pipeline, not the rule.

2A. Feed Polling Interval

For cloud-to-cloud API feeds (Okta, Azure AD, O365, AWS via S3), SecOps polls on a configured interval.

Diagnostic: Navigate to Settings > Feeds and inspect the polling interval for the affected log type.
 

Interval

Effect

When to Use

1 minute

Near real-time for API feeds

Critical identity telemetry (Okta, Azure AD)

5 minutes

Standard for most cloud feeds

Default for non-critical sources

15 minutes

Cost-optimized for high-volume, low-priority sources

Network flow logs, verbose audit logs

12 hours

Almost certainly a misconfiguration

Never appropriate for security telemetry

 

The most common mistake in the field: an engineer sets the interval to 12 hours during initial testing to avoid API rate limits, forgets to change it back, and the SOC operates with a 12-hour blind spot for months. The opening incident of this article is exactly that mistake.

2B. Switch from Polling to Streaming

For critical detections, stop polling. Subscribe.

 

Cloud

Polling Path

Streaming Path

Latency Improvement

AWS

CloudTrail > S3 > SecOps poll

CloudTrail > EventBridge > Lambda > Chronicle Ingestion API

15 min to under 1 min

Azure

Activity Log > Storage account poll

Event Hub > Chronicle Forwarder

15 min to 2-5 min

GCP

Cloud Audit > Storage poll

Cloud Audit > Pub/Sub > Chronicle

Already sub-minute, skip polling entirely

O365

Management Activity API poll

Event Hub via Microsoft Graph connector

30 min to 2-5 min

 

The economics usually favor streaming. The engineering cost of one EventBridge rule and a Lambda is recovered the first time a real incident is detected fast enough to contain.
 

2C. Overloaded Forwarder

If you run an on-prem Chronicle Forwarder, the host's resources directly impact ingestion latency.

Diagnostic signs:

  • Sustained CPU over 80%

  • Memory pressure causing swap usage

  • Disk I/O saturation on the syslog receiving buffer

  • Network saturation between forwarder and ingestion endpoint

Fix: Allocate more resources, or split the workload across multiple forwarders by log type. High-volume sources (firewall netflow, DNS queries) should not share a forwarder with low-volume, high-priority sources (identity logs, EDR alerts). One noisy neighbor can starve a critical feed.
 

Step 3: Fixing YARA-L Evaluation Delays

If logs arrived on time but the alert fired late, the rule engine is the bottleneck. Three common causes.

3A. The Sliding Window Trap

A rule with match: $target.ip over 24h forces the engine to maintain running state for every unique $target.ip across 24 hours. With millions of unique IPs, state management consumes massive memory and CPU, delaying evaluation for every rule on the tenant, not just the offending one.
 

Shrink match windows to the minimum viable timeframe for the actual attack pattern:
 

Attack Pattern

Appropriate Window

Why

Brute force to success

over 1h

The attack completes in minutes, not hours

Impossible travel

over 4h

Generous for international travel + VPN lag

Low-and-slow data exfiltration

over 24h

Justified. The attack is deliberately slow

Malware drop to execution

over 10m

Execution follows drop within seconds

Kerberoasting spray

over 1h

Spraying completes in minutes

 

Rule of thumb: If the attack completes in minutes, the match window should be minutes. A 24-hour window on a brute force rule wastes engine resources for zero detection benefit.

3B. Unanchored Regex (ReDoS)

Complex, unanchored regex applied to the full UDM dataset consumes exponential compute. The engine can throttle or disable the rule entirely.
 

// BAD: regex scans every single event in UDM
re.regex($e.target.process.command_line, `(?i).*invoke-mimikatz.*`)

// GOOD: filter narrows the dataset, regex evaluates a tiny subset
$e.metadata.event_type = "PROCESS_LAUNCH"
re.regex($e.target.process.command_line, `(?i).*invoke-mimikatz.*`)


Without the event_type filter, the regex runs against every UDM event including network connections, DNS queries, email transactions, and file creations, none of which will ever contain "invoke-mimikatz" but all of which must still be evaluated. The filter drops the evaluation surface by 95 percent or more.
 

Additional regex discipline:

  • Anchor patterns when possible. (?i)\\powershell\.exe$ is faster than (?i).*powershell.*.

  • Avoid nested quantifiers. (a+)+ causes catastrophic backtracking.

  • Use exact string operations when an exact match is sufficient. $e.target.process.file.full_path = "/usr/bin/curl" beats any regex.

Catch this before it ships. The YaraL Validator flags unanchored regex and missing event-type filters at commit time, not after a rule goes live and tanks tenant performance. Put it in CI.
 

3C. Tumbling Windows

YARA-L 2.0 Tumbling Windows segment data into fixed, non-overlapping intervals for deduplication. Unlike sliding windows (continuously evaluated), tumbling windows evaluate once at the end of each interval.
 

The trap: with a 1-hour tumbling window, an event arriving at 10:01 does not trigger an alert until 11:00 when the window closes.
 

Use Tumbling For

Use Sliding For

"Alert once per hour if >100 failed logins occur"

"Alert the moment the 10th failed login within 10 minutes arrives"

Aggregate statistics, deduplication

Real-time attack chain detection

Volumetric alerts with defined cadence

Any detection where time-to-fire matters

 

Default to sliding. Reach for tumbling only when batched aggregation is the actual requirement.
 

Proactive Monitoring: Stop Waiting for Analysts to Complain

Do not wait for a detection engineer to open a ticket. Build automated health monitoring.

Ingestion Latency Dashboard

Schedule a UDM query that calculates the average delta between event_timestamp and ingested_timestamp per log type. Alert the Detection Engineering channel if any log type's average exceeds your threshold (typical: 15 minutes).

Feed Health Check

Use the SecOps v1alpha/feeds API to programmatically verify each feed last polled within its expected interval. If a feed has not polled within 2x its configured interval, it is almost certainly broken. Page someone.

Rule Evaluation Monitor

Track the delta between ingested_timestamp and detection_timestamp. If a specific rule consistently exceeds 5 minutes for this delta, that rule needs performance optimization. Open a ticket automatically.
 

Production Deploy Checklist

Before calling a late-trigger incident closed:

  1. Timestamp interrogation run. Delta calculated. Root cause layer identified.

  2. If Origin/Ingestion: polling interval verified under 5 minutes for all identity and critical-auth feeds.

  3. If AWS is involved: EventBridge path considered for critical detections.

  4. If Evaluation: match window audited against actual attack tempo. Shrunk where justified.

  5. If Evaluation: every regex in the offending rule has an event_type or equivalent pre-filter.

  6. Ingestion latency dashboard live, alerting on 15-minute threshold breach per log type.

  7. Feed health API check scheduled, paging when a feed misses 2x its interval.

  8. Rule evaluation monitor live for all CRITICAL-severity detections.

  9. Runbook updated: "Every late alert ticket starts with the timestamp interrogation. Not with a rule rewrite."

  10. Post-incident: if a feed misconfiguration was the cause, infrastructure-as-code the feed definition so a human cannot set 12-hour polling by hand next time.
     

The Diagnostic Truth

Late-triggering rules are a three-variable equation.

  • Large delta: upstream delay. Fix the pipeline.

  • Zero delta: evaluation delay. Fix the rule.

  • Negative delta: clock skew. Fix NTP.

Every time a rule fires late, run the interrogation first. Most of the time, the engine is innocent and the delay lives in a vendor's API, a misconfigured feed, or an under-resourced forwarder. The rewrites and vendor-blame sessions you avoid are the real ROI.