Author: Ivan Ninichuck
The Definition of UEBA
User and Entity Behavior Analytics (UEBA) represents a fundamental shift from traditional, signature-based security monitoring. At its core, UEBA is an analytical approach designed to detect security threats by identifying anomalous or malicious behaviors, as opposed to relying solely on known threat indicators (like malware signatures or malicious IP addresses). It operates on the principle of "knowing normal to find abnormal." The system ingests vast amounts of data from diverse sources—such as logs from endpoints, networks, servers, and cloud applications—to build a comprehensive, dynamic baseline of normal behavior for every single user (e.g., employees, contractors) and entity (e.g., servers, workstations, cloud workloads, service accounts) within the environment. Detections are triggered when a user or entity significantly deviates from this established pattern. This makes UEBA exceptionally powerful for identifying threats that evade traditional rules, such as insider threats (whether malicious or accidental), compromised accounts, and the subtle, low-and-slow activities of advanced persistent threats (APTs).
Methods of UEBA
To accomplish this, UEBA employs a sophisticated, multi-layered analytical engine that doesn't rely on a single technique. These methods work in concert to build a high-fidelity risk profile for each user and entity, moving from broad anomaly detection to highly specific threat identification. Within the Google SecOps platform, these methods can be broadly understood as two complementary pillars. The first is the large-scale statistical analysis required to define "normal" and flag outliers across billions of events. The second is the application of curated, intelligence-driven rules that look for specific behaviors corresponding to known adversary Tactics, Techniques, and Procedures (TTPs). By blending these two approaches, the system can surface both completely novel anomalies and known attack patterns, providing analysts with context-rich alerts that are far more actionable than simple statistical deviations alone.
Statistical Analysis of Data
The foundation of any UEBA system is its ability to perform statistical analysis at scale. This method involves building a dynamic, multi-dimensional profile for every user and entity, which is continuously updated as new data is ingested. This profile, or "baseline," models many different variables: What are the typical hours this user works? What geographic locations do they normally log in from? What assets and servers do they typically access? What is the normal volume of data they upload or download? What processes are common on their workstation? When new activity occurs, the UEBA engine compares it against this learned baseline. An alert is triggered when an event or a combination of events is determined to be a significant statistical outlier (e.g., a "3-sigma event"). For example, a user who has never used PowerShell, suddenly logging in at 3:00 AM from a new country, and running obfuscated commands on a high-value server would be a massive statistical anomaly, instantly flagged for review even if no single part of that action matched a predefined "bad" signature.
How Baselines are Built: Metrics Functions
The statistical baselines described above are built and queried using a powerful set of Metrics Functions. These are the technical tools within detection rules that allow an analyst to access and aggregate historical data for any user or entity. They are the mechanism that lets a rule compare a new, incoming event against "what's normal" for that specific entity.
These functions work by defining two key time-based parameters:
- period: This defines the "slice" of time for each data point (e.g., 1h for hourly, 1d for daily).
- window: This defines the total historical look-back window (e.g., 30d for 30 days of history).
When writing a rule, you query a pre-calculated metric for an entity. The five key metric types available are:
- event_count_sum: The total count of matching events (e.g., number of successful logins).
- first_seen: The timestamp when this activity was first observed. This is ideal for "first time" detections (e.g., "first time user has run this process").
- last_seen: The timestamp when this activity was last observed.
- value_sum: A sum of a numerical value, typically used for measuring data volumes (e.g., total bytes uploaded).
- num_unique_filter_values: A count of unique items (e.g., number of unique applications a user accessed).
Finally, you apply an Aggregation Method to this data to get a single, comparable value from the historical window. This allows you to ask precise questions:
- avg: What is the average number of failed logins per day?
- max / min: What is the maximum or minimum data a user has ever uploaded in a single hour?
- sum: What is the total number of logins over the last 30 days?
- stddev: What is the standard deviation of this activity? This is critical for finding true statistical outliers (e.g., "alert if today's activity is 3 standard deviations above the average").
- num_metric_periods: How many days in total has this user performed this activity? This helps distinguish between a truly new event (value is 0) and an infrequent one.
By combining these functions, a detection engineer can translate a broad concept like "unusual login" into a specific, high-fidelity rule like: "Alert if event_count_sum of successful logins in a 1h period is greater than the avg + (3 * stddev) calculated over a 30d window."
Putting It All Together: A Real-World UEBA Rule Example
Let's examine a real, curated UEBA detection rule to see how these concepts work in practice.
Rule Name: Anomalous Inbound Bytes by Hostname
rule ueba_anomalous_inbound_bytes_by_device_hostname {
meta:
rule_name = "Anomalous Network Bytes Inbound By Hostname"
description = "Detects anomalous (2 standard deviations away from the historical average, if the entity has a coefficient of variation < 0.1 and was observed for at least 9 of the last 30 days) network bytes inbound by device hostname. This may be indicative of potential Exfiltration behavior."
severity = "Low"
tactic = "TA0010"
events:
$e.metadata.event_type = "NETWORK_CONNECTION"
$e.network.received_bytes > 0
$hostname = $e.principal.asset.hostname
$hostname != ""
optimization.sample_rate($e.metadata.id, 1, 25)
$principal_ns = $e.principal.namespace
match:
$hostname, $principal_ns over 24h
outcome:
$risk_score = 35
$event_count = count_distinct($e.metadata.id)
$usage_past_24h = sum(25.0 * $e.network.received_bytes)
$num_stddevs_away = max(2)
$historical_stddev = max(metrics.network_bytes_inbound(
period:1d, window:30d,
metric:value_sum, agg:stddev,
principal.asset.hostname:$hostname,
principal.namespace:$principal_ns))
$historical_avg = max(metrics.network_bytes_inbound(
period:1d, window:30d,
metric:value_sum, agg:avg,
principal.asset.hostname:$hostname,
principal.namespace:$principal_ns))
$historical_max = max(metrics.network_bytes_inbound(
period:1d, window:30d,
metric:value_sum, agg:max,
principal.asset.hostname:$hostname,
principal.namespace:$principal_ns))
$historical_observations = max(metrics.network_bytes_inbound(
period:1d, window:30d,
metric:value_sum, agg:num_metric_periods,
principal.asset.hostname:$hostname,
principal.namespace:$principal_ns))
$historical_coefficient_of_variation = max(
metrics.network_bytes_inbound(
period:1d, window:30d,
metric:value_sum, agg:stddev,
principal.asset.hostname:$hostname,
principal.namespace:$principal_ns) /
metrics.network_bytes_inbound(
period:1d, window:30d,
metric:value_sum, agg:avg,
principal.asset.hostname:$hostname,
principal.namespace:$principal_ns))
$observation_threshold = max(9)
$coefficient_of_variation_threshold = max(1/10)
$historical_threshold = max(
metrics.network_bytes_inbound(
period:1d, window:30d,
metric:value_sum, agg:avg,
principal.asset.hostname:$hostname,
principal.namespace:$principal_ns)+ 2 * metrics.network_bytes_inbound(
period:1d, window:30d,
metric:value_sum, agg:stddev,
principal.asset.hostname:$hostname,
principal.namespace:$principal_ns))
$vendor_name = array_distinct($e.metadata.vendor_name)
$product_name = array_distinct($e.metadata.product_name)
$result_time = min($e.metadata.event_timestamp.seconds)
$tmp1 = max(
if($e.security_result.action != "BLOCK" and $e.security_result.action != "UNKNOWN_ACTION", 2)
)
$tmp2 = max(
if($e.security_result.action = "BLOCK", 1)
)
$result = arrays.index_to_str(strings.split("attempted,failed,succeeded,succeeded"), $tmp1 + $tmp2)
condition:
$e and ($usage_past_24h > $historical_threshold) and ($historical_coefficient_of_variation < $coefficient_of_variation_threshold) and ($historical_observations >= $observation_threshold) and $event_count >= 100
options:
allow_zero_values = true
Intent: This rule aims to detect a potential data staging attack. Before an attacker exfiltrates large amounts of data, they often first collect and aggregate it on a single compromised machine. This "staging" activity would appear as a sudden, massive influx of inbound data to a device that normally doesn't receive that much.
Logic: A detection is generated if a device's total inbound data (using the value_sum of bytes) in a 24-hour period is more than 2 standard deviations (stddev) above its 30-day historical average (avg).
This logic directly applies the statistical methods we've discussed. Instead of a static threshold like "alert if > 10GB," which would be noisy, this rule builds a dynamic threshold customized for every single device. A 500MB influx might be an anomaly for a domain controller, but perfectly normal for a developer's workstation. The stddev calculation finds this distinction automatically.
Example: How the Standard Deviation Threshold Works
Let's say a server has been observed for 9 days (meeting the data sufficiency filter) with the following daily inbound traffic:
[100, 105, 95, 100, 110, 90, 100, 105, 95] (all in MB).
- Calculate the Mean (Average):
- Sum: 100 + 105 + 95 + 100 + 110 + 90 + 100 + 105 + 95 = 900 MB
- Mean: 900 MB / 9 days = 100 MB (This is the historical_avg)
- Calculate the Variance:
- First, find the sum of the squared differences from the mean:
- (100-100)² + (105-100)² + (95-100)² + (100-100)² + (110-100)² + (90-100)² + (100-100)² + (105-100)² + (95-100)²
- 0 + 25 + 25 + 0 + 100 + 100 + 0 + 25 + 25 = 300
- Variance: 300 / 9 = 33.33
- Calculate the Standard Deviation:
- Standard Deviation: sqrt(33.33) = 5.77 MB (This is the stddev)
- Set the Alerting Threshold:
- The rule triggers at Mean + (2 * StdDev).
- Threshold: 100 MB + (2 * 5.77 MB) = 100 MB + 11.54 MB = 111.54 MB
Result: The rule will now only fire an alert if this specific server receives more than 111.54 MB of inbound data in a 24-hour period. If on the 10th day it receives 115 MB, an alert is generated.
The Importance of Data Quality: Reducing False Positives
This rule's real power comes from two data quality filters that must be passed before the main logic is even applied. This is a critical best practice for reducing false positives.
- The Stability Filter (Coefficient of Variation): The rule only monitors devices with a Coefficient of Variation (CV) less than 0.1. The CV is a ratio of the standard deviation to the mean, which measures how "spiky" or "erratic" a device's traffic history is.
- Low CV (< 0.1): A "stable" device, like a server, whose traffic is consistent day-to-day. This rule monitors this device, because an anomaly here is highly suspicious.
- High CV (> 0.1): A "noisy" device, like a user's laptop, with erratic traffic (e.g., 50MB one day, 5GB the next). This rule ignores this device, as its "normal" is too chaotic to find meaningful statistical anomalies, preventing a flood of false-positive alerts.
Example: How the Coefficient of Variation (CV) Filter Works
Using the same server data from the previous example:
- Mean: 100 MB
- Standard Deviation: 5.77 MB
- Calculate the Coefficient of Varia`tion:
- CV = Standard Deviation / Mean
- CV: 5.77 MB / 100 MB = 0.0577
Result: The calculated CV of 0.0577 is less than 0.1. This proves the device's traffic is "stable" and predictable. Therefore, the device passes the filter and will be monitored by the rule.
- The Data Sufficiency Filter: The rule also requires that the device has been observed on at least 9 of the last 30 days. This ensures the system has enough historical data to build a reliable baseline. Trying to calculate an "average" from only one or two data points is statistically meaningless and would lead to bad detections.
In summary, the Coefficient of Variation acts as a gatekeeper to decide which devices to monitor, while the Standard Deviation acts as the trigger to decide when to alert on those devices.
TTP Level Detections
While statistical analysis is excellent at finding "the unknown," it can sometimes be noisy. TTP-level detections provide the crucial layer of intent and context. This method moves beyond simply asking, "Is this behavior unusual?" and instead asks, "Is this behavior unusual in a way that maps to a known adversary technique?" By leveraging security frameworks like the MITRE ATT&CK knowledge base, Google SecOps maintains a rich library of detection rules that look for specific, discrete behaviors associated with known attack tactics. For example, a statistical anomaly might just flag "unusual process execution." A TTP-level detection, however, would identify that a user ran a command associated with credential dumping (e.g., Mimikatz, T1003) or that a process is attempting to discover network shares (e.g., net view, T1087). These detections provide immediate, actionable context to the analyst. They explain why an anomaly is suspicious by linking it directly to a specific stage of an attack, such as initial access, defense evasion, or lateral movement. This fusion of statistical anomaly detection with intelligence-driven TTP rules allows Google SecOps to surface high-priority, contextualized threats from the noise.
Use Case: Detecting the "Low and Slow" Insider Threat
The concepts of statistical UEBA and TTP-based rules come together powerfully when applied to one of the most difficult security challenges: the insider threat. This can be a malicious employee, or an external attacker who has successfully stolen an employee's credentials.
The Insider Threat Challenge
The core problem with detecting an insider threat is that the attacker isn't "breaking in." They are already inside the network, using legitimate credentials and access. Their activity often looks "normal" at a glance. A rule that "blocks a malicious IP" is useless when the "attacker" is logging in from a trusted employee's laptop.
This is precisely where UEBA is essential. By building a deep baseline of what is "normal" for that specific employee, we can spot the subtle deviations that signal their account is being used for malicious purposes.
The Attack Scenario
Consider this "low and slow" data theft scenario:
- Initial Access: A disgruntled employee (User_A) uses social engineering to acquire the credentials of a colleague (User_B) who has legitimate access to a sensitive Google Cloud Storage bucket.
- Discovery & Staging: User_A (posing as User_B) logs in. To avoid detection, they operate during User_B's normal working hours. They access the sensitive bucket, compress the data, and rename the files to look benign (e.g., archive_logs.zip).
- Exfiltration: The attacker wants to avoid a single, large data transfer that would trigger an alert (like the Anomalous Inbound Bytes rule, but for outbound traffic). Instead, they split the compressed file into hundreds of small chunks.
- Defense Evasion: Over the next 30 days, a script exfiltrates these small chunks, one by one, using a VPN and tunneling the traffic over port 443 to make it look like legitimate HTTPS web traffic.
- Cleanup: Once the transfer is complete, the attacker "timestomps" (modifies the timestamps) the access logs for the storage bucket to hide the activity.
MITRE ATT&CK Mapping
This attack chain can be mapped directly to adversary TTPs:
- Initial Access: T1078 (Valid Accounts)
- Collection: T1560 (Archive Collected Data)
- Exfiltration: T1041 (Exfiltration Over C2 Channel) & T1029 (Scheduled Transfer)
- Defense Evasion: T1070.006 (Indicator Removal: Timestomp)
Detection Strategy: Composite Rules & UEBA
No single one of these actions is a high-fidelity "smoking gun." A user zipping a file is normal. A user accessing a bucket is normal. But when combined, they tell a story. This is the power of Composite Rules.
A robust detection strategy would chain these "weak signals" from both UEBA and TTP rules into a single, high-priority alert:
- Signal 1 (UEBA): Anomalous Login. A rule detects that User_B logged in from a new or "infrequent" device or geographic location (using the first_seen metric).
- Signal 2 (UEBA): Anomalous API Access. A rule detects that User_B is accessing the GCS bucket in an unusual pattern (e.g., num_unique_filter_values for files accessed is abnormally high, or last_seen shows activity on a file that hasn't been touched in a year).
- Signal 3 (TTP): Data Staging. A rule detects the execution of zip, 7z, or tar on a sensitive server (T1560).
- Signal 4 (TTP): Log Manipulation. A rule detects TTP T1070.006 when log file timestamps are modified.
A Composite Rule can be built to say: "Generate a high-priority alert if Signal 1 OR Signal 2 occurs for a user, AND is followed by Signal 3 AND/OR Signal 4 by the same user within 48 hours."
This composite approach provides a crucial complementary detection layer. A careful insider might successfully evade statistical detection by performing their attack slowly within their normal working hours. However, they cannot avoid the procedure of the attack itself (e.g., accessing data, staging it, and removing indicators). A composite rule looking for the TTP chain itself—even without statistical anomalies—can catch the most careful of malicious actors.
Conclusion: Your Path Forward with UEBA
This guide has walked through the core components of User and Entity Behavior Analytics, from its foundational definition to its real-world application in detecting complex insider threats. The key takeaway is that effective UEBA is not a single tool, but a layered strategy. It relies on the powerful combination of:
- Broad Statistical Analysis: To find the "unknown unknowns" by building dynamic baselines of normal behavior and flagging significant deviations.
- Specific TTP Detections: To find the "known bads" by identifying behaviors that map directly to attacker techniques.
- Composite Logic: To chain these statistical anomalies (weak signals) and TTP alerts (strong signals) into a single, high-fidelity detection that tells a clear story, as seen in the insider threat example.
By blending these methods, you move beyond simple, noisy alerting to a sophisticated detection posture capable of surfacing threats that would otherwise go unnoticed.
Recommended Next Steps
As you begin your Google SecOps UEBA adoption, here is a practical path forward:
- Start with Curated Content: Your first step should be to enable the curated UEBA detection rule packs provided by Google SecOps. Allow them to run and begin observing the alerts they generate. This will give you an immediate baseline of anomalous activity in your environment.
- Identify Your "Crown Jewels": Map out your most critical assets, data stores, and privileged users. These should be your initial focus for monitoring. Ask questions like, "What would a statistical anomaly on my domain controller look like?"
- Study the Metrics Functions: Use this guide and the product documentation to understand how the curated rules work. Look at their logic. This will build your team's confidence in the alerts and demystify the "magic" of the statistical engine.
- Think in Chains: As you investigate alerts, start thinking like the insider threat example. Don't just look at alerts in isolation. Begin identifying the weak signals and TTP alerts that, if combined, would point to a specific threat actor in your environment.
Plan for Custom Composite Rules: Your long-term goal should be to build your own Composite Rules that chain these signals together. A good starting point is to combine a UEBA anomaly (like "Anomalous Login") with a TTP alert (like "Data Staging") for your high-value asset groups.