Handling Dynamic and Indexed CEF Keys in Google SecOps Parsers

Question

Hello Community,I am currently building a parser for a custom data source sending CEF logs via Syslog. I’ve encountered a structural issue with the keys in the CEF extension that I’m struggling to map to UDM.The Problem: The logs contain keys that are dynamic and non-static, making it impossible to create individual UDM field mappings for them. Specifically: IP-embedded Keys: I have fields like affected_assets..hosts and affected_assets..mac. Since the IP address is part of the key name itself, the potential number of unique keys is infinite. Indexed Keys: I also see fields like priority_events.1, priority_events.2, and so on, where the count varies with every log entry. The Challenge: In Google SecOps, it is not technically feasible to create thousands of unique UDM fields or write static regex mappings to cover every possible IP address or index number that might appear in a key name.My Questions: Does Google SecOps support a dynamic field type or a specific mechanism to "catch" all keys matching a certain prefix (e.g., anything starting with affected_assets.)? Can the parser logic loop through the CEF extension to identify and extract these variable keys automatically? What is the recommended UDM strategy for storing this type of dynamic data so that it remains searchable without violating a fixed schema? I’d appreciate any insights or examples of how you have handled these types of dynamic keys in your own parsers.Thanks in advance!

hzmndt · Accepted Answer

You're facing a common challenge when parsing semi-structured logs like CEF, where key names themselves contain variable data. Google SecOps parsers have mechanisms to handle this, although it might require a combination of techniques.

Here's a breakdown of how to approach this, addressing your specific questions:

1. **Dynamic Field/Prefix Catching:**
* **No Explicit "Dynamic Field Type":** The Unified Data Model (UDM) has a fixed schema. You cannot dynamically create new UDM fields based on the key names in the raw log.
* **KV Filter:** The most promising approach for the CEF extension part (which is key-value like) is using the `kv {}` filter in your parser. This filter can parse strings containing key-value pairs.

2. **Parser Logic to Loop/Extract Variable Keys:**
* **Direct Looping on Keys (Limited):** Parsers have a `for` loop construct, but it's typically used for iterating over arrays *extracted* from JSON or split fields. Directly iterating over the *keys* of an arbitrary key-value string within the parser is less straightforward. The `kv {}` filter doesn't inherently loop and create UDM fields dynamically based on key names.

3. **Recommended UDM Strategy for Dynamic Data:**

* **`additional.fields`:** This is the primary place in UDM to store arbitrary key-value pairs that don't fit into the standard UDM schema. It's designed for this exact purpose. You can map your dynamic keys here. This field is a map type.

* **`about.labels` / `principal.labels` / `target.labels` etc.:** These are repeated fields of key-value pairs on various UDM nouns. Similar to `additional.fields`, they are suitable for vendor-specific or dynamic attributes.

* **Populating Labels/Additional Fields:** The most straightforward method within the parser is to map all the key-value pairs extracted from the CEF extension into one of these map fields.

* **Example Concept:**
```gcl
filter {
# Extract the CEF extension part into a field named 'extension'
grok {
match => { "message" => ".*CEF:0\\|.*?\\|.*?\\|.*?\\|.*?\\|.*?\\|.*?\\|(?P<extension>.*)" }
}

if [extension] != "" {
# Parse the key-value pairs from the extension field
kv {
source => "extension"
field_split => " " # Keys are separated by spaces
value_split => "=" # Key and value are separated by '='
target => "dynamic_attributes" # Store the result in a temporary field
}

# Merge all extracted key-value pairs into additional.fields
mutate {
merge => { "event.idm.read_only_udm.additional.fields" => "dynamic_attributes" }
}
}
}
```
This configuration will take all the key-value pairs found in the CEF extension string and put them into the `additional.fields` map in the UDM event.

* **Searchability:** Data stored in `additional.fields` and the various `*.labels` fields is fully searchable using UDM Search in Google SecOps. You can query based on the dynamic keys and their values. For example, you could search for `additional.fields["affected_assets.192.168.1.1.hosts"] = "some_host"`.

**Recommendations:**

1. **Use `additional.fields`:** The most direct method within the parser is to use the `kv` filter to extract all the dynamic keys from the CEF extension and merge them into the `event.idm.read_only_udm.additional.fields` map. This ensures all data is ingested and available for searching.

2. **External Pre-processing:** If you have control over the data pipeline before it reaches Google SecOps, you could use an intermediary tool to pre-process the logs. This tool could parse the dynamic keys and restructure them into a JSON format that's more easily mapped to specific UDM fields or a more structured representation within `additional.fields`.

The recommended approach is to use the `kv` filter and merge the results into `additional.fields`. This makes all the dynamic data accessible within Google SecOps.

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded