You're facing a common challenge when parsing semi-structured logs like CEF, where key names themselves contain variable data. Google SecOps parsers have mechanisms to handle this, although it might require a combination of techniques.
Here's a breakdown of how to approach this, addressing your specific questions:
1. **Dynamic Field/Prefix Catching:**
* **No Explicit "Dynamic Field Type":** The Unified Data Model (UDM) has a fixed schema. You cannot dynamically create new UDM fields based on the key names in the raw log.
* **KV Filter:** The most promising approach for the CEF extension part (which is key-value like) is using the `kv {}` filter in your parser. This filter can parse strings containing key-value pairs.
2. **Parser Logic to Loop/Extract Variable Keys:**
* **Direct Looping on Keys (Limited):** Parsers have a `for` loop construct, but it's typically used for iterating over arrays *extracted* from JSON or split fields. Directly iterating over the *keys* of an arbitrary key-value string within the parser is less straightforward. The `kv {}` filter doesn't inherently loop and create UDM fields dynamically based on key names.
3. **Recommended UDM Strategy for Dynamic Data:**
* **`additional.fields`:** This is the primary place in UDM to store arbitrary key-value pairs that don't fit into the standard UDM schema. It's designed for this exact purpose. You can map your dynamic keys here. This field is a map type.
* **`about.labels` / `principal.labels` / `target.labels` etc.:** These are repeated fields of key-value pairs on various UDM nouns. Similar to `additional.fields`, they are suitable for vendor-specific or dynamic attributes.
* **Populating Labels/Additional Fields:** The most straightforward method within the parser is to map all the key-value pairs extracted from the CEF extension into one of these map fields.
* **Example Concept:**
```gcl
filter {
# Extract the CEF extension part into a field named 'extension'
grok {
match => { "message" => ".*CEF:0\\|.*?\\|.*?\\|.*?\\|.*?\\|.*?\\|.*?\\|(?P<extension>.*)" }
}
if [extension] != "" {
# Parse the key-value pairs from the extension field
kv {
source => "extension"
field_split => " " # Keys are separated by spaces
value_split => "=" # Key and value are separated by '='
target => "dynamic_attributes" # Store the result in a temporary field
}
# Merge all extracted key-value pairs into additional.fields
mutate {
merge => { "event.idm.read_only_udm.additional.fields" => "dynamic_attributes" }
}
}
}
```
This configuration will take all the key-value pairs found in the CEF extension string and put them into the `additional.fields` map in the UDM event.
* **Searchability:** Data stored in `additional.fields` and the various `*.labels` fields is fully searchable using UDM Search in Google SecOps. You can query based on the dynamic keys and their values. For example, you could search for `additional.fields["affected_assets.192.168.1.1.hosts"] = "some_host"`.
**Recommendations:**
1. **Use `additional.fields`:** The most direct method within the parser is to use the `kv` filter to extract all the dynamic keys from the CEF extension and merge them into the `event.idm.read_only_udm.additional.fields` map. This ensures all data is ingested and available for searching.
2. **External Pre-processing:** If you have control over the data pipeline before it reaches Google SecOps, you could use an intermediary tool to pre-process the logs. This tool could parse the dynamic keys and restructure them into a JSON format that's more easily mapped to specific UDM fields or a more structured representation within `additional.fields`.
The recommended approach is to use the `kv` filter and merge the results into `additional.fields`. This makes all the dynamic data accessible within Google SecOps.