Miniature Examples
We will avoid using complex loops in this set of examples to focus on demonstrating the mapping algorithm.
We will use the UDM field list https://cloud.google.com/chronicle/docs/reference/udm-field-list as reference for field types.
Recipe1: Placeholder without Subfields + Merge + Clear |
Used to quickly map several JSON objects into a hierarchy with multiple repeated fields. Map: $.system.ip_address to UDM UDM Schema: A candidate UDM field will be $.event.idm.read_only_udm.observer.ip UDM Schema Analysis: event(Repeated, composite) → idm(composite) → read_only_udm (composite) → observer(composite, non-repeated) → ip(Repeated, atomic string in the form of an IP Address). |
|
Recipe:
mutate {replace => {"temp" => "%{system.ip_address}"}} #temp ⇔ ip(repeated) So the placeholder variable represents the first repeated field in the left direction, i.e. temp ⇔ ip (Repeated)
mutate {replace => {"temp" => ""}}
|
|
"@output": t { "idm": { "read_only_udm": { "metadata": { "event_type": "GENERIC_EVENT" }, "observer": { "ip": "192.168.1.100" ] } } } } ], |
Recipe2 : Placeholder with Subfield + Merge + Clear |
Map: $.system.ip_address to UDM UDM Schema: A candidate UDM field will be $.event.idm.read_only_udm.observer.ip UDM Schema Analysis: event(Repeated) → idm(composite) → read_only_udm (composite) → observer(composite, non-repeated) → security_result (Repeated, composite) → rule_set (non-repeated atomic string) |
|
Recipe:
mutate {replace => {"temp.rule_set" => "%{system.ip_address}"}} So here temp ⇔ security_result and temp.rule_set ⇔ security_result.rule_set
mutate {merge => {"event.idm.read_only_udm.observer.security_result" => "temp"}} mutate {replace => {"temp" => ""}}
|
|
"@output": g { "idm": { "read_only_udm": { "metadata": { "event_type": "GENERIC_EVENT" }, "observer": { "security_result": { "rule_set": "192.168.1.100" } ] } } } } ], |
Extension: Adding More Sub-fields |
Map: In addition to $.system.ip_address, map also $.system.hostname to UDM UDM Schema: Suggested Mapping event.idm.read_only_udm.observer.security_result.rule_set ← $.system.ip_address event.idm.read_only_udm.observer.security_result.summary ← $.system.hostname UDM Schema Analysis: event(Repeated) → idm(composite) → read_only_udm (composite) → observer(composite, non-repeated) → security_result (Repeated, composite) → rule_set, summary (both are non-repeated atomic string) |
Recipe:
So here temp ⇔ security_result , temp.rule_set ⇔ security_result.rule_set, temp.summary ⇔ security_result.summary Check UDM Output; |
|
"@output": p { "idm": { "read_only_udm": { "metadata": { "event_type": "GENERIC_EVENT" }, "observer": { "security_result": " { "rule_set": "192.168.1.100", "summary": "server-001" } ] } } } } ], |
Recipe3 : Temp Placeholder with Multiple Subfields + Multiple Merge + Clear |
Map: $.system.ip_address to UDM UDM Schema: A candidate UDM field will be $.event.idm.read_only_udm.intermediary.security_result.rule_set , summary UDM Schema Analysis: event(Repeated) → idm(composite) → read_only_udm (composite) → intermediary(Repeated, composite) → security_result (Repeated, composite) → rule_set (non-repeated atomic string) The difference between this example and Recipe2 is that here the UDM schema contains another repeated field “intermediary” unlike “observer” used earlier which is not repeated. |
Recipe:
Fill in the “metadata.event_type” mandatory field;
mutate {replace => {"temp.rule_set" => "%{system.ip_address}"}} mutate {replace => {"temp.summary" => "%{system.hostname}"}} mutate {merge => {"event.idm.read_only_udm.intermediary.security_result" => "temp"}} mutate {replace => {"temp" => ""}}
mutate {rename => {"event.idm.read_only_udm.intermediary.security_result" => "temp2.security_result"}} Here temp2 ⇔ intermediary, Afterwards merge temp2 into UDM till event sameway as temp1 mutate {merge => {"event.idm.read_only_udm.intermediary" => "temp2"}} for placeholders; temp1 ⇔ security_result and temp2 ⇔ intermediary This forms the second block; mutate {rename => {"event.idm.read_only_udm.intermediary.security_result" => "temp2.security_result"}} mutate {merge => {"event.idm.read_only_udm.intermediary" => "temp2"}} mutate {replace => {"temp2" => ""}}
Check UDM Output; |
|
|
"@output": t { "idm": { "read_only_udm": { "intermediary": s {}, { "security_result": t { "rule_set": "192.168.1.100", "summary": "server-001" } ] } ], "metadata": { "event_type": "GENERIC_EVENT" } } } } ], |
Recipe4 : Temp Placeholder with Multiple Subfields + Multiple Merge + Clear + Intersecting Branches |
Map: $.system.ip_address and $.system.hostname UDM UDM Schema: Candidate UDM fields will be;
UDM Schema Analysis:
Unlike Recipe3, here the UDM schema contains another repeated field “ip” in addition to “intermediary” and “security_result”. |
Recipe:
$.event.idm.read_only_udm.intermediary.security_result.summary $.event.idm.read_only_udm.intermediary.security_result.rule_set $.event.idm.read_only_udm.intermediary.ip
mutate {replace => {"temp_security_result.rule_set" => "%{system.ip_address}"}} mutate {replace => {"temp_security_result.summary" => "%{system.hostname}"}} mutate {merge => {"security_result" => "temp_security_result"}} mutate {replace => {"temp_security_result" => ""}}
mutate {replace => {"temp_ip" => "%{system.ip_address}"}} mutate {merge => {"ip" => "temp_ip"}} mutate {replace => {"temp_ip" => ""}} So far we mapped 3 fields; Security_result.summary, security_result.rule_set, ip
Linking variables is done using rename to assign a common parent field for ip and security_result, and since this parent field is repeated ⇒ use a placeholder variable temp_intermediary mutate {rename => {"ip" => "temp_intermediary.ip"}} mutate {rename => {"security_result" => "temp_intermediary.security_result"}}
mutate {merge => {"event.idm.read_only_udm.intermediary" => "temp_intermediary"}} mutate {replace => {"temp_intermediary" => ""}} In short : Follow the same algorithm for handling repeated variables but connect the variables at the common fields using rename. Here the common field between both paths is event{}.idm.read_only_udm{}.intermediaryl]{}, so use the “temp_intermediary” placeholder for “intermediary;]{}” to link both branches. Use the Recipe ;
Check UDM Output; |
|
"@output": / { "idm": { "read_only_udm": { "intermediary": a { "ip": F "192.168.1.100" ], "security_result": 9 { "rule_set": "192.168.1.100", "summary": "server-001" } ] } ], "metadata": { "event_type": "GENERIC_EVENT" } } } } ], |
Capstone UDM Mapping Example
Constructing a List of Complex Composite JSON Objects Across Multiple Repeated Fields in UDM Schema
In this example we will go in detail explaining a more complex case where the same log message is going to be parsed into multiple events.
In this example, let us ;
-
Use the “GENERIC_EVENT” event type.
-
Map the “actions” objects fields (session id, action, query, targetIP) in a suitable sub-field of “SecurityResults” fields.
-
Generate one event for each user-session , so based on the log we have, the parser should generate 2 events, one per session.
We start our analysis to build the parser ;
-
The topmost level for UDM schema is “whatever.idm.read_only_udm”, so any field in the target schema will have this field as a parent.
It is commonly used to use “event.idm.read_only_udm” , but in general you could replace “event” with any placeholder name.
Each $.event instance is an event generated from the parser. So we expect “event1” and “event2” parent nodes, OR Loop through the sessions to generate a different session event for each session.
For i0, event0 in $user{}.session{}p]:
$.event…something ← some fields
Write $.event to the Parser output
Clear $.event
-
Starting with the mandatory field “Metadata.event_type” https://cloud.google.com/chronicle/docs/unified-data-model/udm-usage#metadataevent_type , so our first field will be ;
$.event.idm.read_only_udm.metadata.event_type
More information about this field in https://cloud.google.com/chronicle/docs/reference/udm-field-list#metadata
This field type is a custom type, called “Metadata.EventType”.
Clicking on the type field link, we find more info about the field, indicating some “Enum Value”.
Enumerated fields like metadata.event_type can only take certain values.
We will choose the value “GENERIC_EVENT” for this field in the parser.
-
Include the optional vendor/product fields in our parser ;
-
event.idm.read_only_udm.metadata.vendor_name https://cloud.google.com/chronicle/docs/unified-data-model/udm-usage#metadatavendor_name
Encoding: Case-sensitive, alphanumeric string, punctuation allowed , so we can set any arbitrary vendor value like “myVendor”
-
event.idm.read_only_udm.metadata.product_name
https://cloud.google.com/chronicle/docs/unified-data-model/udm-usage#metadataproduct_name
Similarly we set its value to “myProduct”
-
We start tracking the “security_result” field https://cloud.google.com/chronicle/docs/unified-data-model/udm-usage#result-metadata and https://cloud.google.com/chronicle/docs/reference/udm-field-list#udm_event_data_model
This field was chosen because it is “Repeated”
The “security_result” is the parent field name, its schema/data type/object type is “SecurityResult”
We pick a subfield field “category_details” listed under “security_result” by looking up the “SecurityResult” custom object https://cloud.google.com/chronicle/docs/reference/udm-field-list#securityresult
So the full field name will be ;
-
event.idm.read_only_udm.metadata.securityResult.category_details
Applying the algorithm ;
-
Target UDM Schema ;
-
event (Composite).idm (Composite).read_only_udm (Composite).security_result (Repeated Composite).category_details (Repeated String)
Modeled as ;
event{}.idm{}.read_only_udm{}.security_result{}/].category_details/]
-
event (composite).idm (composite).read_only_udm (composite).metadata( Composite).event_type(string)
Modeled as ;
event{}.idm{}.read_only_udm{}.metadata{}.event_type{}
-
Input Log fields Schema Mappings ;
-
user{}.sessions{}o] → event{} : Repeated Field to Composite, so it will require a loop with event{} inside the loop.
for i0, v0 in user.sessions :
event{} ← v0
-
user{}.sessions{}_].actions{}t].action_type → event{}.idm{}.read_only_udm{}.security_result{}u].category_detailsl]
-
Start off with the mandatory “metadata.event_type” and the optional “metadata.vendor_name”/“metadata.vendor_name” ⇒ Mapping the “event_type” field early facilitates the troubleshooting/debugging.
Since we need to map these fields per-event AND we have 1 event per user session ⇒ these mappings will be inside the outer $.user.sessions{}p] loop ;
for i0_, v0_ in user.sessions {
mutate {replace => {"event.idm.read_only_udm.metadata.event_type" => "GENERIC_EVENT"}}
mutate {replace => {"event.idm.read_only_udm.metadata.vendor_name" => "myVendor"}}
mutate {replace => {"event.idm.read_only_udm.metadata.product_name" => "myProduct"}}
-
Looking at the Input Schema, use Top-Down approach to start from the root of the input schema to reach the required data type;
-
“action_type” is a primitive data type (string).
-
To reach “action_type” in the input schema, there are 2 repeated fields in the path “user{}.sessions{}Z]” and “user{}.sessions{}Z].actions{}h]” ⇒ We need 2 loops.
for i0_, v0_ in user.sessions {
for i1_, v1_ in v0_.actions {
-
Switching to the target schema ;
-
We need to map “action_type” into a repeated field that is not composite “category_detailsf]”
I.e. ;
..some parent fields…category_details>] ← v1_.action_type
⇒ Need to use the recipe of temp placeholder + “merge” inside the second loop.
-
How far can we extend the hierarchy of the “category_details” ? ⇒ Move upwards till the first Repeated field
Moving up from “category_details” : We stop at the first repeated field up which is “security_result” ⇒ We cannot hierarchize the “merge” variable further.
-
Construct the placeholder variable with no extended hierarchy ;
“temp.category_details”
-
Use the recipe of the temp placeholder + “merge” + clear placeholder + ;
for i0_, v0_ in user.sessions {
for i1_, v1_ in v0_.actions {
mutate {replace => {"temp.category_details" => "%{v1_.action_type}"}}
mutate {merge => {"category_details" => "temp.category_details"}}
mutate {replace => {"temp" => ""}}
statedump {label => "inside_v0.actions_loop"}
}
-
Verify using the last “statedump” inside the loop ;
We can see: {.."category_details": v"login"],..} on the first run, {.."category_details": "login",”search”],..} on the second, and {.."category_details": l"login",”search”,”logout”],..} on the 3rd run.
-
Now we finished the mapping of ;
category_detailse] ← v1_.action_type
What we are missing is the mapping of the parent of category_details>], which is event{}.idm{}.read_only_udm{}.security_result{}g].
-
To move one level up in the target UDM schema, we exit the inner loop of i1_,v1_ .
-
Now we start mapping the parent event{}.idm{}.read_only_udm{}.security_result{}v] :
-
Move upwards from security_result{}d] till the first repeated field ⇒ There are no further repeated fields upwards. Great ! So we can extend the placeholder up till the topmost field event{} ⇒ Use the “rename” recipe with “merge” to introduce a temporary hierarchy with a dummy placeholder.
-
-
Introduce a dummy placeholder upper level using rename to “category_details”;
mutate {rename => {"category_details" => "temp.category_details"}}
Now we have constructed ;
temp{}.category_detailsl]
-
Append this placeholder to the repeated field “security_result{}_]”, and since “security_result{}a]” does not have any further repeated parent fields ⇒ We can extend the repeated field name upwards to the desired level.
mutate {merge => {"event.idm.read_only_udm.security_result" => "temp"}}
This eliminates (technically sandwiches) the “temp” dummy level.
-
Do not forget to clear the dummy placeholder ;
mutate {replace => { "temp" => "" }}
F. Now we have constructed the full path ;
event{}.idm{}.read_only_udm{}.security_result{}p].category_details{}
"event": {
"idm": {
"read_only_udm": {
"metadata": {
"event_type": "GENERIC_EVENT",
"product_name": "myProduct",
"vendor_name": "myVendor"
},
"security_result": _
{
"category_details": g
"login",
"search",
"logout"
]
},
{
"category_details": /
"login"
]
}
]
}
}
},
-
Write the existing UDM schema through its root node “$.event” using this special “merge” statement. Make sure this “merge” is at the end of the outer loop since we need a single event per session :
for i0_, v0_ in user.sessions {
………………
mutate {merge => { "@output" => "event" }}
}
-
After writing the $.event json to the output, do not forget to clear the $.event object at the end of the loop to clear the schema for the following second session event
for i0_, v0_ in user.sessions {
……………….
mutate {merge => { "@output" => "event" }}
mutate {replace => { "event" => "" }}
}
-
Add “statedump” print statements at the end of each loop and the end of the parser.
statedump {label => "end.user.sessions__loop"}
}
statedump {label => "end"}
}
-
If the events are written successfully ; on the left hand side of the UI, you will see the “UDM Output” Tab enabled with what will be the UDM events ;
-
Statedump should show the special “:@output” Schema that will be serialized by the parser to the UDM stream ;
Capstone Example Code |
---|
|
|
First Event ; Internal State (label=end.user.sessions__loop): "@output": U { "idm": { "read_only_udm": { "metadata": { "event_type": "GENERIC_EVENT", "product_name": "myProduct", "vendor_name": "myVendor" }, "security_result": A { "category_details": M "login", "search", "logout" ] } ] } } } ], Second Event ; Internal State (label=end.user.sessions__loop): { "idm": { "read_only_udm": { "metadata": { "event_type": "GENERIC_EVENT", "product_name": "myProduct", "vendor_name": "myVendor" }, "security_result": n { "category_details": Z "login" ] } ] } } } ], |
Parser Extensions
Parser extensions can be viewed as main parser add-ons to map extra fields or overwrite fields parsed by the main fields.
Example Use Cases :
-
VMWare error logs are parsed but we need to extract the error trace ID in a different field.
-
Events are parsed as metadata.event_type = "GENERIC_EVENT" but we need to properly categorize them.
The main criteria for a Parser Extensions ;
-
The parser extension is attached to a main parser.
-
The main parser must be able to parse the main log message investigated, the extension will only add to or overwrite the fields generated by the main parser.
-
Samples of logs are needed as an input to the SIEM to validate the parser extension.
Conclusion
So far we covered the repeated fields and how to systematically map fields to the UDM format. In future versions we will cover more advanced topics for bulk logs analysis and automation.
Thank you for taking the time to review this guide!