Adoption Guide: Deep Dive into UDM Parsing - Part 2.2

Forum|Forum|3 months ago
August 4, 2025
0 replies
95 views

+6

Digital-Customer-Excellence
Staff

Part 2.1 covered Conditionals and Advanced String Tokens Manipulations for Advanced Tokenizations. Part 2 will cover Advanced Loops, Manipulating the Schema Hierarchy and Sub-fields and Handling Repeated Fields.

Advanced Loops

Advanced Loops : Nested and Multi-Level Loops

Task: Concatenate all the string sub-fields of $user.sessions.actions

filter {
json { source => "message"   array_function => "split_columns"}
mutate { replace => { "myConcatField" => "" }}


for index1_, field1_ in user.sessions {
    for index2_ , field2_ in field1_.actions {
            if [myConcatField] == ""{
                        mutate { replace => { "myConcatField" => "%{field2_.action_type}" }}
                    }
                    else{
                        mutate { replace => { "myConcatField" => "%{myConcatField}|%{field2_.action_type}" }}
                    }
                statedump {}
    }
}
}

Snippet from statedump output: At Each Loop Run

… :

"myConcatField": "login",

….

"myConcatField": "login|search",

….

"myConcatField": "login|search|logout",

…

"myConcatField": "login|search|logout|login",

…

1. The fields paths and types ;

Root → user (composite) → sessions (Repeated+composite) → actions (Repeated+composite) → action_type (String).

2. To concatenate all 'action_type' values from the nested 'actions' fields, we use nested loops.

• JSON Path: $.user.sessions[*].actions[*].action_type

• GoStash goal: $.myConcatField ← concat($.user.sessions[*].actions[*].action_type)

GoStash requires a two-step looping approach:

1. Looping through 'sessions': %{user.sessions}. Each 'sessions' object contains other data, including 'actions'.

2. Looping through 'actions' within each 'session': This lets us access $.user.sessions[*].actions[*].

Within the inner loop, we select the 'action_type' from each 'action' and add it to myConcatField. The 'action_type' field is present in every entry. We use myConcatField to accumulate the concatenated string.

3. Equivalent to ;

$← flatten($.message{})

$.myConcatField ← ""

#Select the $.sessions fields (Repeated+composite) under $.user field (composite) i.e. the $.user{}.sessions{}[] fields

For index1, field1_ in $.user{}.sessions{}[]:

##To select actions (Repeated+composite) from the previous field sessions (Repeated+composite) now aliased as field1{}=$.user{}.sessions{}[]

For index2_, field2_ in field1_{}.actions{}[]:

#Select action_type (String) from the previous field actions (Repeated+composite) now aliased as field2_

If $.myConcatField is Empty:

$.myConcatField ← field2_{}.action_type

Else :

$.myConcatField ← concat($.myConcatField, "|", field2_{}.action_type)

Advanced Loops : Nested and Multi-Level Loops with Recurring Fields and Field Probes

Task: Capture and concatenate any of the $user.sessions.actions fields if the session actions has any query keywords

filter {
json { source => "message"   array_function => "split_columns"}
mutate { replace => { "myConcatField" => "" }}


for index1_, field1_ in user.sessions {
    for index2_ , field2_ in field1_.actions {
            mutate { replace => { "optionalQuery" => "%{field2_.query}"} on_error=> "queryMissing"}
            if ![queryMissing] {
                for index3_, field3_ in field2_ map {
                    if [myConcatField] == ""{
                        mutate { replace => { "myConcatField" => "%{field3_}" }}
                    }
                    else{
                        mutate { replace => { "myConcatField" => "%{myConcatField}|%{field3_}" }}
                    }
                }
                statedump {}
            }
       }
    }
}

Snippet from statedump output: On the loop run instance when "query" exists in "sessions" :

… :

"myConcatField": "search|weather|10.0.0.10|2025-01-11T11:35:00Z",

"optionalQuery": "weather",

"queryMissing": false,

…

1. Path: Root → user (composite) → sessions (Repeated+composite) → actions (Repeated+composite) → query(String/Optional).

2. Equivalent to ;

$← flatten($.message{})

$.myConcatField ← "" #initialize the placeholder

For index1_ , field1_ in $.user{}.sessions[*]

For index2_ , field2_ in field1_{}.actions[*]:

#Check if the "query" field exists under the "actions" object loop variable, (i.e. Probe for "query" field) if it exists then capture it and start concatenating all fields in its "actions" object, otherwise it does not exist then raise a flag "queryMissing"

$.optionalQuery ← field2_.query ifExists, Else raiseFlag: $.queryMissing ← true

If $.queryMissing is false : # i.e. if the "query" field is exists (if the "queryMissing" flag is False then the field "query" exists), then proceed to concatenate all fields under the composite "actions" object by looping through the composite loop"actions" field aliased in the loop as "field2"

For index3_ , field3_ in field2{} :

If $.myConcatField is Empty:

$.myConcatField ← field3_

Else :

$.myConcatField ← concat($.myConcatField, "|", field3_ )

3. By using field probing and conditional logic, you can gracefully deal with unexpected or recurring fields, leading to more reliable parsing and simplified troubleshooting

The use of flags (like queryMissing) provides valuable debugging information. If a flag is set, you know that a particular field was not found, which can help you quickly identify issues in your parsing logic or data.

Manipulating the Schema Hierarchy and Sub-fields

This section covers how to structure your output data in GoStash, including nesting fields, creating composite elements, and merging fields into lists. Unlike the previous section, which focused on capturing and basic manipulation of data, this section focuses on building complex hierarchies.
These techniques are required to map different UDM fields since all fields are composite (e.g. principal.process.parent_process) or have repeated fields within the hierarchy (e.g, principal.ip[*] )

The techniques described in this section will be used in the final parser “recipes” covered in “UDM Schema Mapping” Section

Constructing Nested String Elements

Task: Capture the field “eventType” value into a hierarchy $.grandparent.parent.eventType and “timestamp” into a hierarchy $.othergrandparent.otherparent.date

Schema Mapping:
$.eventType ⇒ $.grandparent{}.parent{}.eventType

$.timestamp ⇒ $.othergrandparent{}.otherparent{}.date

filter {
json { source => "message"   array_function => "split_columns"}
mutate { replace => { "grandparent.parent.eventType" => "%{event_type}" }}


mutate { replace => { "myTimestamp" => "%{timestamp}" }}
mutate { rename => { "myTimestamp" => "othergrandparent.otherparent.date" }}


statedump {}
}

Snippet from statedump output:

… :

"grandparent": {

"parent": {

"eventType": "user_activity"

},

"othergrandparent": {

"otherparent": {

"date": "2025-01-11T12:00:00Z"

}

},

…

1. GoStash allows you to build hierarchical structures in your output. You can do this with;

a. rename : Preferable as it is more flexible.

b. replace : Limited to string fields.

Both operators were used to form nested fields effectively as ;

$.grandparent.parent.eventType ←$.event_type

$.othergrandparent.otherparent.myTimestamp ←$.event_type

2. Equivalent to ;

$← flatten($.message{})

$.grandparent{}.parent{}.eventType ← $.eventType

$.myTimestamp ← $.timestamp

Rename $.myTimestamp ⇒ $.othergrandparent.otherparent.date

AD_4nXdjrBRyjFyosl-mxtsVw4e3D8pOKnEMPMpllNgXuutO02Mias5AZ_NbKOiDtV0RvvFmJPfI_ntbPHbvuOI0Ig-j8eBJpoHouhTXTWOEYgmUvtjglbc_i1pHmXBQ6yMZ2Knw1-d1apw?key=D6S2Pz4xypBqnhbqU9jG-8gv

Constructing Composite Fields with Mixed Types

Task: Capture the fields $.user.id , $.user.profile.VIP and $.timestamp fields and convert to a hierarchy $.grandparent.parent.*

Schema Mapping:
$.user{}.id ⇒ $.grandparent{}.parent{}.userId

$.user{}.username ⇒ $.grandparent{}.parent{}.username

$.user{}.profile{}.VIP ⇒ $.grandparent{}.parent{}.VIP

$.timestamp ⇒ $.grandparent{}.parent{}.eventTime

filter {
json { source => "message"   array_function => "split_columns"}


mutate {convert => {"user.id" => "string"}}
mutate { replace => { "grandparent.parent.userId" => "%{user.id}" }}
mutate {convert => {"grandparent.parent.userId" => "integer"}}

mutate { replace => { "grandparent.parent.userName" => "%{user.username}" }}

mutate {convert => {"user.profile.VIP" => "string"}}
mutate {replace => { "grandparent.parent.VIP" => "%{user.profile.VIP}" }}
mutate {convert => {"grandparent.parent.VIP" => "boolean"}}

mutate {copy => {"timestampCopy" => "timestamp"}}
date {match => ["timestampCopy", "yyyy-MM-dd HH:mm:ss", "UNIX", "ISO8601", "UNIX_MS"]   timezone => "America/New_York" on_error => "dateConversionError"}
mutate { rename => { "timestampCopy" => "grandparent.parent.eventTime" }}

statedump {}
}

Snippet from statedump output:

… :

"dateConversionError": false,

"event_type": "user_activity",

"grandparent": {

"parent": {

"VIP": true,

"eventTime": "2025-01-11T12:00:00Z",

"userId": 12345,

"userName": "johndoe"

}

…

1. This example demonstrates different strategies for constructing a composite field in GoStash. We'll focus on building a grandparent.parent field that will contain:

• VIP

• userId

• userName

• eventTime

Here's a breakdown of the techniques involved:

• Handling Non-String Fields: Because replace is designed for string fields, convert is used with userId and VIP by converting them to strings before using replace (if that's the chosen approach)

• Handling Date Field : date is used to convert the string $.timestamp field into a date type field in America/New_York timezone, then the field is renamed to be added as eventTime under grandparent.parent .

2. Equivalent to ;

$← flatten($.message{})

$.grandparent{}.parent{}.userName ← $.user{}.username

$.user{}.id ← string($.user{}.id)

$.grandparent{}.parent{}.userId ← $.user{}.id

$.grandparent{}.parent{}.userId ← integer($.grandparent{}.parent{}.userId)

$.user{}.profile{}.VIP ← string($.user{}.profile{}.VIP)

$.grandparent{}.parent{}.VIP ← $.user{}.profile{}.VIP

$.grandparent{}.parent{}.VIP ← boolean($.grandparent{}.parent{}.VIP)

Copy $.timestamp → $.timestampCopy

$.timestampCopy ← DateConvert ($.timestampCopy, Zone=NY, Format = [yyyy-MM-dd HH:mm:ss, UNIX, ISO8601, UNIX_MS])

Rename $.timestampCopy → $.grandparent.parent.eventTime

AD_4nXfs-saQwouL5ZgkCg3j5bNHXyNYkMwRZiIJ2ldCrh31P2NE-LKPhgnGV50ZHH2moVWiIkk-mGzODJsG5BeRQoj1fN9hXSGV2KLj_0ER4a5Pd4hqKFDcxHhrcrV54LwI4fV9pI0jyx8?key=D6S2Pz4xypBqnhbqU9jG-8gv

Handling Repeated Fields

This section is crucial for ensuring UDM compliance. We'll discuss how to capture repeated fields from raw logs and, more importantly, how to structure them to match the UDM schema. Examples of repeated UDM fields are:

security_result{}[] (an array of composite objects)
security_result{}[].action{}[] (a nested array of composite objects)
principal{}[].ip[] (an array of string arrays)

It's vital to remember that even if your log data contains a single value for a field, if the UDM schema defines that field as repeated (e.g., principal.ip), your parser must construct a repeated field structure."

Constructing a repeated string field (List of strings)

Task: Capture and Merge the string fields $.user.id , $.user.username into a list field $.listObjet, and the repeated string field $.Tags into $.listObject2 .

Schema Mapping:
$.Tags[] ⇒ $.listObject[]

$.user{}.id ⇒ $.listObject2[]

$.user{}.username ⇒ $.listObject2[]

flatten($.Tags[]) ⇒ $.listObject2[]

filter {
json { source => "message"   array_function => "split_columns"}
for index_ , fieldValue_ in Tags {
    mutate { merge => { "listObject" => "fieldValue_" }}
    statedump {}
    }

mutate { merge => { "listObject2" => "user.id" }}
mutate { merge => { "listObject2" => "user.username" }}
mutate { merge => { "listObject2" => "Tags" }}
statedump {}
}

Snippet from statedump output:

… First Loop Run:

"listObject": [

"login_logs"

],

… Second and Final Loop Run:

"listObject": [

"login_logs",

"dev"

],

…

"listObject2": [

12345,

"johndoe",

{

"0": "login_logs",

"1": "dev"

}

],

..}

1. GoStash's merge clause is used to construct repeated fields in UDM, it can work with fields of any data type. This example demonstrates its use in creating two list objects from single string fields :

• listObject: This list is populated by looping through the flattened 'Tags' field values, which is a repeated (but not composite) field.

• listObject2: This list contains a mix of different data types, including strings, integers, and flattened repeated strings. Notably, it also includes listObject itself. Again, listObject is a repeated, non-composite field."

• Notice the difference between Tags handling in both cases;

• In listObject ; The repeated tags values will be captured.

• In listObject2 ; The flattened tags field (including the indices “0” and “1” ) will be captured.

2. This example is for demonstration only, as UDM requires repeated fields to be structured (e.g., [{"Tag": "login_logs"}, {"Tag": "Dev"}]), not simple lists like ["login_logs","Dev"], so this form is not very useful.

3. Equivalent to ;

$← flatten($.message{})

For index_ , fieldValue_ in $.Tags{}[]:

listObject[] ← append(fieldValue)

listObject2[] ← append($.user{}.id)

listObject2[] ← append($.user{}.username)

listObject2[] ← append($.Tags{}[])

Constructing Lists of Strings with a Hierarchy

Task: Capture and Merge the string fields $.Tags into a hierarchy $.parent1.tagsObject

filter {
json { source => "message"   array_function => "split_columns"}
mutate { merge => { "tagsObject" => "Tags" }}
mutate { rename => { "listObject" => "parent1.tagsObject" }}
statedump {}
}

Snippet from statedump output:

…

"parent1": {

"tagsObject": [

{

"0": "login_logs",

"1": "dev"

}

]

},

..}

1. This example shows that you can use merge with the repeated 'tags' field without flattening it.

2. You can also use rename or hierarchical names with merge and merged lists.

This example shows one of the core patterns used in UDM Mapping to handle repeated nested fields.

Constructing a List of Simple Composite JSON Fields

Task: Capture the repeated Tags field into a Repeated composite field Tags as $.Tags[]{}.Tag.*

Input Log Schema

AD_4nXfkpeiSM7x4un-cZ52PtJ1GB3G_qOf4_Xp7V7pA103LRcQN2GVheNzh_rp1SqImpT15ejT_lfk1SezOUzPoFucj07_hLQAOemrGiyzoakel1Ov4x1_hLauGAcugMKWz35lgMtqq4lY?key=D6S2Pz4xypBqnhbqU9jG-8gv

Required Target Schema

AD_4nXehk_JS0JlBEpSrTv5xl6iF4FN6MWZZbxmafINvSnWHvQpzJ3pSZalan-Zf_9C6zFVCWSSet7A0o6mQvO2g4ZmrZxMqqnCvOH6BZYEThlNcoxE6C4tfrrWmgUc0c1W0K2ej1ptyvI8?key=D6S2Pz4xypBqnhbqU9jG-8gv

filter {
json { source => "message"   array_function => "split_columns"}
for index_ , fieldValue_ in Tags {
    mutate { replace => { "temp.TagKey" => "%{fieldValue_}"}}
    mutate { merge => { "TagsList" => "temp" }}
    statedump { label=> "insideTagsLoop"}
    mutate { replace => { "temp" => "" }}
    }
statedump { label=> "end"}
}

Snippet from statedump output:

… First Loop Run: Internal State (label=insideTagsLoop):

"TagsList": [

{

"TagKey": "login_logs"

}

],

"fieldValue_": "login_logs",

"index_": 0,

"temp": {

"TagKey": "login_logs"

},

… Second and Final Loop Run: Internal State (label=insideTagsLoop):

"TagsList": [

{

"TagKey": "login_logs"

},

{

"TagKey": "dev"

}

],

"fieldValue_": "dev",

"index_": 1,

"temp": {

"TagKey": "dev"

},..}

At the end: Internal State Internal State (label=end):

"TagsList": [

{

"TagKey": "login_logs"

},

{

"TagKey": "dev"

}

],

Here's a detailed breakdown of how the 'Tags' field is processed:

1. Flattening the Input: The initial step involves flattening the input JSON log. This transformation converts the repeated 'Tags' field (which is initially an array) into a composite field. The composite field then contains sub-fields named '0' and '1', where '0' holds the first tag value ('login_logs') and '1' holds the second tag value ('dev').

2. Capturing Tag Values: A loop is used to iterate through the flattened 'Tags' field. During each iteration, the loop captures the tag values associated with the sub-fields '0' and '1'. These captured values are stored within a temporary nested field called 'temp.TagKey'.

3. Merging into 'TagsList': Finally, the temporary 'temp' field, which now contains a single tag value, is appended to the 'TagsList' using the merge clause. This process is repeated for each tag in the original 'Tags' field, effectively reconstructing the tag information within the 'TagsList' field."

AD_4nXfZ860qm1w4BDLICn6S4p5gJBWBl0KZHDdtFRQU3XhCnq0L-tYurPeezyHDQuAqrO5BXONs0SECTAMSXQa1oP-vMbOmxDAYqU911dDov9pfyikGNiX28LPN_piIarpceXLRJBbYXOM?key=D6S2Pz4xypBqnhbqU9jG-8gv

1. This pattern is essential for formatting data to match UDM's requirements for repeated fields. To create a list of objects like:

"listObject": [ {"tag": "..." }, {"tag": "..."} ]

You need to:

a. Iterate over the flattened Tags field: For index_, fieldValue in $.Tags{}[]

b. Construct a temporary object (temp) with a 'tag' property (temp.Tag).

c. Assign the current tag value to temp.Tag: $.temp{}.tag ← fieldValue

d. Append the entire temp object to the target list ($.tags[]): $.tags[] ← append($.temp{})

e. Reset temp for the next iteration: $.temp{} ← """

• Input data is extracted from the JSON Path $.Tags[*] ;

AD_4nXfB9ba_PLhamsRrye2_R3n2dRw8rNgNCwanN8wcvttwHlYpuU0ZrItpADz_D_9CyH33tIGBm34OaqhWIhEdnihtZsaRggBrb-z9TIdr1XfLFlL4_69FB7GDeHlnepRuJYsyqWTjTVQ?key=D6S2Pz4xypBqnhbqU9jG-8gv

• Output schema is ;
{"TagsList": [

{

"TagKey": "login_logs"

},

{

"TagKey": "dev"

}

]

}

AD_4nXfkn0AFzx3ORANARffgiRNCSxwiUpPTDG1jsFaBSTWfN-DW1vBtj-PPKjmR9A95kbe2CwZzMI43AwNCdaTSHtVcgSaGeT4JzzxFg8PpBmE0A0AJGAZpyTDTsOobPMtwRsYTe8er_Ac?key=D6S2Pz4xypBqnhbqU9jG-8gv

3. The merge clause appends values to JSON objects

2. Equivalent to ;

$← flatten($.message{})

For index_ , fieldValue_ in $.Tags{}[]:

$.temp{}.tag ← fieldValue_

tags[]{} ← append($.temp{})

Clear $.temp{}

3. Important: Failing to clear temp completely after appending it, or incorrectly clearing only temp.tag, will corrupt the 'tags' list, causing duplication. For example, either of these will lead to errors:

• mutate { replace => { "Tag" => "" }}

• #mutate { replace => { "temp" => "" }} (commenting out the clear)

Both will corrupt the target schema “tags” field, and it will look like ;

"tags": [ { "Tag": "dev" }, { "Tag": "dev" } ]

Part 2.3 will cover a Capstone Example and UDM Schema Mapping.

Advanced Loops

Advanced Loops : Nested and Multi-Level Loops

Advanced Loops : Nested and Multi-Level Loops with Recurring Fields and Field Probes

Manipulating the Schema Hierarchy and Sub-fields

Constructing Nested String Elements

Constructing Composite Fields with Mixed Types

Handling Repeated Fields

Constructing a repeated string field (List of strings)

Constructing Lists of Strings with a Hierarchy

Constructing a List of Simple Composite JSON Fields

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded