Capturing Basic JSON fields
The main body of a parser Go-stash are enclosed within a filter{...} statement like this ;
filter {
}
In the following section, we will examine common JSON transformation operations available in GoStash. For each operation, we will provide a detailed explanation of its syntax, demonstrate its effect on JSON data using 'statedump' debug output snippets, and include notes to highlight important aspects and potential considerations.
JSON Transformation Pseudocode
In the SecOps environment, incoming log messages are initially stored within a field named 'message'. To standardize and analyze this data, we need to convert it into the Unified Data Model (UDM) schema. To effectively describe the transformation process, we will employ a pseudocode that draws inspiration from the JSONPath syntax you've learned earlier. It's crucial to remember that this pseudocode serves as a descriptive tool for illustrating the transformations and is not a formal syntax that needs to be strictly adhered to.
- To illustrate JSON transformations, we'll employ a pseudocode with clear assignment operations. The arrow symbol (←) will denote assigning a value to a variable or field. For instance, x ← 5 signifies assigning the integer 5 to the variable 'x'. In the context of JSON, $.fixedToken ← "constantString" represents creating a new field named 'fixedToken' directly under the root node ('$') and assigning it the string value 'constantString'.
- Our pseudocode can also express copying values between fields. In Copying by reference we use M to indicate accessing the value of a variable or field. So, x ← M[y] means 'take the value currently stored in 'y' and assign it to 'x'.
This concept extends to JSON structures. For instance, $.myToken ← $.message.username signifies creating a new field 'myToken' at the root level and assigning it the value found within the 'username' field, which is itself nested inside the 'message' field. This copies only the value, not the entire structure.
For example:
Input Log:
JSON
{ "event_type": "user_activity" }
Transformation:
Code snippet
$.myToken ← $.message.event_type
Output (UDM Schema):
JSON
{..., "myToken": "user_activity",... }
- In our pseudocode, composite fields, which are essentially objects containing other fields, are denoted using curly braces {}. This visually represents the nested structure. For example, a field named 'system' that has subfield fields like 'hostname' and 'ip_address' would be represented as $.system{}. For brevity, we can also simply denote it as $.system, but keeping in mind that “system” has a nested structure.
This distinction between composite and simple fields is particularly important in GoStash. To make this clear in our pseudocode, we'll add {} after composite field names (like $.system{}). This helps you visualize the structure and apply GoStash transformations correctly, however you can drop this notation once you get more fluent.
- To access a field nested within a composite field, you can use the dot notation, which is a common convention in JSONPath. For instance, to reference the 'hostname' field within the 'system' field, you would use $.system.hostname. Alternatively, you can explicitly denote the composite field using {}, like this: $.system{}.hostname. Both notations achieve the same result; the latter simply provides a visual cue that system is a composite field containing other fields.
- To represent a repeated field (a field containing a list of values), we'll use square brackets ``. For example, the field Tags with multiple values would be shown as $.Tags or $.Tags[*].
"For fields containing a list of composite values (objects), we'll use {}. For example, a 'sessions' field containing a list of session objects, each with 'session_id', 'start_time', and 'actions', would be shown as $.user.sessions[*], $.user{}.sessions{}[*].
JSON Flattening in Go-Stash
- Flattening is a common operation in JSON parsing that converts a repeated field, which is essentially an array, into a composite field with numbered keys. This transformation makes it easier to access individual elements within the repeated field. For example, if you have a repeated field $.Tags[*] with the values ["login_logs", "dev"], flattening it would result in $.Tags{} with the structure {"0": "login_logs", "1": "dev"}. After flattening, you can reference the first tag using $.Tags.0, the second tag using $.Tags.1, and so on. This provides a convenient way to work with individual elements within a previously repeated field.
- As we describe transformations, our pseudocode will use a flexible approach to referencing fields. Sometimes, it will directly reference fields within the original input log structure, which is always nested under the 'message' field (e.g., $.message.somefield). Other times, the pseudocode might reference fields that have already been created or modified within the target schema during earlier transformation steps (e.g., $.some_new_field). Both approaches are valid and serve to illustrate the flow of data transformation.
Print Tokens Captured |
| filter { statedump {} } |
| Snippet from statedump output: { "@createTimestamp": { "nanos": 0, "seconds": 1736558676 }, "@enableCbnForLoop": true, "@onErrorCount": 0, "@output": [], "@timezone": "", "message": "{\n \"timestamp\": \"2025-01-11T12:00:00Z\", \n \"..... |
|
Parse the JSON Schema and Flatten Repeated Fields |
| filter { json { source => "message" array_function => "split_columns"} statedump {} } |
| Snippet from statedump output: { "@createTimestamp": { "nanos": 0, "seconds": 1736558735 }, "@enableCbnForLoop": true, "@onErrorCount": 0, "@output": [], "@timezone": "", "event_type": "user_activity", "message": "{\n ….. "system": { "hostname": "server-001", "ip_address": "192.168.1.100" }, "timestamp": "2025-01-11T12:00:00Z", "user": {..... |
JSON { "message": {...your log data... } }
While without using the json parse clause, "event_type" is a subfield of the "message" field.
JSON "user": { "sessions": [ { "session_id": "abc-123",... }, { "session_id": "def-456",... } ] }
JSON "user": { "sessions": { "0": { "session_id": "abc-123",... }, "1": { "session_id": "def-456",... } } } This makes it much easier to access individual session objects. For example, you can now directly reference the first session as $.user.sessions.0. Before: After: The effect of flattening on the repeated Tags field:
JSON "Tags": ["login_logs", "dev"]
JSON "Tags": { "0": "login_logs", "1": "dev" } Now you can easily access specific tags. For example, $.Tags.0 refers to "login_logs", and $.Tags.1 refers to "dev"." Before After This conversion allows using dot notation to access repeated fields the same way as the subfields of parent nodes of composite fields. |
Tips
- Always use "array_function" in the JSON parse clause whether you need to parse repeated fields or not
filter {
json { source => "message" array_function => "split_columns"}
for index, _tag in Tags map {
statedump {}}
}
Assign a String Constant |
| Task: Assign a string “Log Sample” to a new token(variable) “constantToken” |
| filter { mutate { replace => { "constantToken" => "Log Sample" }} statedump {} }
|
| Snippet from statedump output: …. "constantToken": "Log Sample", …. |
|
Capture a Simple String Token |
| Task: Capture the value of $.user_activity into a new token “myEventClass” |
| filter { json { source => "message" array_function => "split_columns"} mutate { replace => { "myEventClass" => "%{event_type}" }} statedump {} }
|
| Snippet from statedump output: … "eventClass": "user_activity", …. |
$← flatten($.message{}) $.myEventClass ← $.event_type It's important to be aware of which schema you're referencing. If you're still working with the original input structure, then the equivalent of the second line would be $.myEventClass ← $.message{}.event_type, because the event_type field would still be nested under message
This effectively creates a copy of the 'event_type' field's value under a new name ('myEventClass')."
|
Capture a subfield String Token |
| Task : Capture the value of $.user{}.username into a new token “myUser” |
| filter { json { source => "message" array_function => "split_columns"} mutate { replace => { "myUser" => "%{user.username}" }} statedump {} }
|
| Snippet from statedump output: … "myUser": "johndoe" …. |
$ ← flatten($.message{}) $.myUser ← $.user{}.username This assigns the value from the nested 'username' field (within 'user') to a new token called 'myUser' |
Capture a Repeated String Field with Specific Order |
| Task: Capture the value of the first session ID $.user{}.sessions{}[0].session_id and the second tag $.Tags[0] |
| filter { json { source => "message" array_function => "split_columns"} mutate { replace => { "firstSession" => "%{user.sessions.0.session_id}" }} mutate { replace => { "secondTag" => "%{Tags.1}" }} statedump {} } |
| Snippet from statedump output: … "firstSession": "abc-123", …. |
$.← flatten($.message{}) $.firstSession ← $.user{}.sessions{}.0.session_id $.secondTag ← $.Tags{}[].1 The array_function in GoStash introduces a key difference in how you work with repeated fields compared to JSONPath.
This flattening behavior allows GoStash to treat repeated fields similarly to composite fields, providing a consistent way to access data using the dot notation."
i.e. %{Tags.1} is not supported without the "array_function" clause. |
| Task: Capture the first action type under the first action in the first session $.user{}.sessions{}[0].actions{}[0].action_type into “firstActionType” |
| ### filter { json { source => "message" array_function => "split_columns"} mutate { replace => { "firstActionType" => "%{user.sessions.0.actions.0.action_type}" }} statedump {} } |
| Snippet from statedump output: … "event_type": "user_activity", …. |
| Equivalent to ; $← flatten($.message{}) $.firstActionType ← $.user.sessions.0.actions.1.action_type |
Initialize an empty field |
| Task: Declare and Clear an empty token “emptyPlaceholder” |
| ### Capture the first session ID filter { json { source => "message" array_function => "split_columns"} mutate { replace => { "emptyPlaceholder" => "" }} statedump {} } |
| Snippet from statedump output: … "emptyPlaceholder": "", …. |
$.← flatten($.message{}) Declare Variable: string temp = ""
In these cases, initializing the token ensures that it exists and is ready to be used in the intended operation. We'll explore these scenarios in more detail in the following sections. |
Capture a field -if exists- with Exception Handling |
| Task: Capture the third Tag if it exists without raising any parsing errors if it does not. |
| filter { json { source => "message" array_function => "split_columns"} mutate { replace => { "my3rdTag" => "%{Tags.2}"} on_error => "3rdTag_isAbsent"} statedump {} } |
| Snippet from statedump output: … "3rdTag_isAbsent": true, … |
$← flatten($.message{}) $.my3rdTag ← $.Tags{}[].3, ifError raiseFlag: $.3rdTag_isMissing ← true
In this example, without on_error, trying to access the non-existent 3rd tag would result in a compilation error, halting the parser execution. However, with on_error, GoStash raises a flag named '3rdTag_isAbsent' and sets it to 'True', allowing the parser to continue executing. This demonstrates how on_error can prevent complete parser failure and provide more robust error handling." I.e. Using "mutate { replace => { "my3rdTag" => "%{Tags.2}"}}" statement without "on_error" will generate a compiling error, and will stop the parser execution as there is no 3rd tag field in this example.
|
Capture a subfield Non-String Field |
| Task: Capture $.user{}.id integer field |
| filter { json { source => "message" array_function => "split_columns"} mutate {convert => {"user.id" => "string"} on_error => "userId_conversionError"} mutate { replace => { "myUserId" => "%{user.id}" }} mutate {convert => {"myUserId" => "integer"}} statedump {} } |
| Snippet from statedump output: … "myUserId": 12345, "userId_conversionError": false …. |
$← flatten($.message{}) $.user{}.id ← string($.user{}.id), IfError: raiseFlag $.userId_conversionError ← True $.myUserId ← $.user{}.id $.myUserId ← integer($.myUserId)
Keep this distinction in mind to avoid syntax errors when using these operators |
| Sample Token Types: Assume an input message processed by the parser below ; { "booleanField": true, "booleanField2": 1, "floatField": 14.01, "integerField": -3, "uintegerField": 5, "stringField": "any single-line string is here"} |
| filter { json { source => "message"} statedump {} } |
| Snippet from statedump output: { "booleanField": true, "booleanField2": 1, "floatField": 14.01, "integerField": -3, "stringField": "any single-line string is here", "uintegerField": 5 } |
boolean float hash integer ipaddress macaddress string uinteger hextodec hextoascii |
Constructing Nested String Elements |
| Task: Capture $.event_type and $.date into a Hierarchical token “grandparent.parent.eventType” and “othergrandparent.otherparent.date” |
| filter { json { source => "message" array_function => "split_columns"} mutate { replace => { "grandparent.parent.eventType" => "%{event_type}" }}
mutate { replace => { "myTimestamp" => "%{timestamp}" }} mutate { rename => { "myTimestamp" => "othergrandparent.otherparent.date" }}
statedump {} } |
| Snippet from statedump output: … : "grandparent": { "parent": { "eventType": "user_activity" }, "othergrandparent": { "otherparent": { "date": "2025-01-11T12:00:00Z" } }, … |
$← flatten($.message{}) $.grandparent{}.parent{}.eventType ← $.eventType $.myTimestamp ← $.timestamp Rename $.myTimestamp ⇒ $.othergrandparent.otherparent.date
|
Next Step: Security Operations: Deep Dive into UDM Parsing - Part 1.3