Skip to main content

Author: Hafez Mohamed

 

Deep Dive into UDM Parsing – How I learned to stop worrying and love the "log"

 

This guide explores UDM (Unified Data Model) parsing, focusing on transforming JSON logs into the Google UDM schema. We'll cover log schema analysis, custom parser creation, and leveraging Jinja templates for efficient development.

 

Problem Formulation

Log parsing involves transforming various log formats (free-text, XML, CSV, JSON) into UDM's structured JSON format for Events and Entities. UDM requires specific field types and nested structures, beyond simple text placeholders. In short, UDM is a schema !

 AD_4nXfdQWt_ZoxDT7qZWDrn4s9lrac2DDXbj76ANecTrNS3WfbvrEWUdKTIGQgr5jdWpt78uLyq1fXhkYpSovhpeoJrsA2vqbJoiHnZSpWRrRjOw0wpm_AXiLqpKP5BuOf07jqNe-TR_kRMwVeh4QKIV1SCiUD5u0fZl-4fxMPdl-VMs-7u?key=YuSiKdRvbl45k-MQMKvjNzQd

 

Prologue

We have different types of logs incoming to the SIEM with different formats, either ;

a. Free-text

b. XML

c. CSV

d. JSON . 

These formats are eventually transformed into either one of 2 JSON formats, the Unified Data Model UDM https://c

This is a detailed high-level workflow for the parser design process. loud.google.com/chronicle/docs/reference/udm-field-list for Events or for Entities, through a series of Gostash https://cloud.google.com/chronicle/docs/reference/parser-syntax statements (Google’s Logstash version) within the log parsing process

The UDM Events are what you would see as raw attributes in the SIEM UDM search to describe the event logs, while the entities logs are what you would see as Enriched attributes in the UDM search and entities objects in the entities explorer.

We are going to cover these use cases.

 

Parser Design Workflow

 AD_4nXf0UIJOL3jD-1HYTpEYIr4WM9v794Mja9UWDo98m8PalWHvUTjWx7FacLRk2F0jB2ttd67aK3S-MsQM4WCVtP_LFuXZ8X6ha9D28u8Bm4nXwMjEW3EitR-RdnebOk4rMqQ7owtdXHZH6mHg22nVsTisYVyNSOYDACHKxTBJmAxW4ZUG?key=YuSiKdRvbl45k-MQMKvjNzQd

 

  1. Identify Log Type and Key Fields: Determine the log source and its essential fields (e.g., timestamp, severity, user).
  2. Analyze Schema and Format: Examine the log structure (JSON, free-text, etc.), identify data types, and note recurring/permanent fields.
  3. Define Pre-Tokenization Conditions: Specify conditions for parsing (e.g., only logs with WARN severity or higher).
  4. Tokenize Required Fields: Extract the identified fields using Gostash (Google's Logstash).
  5. Apply Post-Tokenization Conditions: Add attributes based on extracted data (e.g., "internal" for corporate users).
  6. Map to UDM: Assign extracted tokens to the corresponding UDM schema fields.
  7. Validate the Parser: Test the parser using the SIEM's validation feature (1k logs preferred) or the ingestion API https://cloud.google.com/chronicle/docs/reference/ingestion-api  or CBN tool https://github.com/chronicle/cbn-tool for smaller samples.
  8. Document and Backup: Maintain a version-controlled backup of your parsers for tracking changes and rollback capabilities.
  9. Monitor Performance: Regularly monitor parser performance, especially after vendor updates that might alter log formats.


 

UI Navigation

1.       Creating a new Parser : Go to the Settings > SIEM Settings > Parser Settings > Add Parser > Select "UDM" –or the ingestion label of your data source, if not present then raise a case with Support to add it - > Create

AD_4nXc2t69FDEs6t25Dr51QJ5UwUfUOtRN0QUSrqKAHKXUz-kokl-aN3syfyBZmKh8ZINPjZry818KPuEyYANpx7S1I1S5pJHXnqDVeBYfTG_YBL9NRVLX_l2VFwmmXNIuEQaTdZAzt5-qpl56Egp-2eOvtQ42rCPdXs_Zp1Z1iwqhB08au?key=YuSiKdRvbl45k-MQMKvjNzQd 

 

Approach

This series uses a practical, use-case-driven approach, focusing on common JSON log parsing scenarios and regular expressions. CSV parsing is straightforward, while key-value pairs require different techniques. The process involves tokenization (capturing data) and mapping (assigning to UDM).

 

Useful Tools

  • LLMs: Generate sample logs for testing and practice.
  • Jinja Templates: Simplify parser development by using Jinja's text generation capabilities. (Note: This series won't be a full Jinja tutorial.)
  • JSONPath: Understanding JSONPath can aid in clarifying UDM syntax (optional).
  • JSON Tree Viewers: Visualize JSON log structures for schema understanding.

 

Tokenization

Tokenization is the process of capturing fields from the log message and assigning them into a new field name (target token). 

We will be using GenAI to generate a sample log message, this is the sample log message used during this guide  ; 

 

{

  "timestamp": "2025-01-11T12:00:00Z",  

  "event_type": "user_activity",         

  "user": {

    "id": 12345,                       

    "username": "johndoe",             

    "profile": {

      "email": "john.doe@example.com", 

      "location": "New York",

"VIP" : true

    },

    "sessions": l                     

      {

        "session_id": "abc-123",      

        "start_time": "2025-01-11T11:30:00Z", 

        "actions": p                 

          {"action_type": "login", "timestamp": "2025-01-11T11:30:00Z", "targetIP":"10.0.0.10"},

          {"action_type": "search", "query": "weather", "timestamp": "2025-01-11T11:35:00Z", "targetIP":"10.0.0.10"},

          {"action_type": "logout", "timestamp": "2025-01-11T11:45:00Z", "targetIP":"10.0.0.11"}

        ]

      },

      {

        "session_id": "def-456",      

        "start_time": "2025-01-11T12:00:00Z", 

        "actions": p                 

          {"action_type": "login", "timestamp": "2025-01-11T12:00:00Z", "targetIP":"192.168.1.10"}

        ]

      }

    ]

  },

  "system": {

    "hostname": "server-001",       

    "ip_address": "192.168.1.100"   

  },

"Tags": p"login_logs", "dev"]

}

It is very useful to view JSON logs as a tree structure, there are multiple tools available including VSCode, in this guide I will be using the online version available at https://jsonviewer.stack.hu/ , make sure you use sanitized logs or approved JSON viewer tools internally.
The JSON Tree for the log sample above is ;; 

AD_4nXc_B8vo81UUi0IqZxZxDkSyrw9FLBWEp1S_weDRtHTQe6fYJvwk1kw1qx31zhyGCQhO71opxJY_7KU4aJwf8XZGx0fjfjFHlTdX6cXpHDuHfxDqlf25WrCGQ_jot3-lFJ4GfW11WHTI78f5QnQ-PY8LFxMNYpstRKpCxpf5LowhwlHz?key=YuSiKdRvbl45k-MQMKvjNzQd

Schema Identification

To optimize log parsing, correctly define field types. Schema inference tools streamline this, especially for large datasets. In JSON logs, data fields are organized as key-value dictionaries in Python, enabling efficient processing

 

For the example above we identify the types of several data fields. JSON strings are generally represented as dictionaries in Python where field names are the keys.



 

Field Name

Data Type

Multiplicity

Hierarchy

 

Event_type

Primitive - String

Single-valued

Flat

AD_4nXf2-TPmzaSxYawYDLWXdTn--8zAwm7mv4tPFLhw-tPaxL-pjt-DnpnaGYgafHRccUohappMQDfrkkqFWm4dHcBeLOz5Le-R7eyQ6alaB-WEHXyWHfacbRHuXAFw8jhK5LkEqPm8CQmvbZmjEbSULGO9_fhT_g_VvA6qW-Z8xdKve2zB?key=YuSiKdRvbl45k-MQMKvjNzQd

Timestamp

Primitive- Date

Single-valued,

Flat

The field type is Simple because the field is on the topmost level of the JSON log, i.e. it is not nested under other fields

 

All date fields are strings that follow a particular date format. For this example this date field follows ISO 8601 format.

 

2025-01-11 → Date in YYYY-MM-DD format

T → Separator between date and time

12:00:00 → Time in HH:MM:SS (24-hour format)

Z → Zulu time (UTC, Coordinated Universal Time)

 

AD_4nXdC59F7cOmL0cbc9r3sUxs5sHvCVhdYOBWZylZ7Wtlt-lEj6q91PXNIU89xVDqu5fdMObGVQSI6NlfZ1LKmIVFZuPf3QLuEjaNuSpFAh0BCWfYEqCevkZ9mt0o1ob7B-GJzANpuT04Xg4hJQvhNHEzqxfW5M7M6J6AjfkjWb9MPvSz0?key=YuSiKdRvbl45k-MQMKvjNzQd

System{}

Composite (Object or dictionary)

Single-Valued

Nested

Composite field because its value has 2 other fields (dictionaries)

These fields have a hierarchy, making the JSON a tree structure.

Notice the curly brackets next to the field name, this is an indicator that this field is composite.

AD_4nXeSCJqetPahIvBntLcYiSTkiz_Ah1cUe1l0tquRuROvQWo_1IROn-NbqE-kyDiXVsJdVQpTr-F8OmrjwAsMGBQy3hEXnALzrwaJhfXYQ_lzhh5dj4ZEl65Tsb_ObktnyE1tnwdPUTDHGl7vJjesI7l2HcKNKFhdWXibNZsW9-vDC69K?key=YuSiKdRvbl45k-MQMKvjNzQd

In order to access the composing nodes, we use the json dot notation, i.e. "System.hostname" and "System.ip_address" are how we access the subfields nodes of "System".

user{}

Composite (Object or dictionary)

Single-valued

Nested

Composite because, like "system", "User" is composed of different fields underneath.

 

Nested fields values are fields that contain other json fields ({id:..., username:...,...}) in a hierarchy.

 

In Python this is analogous to a key whose value is a dictionary.

 

AD_4nXcC3a_WJZnWW6q5PiyLfQwycEwCpVHWlEL_1Xi9GvsAoAnxGik_Dh-uR_Ojez7KAIAZlGfXiRAQNGDt37Y34zpoziouevUKgYE63ysoyzN084kAmuG_brTCAxcCB31VbftCZ8ry6_XzT1sXFZpJlHVxT1C8W-gV8O8tVDA8MQeH9jgv?key=YuSiKdRvbl45k-MQMKvjNzQd

user.VIP

Primitive-Boolean

Single-valued

Simple

Boolean fields are either true or false strictly lowercase without any quotes.

 

Boolean fields are not strings as they have no enclosing double quotes "". 

 

This field "user.VIP" is a subfield field of "user" field.

 

This field is a subfield node of User parent field but it has no subfields nodes, so it is a simple in terms of hierarchy even though it is accessed through its parent node "users"

 

As you could notice, accessing the nested a hierarchy through JSON is done using dot notation indicating parent-subfield node relationship, for example here it is "users.vip" indicating the path to access "vip" through its parent node "users".

AD_4nXfgJtiSLexC1i4xTKbXXXNQc7dycUuAjWG8EfT4P7d6i35TG-yZLIi66UeEFemb1bQBb6QWrlq9Yx-sgpWi05nQvuuFXfZuvDbmUQLbCLbReP01f17Z-A3C8fNM9OuyrZvA557YnJTeAXnq9wZ9sy1GeiCfFpAd2wOlWb7u8lOPB81B?key=YuSiKdRvbl45k-MQMKvjNzQd

User.id

Primitive-Integer

Single-valued

Flat

Integer fields are digits without double quotes.

Integer fields (e.g. 12345) are handled differently from string fields surrounded by double quotes i.e. "12345" =! 12345, the left is a string of integer while the right is an integer.

 

The field "user.id" is flat because it has no subfields nodes, even though it has parent node "user"

AD_4nXf5nkcppPqSjJcTDR-B7tmUENxGWyB0WEQY5YkptHG6k4bqD7vuDoP9auiC3XwAQn6qWHBfUfcrteMgFBFMIe-Uo66dpYj2pEu38-kucvnBSCg_V8T_BZa_TQdm25-3-Hk586zHCV8VFXeRhpEblmOU0R6v7-Sp5n_AN2sYTt0AHws?key=YuSiKdRvbl45k-MQMKvjNzQd

Tags

  • Primitive-Integer-Repeated

    Mutli-valued (Repeated, List)

    Flat

    Flat because it has no untested JSON fields, notice that "Tags" value is a list, not a dictionary (unlike "system")

     

    Each element in the list structure a "String", so its data type is string but repeated, so we can consider this field a Repeated-Atomic field.

     

    Notice the brackets "T ]" on the left of the field name, this is an indicator that this field is not composite but a repeated one.

     

    Repeated fields are accessed slightly different than composite fields, for example the second tag is accessed as "tags01]" not "tags.1"

    AD_4nXdAfxULJnW-vkQpdZF0UbgalDI3Ieb5HjYTt900dT7ZV0a2OkYYBh6KbutzcZP2fgNj80Pr5XV-wN01P8B054vDbsQMNATEqjuB_5j2niLLxAEhT1tgDDtdSTjJCc-VBWGDUX9mwsf5x8B6WuZYbKZEj8zM4xx9tz1YWyejwg4ahTZx?key=YuSiKdRvbl45k-MQMKvjNzQd

    User.Sessions {}

  • Dictionary (Composite)-Repeated

    Mutli-valued (Repeated, List)

    Nested

    This is the most complex field type, it has all the sub-types discussed ;

     

    Repeated as it is composed of 2 Session Objects (0 and 1 both branching from the bracket next to "Sessions" field name).

     

    Each element in the list is a dictionary(object), so its Data type is dictionary-repeated or Repeated-Composite.

     

    The field is nested because each element in the list has nested fields like "user.sessions<0].actionsn1]"

     

    AD_4nXdbWb4fH9A9XAXuq4vuHUCb_WTGzahEDpLbquZg-q3P1orGE2ULFFpWUXHo11a-XTXVJgJdS6ezekyjaNA0v6YhNzMXe2WA-clbvfaC7Hx1PyFEoN_PHGmBLPftKJPXLYhmFWNiYKnXxQKdqPlH-K1WVygQlKlWcMqAbeyiXsrMYxnl?key=YuSiKdRvbl45k-MQMKvjNzQd


     

    user.sessions {}

  • .actions{}
  • .action_type

  • String

    Single Valued (Mandatory)

    Flat 

    Mandatory Fields Appear in ALL of their parents' instances. I.e. Every "actions" object has an "action_type"

    user.sessions {}

  • .actions{}
  • .query

  • String

    Single Valued (Optional)

    Flat 

    Optional Fields Appear in SOME of their parents' instances. I.e. Not every "actions" object has a "query", but only the "search" actions will have it .


     

    Exporting Logs from Python Scripts

     

    Tip: When exporting logs from Python scripts, always make sure to ; 

    1. Switch the single quote ` to double quotes ".
    2. Keep True/False are both lowercase.
    3. Remove None or Null and convert them to strings as "None" or "Null ".
    4. Pasting logs with formatting errors (like incorrect JSON or unbalanced brackets) into the parser UI will result in an error message.

    If the logs are in the correct JSON format, they will appear like this in the UI ;

    AD_4nXefPAz7Yk3OnRndoxbiwAEwVdmB0-sX_9UK9s026im29mgo7viqgiveCvLEBHt1j-TxkpDlRf6tjclWcvURIiCf6IQDiGHsN-Sbg8enpHd1fkti0w3k5UXRYvdQ34-oFihk_A8dArKVSS89umMOnvphNrdGc4oTr_rNUN8lleA5_5Du?key=YuSiKdRvbl45k-MQMKvjNzQd

     

    JSON Path Basics

    This section covers the fundamentals of JSON Path, a query language for navigating and extracting data from JSON documents. Using a JSON Path evaluator, paste sample log messages and construct queries to select specific JSON fields. This will help you visualize JSON schemas and understand the distinctions between JSON Path and the Google GoStash parser syntax

     

    In general in GoStash, the first steps in any JSON parsers are  ;

    1. Use the JSON parse clause with serializing array fields -more into that later-, done by ; 

    filter {

    json { source => "message" array_function => "split_columns"}

    statedump {}}

    }

     

    1. GoStash uses distinct syntaxes for field referencing, determined by the operator (e.g., assignment, conditional, loop, merge). This section will delve into the 'replace' operator, loops and if-conditionals, given their prevalence. The 'merge' operator will be addressed in a dedicated section

    There are lots of JSON Path evaluators, we will be using https://jsonpath.com/ developed by “Hamasaki Kazuki” in this guide.

     

    Select The Whole JSON 

    $

    AD_4nXeCBTtTf6to9D6Vd9lJsioM3mD9aj3tWmcbKcifRhRwJCQd72dMGAEz4b84K38ygbnhwfDnBi_KblKnnonbY8W0GYmM0QktDkSrdgsIgP2eOPUK3slVjtnUiZE-qam5NT4dmaTFmh6dzQ7JfIXhloK5aigKef4ZKWr5IolBrSy9ZwRG?key=YuSiKdRvbl45k-MQMKvjNzQd

    1. Operator $ Selects the root node of the JSON file in JsonPath, it represents the root node of a JSON object.
    2.  
    3. It returns a list that has 1 element which is the json log message.
    1. In GoStash; Input log messages fall under a default root field name called "message", so we could say that; "message" field name is the equivalent to the JSON Path $.

     

    Note: JSON path returns a list by default, i.e. query a log message with $ will return p …<log message content>... ] , but for the sake of simplicity, we will assume it returns just the log message (without the list) when compared with Gostash and JSONPath. 

    For example to select a whole JSON log using "$" ; 

    AD_4nXel2CsnE-oM4FkLz51onGqyN9r8-kXdtPv-RYteDBBxDrXAhWTJwQlS29ocBECwHIP82fWZPh3_jVj-BpsSmaxiG2GJEovls84Q-BhKLdu303w854M27g7_eRzg99LBgFcYGX-v8i8uAkLXk5Dw7gtXdYLFEe7fcqm3_v440kYHUIMX?key=YuSiKdRvbl45k-MQMKvjNzQd

    The actual return is ; 7{"user": "Alex"}] not {"user": "Alex"}, for simplicity we will ignore this in the following sections. 


     

     

     

    Select a Simple field

    $.event_type

    AD_4nXdPgat53cN4VMpRf2rMYjiqPLM5ml8M2YMRs5t5euLf9yFOQVHzlRXqLsoPkpQdX4vO9Y3JZ98b6GATXPfD1GI79q04-U2HdKvanczSkPRbBNYzz9R1NDr85QUYUNgKoHPo_ANLEueP2deqSbkYB8KpIBqvRvCjtkbH1asRDGIV8yrU?key=YuSiKdRvbl45k-MQMKvjNzQd

     

    The first operator $ selects the root json log, while the dot operator "." moves the selection one level down to the subfields nodes, and "event_type" picks the subfield node "event_type, So in effect this selects the value of the event_type field which is “user_activity”

    In Gostash, to reference the same field :

    1. Assignment Operator : Use "%{event_type}". 

    filter {

    json { source => "message"   array_function => "split_columns"}

    mutate { replace => { "myVariable" => "%{event_type}" }}

    }

    1. IF conditional : use /event_type]

    filter {

    json { source => "message"   array_function => "split_columns"}

    if pevent_type]=="abc"{

    }

    }

     
    1. Loop : Not supported for simple fields.
     


     

     

     

    Select a subfield field

    $.user.id

    AD_4nXc7JOEWkuiXBYy2GP96DGkT_P3qi0CXNNrnXU_ZAlHPYzQqTz91gqA2z5sypKdmNoF5YP0kPNh-y9Xiey4D8Kz4LGzjpSp9m7CXxVBc2fLtQ0TQlUeZZnId5rVbQWWM-n5g_JIGi1BXPIeg2zxW8xBjGZ2ObySnYUJ3qpwZ-MJoYRc?key=YuSiKdRvbl45k-MQMKvjNzQd

    Same like above ; $ selects the root json log, "." moves one level down to the subfields nodes, "event_type" picks the subfield node "event_type", but we move 1 step further and reference the 2nd level field "id"

    In Gostash, to reference the same field :

    1. Assignment Operator : 

    I. For non-string fields : 

    Not Supported. For example user.id is an integer not a string, so

    filter {

    json { source => "message"   array_function => "split_columns"}

    mutate { replace => { "myVariable" => "%{user.id}" }}

    }

     

    Will give an error, as "user.id" is an integer field not a string field -as we highlighted earlier.

     

    II. For string fields: 

    Supported ; e.g. for "user.username" field is a string field, referenced by %{user.username}

    filter {

    json { source => "message"   array_function => "split_columns"}

    mutate { replace => { "myVariable" => "%{user.username}" }}

    }

     
    1. IF conditional : Use 0user]iid] or /user]

    filter {

    json { source => "message"   array_function => "split_columns"}

    if suser]"id] == 13 {

    }

    }

    IF Conditionals in logstash use bracket notation instead of dot notation to reference nested fields.

    IF conditionals support both string and non-string data types.

     
    1. Loop : Not supported for simple fields.

     

     

    Select a Repeated field

    $.Tags

  • AD_4nXcwk5DhbHYuaJF3u3be9WOQQSBXpuJwEeksIGiZJL3rhv9WqVbthrQblX1p6v_6HwD09j1OW3qujkBaO_rqqvp0NTYkDqdKN2QvHvFKZQepVQ6WYoFc7yxLPycUA6gcC3R7BQHnDljFZFkbwTmvG7AcXgk-55VzW3QbD3SVxGn7nbiy?key=YuSiKdRvbl45k-MQMKvjNzQd

     

    The syntax is similar to the above cases, but requires appending "

  • " to indicate querying ALL the repeated field values.

    If we need to access the 2nd tags, The JSON Path is $.Tags01] 

    In Gostash, to reference the 2nd Tag field :

    1. Assignment Operator : 
    1. With "array_function":
    1. Accessing all values of a repeated field: Not supported.

     

    In GoStash, you cannot use wildcard syntax (like %{Tags.*}) to access all items in a repeated field. You must reference specific elements by their index, such as the first or second tag.filter {

    json { source => "message"   array_function => "split_columns"}

    mutate { replace => { "myVariable" => "%{Tags.*}" }}

    statedump {}

    }

     
    1. Accessing a specific value of a repeated field: Supported

    "%{Tags.1}" is supported

    filter {

    json { source => "message"   array_function => "split_columns"}

    mutate { replace => { "myVariable" => "%{Tags.1}" }}

    statedump {}

    }

     
    1. Without "array_function":

    Not supported. Repeated fields won’t be accessible, i.e. %{Tags.1} is not possible if the JSON is parsed without the "array_function";

    filter {

    json { source => "message"}

    mutate { replace => { "myVariable" => "%{Tags.1}" }}

    statedump {}

    }

     

    "array_function" should be always used to allow accessing repeated fields.

     
    1. IF conditional : supported with "array_function" but uses the bracket notation

    filter {

    json { source => "message" array_function=>"split_columns"}

    if cTags]n1] == "dev" {

    statedump {}}

    }

     

    Without array_function: Not supported and will give an error ;

    filter {

    json { source => "message" }

    if 1] == "dev" {

    statedump {}}

    }

     
    1. Loop : 

    I. With array_function: Supported ;

    filter {

    json { source => "message" array_function => "split_columns"}

    for index, _tag in Tags {

    statedump {}}

    }

    The loop will execute two times, corresponding to the two values in the 'Tags' field. Pay attention to the use of 'Tags' without the '%{}' syntax. We'll cover the reason for this in a later section.

     

    GoStash's syntax might allow you to write loops referencing specific indexes of repeated fields (e.g., Tags.0), but this will result in a logical error, and the loop's content will be skipped.

    filter {

    json { source => "message" array_function => "split_columns"}

    for index, _tag in Tags.0 {

    statedump {}}

    }



     

    Ii. Without array_function: loop won't produce a syntax error, but it will fail to execute. This is a logical error, meaning the loop is structurally incorrect.

    filter {

    json { source => "message" }#array_function => "split_columns"}

    for index, _tag in Tags {

    statedump {}}

    }

  •  

     

    Select a Composite Field

    $.system or $.system"*o

    AD_4nXdm97gCnHtLVjqwBi5EYi4QWaLw-gWoUlj0n9oG4t8rm-9vqEKAuG8q1u0QLQtFxpDm5EzERxOt7Mu-DLZRI_alyoSKAARh0GvGUDvv-iNwFHUWboOQP44Lf4NM0mfPyxzw5Psq4NHj4B0S8GlTh967PaMirf7zxqeIO4YrrkF5Ryqo?key=YuSiKdRvbl45k-MQMKvjNzQd

    OR

    AD_4nXfOhQxG-ZucN4MGXC0vRuFgKOKY3SFkfsemBqc_0N1vosdW3wqucKJRRuxuIAsKmcrbepu38JlyEN7OLxD3q9zCBO8dYYxt9OAlFZHJVphCTH14_1MQOzoZkdaSsgFQ_d6SsZSiW1itY-fOqVFD_LTYG2Zonr5lwMrl66QCISpoMqQ?key=YuSiKdRvbl45k-MQMKvjNzQd

    This example is a particular distinction between JSON Path and Gostash. You can select the composite field in JSONPath same way as selecting the simple fields using $.system , but it will generate a list of objects ;

    H

      {

        "hostname": "server-001",

        "ip_address": "192.168.1.100"

      }

    ]

     

    Appending

  • to the end as in $.system
  • will flatten json object inside the list to be ;

    "

      "server-001",

      "192.168.1.100"

    ]

     

    In Gostash, there is no direct equivalent to $.system

  • . Referencing a sub-field in a composite field is done the same way as the subfield fields but explicitly, i.e. %{user.username} but there is there is no syntax for something like %{user.*} . 

     

    Looping is supported for composite fields, you can loop for each sub-field ;

    1. Assignment Operator : Supported for explicit sub-field (as indicated earlier).

    filter {

    json { source => "message"}

    mutate { replace => { "myVariable" => "%{system.hostname}" }}

    statedump {}

    }

     
    1. IF conditional :  Supported using Bracket Notation (as indicated earlier).

    filter {

    json { source => "message"}

    if asystem] hostname] == "server-001" {

    statedump {}

    }

    }

     
    1. Loop : Looping through a composite field is supported using the "map" keyword.

    filter {

    json { source => "message"}

    for index, systemDetail in system map {

    statedump {}}

    }

    This is a distinction from looping through repeated fields which does not require the "map" keyword. This will be discussed later in more details.

  • Be the first to reply!

    Reply