Skip to main content

Capstone Example

Capstone Example: Constructing a List of Complex Composite JSON Objects

In this example we will pick a more complex scenario ; Construct a JSON object consisting of  "action_type","target_ip" of the first session only, and the user integer ID.

     JSON Path of Source Fields ; 

          $.message.user{}.sessions}0].actions

  • .action_type

              $.message.user{}.sessionsu0].actions

  • .targetIP 

              $.message.user{}.sessionss0].actions

  • .query if it exists 

              $.message.user{}.id 

     

         Required Target Schema ;

              listFirstSession (Repeated, Composite) eachElement – action (Composite) –                            type(string) , dst(string) and userId (integer).

              In short ; listFirstSessiono] – action – type, dst, userId 

     

              The expected constructed field should look like ; 

     

              {"listFirstSession" : < 

              {"action": {"type": "login", "dst" : "10.0.0.10", "userId": 12345}},

              {"action": {"type": "search",  "dst" : "10.0.0.10", "userId": 12345, "query":"weather"}},

              {"action": {"type": "logout", "dst" : "10.0.0.11", "userId": 12345}} 

              ]}

  • Input Schema before Flattening

    AD_4nXdAShxRc_7r-vu38Nilo9HOaqjyrFepis8PcbjLxTVcNe31u7AfwBIpj1M1wPainpqBFk_IyCcbXNcPfwtqjZJq3BhiGdBHvR5w8dHLgaohAUBFQSzkE0i7Re38lU7Clu8KG52bPlE?key=D6S2Pz4xypBqnhbqU9jG-8gv

    Input Schema after FlatteningAD_4nXfopgk8MQdHoTj7XpzETTx-0jJgMm_ZOIZ1dRTCoeT7jykeIOtSrpVaLA4gPQdaqN3gMYHq5V67mpylZ9g8_U9d3tmDdChUNDu7GDd2aC4oYYHM5SnhNUjm0Ma3IFFJxCIkCkPin44?key=D6S2Pz4xypBqnhbqU9jG-8gv

    Required Target Schema 

    AD_4nXc74_ITxLrAKmo3l813609DsRhV7BxQ6GTGVTQVMX0yXCsvLezbUdz5ajN5VHnFKV5juDGLPz6TQOM9h-x9LufxyIyFyGWn5C3PSaiQgdFujQFySs__PNI7HaEWFsxio1INHgl5cw?key=D6S2Pz4xypBqnhbqU9jG-8gv

    AD_4nXdCti_eeg-SFy1Ct2boy4uuPBJoCPrKzdxvzNj_y9y_a7lLtQy1eDIbGNnpnxkn5mp8Owl-ApPuxTXMI7QU0MoTy5_u5cf8WKHtAVXsdgw20Q51VeY5f_zeKEVus5WWOdtTVTTS3Pg?key=D6S2Pz4xypBqnhbqU9jG-8gv

     

     

    filter {
    json { source => "message" array_function => "split_columns"}


    mutate { convert => { "user.id" => "string"} on_error =>"convertError"}
    mutate {replace => {"userId"=>"%{user.id}"}}
    mutate { convert => { "userId" => "string"} on_error =>"missingUserId"}


    #Loop1
    for i1, v1 in user.sessions {
    if "i1] == 0 {


    #Loop2
    for i2, v2 in v1.actions {
    mutate {replace => {"temp.action.userId"=>"%{userId}"}}
    mutate {replace => {"temp.action.dst"=>"%{v2.targetIP}"}}
    mutate {replace => {"temp.action.type"=>"%{v2.action_type}"}}
    mutate {replace => {"temp.action.query"=>"%{v2.query}"} on_error => "missingQueryField"}
    mutate {merge => {"listFirstSession"=>"temp"} on_error => "mergeError"}
    mutate {replace => {"temp"=>""} on_error => "errorClearingTempPlaceHolder"}
    }
    }
    }
    statedump {}
    }

     

    Snippet from statedump output:

    {… 

     "listFirstSession": >

        {

          "action": {

            "dst": "10.0.0.10",

            "type": "login",

            "userId": "12345"

          }

        },

        {

          "action": {

            "dst": "10.0.0.10",

            "query": "weather",

            "type": "search",

            "userId": "12345"

          }

        },

        {

          "action": {

            "dst": "10.0.0.11",

            "type": "logout",

            "userId": "12345"

          }

        }

      ],



     

    ..}

    Schematic for the transform

    AD_4nXdfM-_TlLhsyYk470tEeKDrQR1hvTQGxM-Arktb06ysdeFbofmxVQsnemrSItngQlFx2lVlRTqncHAI1NPKuPzk7KT6zicJbmoYLjWqflMVR49Ugzyds-BSH0FctQaMsxx2p-mJcRE?key=D6S2Pz4xypBqnhbqU9jG-8gv

    Summary: The core logic for mapping data involves transforming the input schema into a target schema with the following structure: listFirstSession(Repeated).action(composite).type, dst, userid, query(atomic).

    We'll use a temporary variable, temp, to hold the structured data for each action within the first session. This variable will have a hierarchy mirroring the target: temp.action.type, temp.action.dst, temp.action.userid, and temp.action.query.

    The input data has two levels of repeated fields: user.sessions and user.sessions.actions. To process all the actions, we'll implement two nested loops. The outer loop iterates through each session, and the inner loop iterates through the actions within each session.

    Finally, because the query field in the input is optional and can appear multiple times, we'll include error handling to gracefully manage cases where it's absent."
     

    1. 1. We begin by extracting and converting the $.user.id to an integer field              userId, as these fields are directly accessible and outside the scope of            the session loops.

      2. To access the 'actions', which are nested within                                             $.message.user.sessionsc].actionsn], we'll use two nested loops. The outer      loop iterates through each session, and the inner loop iterates through the      actions within the current session. The variable v2 will represent each              individual 'actions' object in the inner loop.
    2. 3. To target only the first session, we introduce a conditional statement in            the outer loop: if i1 == 0. This ensures that the inner loop processing                actions are only executed for the session at index 0.

    3. 4. Within the inner loop, we extract the mandatory fields 'action_type' and             'targetIP' , along with the common field ‘userId’ ’into a temporary variable        $.temp with hierarchy $.temp.action.type,dst,userId. For the optional                'query' field, we use a conditional check using on_error to implement an            operator similar to  if Exists v2.query.
    4. 5. For each action processed in the loops, we populate a temporary                     placeholder $.temp.action with the following:
    •         • $.temp.action.type ← v2.action_type

    •         • $.temp.action.dst ← v2.targetIP

    •         • $.temp.action.userId ← $.userId (This is the username captured                      earlier, common to all actions in the first session)

    •         •$.temp.action.query ← v2.query ifExists, on_error :
               $.queryMissingFlag ← true (If 'query' exists, we map it; otherwise, we               set a flag).

    1. 6. Once the $.temp.action structure is populated for a given action, we                append the top-level $.temp object to the $.listFirstSession repeated field      using merge to implement ; $.listFirstSession0] ← append ($.temp).
    2. 7. After each merge operation, we clear the $.temp placeholder to ensure           that subsequent action mappings are independent.
      1.  
      2. Using the algorithm described above, we smoothly managed to capture AND map the fields in the target schema

    AD_4nXf-JqxV6vZ5KEVRNuG-LEU23FH_qL8HhsW8_qcdLeC-EMVLJcbsl0GrUpePhAXYUZ-pytL_vVPzXDLX1L2Tyv3gGDx297cKLtiLOLDMidxKlpzxsmQnj_osOCvnFuoVPD0BRnTKuOc?key=D6S2Pz4xypBqnhbqU9jG-8gv

     

    UDM Schema Mapping

    In this final example, we will use what was discussed so far to tokenize and map target fields to UDM event format, in addition to a few more .

     

    Interpreting UDM Schema

    The UDM usage guide https://cloud.google.com/chronicle/docs/unified-data-model/udm-usage  and UDM field list https://cloud.google.com/chronicle/docs/reference/udm-field-list documents detail the data model for UDM entity and events. The focus of this guide is on Events UDM Schema.

    We highlight the following properties ; 

     

    1. Schema Adherence: All mapped fields must strictly conform to the UDM data model structure and the defined data type for that field (e.g., Integer fields require integer tokens, Repeated fields require List tokens).

     

    1. Some UDM events have mandatory fields, for example ;

      1. Mandatory 'Metadata.event_type': Every UDM event necessitates a value for the 'Metadata.event_type' field. The UDM usage guide lists the possible values. https://cloud.google.com/chronicle/docs/unified-data-model/udm-usage#metadataevent_type , 'GENERIC_EVENT' serves as a versatile, catch-all type.

      2. Conditional Field Requirements: The UDM documentation specifies that the necessity (optional or mandatory) of other fields depends on the value of 'Metadata.event_type’ as listed in ;

    1. Avoid Deprecated Fields: It is essential to refrain from using any fields marked as deprecated in the UDM field list to ensure long-term compatibility as listed in https://cloud.google.com/chronicle/docs/deprecations

    Part 2.4 will cover miniature examples of how to map source data to the UDM (Unified Data Model) format, focusing on the mapping algorithm and avoiding complex loops.

    Be the first to reply!

    Reply