Parsing CEF format or other fields containing separator character within

Question

Hello everyone,I am having a quite hard time trying to parse a MalwareByte logs in CEF + KV format, since the kv pairs are  separated by a simple space and several values contains spaces as well. Here a (reconstructed) example:  <13>Apr  8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1.2.0.1193|Detection|PUP found|2|deviceExternalId=239dw57h9861fe48342534f dvchost=cercer deviceDnsDomain=fake.local dvcmac=458234F23E33 dvc=10.10.10.10 rt=Apr 08 2024 14:59:06 Z fileType=file cat=PUP act=found msg=PUP found\nFile: C:\\Users\\Gengis\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\Sync Data\\Rott\\in.ldb\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9 filePath=C:\\Users\\Gengis\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\Sync Data\\Rott\\1NCu.ldb cs1Label=Detection name cs1=PUP.Optional.PushNotifications.Generic cs3Label=Detection ID cs3=ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj    I tried several approaches to solve this, but could not make it work. Big problem is the regex captuing functions do not work, so trying regex patterns like  gsub => ["inner_message", "(\\w=)", ",\\1"]  to modify the separator char are useless.Is there any other peculiar function or trick that I am missing? I see there are several prebuilt parser working on CEF formats, so there must be a way around this... Many thanks A

mikewilusz · Accepted Answer

You're on the right track! We need to use gsub to replace the "spaces" as our field_split. I quickly wrote the below example that replaces them with "^" and then parses using kv extraction. Here's the parser snippet and corresponding output it yielded.

mutate {
   gsub => ["message", " ([a-zA-Z0-9]+)=", "^$1="]
}
kv {
   source => "message"
   field_split => "^"
   value_split => "="
}

My statedump yields the following output (you'll still have to parse CEF headers, but it solves for the KV problem)

Internal State (label=):

{
  "\\u003c13\\u003eApr  8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1": {
    "2": {
      "0": {
        "1193|Detection|PUP found|2|deviceExternalId": "239dw57h9861fe48342534f"
      }
    }
  },
  "@createTimestamp": {
    "nanos": 0,
    "seconds": 1712754380
  },
  "@enableCbnForLoop": true,
  "@onErrorCount": 0,
  "@output": [],
  "@timezone": "",
  "act": "found",
  "cat": "PUP",
  "cs1": "PUP.Optional.PushNotifications.Generic",
  "cs1Label": "Detection name",
  "cs3": "ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj",
  "cs3Label": "Detection ID",
  "deviceDnsDomain": "fake.local",
  "dvc": "10.10.10.10",
  "dvchost": "cercer",
  "dvcmac": "458234F23E33",
  "filePath": "C:\\\\\\\\Users\\\\\\\\Gengis\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Google\\\\\\\\Chrome\\\\\\\\User Data\\\\\\\\Default\\\\\\\\Sync Data\\\\\\\\Rott\\\\\\\\1NCu.ldb",
  "fileType": "file",
  "message": "\\u003c13\\u003eApr  8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1.2.0.1193|Detection|PUP found|2|deviceExternalId=239dw57h9861fe48342534f^dvchost=cercer^deviceDnsDomain=fake.local^dvcmac=458234F23E33^dvc=10.10.10.10^rt=Apr 08 2024 14:59:06 Z^fileType=file^cat=PUP^act=found^msg=PUP found\\\\nFile: C:\\\\\\\\Users\\\\\\\\Gengis\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Google\\\\\\\\Chrome\\\\\\\\User Data\\\\\\\\Default\\\\\\\\Sync Data\\\\\\\\Rott\\\\\\\\in.ldb\\\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9^filePath=C:\\\\\\\\Users\\\\\\\\Gengis\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Google\\\\\\\\Chrome\\\\\\\\User Data\\\\\\\\Default\\\\\\\\Sync Data\\\\\\\\Rott\\\\\\\\1NCu.ldb^cs1Label=Detection name^cs1=PUP.Optional.PushNotifications.Generic^cs3Label=Detection ID^cs3=ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj \\n",
  "msg": "PUP found\\\\nFile: C:\\\\\\\\Users\\\\\\\\Gengis\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Google\\\\\\\\Chrome\\\\\\\\User Data\\\\\\\\Default\\\\\\\\Sync Data\\\\\\\\Rott\\\\\\\\in.ldb\\\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9",
  "node": "",
  "rt": "Apr 08 2024 14:59:06 Z"
}

That's great @mikewilusz! thanks so much for the fast solution.

So it cames out that the capturing group functions are available, do you confirm? I must have misread about it!

Many thanks again!

A

mikewilusz · Answer

That's great @mikewilusz! thanks so much for the fast solution.

So it cames out that the capturing group functions are available, do you confirm? I must have misread about it!

Many thanks again!

A

Correct, capture groups are supported. You can note the usage of "$1" to reference the capture group I used to get the field name.

-mike

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded