Skip to main content

Hello everyone,

I am having a quite hard time trying to parse a MalwareByte logs in CEF + KV format, since the kv pairs are  separated by a simple space and several values contains spaces as well. Here a (reconstructed) example:

 

 

<13>Apr 8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1.2.0.1193|Detection|PUP found|2|deviceExternalId=239dw57h9861fe48342534f dvchost=cercer deviceDnsDomain=fake.local dvcmac=458234F23E33 dvc=10.10.10.10 rt=Apr 08 2024 14:59:06 Z fileType=file cat=PUP act=found msg=PUP found\\nFile: C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\in.ldb\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9 filePath=C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\1NCu.ldb cs1Label=Detection name cs1=PUP.Optional.PushNotifications.Generic cs3Label=Detection ID cs3=ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj

 

 

 

I tried several approaches to solve this, but could not make it work. Big problem is the regex captuing functions do not work, so trying regex patterns like

 

 

gsub => ["inner_message", "(\\\\w=)", ",\\\\1"]

 

 

to modify the separator char are useless.

Is there any other peculiar function or trick that I am missing? I see there are several prebuilt parser working on CEF formats, so there must be a way around this...

 

Many thanks

 

A

You're on the right track! We need to use gsub to replace the "spaces" as our field_split. I quickly wrote the below example that replaces them with "^" and then parses using kv extraction. Here's the parser snippet and corresponding output it yielded.

mutate { gsub => ["message", " ([a-zA-Z0-9]+)=", "^$1="] } kv { source => "message" field_split => "^" value_split => "=" }

My statedump yields the following output (you'll still have to parse CEF headers, but it solves for the KV problem)

Internal State (label=): { "\\u003c13\\u003eApr 8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1": { "2": { "0": { "1193|Detection|PUP found|2|deviceExternalId": "239dw57h9861fe48342534f" } } }, "@createTimestamp": { "nanos": 0, "seconds": 1712754380 }, "@enableCbnForLoop": true, "@onErrorCount": 0, "@output": [], "@timezone": "", "act": "found", "cat": "PUP", "cs1": "PUP.Optional.PushNotifications.Generic", "cs1Label": "Detection name", "cs3": "ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj", "cs3Label": "Detection ID", "deviceDnsDomain": "fake.local", "dvc": "10.10.10.10", "dvchost": "cercer", "dvcmac": "458234F23E33", "filePath": "C:\\\\\\\\Users\\\\\\\\Gengis\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Google\\\\\\\\Chrome\\\\\\\\User Data\\\\\\\\Default\\\\\\\\Sync Data\\\\\\\\Rott\\\\\\\\1NCu.ldb", "fileType": "file", "message": "\\u003c13\\u003eApr 8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1.2.0.1193|Detection|PUP found|2|deviceExternalId=239dw57h9861fe48342534f^dvchost=cercer^deviceDnsDomain=fake.local^dvcmac=458234F23E33^dvc=10.10.10.10^rt=Apr 08 2024 14:59:06 Z^fileType=file^cat=PUP^act=found^msg=PUP found\\\\nFile: C:\\\\\\\\Users\\\\\\\\Gengis\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Google\\\\\\\\Chrome\\\\\\\\User Data\\\\\\\\Default\\\\\\\\Sync Data\\\\\\\\Rott\\\\\\\\in.ldb\\\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9^filePath=C:\\\\\\\\Users\\\\\\\\Gengis\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Google\\\\\\\\Chrome\\\\\\\\User Data\\\\\\\\Default\\\\\\\\Sync Data\\\\\\\\Rott\\\\\\\\1NCu.ldb^cs1Label=Detection name^cs1=PUP.Optional.PushNotifications.Generic^cs3Label=Detection ID^cs3=ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj \\n", "msg": "PUP found\\\\nFile: C:\\\\\\\\Users\\\\\\\\Gengis\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Google\\\\\\\\Chrome\\\\\\\\User Data\\\\\\\\Default\\\\\\\\Sync Data\\\\\\\\Rott\\\\\\\\in.ldb\\\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9", "node": "", "rt": "Apr 08 2024 14:59:06 Z" }

 


You're on the right track! We need to use gsub to replace the "spaces" as our field_split. I quickly wrote the below example that replaces them with "^" and then parses using kv extraction. Here's the parser snippet and corresponding output it yielded.

mutate { gsub => ["message", " ([a-zA-Z0-9]+)=", "^$1="] } kv { source => "message" field_split => "^" value_split => "=" }

My statedump yields the following output (you'll still have to parse CEF headers, but it solves for the KV problem)

Internal State (label=): { "\\u003c13\\u003eApr 8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1": { "2": { "0": { "1193|Detection|PUP found|2|deviceExternalId": "239dw57h9861fe48342534f" } } }, "@createTimestamp": { "nanos": 0, "seconds": 1712754380 }, "@enableCbnForLoop": true, "@onErrorCount": 0, "@output": [], "@timezone": "", "act": "found", "cat": "PUP", "cs1": "PUP.Optional.PushNotifications.Generic", "cs1Label": "Detection name", "cs3": "ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj", "cs3Label": "Detection ID", "deviceDnsDomain": "fake.local", "dvc": "10.10.10.10", "dvchost": "cercer", "dvcmac": "458234F23E33", "filePath": "C:\\\\\\\\Users\\\\\\\\Gengis\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Google\\\\\\\\Chrome\\\\\\\\User Data\\\\\\\\Default\\\\\\\\Sync Data\\\\\\\\Rott\\\\\\\\1NCu.ldb", "fileType": "file", "message": "\\u003c13\\u003eApr 8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1.2.0.1193|Detection|PUP found|2|deviceExternalId=239dw57h9861fe48342534f^dvchost=cercer^deviceDnsDomain=fake.local^dvcmac=458234F23E33^dvc=10.10.10.10^rt=Apr 08 2024 14:59:06 Z^fileType=file^cat=PUP^act=found^msg=PUP found\\\\nFile: C:\\\\\\\\Users\\\\\\\\Gengis\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Google\\\\\\\\Chrome\\\\\\\\User Data\\\\\\\\Default\\\\\\\\Sync Data\\\\\\\\Rott\\\\\\\\in.ldb\\\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9^filePath=C:\\\\\\\\Users\\\\\\\\Gengis\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Google\\\\\\\\Chrome\\\\\\\\User Data\\\\\\\\Default\\\\\\\\Sync Data\\\\\\\\Rott\\\\\\\\1NCu.ldb^cs1Label=Detection name^cs1=PUP.Optional.PushNotifications.Generic^cs3Label=Detection ID^cs3=ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj \\n", "msg": "PUP found\\\\nFile: C:\\\\\\\\Users\\\\\\\\Gengis\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Google\\\\\\\\Chrome\\\\\\\\User Data\\\\\\\\Default\\\\\\\\Sync Data\\\\\\\\Rott\\\\\\\\in.ldb\\\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9", "node": "", "rt": "Apr 08 2024 14:59:06 Z" }

 


That's great @mikewilusz! thanks so much for the fast solution. 

So it cames out that the capturing group functions are available, do you confirm? I must have misread  about it!

Many thanks again!

 

A

 


That's great @mikewilusz! thanks so much for the fast solution. 

So it cames out that the capturing group functions are available, do you confirm? I must have misread  about it!

Many thanks again!

 

A

 


Correct, capture groups are supported. You can note the usage of "$1" to reference the capture group I used to get the field name.

-mike


Reply