Hello! I am struggling with how to handle nested arrays in my parsers. I have been reviewing the following documentation but I still am unable to full wrap my head around how to make it all work. I have the following JSON log (its a lot longer but I just want to see how to start it)
And following the documentation I conjured up the following but I continue to run into generic errors, I havent include the “host.name” or the “os.type” yet because I wasnt able to get the “source_type” out of the log:
Any assistance would be wonder - these logs are extremely important and I would like to be able to extract as much as possible from them into UDM fields.
Thanks!
Page 1 / 1
Hi GSCoNist,
Are you sure about the structure and format of the raw log you shared ? I had to terminate the raw log properly to be treated as JSON, otherwise you would need to use the string functions with regex to replace the end of the log message with proper brackets.
Mapping additional fields is covered in Part 3 of the UDM deep dive adoption guide which should be released in few days.
Also please let me know if the sample raw log you shared has the same structure as what you are getting. I can fix the format in the parser but this would mean that the logs you are receiving are not properly terminated and should be fixed from the source.
@AbdElHafez Thank you for the response. I should have put I removed alot of the lines from the raw log in the initial question, I had removed a large majority of the log just for brevity cause it was roughly 300+ lines. But they are properly formatted in the SIEM.
As for the syntax your response helps me so much.
One thing that I am kind of worried about is I have multiple logRecords in this log would I handle using the same for loop? Ill post the whole log below, the biggest thing that I would need is the stringValues for example this log has 5 of those records. Would I just label them 1 through 5, one thing is that some of these logs have 30+ of those stringValue’s so im not quite sure how that would work.
@AbdElHafez So I was able to get some of the logRecords values but when using replace it overwrites every label with the last value in the log. I am not quite sure how to handle that. Here is what I currently have for the parser.
Repeated additional fields with the same key will be problematic in searches, so I would suggest mapping the values into a different repeated field like security_result key-value pairs instead.
Also if you want we could concatenate the tokens into a list ; value2|value3|value4|value5 .
That works perfectly!
I also got it to work with the following but I am not sure if its considered best practice or not but the result was the same
for l1, _resourceLogs in resourceLogs map { for l2, _scopeRecords in _resourceLogs.scopeLogs map { for l3, _logRecords in _scopeRecords.logRecords map { if [_logRecords][body][stringValue] != "" { mutate { replace => { "_label.value" => "%{_logRecords.body.stringValue}" } } mutate { merge => { "event.idm.read_only_udm.principal.resource.attribute.labels" => "_label" } } mutate { remove_field => ["_label"] } } } } }
Got me the following output:
metadata.event_timestamp"2025-09-24T19:16:41Z" metadata.event_type"GENERIC_EVENT" metadata.log_type"ADOBE_EXPERIENCE_MANAGER" additional.fields["com.splunk.sourcetype"]"cq-access" principal.hostname"girlscouts-prod65-1-dispatcher1useast1-28596417" principal.platform"LINUX" principal.resource.attribute.labels[0].value" [removed by moderator] "www.gskentucky.org" - [22/Sep/2025:09:16:38 -0400] "GET /etc.clientlibs/clientlibs/granite/jquery/granite.min.js HTTP/1.1" 200 1639 "https://www.gskentucky.org/en/sf-events-repository/2025/troop-activity-camp----camp-judy-layne.html?fbclid=IwZXh0bgNhZW0CMTEAAR6OqbIhdAepmlZSEVsu6pgL8YSF0vG33jhiNdVy_sYi8dZ_rrk46HJPePUiTw_aem_CsbkI68aIQZhm1Xskw3W1A" "Mozilla/5.0 (iPhone; CPU iPhone OS 18_6_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/22G100 [FBAN/FBIOS;FBAV/530.0.0.59.75;FBBV/790686474;FBDV/iPhone17,1;FBMD/iPhone;FBSN/iOS;FBSV/18.6.2;FBSS/3;FBID/phone;FBLC/en_US;FBOP/5;FBRV/795242686;IABMV/1]"" principal.resource.attribute.labels[1].value" [removed by moderator] "www.gskentucky.org" - [22/Sep/2025:09:16:38 -0400] "GET /etc.clientlibs/clientlibs/granite/jquery.min.js HTTP/1.1" 200 36216 "https://www.gskentucky.org/en/sf-events-repository/2025/troop-activity-camp----camp-judy-layne.html?fbclid=IwZXh0bgNhZW0CMTEAAR6OqbIhdAepmlZSEVsu6pgL8YSF0vG33jhiNdVy_sYi8dZ_rrk46HJPePUiTw_aem_CsbkI68aIQZhm1Xskw3W1A" "Mozilla/5.0 (iPhone; CPU iPhone OS 18_6_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/22G100 [FBAN/FBIOS;FBAV/530.0.0.59.75;FBBV/790686474;FBDV/iPhone17,1;FBMD/iPhone;FBSN/iOS;FBSV/18.6.2;FBSS/3;FBID/phone;FBLC/en_US;FBOP/5;FBRV/795242686;IABMV/1]"" principal.resource.attribute.labels[2].value" [removed by moderator] "www.girlscouts-swtx.org" - [22/Sep/2025:09:16:38 -0400] "GET /libs/granite/csrf/token.json HTTP/1.1" 200 2 "https://www.girlscouts-swtx.org/en/members/for-girl-scouts-and-families/fall-product-program.html" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/140.0.0.0 Safari/537.36"" principal.resource.attribute.labels[3].value" [removed by moderator] "www.gskentucky.org" - [22/Sep/2025:09:16:38 -0400] "GET /etc.clientlibs/gsusafoundation/clientlibs/clientlib-gsusa-site.min.js HTTP/1.1" 200 308613 "https://www.gskentucky.org/en/sf-events-repository/2025/troop-activity-camp----camp-judy-layne.html?fbclid=IwZXh0bgNhZW0CMTEAAR6OqbIhdAepmlZSEVsu6pgL8YSF0vG33jhiNdVy_sYi8dZ_rrk46HJPePUiTw_aem_CsbkI68aIQZhm1Xskw3W1A" "Mozilla/5.0 (iPhone; CPU iPhone OS 18_6_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/22G100 [FBAN/FBIOS;FBAV/530.0.0.59.75;FBBV/790686474;FBDV/iPhone17,1;FBMD/iPhone;FBSN/iOS;FBSV/18.6.2;FBSS/3;FBID/phone;FBLC/en_US;FBOP/5;FBRV/795242686;IABMV/1]"" principal.resource.attribute.labels[4].value" [removed by moderator] "www.girlscouts.org" - [22/Sep/2025:09:16:38 -0400] "GET /libs/granite/csrf/token.json HTTP/1.1" 200 2 "-" "Amazon CloudFront"" principal.asset.hostname"girlscouts-prod65-1-dispatcher1useast1-28596417"
Thank you so much for your help [removed by moderator] you’ve made it way easier to understand how to handle nested arrays in these logs. If I could send you some drinks or food I 100% would!
@GSCoNist- You could use the dynamic labeling like what I have done, Or use another repeated field that won’t be problematic with the same key ( security_result.ruleLabels.key/value) Or concatenate them in a single value ( like value1|value2|value3|value4….etc ) after testing and verifying that the long name won’t cause any overflows/truncations in the UDM field.
@AbdElHafez I’ll check that out for sure. I’m just happy to have multiple different avenues - my brain was melting a little bit :)
Again, thanks for all your help its finally starting to make a little more sense.
I am just glad I was able to help😁 your kind words are more than enough and really made my day.
I would say you would better try to use GROK patterns to split the labels.value into separate field (request method, URL, return code, user agent,...etc). You could use the patterns define in Apache webserver parser since the values you have look like a standard Apache logs format. If you like I could try to modify your custom parser to do the same -but may be tomorrow or by Friday :) - .
I would suggest looking at the adoption guides here, for the parser I covered the repeated fields in details, and the additional fields will be covered in Part 3 to be released this month.
@AbdElHafez I see about that for sure. I was running into an interesting issue when I went to validate the extension. It seems to be randomly failing on large logs that I am getting from AEM. I got the following error:
generic::unknown: pipeline.ParseLogEntry failed: LOG_PARSING_CBN_ERROR: "generic::invalid_argument: failed to convert raw output to events: failed to convert raw message 0: field \"idm\": index 0: recursive rawDataToProto failed: field \"read_only_udm\": index 0: recursive rawDataToProto failed: field \"additional\": index 0: recursive rawDataToProto failed: field \"fields\": index 4: recursive rawDataToProto failed: field \"value\": index 0: recursive rawDataToProto failed: field \"string_value\": containing one-of \"kind\" already set by field \"string_value\"”
But what is even more odd is when I use the failed log, if I click the preview it fails one time and if I click it again it works with no issues. I was wondering if maybe I was pushing a buffer to the max and it initally fails but if I try again it reinitalizes? One of my logs is roughly 500+ entries in that body section:
principal.resource.attribute.labels[593].value"24.09.2025 16:13:53.945 *DEBUG* [10.43.0.62 [ [removed by moderator] ] POST /content/girlscouts-vtk/service/react/action/change-meeting-positions.html HTTP/1.1] org.girlscouts.vtk.osgi.service.impl.GirlScoutsOCMRepositoryImpl Reading node at: /content/girlscouts-vtk/meetings/library/daisy/D18B10 from cache."
Is there a way I can skip over logs that contain certain phrases or is this just something on the backend that I cant exceed a certain amount of space?
@GSCoNist- I could think of 2 workarounds ;
Generating multiple events from the same logs, if that is feasible please let me know with a sample how do you want to formulate/segregate the different events from a single log.
Using GROK patterns and regex to capture only the required information or split them across different suitable sub-fields instead of a single field. That I can also give an example for if I have a sample log what are the required field. I noticed most of your internal nested data look like proxy logs so there is an existing list of GROK patterns for them.
I could try to ask internally about the buffer space issue, but I do not think it can or should be exceeded.