Question

Log parsing extension issue on vmware

Forum|Forum|1 month ago
October 22, 2025
8 replies
91 views

spanuganti

Log Parsing Issue in Restricted Pipeline (Chronicle/CBN/Logstash)

We are encountering persistent syntax and parsing errors in a highly restricted log processing pipeline (likely based on Logstash/Grok, possibly within Google Chronicle or CBN). We need help structuring the code to handle the strict syntax and mixed log types without using advanced conditional logic.

1. Environment and Constraint

Environment: Highly restricted log parsing pipeline (cannot use standard Logstash features).
Confirmed Constraints (Key Failures):
- NO if/else conditionals (if [field] fails, if "tag" in [tags] fails).
- NO semicolons (;) allowed anywhere in the config.
- Field names must use snake_case (e.g., host_name).

2. The Core Problem: Mixed Log Types

Our pipeline receives two distinct log formats that crash the system when we try to parse the second one, or when we encounter simple syntax errors.

Log Type	Sample Log (Raw)
Envoy Access Log (Primary Goal)	`<166>2025-10-22T14:00:00.095Z S01PSSESXUCS055.mgmt.ad.usfs.llc envoy-access[2099405]: POST /hgw/host-14948/vpxa 200 via_upstream - 15887 243 gzip 8 8 0 10.9.10.94:49874 HTTP/1.1 TLSv1.2 10.9.36.55:443 127.0.0.1:40729 HTTP/1.1 - 127.0.0.1:8089 "3addb69f" "QueryBatchPerformanceStatisticsVpxa"`
VMware ESX Log (Crashing Log)	`<166>2025-10-21T22:25:49.901Z S01PDSESXUCS054. healthd[2100030]: [Originator@6876 sub=PluginLauncher] Launching binary: /usr/lib/vmware/healthd/plugins/bin/ssd_storage ++group=healthd-plugins,mem=40 -u http://!vmwLocalSocketHealthd`

3. The Final Attempted Code (for Envoy)

We are currently trying to deploy this final, single-Grok-based solution for the primary Envoy logs, which keeps failing due to hidden syntax issues:

Ruby

filter {

    # 1. GROK: Parse the entire log line in one resilient step.
    grok {
        tag_on_failure => ["_grok_failure_envoy"]
        match => {
            "message" => "<%{POSINT}>%{SPACE}?%{TIMESTAMP_ISO8601:timestamp} %{DATA:host_name} %{DATA:process_detail}: %{WORD:http_method} %{URIPATH:http_path} %{NUMBER:http_status_code} %{DATA:route_status} - %{NUMBER:http_response_bytes} %{NUMBER:http_request_bytes} %{DATA:compression} %{NUMBER:duration_total} %{NUMBER:duration_upstream} %{NUMBER:duration_request} %{DATA:source_address} %{DATA:client_protocol} %{DATA:destination_address} %{DATA:upstream_address} %{DATA:upstream_protocol} - %{DATA:internal_address} %{DATA:request_id} %{GREEDYDATA:custom_operation}"
        }
    }

    # 2. SEGMENTED GROK: Split IP:Port fields.
    grok { source => "source_address"; match => { "source_address" => "%{IP:source_ip}:%{NUMBER:source_port}" } }
    grok { source => "destination_address"; match => { "destination_address" => "%{IP:destination_ip}:%{NUMBER:destination_port}" } }
    grok { source => "upstream_address"; match => { "upstream_address" => "%{IP:upstream_ip}:%{NUMBER:upstream_port}" } }
    grok { source => "internal_address"; match => { "internal_address" => "%{IP:internal_ip}:%{NUMBER:internal_port}" } }

    # 3. FINAL CLEANUP AND TYPE CONVERSION
    mutate {
        gsub => [
            "custom_operation", "\"", ""
        ]
        convert => {
            "http_status_code" => "integer"
            "http_response_bytes" => "integer"
            "http_request_bytes" => "integer"
            "duration_total" => "integer"
            "duration_upstream" => "integer"
            "duration_request" => "integer"
            "source_port" => "integer"
            "destination_port" => "integer"
            "upstream_port" => "integer"
            "internal_port" => "integer"
        }
        remove_field => [
            "message", "POSINT", "source_address", "destination_address", "upstream_address", "internal_address", 
            "process_detail", "route_status", "compression", "client_protocol", "upstream_protocol", "request_id"
        ]
    }
}

4. The Exact Error

The deployment is failing with a persistent syntax error, despite rigorous manual checks:

generic::invalid_argument: failed to create augmentor pipeline: failed to parse config: Parse error line 13, column 38 : illegal token ';'

5. Request for Help

Semicolon Hunter: Can anyone spot a hidden semicolon or an encoding issue in the long Grok pattern string that would cause the parser to see it at line 13, column 38?
Best Practice for No-Conditional Parsing: Given the constraint of NO conditionals, what is the best practice for deploying multiple Grok patterns to handle the two drastically different log types (Envoy vs. ESX) without the pipeline crashing on the first failure? Should we rely solely on tag_on_failure and tag_on_success?

James_E
Staff
Forum|Forum|1 month ago
October 24, 2025

The semicolon is on line 13.

spanuganti
Author
Forum|Forum|1 month ago
October 27, 2025

Looks like I’m facing different error now

filter {

# 1. GROK FILTER: Parse the entire log line and extract all fields

grok {

match => {

"message" => [

# Pattern for the Syslog header wrapped Envoy access log

"<%{POSINT}>%{TIMESTAMP_ISO8601:syslog_timestamp} %{DATA:syslog_hostname} %{DATA:syslog_process}: %{WORD:http_method} %{URIPATH:http_path} %{NUMBER:http_status_code:int} %{DATA:route_status} - %{NUMBER:http_response_bytes:int} %{NUMBER:http_request_bytes:int} %{DATA:compression} %{NUMBER:duration_total:int} %{NUMBER:duration_upstream:int} %{NUMBER:duration_request:int} %{DATA:source_address} %{DATA:client_protocol} %{DATA:destination_address} %{DATA:upstream_address} %{DATA:upstream_protocol} - %{DATA:internal_address} %{DATA:request_id} %{GREEDYDATA:custom_operation}"

]

}

# If the grok pattern fails, tag it for review/dropping

tag_on_failure => ["_grokparsefailure_envoy"]

}

# Conditional processing only if the Grok was successful

## if ![ "_grokparsefailure_envoy" ]

{

# 2. MUTATE: Split combined IP:Port fields and rename/convert fields

# Split source_address (e.g., 10.9.10.94:49724) into IP and Port

# Note: We use a temporary field _source_ip_port for the split operation

mutate {

split => { "source_address" => ":" }

add_field => {

"source_ip" => "%{[source_address][0]}"

"source_port" => "%{[source_address][1]}"

}

# Split destination_address (e.g., 10.9.36.43:443) into IP and Port

mutate {

split => { "destination_address" => ":" }

add_field => {

"destination_ip" => "%{[destination_address][0]}"

"destination_port" => "%{[destination_address][1]}"

}

# Convert split ports from strings to integers

mutate {

convert => {

"source_port" => "integer"

"destination_port" => "integer"

}

# 3. UDM MAPPING: Map cleaned fields to the final UDM destination (required for Chronicle)

mutate {

replace => {

# Time

"metadata.event_timestamp" => "%{timestamp}"

# Principal (Source)

"principal.ip" => "%{source_ip}"

"principal.port" => "%{source_port}"

"principal.hostname" => "%{syslog_hostname}"

# Target (Destination/Server)

"target.ip" => "%{destination_ip}"

"target.port" => "%{destination_port}"

# Network/HTTP Details

"network.http.method" => "%{http_method}"

"network.http.response_code" => "%{http_status_code}"

"network.protocol" => "HTTP" # Assuming HTTP traffic

"network.ip_protocol" => "TCP"

"network.sent_bytes" => "%{http_request_bytes}"

"network.received_bytes" => "%{http_response_bytes}"

# Event Metadata

"metadata.event_type" => "NETWORK_HTTP_CONNECTION"

"metadata.product_event_type" => "envoy_access_log"

"about.url" => "%{http_path}"

}

# 4. CLEANUP: Remove all temporary fields and the original message

mutate {

remove_field => [

"message",

"timestamp",

"syslog_timestamp",

"syslog_hostname",

"syslog_process",

"source_address",

"destination_address",

# Remove all intermediary fields

"http_method", "http_path", "http_status_code", "route_status",

"http_response_bytes", "http_request_bytes", "compression",

"duration_total", "duration_upstream", "duration_request",

"client_protocol", "upstream_address", "upstream_protocol",

"internal_address", "request_id", "custom_operation",

"source_ip", "source_port", "destination_ip", "destination_port"

]

}

}

Error: generic::invalid_argument: failed to create augmentor pipeline: failed to parse config: Parse error line 16, column 5 : unexpected token '{', expected one of 'comment|string|}|if|else|else if|for'

ThankYou

spanuganti
Author
Forum|Forum|1 month ago
October 27, 2025

After polishing my code..
Here is the final version. Now I’m looking to fix something in the beginning like field manipulation. I’m not sure where I’m missing.
Looking for help!!@James_E @nickharbour

filter {

# 0. TAGS INITIALIZATION: This is the critical fix. It guarantees the [tags] field exists

# for all events, preventing the "tags not found in state data" error during conditional checks.

mutate {

# Add a tag that will be removed later, ensuring the [tags] array is initialized.

add_tag => ["_tag_initializer_grok_check"]

}

# 1. GROK FILTER: Parse the log line (Your exact original pattern)

grok {

match => {

"message" => [

"<%{POSINT}>%{TIMESTAMP_ISO8601:timestamp} %{DATA:syslog_hostname} %{DATA:syslog_process}: %{WORD:http_method} %{URIPATH:http_path} %{NUMBER:http_status_code:int} %{DATA:route_status} - %{NUMBER:http_response_bytes:int} %{NUMBER:http_request_bytes:int} %{DATA:compression} %{NUMBER:duration_total:int} %{NUMBER:duration_upstream:int} %{NUMBER:duration_request:int} %{DATA:source_ip}:%{NUMBER:source_port} %{DATA:client_protocol} %{DATA} %{DATA:destination_ip}:%{NUMBER:PORTs} %{DATA:upstream_address}:%{INT:port_3} %{GREEDYDATA:upstream_protocol} %{IPV4:internal_address}:%{NUMBER:PORT_2}"

]

}

tag_on_failure => ["_grokparsefailure_envoy"]

}

# Conditional processing only if the Grok was successful. This check is now safe.

if "_grokparsefailure_envoy" not in [tags] {

# 2. RENAME: Rename the Grok-captured fields to the target names

# expected by your original 'convert' and 'cleanup' blocks.

mutate {

rename => {

"PORTs" => "destination_port"

"port_3" => "upstream_port"

"PORT_2" => "internal_port"

}

# 3. MUTATE: Convert string fields to integers

mutate {

convert => {

"http_status_code" => "integer"

"http_response_bytes" => "integer"

"http_request_bytes" => "integer"

"source_port" => "integer"

"destination_port" => "integer"

"upstream_port" => "integer"

"internal_port" => "integer"

"duration_total" => "integer"

"duration_upstream" => "integer"

"duration_request" => "integer"

}

# 4. UDM MAPPING: Map cleaned fields

mutate {

replace => {

"metadata.event_timestamp" => "%{timestamp}"

"principal.ip" => "%{source_ip}"

"principal.port" => "%{source_port}"

"principal.hostname" => "%{syslog_hostname}"

"target.ip" => "%{destination_ip}"

"target.port" => "%{destination_port}"

"network.http.method" => "%{http_method}"

"network.protocol" => "HTTP"

"network.ip_protocol" => "TCP"

"network.sent_bytes" => "%{http_request_bytes}"

"network.received_bytes" => "%{http_response_bytes}"

"metadata.event_type" => "NETWORK_HTTP_CONNECTION"

"metadata.product_event_type" => "envoy_access_log"

"about.url" => "%{http_path}"

"about.metadata.operation" => "%{custom_operation}"

}

# 5. CLEANUP: Remove all temporary fields, including the tag initializer

mutate {

remove_field => [

# The initializer tag must be removed

"_tag_initializer_grok_check",

"message",

"timestamp", "syslog_hostname", "syslog_process",

# Source/Destination fields

"source_ip", "source_port", "destination_ip", "destination_port",

# HTTP fields

"http_method", "http_path", "http_status_code", "route_status",

"http_response_bytes", "http_request_bytes", "compression",

# Duration fields

"duration_total", "duration_upstream", "duration_request",

# Connection/Upstream fields

"client_protocol",

"upstream_address", "upstream_port", "upstream_protocol",

"internal_address", "internal_port",

# Grok captured port names

"PORTs", "port_3", "PORT_2",

# Other temporary fields

"trace_id", "custom_operation"

]

}

}

Here is the Error: generic::unknown: pipeline.ParseLogEntry failed: LOG_PARSING_CBN_ERROR: "generic::invalid_argument: pipeline failed: filter conditional (2) failed: failed to evaluate expression: generic::invalid_argument: \"tags\" not found in state data"

ThankYou

spanuganti
Author
Forum|Forum|1 month ago
November 5, 2025

Any suggestions would be great!

ThankYou

James_E
Staff
Forum|Forum|1 month ago
November 5, 2025

where do you define “tags” before you use it on this line:

if "_grokparsefailure_envoy" not in [tags] {

spanuganti
Author
Forum|Forum|1 month ago
November 6, 2025

@James_E I’m wondering how to define the Tags as well. First time trying to write the parser extension.

ThankYou

James_E
Staff
Forum|Forum|1 month ago
November 6, 2025

I’m gonna be honest, there’s a lot missing in your parser. I highly recommend looking at the docs below, and looking at pre-built parsers already in the platform. Posting a redacted copy of an example log may help here as well. You mentioned possibly have to events under the same log type, so having an example of one each, would also be helpful.

https://docs.cloud.google.com/chronicle/docs/reference/parser-syntax

https://docs.cloud.google.com/chronicle/docs/event-processing/parsing-overview

spanuganti
Author
Forum|Forum|3 days ago
December 11, 2025

Raw Log:
<166>2025-10-21T19:00:05.002Z S0PSSESXUCS071.mgmt.ad.usfs.llc envoy-access[2098923]: POST /hgw/host-15062/vpxa 200 via_upstream - 748 428 gzip 13 12 0 10.9.10.94:51422 HTTP/1.1 TLSv1.2 10.9.36.71:443 127.0.0.1:34032 HTTP/1.1 - 127.0.0.1:8089 - "QuerySummaryStatisticsVpxa"

!--startfragment>

filter {

mutate {

replace => {

"event.idm.read_only_udm.metadata.product_name" => "Envoy"

"event.idm.read_only_udm.metadata.vendor_name" => "VMware ESXi"

"event.idm.read_only_udm.metadata.event_type" => "NETWORK_HTTP"

}

# ############################################################

# 1) GROK — parse broadly

# ############################################################

grok {

match => {

"message" => [

"<%{POSINT:syslog_pri}>%{TIMESTAMP_ISO8601:timestamp} %{DATA:syslog_hostname} %{DATA:syslog_process}: %{WORD:http_method} %{URIPATH:http_path} %{NUMBER:http_status_code:int} %{DATA:route_status} - %{NUMBER:http_response_bytes:int} %{NUMBER:http_request_bytes:int} %{DATA:compression} %{NUMBER:duration_total:int} %{NUMBER:duration_upstream:int} %{NUMBER:duration_request:int} %{DATA:source_ip}:%{NUMBER:source_port} %{DATA:client_protocol} %{DATA:tls_version} %{DATA:destination_ip}:%{NUMBER:PORTs} %{DATA:upstream_address}:%{INT:port_3} %{GREEDYDATA:upstream_protocol} %{IPV4:internal_address}:%{NUMBER:PORT_2}"

]

}

overwrite => [

"timestamp","syslog_hostname","syslog_process","http_method","http_path",

"http_status_code","route_status","http_response_bytes","http_request_bytes",

"compression","duration_total","duration_upstream","duration_request",

"source_ip","source_port","client_protocol","tls_version",

"destination_ip","PORTs","upstream_address","port_3",

"upstream_protocol","internal_address","PORT_2"

]

on_error => "grok_failure"

}

# ############################################################

# 2) Timestamp — minimal

# ############################################################

if [timestamp] != "" {

date { match => ["timestamp", "ISO8601"] }

}

# ############################################################

# 3) Type conversions on raw fields (Numeric)

# ############################################################

mutate {

convert => {

"http_status_code" => "integer"

"source_port" => "integer"

"PORTs" => "integer"

"http_request_bytes" => "integer"

"http_response_bytes" => "integer"

}

# ############################################################

# 3.5) String Conversions (Fix for UDM Mapping Error)

# Convert integers back to strings for safe use with 'replace' in UDM mapping.

# ############################################################

mutate {

convert => {

"source_port" => "string"

"PORTs" => "string"

"http_status_code" => "string"

"http_request_bytes" => "string"

"http_response_bytes" => "string"

}

# ############################################################

# 4) STANDARD UDM FIELDS ONLY (MAPPING)

# ############################################################

# Observer (context)

mutate {

replace => {

"observer.hostname" => "%{syslog_hostname}"

"observer.resource.uid" => "%{syslog_process}"

}

# Principal (client)

if [source_ip] != "" {

# These fields now receive string values, preventing the error.

mutate { replace => { "principal.ip_address" => "%{source_ip}" } }

mutate { replace => { "principal.port" => "%{source_port}" } }

}

# Target (server)

if [destination_ip] != "" {

mutate { replace => { "target.ip_address" => "%{destination_ip}" } }

mutate { replace => { "target.port" => "%{PORTs}" } }

}

# HTTP essentials (standard)

mutate {

replace => {

"network.http.request.method" => "%{http_method}"

"network.http.protocol" => "%{client_protocol}"

"network.http.tls_version" => "%{tls_version}"

"network.http.request.url" => "%{http_path}"

"network.http.response.code" => "%{http_status_code}"

}

# Bytes

mutate { replace => { "network.sent_bytes" => "%{http_response_bytes}" } }

mutate { replace => { "network.received_bytes" => "%{http_request_bytes}" } }

}

THis seems working but getting as description not like the UDM fields which is unexpected to be happend as results

Can someone suggest me or give me the correct grok pattern please?!--endfragment>

ThankYou

Log Parsing Issue in Restricted Pipeline (Chronicle/CBN/Logstash)

1. Environment and Constraint

2. The Core Problem: Mixed Log Types

3. The Final Attempted Code (for Envoy)

4. The Exact Error

5. Request for Help

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded