Skip to main content
Question

Log parsing extension issue on vmware

  • October 22, 2025
  • 8 replies
  • 91 views

Log Parsing Issue in Restricted Pipeline (Chronicle/CBN/Logstash)

 

We are encountering persistent syntax and parsing errors in a highly restricted log processing pipeline (likely based on Logstash/Grok, possibly within Google Chronicle or CBN). We need help structuring the code to handle the strict syntax and mixed log types without using advanced conditional logic.

 

1. Environment and Constraint

 

  • Environment: Highly restricted log parsing pipeline (cannot use standard Logstash features).

  • Confirmed Constraints (Key Failures):

    • NO if/else conditionals (if [field] fails, if "tag" in [tags] fails).

    • NO semicolons (;) allowed anywhere in the config.

    • Field names must use snake_case (e.g., host_name).

 

2. The Core Problem: Mixed Log Types

 

Our pipeline receives two distinct log formats that crash the system when we try to parse the second one, or when we encounter simple syntax errors.

Log Type Sample Log (Raw)
Envoy Access Log (Primary Goal) <166>2025-10-22T14:00:00.095Z S01PSSESXUCS055.mgmt.ad.usfs.llc envoy-access[2099405]: POST /hgw/host-14948/vpxa 200 via_upstream - 15887 243 gzip 8 8 0 10.9.10.94:49874 HTTP/1.1 TLSv1.2 10.9.36.55:443 127.0.0.1:40729 HTTP/1.1 - 127.0.0.1:8089 "3addb69f" "QueryBatchPerformanceStatisticsVpxa"
VMware ESX Log (Crashing Log) <166>2025-10-21T22:25:49.901Z S01PDSESXUCS054. healthd[2100030]: [Originator@6876 sub=PluginLauncher] Launching binary: /usr/lib/vmware/healthd/plugins/bin/ssd_storage ++group=healthd-plugins,mem=40 -u http://!vmwLocalSocketHealthd

 

3. The Final Attempted Code (for Envoy)

 

We are currently trying to deploy this final, single-Grok-based solution for the primary Envoy logs, which keeps failing due to hidden syntax issues:

Ruby

 

filter {

# 1. GROK: Parse the entire log line in one resilient step.
grok {
tag_on_failure => ["_grok_failure_envoy"]
match => {
"message" => "<%{POSINT}>%{SPACE}?%{TIMESTAMP_ISO8601:timestamp} %{DATA:host_name} %{DATA:process_detail}: %{WORD:http_method} %{URIPATH:http_path} %{NUMBER:http_status_code} %{DATA:route_status} - %{NUMBER:http_response_bytes} %{NUMBER:http_request_bytes} %{DATA:compression} %{NUMBER:duration_total} %{NUMBER:duration_upstream} %{NUMBER:duration_request} %{DATA:source_address} %{DATA:client_protocol} %{DATA:destination_address} %{DATA:upstream_address} %{DATA:upstream_protocol} - %{DATA:internal_address} %{DATA:request_id} %{GREEDYDATA:custom_operation}"
}
}

# 2. SEGMENTED GROK: Split IP:Port fields.
grok { source => "source_address"; match => { "source_address" => "%{IP:source_ip}:%{NUMBER:source_port}" } }
grok { source => "destination_address"; match => { "destination_address" => "%{IP:destination_ip}:%{NUMBER:destination_port}" } }
grok { source => "upstream_address"; match => { "upstream_address" => "%{IP:upstream_ip}:%{NUMBER:upstream_port}" } }
grok { source => "internal_address"; match => { "internal_address" => "%{IP:internal_ip}:%{NUMBER:internal_port}" } }

# 3. FINAL CLEANUP AND TYPE CONVERSION
mutate {
gsub => [
"custom_operation", "\"", ""
]
convert => {
"http_status_code" => "integer"
"http_response_bytes" => "integer"
"http_request_bytes" => "integer"
"duration_total" => "integer"
"duration_upstream" => "integer"
"duration_request" => "integer"
"source_port" => "integer"
"destination_port" => "integer"
"upstream_port" => "integer"
"internal_port" => "integer"
}
remove_field => [
"message", "POSINT", "source_address", "destination_address", "upstream_address", "internal_address",
"process_detail", "route_status", "compression", "client_protocol", "upstream_protocol", "request_id"
]
}
}

 

4. The Exact Error

 

The deployment is failing with a persistent syntax error, despite rigorous manual checks:

generic::invalid_argument: failed to create augmentor pipeline: failed to parse config: Parse error line 13, column 38 : illegal token ';'

 

5. Request for Help

 

  1. Semicolon Hunter: Can anyone spot a hidden semicolon or an encoding issue in the long Grok pattern string that would cause the parser to see it at line 13, column 38?

  2. Best Practice for No-Conditional Parsing: Given the constraint of NO conditionals, what is the best practice for deploying multiple Grok patterns to handle the two drastically different log types (Envoy vs. ESX) without the pipeline crashing on the first failure? Should we rely solely on tag_on_failure and tag_on_success?

8 replies

James_E
Staff
Forum|alt.badge.img+8
  • Staff
  • October 24, 2025

The semicolon is on line 13. 

 


  • Author
  • October 27, 2025

Looks like I’m facing different error now 

 

filter {

    # 1. GROK FILTER: Parse the entire log line and extract all fields

    grok {

        match => {

            "message" => [

                # Pattern for the Syslog header wrapped Envoy access log

                "<%{POSINT}>%{TIMESTAMP_ISO8601:syslog_timestamp} %{DATA:syslog_hostname} %{DATA:syslog_process}: %{WORD:http_method} %{URIPATH:http_path} %{NUMBER:http_status_code:int} %{DATA:route_status} - %{NUMBER:http_response_bytes:int} %{NUMBER:http_request_bytes:int} %{DATA:compression} %{NUMBER:duration_total:int} %{NUMBER:duration_upstream:int} %{NUMBER:duration_request:int} %{DATA:source_address} %{DATA:client_protocol} %{DATA:destination_address} %{DATA:upstream_address} %{DATA:upstream_protocol} - %{DATA:internal_address} %{DATA:request_id} %{GREEDYDATA:custom_operation}"

            ]

        }

        # If the grok pattern fails, tag it for review/dropping

        tag_on_failure => ["_grokparsefailure_envoy"]

    }

 

    # Conditional processing only if the Grok was successful

   ## if ![ "_grokparsefailure_envoy" ]

    {

        # 2. MUTATE: Split combined IP:Port fields and rename/convert fields

 

        # Split source_address (e.g., 10.9.10.94:49724) into IP and Port

        # Note: We use a temporary field _source_ip_port for the split operation

        mutate {

            split => { "source_address" => ":" }

            add_field => {

                "source_ip" => "%{[source_address][0]}"

                "source_port" => "%{[source_address][1]}"

            }

        }

        # Split destination_address (e.g., 10.9.36.43:443) into IP and Port

        mutate {

            split => { "destination_address" => ":" }

            add_field => {

                "destination_ip" => "%{[destination_address][0]}"

                "destination_port" => "%{[destination_address][1]}"

            }

        }

 

        # Convert split ports from strings to integers

        mutate {

            convert => {

                "source_port" => "integer"

                "destination_port" => "integer"

            }

        }

 

        # 3. UDM MAPPING: Map cleaned fields to the final UDM destination (required for Chronicle)

 

        mutate {

            replace => {

                # Time

                "metadata.event_timestamp" => "%{timestamp}"

 

                # Principal (Source)

                "principal.ip" => "%{source_ip}"

                "principal.port" => "%{source_port}"

                "principal.hostname" => "%{syslog_hostname}"

 

                # Target (Destination/Server)

                "target.ip" => "%{destination_ip}"

                "target.port" => "%{destination_port}"

 

                # Network/HTTP Details

                "network.http.method" => "%{http_method}"

                "network.http.response_code" => "%{http_status_code}"

                "network.protocol" => "HTTP" # Assuming HTTP traffic

                "network.ip_protocol" => "TCP"

                "network.sent_bytes" => "%{http_request_bytes}"

                "network.received_bytes" => "%{http_response_bytes}"

 

                # Event Metadata

                "metadata.event_type" => "NETWORK_HTTP_CONNECTION"

                "metadata.product_event_type" => "envoy_access_log"

                "about.url" => "%{http_path}"

            }

        }

 

        # 4. CLEANUP: Remove all temporary fields and the original message

        mutate {

            remove_field => [

                "message",

                "timestamp",

                "syslog_timestamp",

                "syslog_hostname",

                "syslog_process",

                "source_address",

                "destination_address",

                # Remove all intermediary fields

                "http_method", "http_path", "http_status_code", "route_status",

                "http_response_bytes", "http_request_bytes", "compression",

                "duration_total", "duration_upstream", "duration_request",

                "client_protocol", "upstream_address", "upstream_protocol",

                "internal_address", "request_id", "custom_operation",

                "source_ip", "source_port", "destination_ip", "destination_port"

            ]

        }

    }

}




Error: generic::invalid_argument: failed to create augmentor pipeline: failed to parse config: Parse error line 16, column 5 : unexpected token '{', expected one of 'comment|string|}|if|else|else if|for' 

 


  • Author
  • October 27, 2025

After polishing my code..
Here is the final version. Now I’m looking to fix something in the beginning like field manipulation. I’m not sure where I’m missing.
Looking for help!!​@James_E  ​@nickharbour 

filter {

    # 0. TAGS INITIALIZATION: This is the critical fix. It guarantees the [tags] field exists

    # for all events, preventing the "tags not found in state data" error during conditional checks.

    mutate {

        # Add a tag that will be removed later, ensuring the [tags] array is initialized.

        add_tag => ["_tag_initializer_grok_check"]

    }

    # 1. GROK FILTER: Parse the log line (Your exact original pattern)

    grok {

        match => {

            "message" => [

                "<%{POSINT}>%{TIMESTAMP_ISO8601:timestamp} %{DATA:syslog_hostname} %{DATA:syslog_process}: %{WORD:http_method} %{URIPATH:http_path} %{NUMBER:http_status_code:int} %{DATA:route_status} - %{NUMBER:http_response_bytes:int} %{NUMBER:http_request_bytes:int} %{DATA:compression} %{NUMBER:duration_total:int} %{NUMBER:duration_upstream:int} %{NUMBER:duration_request:int} %{DATA:source_ip}:%{NUMBER:source_port} %{DATA:client_protocol} %{DATA} %{DATA:destination_ip}:%{NUMBER:PORTs} %{DATA:upstream_address}:%{INT:port_3} %{GREEDYDATA:upstream_protocol} %{IPV4:internal_address}:%{NUMBER:PORT_2}"

            ]

        }

        tag_on_failure => ["_grokparsefailure_envoy"]

    }

 

    # Conditional processing only if the Grok was successful. This check is now safe.

    if "_grokparsefailure_envoy" not in [tags] {

 

        # 2. RENAME: Rename the Grok-captured fields to the target names

        # expected by your original 'convert' and 'cleanup' blocks.

        mutate {

            rename => {

                "PORTs" => "destination_port"

                "port_3" => "upstream_port"

                "PORT_2" => "internal_port"

            }

        }

       

        # 3. MUTATE: Convert string fields to integers

        mutate {

            convert => {

                "http_status_code" => "integer"

                "http_response_bytes" => "integer"

                "http_request_bytes" => "integer"

                "source_port" => "integer"

                "destination_port" => "integer"

                "upstream_port" => "integer"

                "internal_port" => "integer"

                "duration_total" => "integer"

                "duration_upstream" => "integer"

                "duration_request" => "integer"

            }

        }

 

        # 4. UDM MAPPING: Map cleaned fields

        mutate {

            replace => {

                "metadata.event_timestamp" => "%{timestamp}"

                "principal.ip" => "%{source_ip}"

                "principal.port" => "%{source_port}"

                "principal.hostname" => "%{syslog_hostname}"

                "target.ip" => "%{destination_ip}"

                "target.port" => "%{destination_port}"

                "network.http.method" => "%{http_method}"

                "network.protocol" => "HTTP"

                "network.ip_protocol" => "TCP"

                "network.sent_bytes" => "%{http_request_bytes}"

                "network.received_bytes" => "%{http_response_bytes}"

                "metadata.event_type" => "NETWORK_HTTP_CONNECTION"

                "metadata.product_event_type" => "envoy_access_log"

                "about.url" => "%{http_path}"

                "about.metadata.operation" => "%{custom_operation}"

            }

        }

 

        # 5. CLEANUP: Remove all temporary fields, including the tag initializer

        mutate {

            remove_field => [

                # The initializer tag must be removed

                "_tag_initializer_grok_check",

 

                "message",

                "timestamp", "syslog_hostname", "syslog_process",

                # Source/Destination fields

                "source_ip", "source_port", "destination_ip", "destination_port",

                # HTTP fields

                "http_method", "http_path", "http_status_code", "route_status",

                "http_response_bytes", "http_request_bytes", "compression",

                # Duration fields

                "duration_total", "duration_upstream", "duration_request",

                # Connection/Upstream fields

                "client_protocol",

                "upstream_address", "upstream_port", "upstream_protocol",

                "internal_address", "internal_port",

                # Grok captured port names

                "PORTs", "port_3", "PORT_2",

                # Other temporary fields

                "trace_id", "custom_operation"

            ]

        }

    }

}

Here is the Error: generic::unknown: pipeline.ParseLogEntry failed: LOG_PARSING_CBN_ERROR: "generic::invalid_argument: pipeline failed: filter conditional (2) failed: failed to evaluate expression: generic::invalid_argument: \"tags\" not found in state data"
 


  • Author
  • November 5, 2025

Any suggestions would be great!


James_E
Staff
Forum|alt.badge.img+8
  • Staff
  • November 5, 2025

where do you define “tags” before you use it on this line:

    if "_grokparsefailure_envoy" not in [tags] {


  • Author
  • November 6, 2025

@James_E  I’m wondering how to define the Tags as well. First time trying to write the parser extension.


James_E
Staff
Forum|alt.badge.img+8
  • Staff
  • November 6, 2025

I’m gonna be honest, there’s a lot missing in your parser. I highly recommend looking at the docs below, and looking at pre-built parsers already in the platform. Posting a redacted copy of an example log may help here as well. You mentioned possibly have to events under the same log type, so having an example of one each, would also be helpful.

https://docs.cloud.google.com/chronicle/docs/reference/parser-syntax

https://docs.cloud.google.com/chronicle/docs/event-processing/parsing-overview


  • Author
  • December 11, 2025

Raw Log: 
<166>2025-10-21T19:00:05.002Z S0PSSESXUCS071.mgmt.ad.usfs.llc envoy-access[2098923]: POST /hgw/host-15062/vpxa 200 via_upstream - 748 428 gzip 13 12 0 10.9.10.94:51422 HTTP/1.1 TLSv1.2 10.9.36.71:443 127.0.0.1:34032 HTTP/1.1 - 127.0.0.1:8089 - "QuerySummaryStatisticsVpxa"

 !--startfragment>

filter {

mutate {

replace => {

"event.idm.read_only_udm.metadata.product_name" => "Envoy"

"event.idm.read_only_udm.metadata.vendor_name" => "VMware ESXi"

"event.idm.read_only_udm.metadata.event_type" => "NETWORK_HTTP"

}

}

 

# ############################################################

# 1) GROK — parse broadly

# ############################################################

grok {

match => {

"message" => [

"<%{POSINT:syslog_pri}>%{TIMESTAMP_ISO8601:timestamp} %{DATA:syslog_hostname} %{DATA:syslog_process}: %{WORD:http_method} %{URIPATH:http_path} %{NUMBER:http_status_code:int} %{DATA:route_status} - %{NUMBER:http_response_bytes:int} %{NUMBER:http_request_bytes:int} %{DATA:compression} %{NUMBER:duration_total:int} %{NUMBER:duration_upstream:int} %{NUMBER:duration_request:int} %{DATA:source_ip}:%{NUMBER:source_port} %{DATA:client_protocol} %{DATA:tls_version} %{DATA:destination_ip}:%{NUMBER:PORTs} %{DATA:upstream_address}:%{INT:port_3} %{GREEDYDATA:upstream_protocol} %{IPV4:internal_address}:%{NUMBER:PORT_2}"

]

}

overwrite => [

"timestamp","syslog_hostname","syslog_process","http_method","http_path",

"http_status_code","route_status","http_response_bytes","http_request_bytes",

"compression","duration_total","duration_upstream","duration_request",

"source_ip","source_port","client_protocol","tls_version",

"destination_ip","PORTs","upstream_address","port_3",

"upstream_protocol","internal_address","PORT_2"

]

on_error => "grok_failure"

}

 

# ############################################################

# 2) Timestamp — minimal

# ############################################################

if [timestamp] != "" {

date { match => ["timestamp", "ISO8601"] }

}

 

# ############################################################

# 3) Type conversions on raw fields (Numeric)

# ############################################################

mutate {

convert => {

"http_status_code"    => "integer"

"source_port"         => "integer"

"PORTs"               => "integer"

"http_request_bytes"  => "integer"

"http_response_bytes" => "integer"

}

}

 

# ############################################################

# 3.5) String Conversions (Fix for UDM Mapping Error)

# Convert integers back to strings for safe use with 'replace' in UDM mapping.

# ############################################################

mutate {

convert => {

"source_port"         => "string"

"PORTs"               => "string"

"http_status_code"    => "string"

"http_request_bytes"  => "string"

"http_response_bytes" => "string"

}

}

 

# ############################################################

# 4) STANDARD UDM FIELDS ONLY (MAPPING)

# ############################################################

 

# Observer (context)

mutate {

replace => {

"observer.hostname"     => "%{syslog_hostname}"

"observer.resource.uid" => "%{syslog_process}"

}

}

 

# Principal (client)

if [source_ip] != "" {

# These fields now receive string values, preventing the error.

mutate { replace  => { "principal.ip_address"   => "%{source_ip}"   } }

mutate { replace  => { "principal.port"         => "%{source_port}" } }

}

 

# Target (server)

if [destination_ip] != "" {

mutate { replace  => { "target.ip_address"   => "%{destination_ip}" } }

mutate { replace  => { "target.port"         => "%{PORTs}"          } }

}

 

# HTTP essentials (standard)

mutate {

replace => {

"network.http.request.method" => "%{http_method}"

"network.http.protocol"       => "%{client_protocol}"

"network.http.tls_version"    => "%{tls_version}"

"network.http.request.url"    => "%{http_path}"

"network.http.response.code"  => "%{http_status_code}"

}

}

 

# Bytes

mutate { replace => { "network.sent_bytes"     => "%{http_response_bytes}" } }

mutate { replace => { "network.received_bytes" => "%{http_request_bytes}"  } }

 

}




THis seems working but getting as description not like the UDM fields which is unexpected to be happend  as results


Can someone suggest me or give me the correct grok pattern please?!--endfragment>