Skip to main content
Question

Is Row-based join and enrichment from a Data Table where all identifying columns are REGEX type possible?

  • June 26, 2026
  • 0 replies
  • 0 views

blappy_fird

Hi community,

I'm building a YARA-L detection rule for registry-based persistence mechanisms in Google SecOps. I have a threat intel data table structured like this: 

Registry_Key (REGEX)

Registry_Value_Name (REGEX)

Registry_Value_Data (REGEX) Threat_Actor_Association (STRING) Malware_Association (STRING) Description (STRING)
SOFTWARE[\\]+Microsoft[\\]+Windows[\\]+CurrentVersion[\\]+Explorer[\\]+(Run|RunOnce) Persistence_Mechanism wscript(?:\.exe$)?\b.*\.(vbs|js)$ Threat_Actor Malware This persistence mechanism is employed by <malware>, used by <Threat_Actor>.

 

The REGEX columns are intentional and necessary — registry key paths vary significantly across log sources (to address backslash escaping inconsistencies like \\ vs \\\\), and registry value names and data can match multiple variants. I want to keep the REGEX columns as-is.

My goal is to:

  1. Filter events using all three REGEX columns (in regex %table.column nocase)
  2. Pull the enrichment columns (Threat_Actor_Association, Malware_Association, Description) from the matched row into the rule outcome

The problem I'm running into is that the platform requires a STRING column = equality join to establish the row context before enrichment column assignments are allowed. However:

  • All three identifying columns are REGEX type — in regex is column-based only and does not constitute a row join per the documentation
  • Registry_Value_Name values are themselves regex patterns (e.g. (\bVLC\b)|(\bWingetUI\b), ([A-Za-z0-9]{8}\.exe$)), not plain literals, so a STRING duplicate would not match the live UDM event via =
  • Registry_Key paths share the same key across multiple rows (e.g. many malware families persist under CurrentVersion\\Run), so they are not unique per row and cannot serve as a reliable join anchor
  • Registry_Value_Data has the same issues

Is there a supported YARA-L syntax or data table design pattern that allows enrichment from a row-based join when all identifying columns are REGEX type? Specifically:

  • Is there any way to use in regex as a row-based join, or is column-based the only behavior?
  • Does re.regex($e.field, %table.column) work as a row-based join? (We tried this and got expected string for arg 1 of regex at compile time)
  • Is there a recommended pattern for this kind of threat intel enrichment use case where the identifiers are inherently regex patterns?

This is kind of a feasibility project on my end since I do understand the same output can be achieved by just making multiple rules that have the malware or TA attribution written in the rule name or description, however I would just like to know if this is feasible because it would be a lot more easier (and cleaner) to create this kind of rule with data tables. 

Any guidance is greatly appreciated. Thank you in advance!