Hi community,
I'm building a YARA-L detection rule for registry-based persistence mechanisms in Google SecOps. I have a threat intel data table structured like this:
| Registry_Key (REGEX) | Registry_Value_Name (REGEX) | Registry_Value_Data (REGEX) | Threat_Actor_Association (STRING) | Malware_Association (STRING) | Description (STRING) |
|---|---|---|---|---|---|
SOFTWARE[\\]+Microsoft[\\]+Windows[\\]+CurrentVersion[\\]+Explorer[\\]+(Run|RunOnce) | Persistence_Mechanism | wscript(?:\.exe$)?\b.*\.(vbs|js)$ | Threat_Actor | Malware | This persistence mechanism is employed by <malware>, used by <Threat_Actor>. |
The REGEX columns are intentional and necessary — registry key paths vary significantly across log sources (to address backslash escaping inconsistencies like \\ vs \\\\), and registry value names and data can match multiple variants. I want to keep the REGEX columns as-is.
My goal is to:
- Filter events using all three REGEX columns (
in regex %table.column nocase) - Pull the enrichment columns (
Threat_Actor_Association,Malware_Association,Description) from the matched row into the rule outcome
The problem I'm running into is that the platform requires a STRING column = equality join to establish the row context before enrichment column assignments are allowed. However:
- All three identifying columns are REGEX type —
in regexis column-based only and does not constitute a row join per the documentation Registry_Value_Namevalues are themselves regex patterns (e.g.(\bVLC\b)|(\bWingetUI\b), ([A-Za-z0-9]{8}\.exe$)), not plain literals, so a STRING duplicate would not match the live UDM event via=Registry_Keypaths share the same key across multiple rows (e.g. many malware families persist underCurrentVersion\\Run), so they are not unique per row and cannot serve as a reliable join anchorRegistry_Value_Datahas the same issues
Is there a supported YARA-L syntax or data table design pattern that allows enrichment from a row-based join when all identifying columns are REGEX type? Specifically:
- Is there any way to use
in regexas a row-based join, or is column-based the only behavior? - Does
re.regex($e.field, %table.column)work as a row-based join? (We tried this and gotexpected string for arg 1 of regexat compile time) - Is there a recommended pattern for this kind of threat intel enrichment use case where the identifiers are inherently regex patterns?
This is kind of a feasibility project on my end since I do understand the same output can be achieved by just making multiple rules that have the malware or TA attribution written in the rule name or description, however I would just like to know if this is feasible because it would be a lot more easier (and cleaner) to create this kind of rule with data tables.
Any guidance is greatly appreciated. Thank you in advance!
