Skip to main content

So I am writing a yaraL rule and I am stuck with a problem.


I want to write a regex to capture the domain part of a url. 

For example : 

For https://www.example.com/path, it captures example.com
For http://example.org, it captures example.org
For subdomain.example.co.uk/path, it captures subdomain.example.co.uk


my regex for this is : 

$domain = re.capture($e.any_variable, " (?:https?:\\/\\/)?(?:www\\.)?([^\\/\\s]+) ")


The error I am getting here is : 
tokenizing: unable to tokenize: invalid char escape

Can somebody help me with this. 


@AymanC  @jstoner 

 



 


Could you try ;


(?:https?://)?(?:www\\\\.)?([^/\\\\s]+)

You do not need to escape "/" , and to use the special regex pattern in Yara-L you would need to use double backslashes, In your case you would use "\\\\s" not "\\s" . 


I won't pretend that my regex chops are going to get a perfect extraction but I would point out that generally when using the regex functions you would use backticks ` rather than quotes around the regular expression.


My preference would be to use something like strings.extract_hostname or strings.extract_domain instead. Examples for these functions are in this blog: https://www.googlecloudcommunity.com/gc/Community-Blog/New-to-Google-SecOps-Domain-and-Hostname-Extraction-is-NOT-Like/ba-p/819666


 


Reply