Skip to main content
I’ve been working with Google Chronicle’s Entity Graph and querying the associated data that has been exported to BigQuery. However, I’ve encountered a few challenges and would appreciate some guidance:

Null Data: While querying the entity graph table in BigQuery, I noticed that a significant portion of the fields are returning NULL. Is this expected behavior? Are there certain configurations or ingestion processes I need to verify to ensure the data is complete?

IOC IP Addresses: I’ve been trying to query for specific IP addresses (Indicators of Compromise) within the entity graph data, but I’m unable to locate them effectively. Specifically, I used the following query:

 
Unfortunately, this query isn’t returning the expected results. Does the entity graph in BigQuery contain IPs or other IOC data directly?

Relationship Data: One of the key strengths of the Entity Graph in Google SecOps (Chronicle) is its ability to map relationships between entities like users, IPs, devices, and domains. Does the data exported to BigQuery retain this relationship information? If so, how is it structured, and is there a recommended way to query this relationship data?

I’m trying to reconcile the data in BigQuery with what is displayed in the Chronicle SecOps UI, and I would greatly appreciate any advice on understanding or improving this workflow.

For some reason, it's not allowing me to post the SQL query that I am utilizing. 


Yes , most fields will be NULL.  If you think about it it makes sense - there are different event types and each of which has it's own attributes.  So if you are querying for entity type IP_ADDRESS (3) then all the fields belonging to domain/user etc will be null.  So you should always choose the fields you want in the SELECT statement.  


We do store the relationships in the relation column of the entity_graph table.  Here is an example on how you can write a query for IP_ADDRESS entity type:


 



SELECT eip, relation.*

FROM `datalake.entity_graph`

LEFT JOIN unnest(entity.ip) as eip

LEFT JOIN unnest(relations) as relation

WHERE TIMESTAMP_TRUNC(_PARTITIONTIME, DAY) > TIMESTAMP("2025-01-20")

and metadata.entity_type = 3

LIMIT 1000;

 

Hope this helps.  Here is another example for USER entity_type:



DECLARE

__USER_ID__ STRING DEFAULT "(?i)admin"; --(?i) makes the regex case insensitive

DECLARE

__USER_EMAIL__ STRING DEFAULT "(?i)admin";

DECLARE

__PARTITION_START_DATE__ STRING DEFAULT "2025-01-18"; --query is equal to or greater than

---------------

SELECT

m.enum_name AS context_entity_type,

e.metadata.vendor_name AS context_vendor,

e.metadata.product_name AS context_product,

TIMESTAMP_SECONDS(e.metadata.collected_timestamp.seconds) AS collected_timestamp,

TIMESTAMP_SECONDS(e.metadata.interval.start_time.seconds) AS interval_start_timestamp,

TIMESTAMP_SECONDS(e.metadata.interval.end_time.seconds) AS interval_end_timestamp,

DATETIME_DIFF(DATETIME(TIMESTAMP_SECONDS(e.metadata.INTERVAL.end_time.seconds)), DATETIME(TIMESTAMP_SECONDS(e.metadata.INTERVAL.start_time.seconds)), HOUR) AS interval_duration,

e.entity.user.first_name AS user_first_name,

e.entity.user.last_name AS user_last_name,

e.entity.user.userid AS user_id,

email AS user_email,

e.metadata.product_entity_id,

relation.entity.asset.hostname AS user_asset_hostname,

relation.entity.asset.asset_id AS user_asset_id,

relation.entity.asset.ip AS user_asset_ip,

FROM

`datalake.entity_graph` e

LEFT OUTER JOIN --not always populated

UNNEST(e.relations) relation

CROSS JOIN

UNNEST(e.entity.user.email_addresses) email

JOIN

`datalake.entity_enum_value_to_name_mapping` m

ON

e.metadata.entity_type = m.enum_value

WHERE

DATE(_PARTITIONTIME) >= CAST(__PARTITION_START_DATE__ AS DATE)

AND ( REGEXP_CONTAINS(email, __USER_EMAIL__)

OR REGEXP_CONTAINS(e.entity.user.userid, __USER_ID__) )

AND m.field_path = "backstory.EntityMetadata.EntityType"

AND m.enum_name = "USER";




Reply