I’ve been working with Google Chronicle’s Entity Graph and querying the associated data that has been exported to BigQuery. However, I’ve encountered a few challenges and would appreciate some guidance:
Null Data: While querying the entity graph table in BigQuery, I noticed that a significant portion of the fields are returning NULL. Is this expected behavior? Are there certain configurations or ingestion processes I need to verify to ensure the data is complete?
IOC IP Addresses: I’ve been trying to query for specific IP addresses (Indicators of Compromise) within the entity graph data, but I’m unable to locate them effectively. Specifically, I used the following query:
Unfortunately, this query isn’t returning the expected results. Does the entity graph in BigQuery contain IPs or other IOC data directly?
Relationship Data: One of the key strengths of the Entity Graph in Google SecOps (Chronicle) is its ability to map relationships between entities like users, IPs, devices, and domains. Does the data exported to BigQuery retain this relationship information? If so, how is it structured, and is there a recommended way to query this relationship data?
I’m trying to reconcile the data in BigQuery with what is displayed in the Chronicle SecOps UI, and I would greatly appreciate any advice on understanding or improving this workflow.
Page 1 / 1
For some reason, it's not allowing me to post the SQL query that I am utilizing.
Yes , most fields will be NULL. If you think about it it makes sense - there are different event types and each of which has it's own attributes. So if you are querying for entity type IP_ADDRESS (3) then all the fields belonging to domain/user etc will be null. So you should always choose the fields you want in the SELECT statement.
We do store the relationships in the relation column of the entity_graph table. Here is an example on how you can write a query for IP_ADDRESS entity type:
SELECT eip, relation.*
FROM `datalake.entity_graph`
LEFT JOIN unnest(entity.ip) as eip
LEFT JOIN unnest(relations) as relation
WHERE TIMESTAMP_TRUNC(_PARTITIONTIME, DAY) > TIMESTAMP("2025-01-20")
and metadata.entity_type = 3
LIMIT 1000;
Hope this helps. Here is another example for USER entity_type:
DECLARE
__USER_ID__ STRING DEFAULT "(?i)admin"; --(?i) makes the regex case insensitive
DECLARE
__USER_EMAIL__ STRING DEFAULT "(?i)admin";
DECLARE
__PARTITION_START_DATE__ STRING DEFAULT "2025-01-18"; --query is equal to or greater than
---------------
SELECT
m.enum_name AS context_entity_type,
e.metadata.vendor_name AS context_vendor,
e.metadata.product_name AS context_product,
TIMESTAMP_SECONDS(e.metadata.collected_timestamp.seconds) AS collected_timestamp,
TIMESTAMP_SECONDS(e.metadata.interval.start_time.seconds) AS interval_start_timestamp,
TIMESTAMP_SECONDS(e.metadata.interval.end_time.seconds) AS interval_end_timestamp,
DATETIME_DIFF(DATETIME(TIMESTAMP_SECONDS(e.metadata.INTERVAL.end_time.seconds)), DATETIME(TIMESTAMP_SECONDS(e.metadata.INTERVAL.start_time.seconds)), HOUR) AS interval_duration,
e.entity.user.first_name AS user_first_name,
e.entity.user.last_name AS user_last_name,
e.entity.user.userid AS user_id,
email AS user_email,
e.metadata.product_entity_id,
relation.entity.asset.hostname AS user_asset_hostname,
relation.entity.asset.asset_id AS user_asset_id,
relation.entity.asset.ip AS user_asset_ip,
FROM
`datalake.entity_graph` e
LEFT OUTER JOIN --not always populated
UNNEST(e.relations) relation
CROSS JOIN
UNNEST(e.entity.user.email_addresses) email
JOIN
`datalake.entity_enum_value_to_name_mapping` m
ON
e.metadata.entity_type = m.enum_value
WHERE
DATE(_PARTITIONTIME) >= CAST(__PARTITION_START_DATE__ AS DATE)
AND ( REGEXP_CONTAINS(email, __USER_EMAIL__)
OR REGEXP_CONTAINS(e.entity.user.userid, __USER_ID__) )
AND m.field_path = "backstory.EntityMetadata.EntityType"
AND m.enum_name = "USER";
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.