Actionable threat hunting with GTI (II) - Analyzing a massive phishing campaign - Part 2

Having thoroughly analyzed the initial attack vectors and the layered infrastructure of this massive Booking.com phishing campaign in Part 1, we now transition from discovery to deeper investigation. In Part 2, we will consolidate the intelligence gathered from both infrastructure tiers, further unveil the threat actors' operational tactics, and demonstrate how this actionable intelligence can be directly applied to fortify your defenses and proactively hunt for similar threats in your own environment.

Analyzing the whole campaign

The different queries we've made can help us create YARA rules to monitor for new activity from this campaign. In fact, at the end of this blog we have created a section for YARA rules.

But to better understand everything about the campaign, it's also good to know other related information. This includes when this activity started, how many URLs Google Threat Intelligence’s scanning tools identify the URLs as malicious, interesting keywords to monitor, and so on. We will go step by step to get this information.

Obtaining Tier 1 and Tier 2 URLs

First, we need to find all possible URLs related to this campaign. To do this, the following search was made to help us.

entity:url fs:2022-01-01+ (title:"One moment" or title:"AD not found 
(captcha2)" or title:"Booking.com | Official") and (meta:"Booking -" or 
meta:"https://ltdfoto.ru/images/2025/06/04/photo_2025-06-02_11-23-22.md.jpg" or 
meta:"https://cf.bstatic.com/xdata/images/hotel/") and not 
(hostname:"booking.com" or redirects_to:booking.com or 
redirects_to:placetobe.homes) and not tag:trackers

In this search, it's important to use and not tag:trackers. This helps reduce wrong results from real ad campaigns related to Booking. Another important filter is not to get URLs that redirect to the legitimate Booking website. To avoid outdated results, we also incorporated a time filter.

# Before running the code, make sure you have enough quota to do so. This query can consume a lot of quota.

import vt

cli = vt.Client(getpass.getpass('Introduce your VirusTotal API key: '))

query = "entity:url fs:2022-01-01+ (title:\"One moment\" or title:\"AD not found (captcha2)\" or title:\"Booking.com | Official\") and (meta:\"Booking -\" or meta:\"https://ltdfoto.ru/images/2025/06/04/photo_2025-06-02_11-23-22.md.jpg\" or meta:\"https://cf.bstatic.com/xdata/images/hotel/\") and not (hostname:\"booking.com\" or redirects_to:booking.com or redirects_to:placetobe.homes) and not tag:trackers" # @param {type: "string"}

# Look for the samples
query_results = []

async for itemobj in cli.iterator('/intelligence/search',params={'query': "%s"%(query)},limit=0): # Set the limit you want
    query_results.append(itemobj.to_dict())
all_results = list(json.loads(json.dumps(query_results, default=lambda o: getattr(o, '__dict__', str(o)))))

The previous code snippet stores all the URLs in the all_results variable. Let's now get the details we want.

Timeline

Understanding how big the campaign is helps us know roughly when the actors might have started. We can't know the exact date, but the times URLs obtained from the initial query of this section were uploaded to Google Threat Intelligence give us a very good idea. There were periods when many URLs were uploaded at once, which suggests the actors were very busy then. The next code snippet can give us a Daily Submission Timeline.

from datetime import datetime
import pandas as pd
import plotly.express as px

# Data processing for daily submissions
daily_submissions = []
for item in all_results:
    attritutes = item.get('attributes', {})
    timestamp = attributes.get('first_submission_date')
    if timestamp:
        dt_object = datetime.fromtimestamp(timestamp)
        # Extract only the date
        daily_submissions.append(dt_object.date())

# Count submissions per day
daily_counts = pd.Series(daily_submissions).value_counts().reset_index()
daily_counts.columns = ['date', 'submission_count']

# Sort by date
daily_counts = daily_counts.sort_values(by='date').reset_index(drop=True)

# Data visualization of daily submissions
fig = px.area(daily_counts, x='date', y='submission_count',
                 labels={'date': 'Date', 'submission_count': 'Number of Submissions'},
                 title='Daily Submission Timeline')

# Update layout for better date formatting on x-axis
fig.update_layout(xaxis_title="Date", yaxis_title="Number of Submissions")

# Display the plot
fig.show()

AD_4nXcvjcGGK-UOr89KlS4UyZh_hkfqBRATzOMPN0UYIqJ2GdpIYOzbKlViRDtjFiETAhwcGfsCENHr99NTeDX-hyT_0AjGIs_1p5xLeMR61qYubKalW858JIOXkca9l5TYYYV_K--4SyA?key=kZakkkGQLkmGx9vdPn1-0A

Figure 8: Daily submission timeline suggest huge activity since January 2025

Looking at all the URLs from both Tiers, it seems the patterns we found have been around for a few years. But in January 2025, more spikes of activity began. Specifically, May and June were the busiest months.

AD_4nXfslaCmXvCbiyx83hKAug4E7EZb0u5abnQjLuKuao8qykWvd3zL3Lcg8_wcBuTT-reLSj1ohgeqAcUeY9pCgFr079SQwvuoAluDDvXrVYBJf7Zi9m8WICXP07wsCnfc-FV81fnMcis?key=kZakkkGQLkmGx9vdPn1-0A

Figure 9: Monthly submissions during 2025

Redirections

We wanted to know how many URLs were redirected to a different URL. We could guess from our past results that more URLs were redirected (since there was more Tier 1 infrastructure than Tier 2), but we still wanted to know a percentage to understand it better.

import plotly.express as px
import pandas as pd

# Count URLs with and without redirection
redirected_urls = 0
not_redirected_urls = 0

# Iterate through each item in the results

for item in all_results:
    attributes = item.get('attributes', {})
    original_url = attributes.get('url')
    last_final_url = attributes.get('last_final_url')

    # Check if both original_url and last_final_url exist and are different
    if original_url and last_final_url and original_url != last_final_url:
        redirected_urls += 1
    else:
        not_redirected_urls += 1

# Calculate the total number of URLs
total_urls = redirected_urls + not_redirected_urls

# Calculate percentages, handling the case where total_urls is 0 to avoid division by zero
percentage_redirected = (redirected_urls / total_urls) * 100 if total_urls > 0 else 0
percentage_not_redirected = (not_redirected_urls / total_urls) * 100 if total_urls > 0 else 0

# Print the results
print(f"URLs with redirection: {redirected_urls} ({percentage_redirected:.2f}%)")
print(f"URLs without redirection: {not_redirected_urls} ({percentage_not_redirected:.2f}%)")

# Create a DataFrame for the pie chart
data = {'Category': ['Redirected URLs', 'Not Redirected URLs'],
        'Count': [redirected_urls, not_redirected_urls]}
df_redirection_status = pd.DataFrame(data)

# Create the pie chart
fig = px.pie(df_redirection_status, values='Count', names='Category',
             title='Percentage of URLs with and without Redirection')

# Update traces to show text inside the pie chart
fig.update_traces(textinfo='percent+label', insidetextorientation='radial')

# Display the plot
fig.show()

AD_4nXcAjkQsF9Fofqm1OWvxoSyun_HeLBoR14SODhPnfqSv4tsqp31xyDcGlPYfTCDVMGptjCVQL5ZLPRQzlWstaPoMtdd0hECmxnt6Wl2r6-F2JP_oO--kuTeyfUCQACYgpMfz0MbQeQA?key=kZakkkGQLkmGx9vdPn1-0A

Figure 10: Percentage of URLs with and without Redirection

52.6% of the URLs redirect, which is interesting. This is because the other 47.4% seem to be phishing sites where users most likely have to enter their credit card details.

When we look at which domains have the most redirects, it's clear that the domains with the initial pattern booking.confirmation-id[5_numbers].com are seen the most. But, there are also other domains in the Top 10 redirections that don't match this pattern. These are also useful for possible YARA rules to help us watch this activity.

from urllib.parse import urlparse

last_final_urls = []
for item in all_results:
    attributes = item.get('attributes', {})
    original_url = attributes.get('url')
    last_final_url = attributes.get('last_final_url')
    # Only consider last_final_url if it exists and is different from the original_url
    if last_final_url and last_final_url != original_url:
        last_final_urls.append(last_final_url)

# Extract the domain from each last_final_url
domains = []
for url in last_final_urls:
    try:
        parsed_url = urlparse(url)
        domains.append(parsed_url.netloc)
    except Exception as e:
        print(f"Error parsing URL {url}: {e}")
        domains.append("Error parsing URL")

# Create a pandas Series from the list of domains and get the value counts
domain_counts = pd.Series(domains).value_counts().reset_index()
domain_counts.columns = ['Domain', 'Count']

# Display the domain counts
display(domain_counts)

Top 10 domains with the most redirects
Domain	Count
booking.id5225211246[.]world	62
booking.confirmation-id9918[.]com	25
booking.confirmation-id901823[.]com	25
booking.confirmation-id542[.]com	15
booking.confirmation-id089172[.]com	15
booking.id455512201[.]world	15
booking.confirmation-id190238[.]com	14
booking.confirmation-id987933[.]com	14
booking.confirmation-id4321[.]com	14
booking.confirmation-id89712[.]com	13

These are the final Top 10 domains where most redirects happen from Tier 1 to Tier 2. The first domain booking.id5225211246[.]world appears because many URLs on that same domain redirect, but to different parts of the site (different paths) as you can see in the next example.

URL	Final URL
https://booking.id5225211246[.]world/EC669QWO2	https://booking.id5225211246[.]world/YZJMYDLNV
https://booking.id5225211246[.]world/EC669QWO2	https://booking.id5225211246[.]world/UT4DOPJPB
https://booking.id5225211246[.]world/EC669QWO2	https://booking.id5225211246[.]world/ZPOCL8FBK
https://booking.id5225211246[.]world/EC669QWO2	https://booking.id5225211246[.]world/6WJCSCMOX
https://booking.id5225211246[.]world/EC669QWO2	https://booking.id5225211246[.]world/Y72WFFHD7
https://booking.id5225211246[.]world/EC669QWO2	https://booking.id5225211246[.]world/58H3YTAOO
https://booking.id5225211246[.]world/EC669QWO2	https://booking.id5225211246[.]world/QHKXP8VB2

For other domains, redirects from Tier 1 infrastructure led to different paths for the same domains, indicating that a single domain stored multiple phishing attempts.

URL	Final URL
https://rsvnmwww.stayiceland[.]com/	https://booking.confirmation-id9918[.]com/4106029014
http://rsvnenom.stayiceland[.]com/	https://booking.confirmation-id9918[.]com/4831933247
http://rsvnuitr.icestayland[.]com/	https://booking.confirmation-id9918[.]com/4718128210
http://rsvnokwc.icestayland[.]com/	https://booking.confirmation-id9918[.]com/4009187168
http://rsvnfbsz.icestayland[.]com/	https://booking.confirmation-id9918[.]com/4747003708
https://rsvnxgnz.icestayland[.]com/	https://booking.confirmation-id9918[.]com/4548282193
https://rsvnjzjp.icestayland[.]com/	https://booking.confirmation-id9918[.]com/4555634971
https://rsvndobm.stayiceland[.]com/	https://booking.confirmation-id9918[.]com/4562368486

Interesting keywords

We used Gemini to look at all the domain names from the URLs we had access to. Our goal was to find keywords that repeat in these domain names. This helps us find patterns to make new detections.

The main keywords identified by Gemini were the following:

keywords = ["booking", "reservation", "reserv", "id", "guest", "hotel", 
"confirm", "confrim", "confirmation"]

The fun part is that Gemini gave different options for some words. For example, the keyword reservation was seen as widely used, but it also offered reserv as an alternative. The reason was that some domains included the misspelling reservetion. In the same way, we found domains with confirmed, which explains the choices between confirm, confirmation and confrim.

from urllib.parse import urlparse
import pandas as pd
import plotly.express as px

keywords = ["booking", "reservation", "reserv", "id", "guest", "hotel", "confirm", "confrim", "confirmation"] 
keyword_counts_original_url = {keyword: 0 for keyword in keywords}

for item in all_results:
    attributes = item.get('attributes', {})
    url = attributes.get('url')
    if url:
        try:
            parsed_url = urlparse(url)
            netloc = parsed_url.netloc.lower()
            for keyword in keywords:
                if keyword in netloc:
                    keyword_counts_original_url[keyword] += 1
                    #break  # Count each URL only once even if it contains multiple keywords
        except Exception as e:
            print(f"Error parsing URL {url}: {e}")

# Create a DataFrame for plotting
df_keyword_counts_original_url = pd.DataFrame(list(keyword_counts_original_url.items()), columns=['Keyword', 'Count'])

# Create a bar chart using Plotly Express
fig = px.bar(df_keyword_counts_original_url, x='Keyword', y='Count',
             title='Count of Original URLs Containing Specific Keywords in Domain Names')

# Display the plot
fig.show()

AD_4nXcWQVJOnZpl524vCNwBRqGINk_RsFN8m-aqvGsgwADpg5fzIgWGog5AQa6075yfyXS2g9AZMaIgktHcOKzfYC5PtQpPipx7zSaLMqcUu2GeujB1Z_5ZHDOUYzzQdCczfRjndo8osdc?key=kZakkkGQLkmGx9vdPn1-0A

Figure 11: (image above) keywords identified in the initial URLs.

(image below) keywords identified in the final URLs (redirections).

For the final URLs redirections (figure below), there are 4 main keywords used: booking, id, confirmation, and confirm (which is part of confirmation). These words are used intentionally, as they are on the final domains where victims will enter their information.

On the other hand, the keywords for the initial URLs (redirectors) domains are more varied. For example, hotel, guest, and reserv are also widely used, along with booking.

Detections

When analyzing the overall campaign, it's important to consider the detection rates of the identified URLs by various security vendors. Across all the URLs gathered from both Tier 1 and Tier 2 infrastructure, a significant portion has been flagged with 0 and 1 detections by the security vendors. It underscores a potential gap in current URL detections and highlights the need to improve it.

import pandas as pd
import plotly.express as px
from collections import Counter

malicious_counts = []
for item in all_results:
    attributes = item.get('attributes', {})
    last_analysis_stats = attributes.get('last_analysis_stats', {})
    malicious = last_analysis_stats.get('malicious', 0)  # Default to 0 if not present
    malicious_counts.append(malicious)

# Count the occurrences of each malicious count
malicious_count_distribution = Counter(malicious_counts)

# Convert to a DataFrame for plotting
df_malicious_counts = pd.DataFrame(list(malicious_count_distribution.items()), columns=['Malicious Count', 'Number of URLs'])

# Sort by the malicious count for better visualization
df_malicious_counts = df_malicious_counts.sort_values(by='Malicious Count')

# Create a bar chart
fig = px.bar(df_malicious_counts, x='Malicious Count', y='Number of URLs',
             title='Distribution of Malicious Detections per URL')

# Display the plot
fig.show()

AD_4nXe3hOKKrIMJDI5ShnPkCIhPOLwPBIN9VWuYU9tZ_eU18qNlm2VM2K-lYZQM5YX7mSxCb3aB8l9QwRZteLHtMs-D3BmU8JF2EXpyPxxu_sVQpLYD-2e5icpMDQeLYSUAcaZkQ55HM6o?key=kZakkkGQLkmGx9vdPn1-0A

Figure 12: Detections per URL identified.

We've mapped the campaign's vast infrastructure and uncovered its hidden patterns, but what if we could peer directly into the threat actors' operations? Part 3 takes you beyond the infrastructure, revealing a rare glimpse into the files and communications that power this sophisticated phishing scheme.

Be the first to reply!

Analyzing the whole campaign

Obtaining Tier 1 and Tier 2 URLs

Timeline

Redirections

Interesting keywords

Detections

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded