Author: David Nehoda, Technical Solutions Consultant
What to Collect, Why, and How
---
Introduction: Why This Guide Exists
Microsoft publishes a lot of event documentation. Almost none of it tells you:
- Which events are worth paying to ingest
- Where each field lands after a SIEM normalizes it
- Which event IDs Microsoft has renamed or changed between Windows versions, Defender schemas, and old versus current Entra ID audit logs
- How to detect the TTPs Microsoft's own EDR often misses (DCSync, BYOVD, consent phishing, AS-REP roasting)
This guide fills all four gaps for Google SecOps customers running Microsoft-heavy stacks. Every event ID, every UDM field, every YARA-L rule has been written so that a rule authored against it keeps firing after Microsoft ships its next schema change.
This is the document your Microsoft rep will not hand you.
Executive Summary
| Dimension | Default Microsoft Ingest | Strict UDM Mapping | Impact |
| **Cost** | Petabytes of WFP, DCOM, DFS-R noise. Defender raw telemetry duplicated across Event Hub and MDE connector. | 40-60% ingest reduction by filtering at the collector, before data enters SecOps billing. | Save $100k-500k/year in cloud logging |
| **Coverage** | Vendor-specific fields: on-prem AD says `TargetUserName`, Entra ID says `userPrincipalName`, Defender says `AccountName`, O365 says `UserId`. Four fields for one concept. | One UDM path, `principal.user.userid`, for every source. A single rule correlates across all four. | One rule detects across all Microsoft sources |
| **Maintenance** | Microsoft renames fields silently. Sentinel KQL breaks on the next Defender schema revision. | Parser layer absorbs upstream changes. Rules keep firing. | Detection rules survive Microsoft updates |
| **Detection** | Bolt-on per-product detections that do not share entity context. | Cross-source rules that track an adversary from an O365 phishing click, through a Sysmon PowerShell execution, to a Kerberos ticket forgery, in one query. | End-to-end attack chain detection |
Section 1: The Ingest Math You Are Actually Paying For
Before mapping fields, understand where the volume comes from. These are observed averages across commercial environments.
Typical Daily Volume per Host
| Source | Volume | Dominant Event Types | Signal Density |
| Domain Controller Security log | 500 MB to 2 GB | 4624, 4634, 4769, 5156, 5157 | Low (WFP events dominate) |
| Workstation Security log | 20 MB to 100 MB | 4624, 4648, 4672, 4688 | Medium-High (if CLA enabled) |
| Sysmon (unfiltered) | 500 MB to 2 GB per host | Event 3 (network), Event 22 (DNS) | Very High (raw volume crushes budget) |
| Sysmon (tuned) | 50 MB to 200 MB per host | Event 1, 3, 7, 10, 11, 22 | Very High |
| Defender for Endpoint raw | 100 MB to 500 MB per host | DeviceProcessEvents, DeviceNetworkEvents | High (80% overlap with Sysmon) |
| Defender alerts only | <1 MB per host | AlertEvidence, AlertInfo | Highest signal density |
| O365 Management Activity API | 1-10 MB per user | UserLoggedIn, FileAccessed, Send | High for admins, noisy for FileAccessed |
| Entra ID sign-in logs | 500 KB to 5 MB per user | Interactive, non-interactive, service principal | High for risky, medium for routine |
Key Insight
On a 500-endpoint domain with 5 DCs:
- **Default ingestion**: 2.5 to 10 GB per day of Windows Filtering Platform events (5156, 5157)
- **Those events**: Duplicate every firewall log you already have
- **Impact**: Cutting just those two event IDs at the collector removes **40-70% of Security log volume** with zero detection impact
---
Section 2: How Microsoft Data Gets Into SecOps
Pick the wrong ingestion path and you either pay twice or lose fields.
| Path | Best For | Watch Out For |
| **Bindplane Agent** (recommended) | Windows, Linux, Sysmon, Defender API, O365 API | Requires outbound HTTPS to SecOps ingest endpoint. Replaces legacy Chronicle Forwarder. |
| **Chronicle Forwarder** (legacy) | Existing syslog and file-based collection | On maintenance. New deployments should use Bindplane. |
| **Defender for Endpoint API direct** | Defender alerts and raw telemetry | Rate-limited. Use Bindplane's MDE connector, not a custom script. |
| **Azure Event Hub bridge** | Entra ID sign-in, Azure Activity log, M365 DLP | Microsoft charges egress. Budget 10-20% extra beyond Event Hub cost. |
| **Office 365 Management Activity API** | Exchange, SharePoint, Teams, audit trail | Subscription model. Content fetched by pulling URIs with replay windows. Configure correctly or lose events. |
| **Winlogbeat or NXLog over syslog** | Environments with no agent deployment budget | Loses Sysmon XML structure. Parser has to reconstruct fields. Not recommended for greenfield. |
Rule of Thumb
**Bindplane for endpoints and servers, native SecOps feed for O365 and Entra ID, skip Event Hub entirely unless you need Azure Activity logs.**
Collector Placement Matters
Filtering in Bindplane is **free**. Filtering inside SecOps costs ingest dollars because you have already paid to ship the data.
**Every filter in this guide belongs in the collector config, not in a SecOps rule exclusion.**
Section 3: Noise Reduction - What NOT to Ingest
These event IDs are either near-zero signal or duplicate another source. **Drop them at the collector.**
| Event ID | Source | Why It's Noise | Volume Saved |
| 4688 (without CLA) | Security | Process creation without command line. Sysmon Event 1 replaces it entirely. | 10-30 MB/workstation/day |
| 5156 / 5157 | Security (WFP) | Windows Filtering Platform allow/block. Duplicates firewall and Sysmon 3 at 10x volume. | 40-70% of DC log volume |
| 5145 | Security | Detailed file share access check. Fires on every SMB open, close, permission check. | 200-800 MB/file server/day |
| 4670 | Security | "Permissions changed." Fires on routine registry reads, file ACLs, service config checks. | 20-100 MB/host/day |
| 10016 | System | DCOM permission errors from known Windows bugs. Cosmetic, not security. | 5-50 MB/host/day |
| 4663 (without targeted SACL) | Security | Object access. Without SACLs on sensitive objects, fires on everything. | Unbounded |
| 6005 / 6006 | System | Event log service started/stopped. Heartbeat data, not security. | 1 MB/host/day |
| 4674 (routine) | Security | Privileged object operation. Admin tools generate thousands/day. | 50-200 MB/host/day |
| 1102 (on DCs) | Security | Audit log cleared. Critical on workstations, routine on DCs during log rotation. | 1 MB/DC |
What You DO Want
**Deploy Sysmon** with a tuned config (SwiftOnSecurity baseline or Olaf Hartong modular). Replaces most Security log needs with richer parent-child context.
**Collect Security events selectively**: 4624, 4625, 4648, 4672, 4697, 4698, 4702, 4720, 4722, 4724, 4726, 4728, 4732, 4740, 4756, 4768, 4769, 4771, 4776, 4662 (with proper SACL), 4719, 1102, 7045.
**Collect Sysmon events**: 1, 3, 6, 7, 8, 10, 11, 12, 13, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26.
**Collect all Defender alerts**. Pre-filtered by Microsoft's detection engine, highest signal-to-noise ratio.
**Collect O365 operations**: UserLoggedIn, MailboxLogin (non-owner), New-InboxRule, Set-InboxRule, Add-MailboxPermission, FileDownloaded, FileShared, AnonymousLinkCreated, Add-RoleGroupMember, Set-AdminAuditLogConfig, Consent to application, Add OAuth2PermissionGrant, eDiscovery search created.
**Collect Entra ID sign-in and audit logs in full**. Volume is low, signal is high.
---
Unified Data Model (UDM) Mapping Cheat Sheet
Core Concepts
For rule authoring. Print it, keep it next to your screen.
| Security Concept | UDM Field Path | Common Sources |
| Who authenticated | `principal.user.email_addresses`, `principal.user.userid` | AD 4624, O365 UserLoggedIn, Entra ID |
| What account was targeted | `target.user.userid`, `target.user.email_addresses` | AD 4720, 4728, O365 Add-MailboxPermission |
| Source IP | `principal.ip` | All authentication events |
| Destination IP | `target.ip` | Sysmon 3, firewall, proxy |
| Source hostname | `principal.hostname` | Process launch, auth events |
| Destination hostname | `target.hostname` | Defender alerts, network connections |
| Process executed | `target.process.file.full_path` | Sysmon 1, Security 4688 |
| Parent process | `principal.process.file.full_path` | Sysmon 1 |
| Command line | `target.process.command_line` | Sysmon 1 (with CLA) |
| Parent command line | `principal.process.command_line` | Sysmon 1 |
| Process SHA256 | `target.process.file.sha256` | Sysmon 1 |
| File SHA256 | `target.file.sha256` | Sysmon 11, Defender |
| DNS query | `network.dns.questions.name` | Sysmon 22, DNS servers |
| Registry key | `target.registry.registry_key` | Sysmon 12, 13 |
| Registry value | `target.registry.registry_value_data` | Sysmon 13 |
| Kerberos encryption type | `security_result.detection_fields["TicketEncryptionType"]` | AD 4768, 4769 |
| Access mask | `security_result.detection_fields["Properties"]` | AD 4662 |
| Alert severity | `security_result.severity` | Defender, any alerting source |
| Alert summary | `security_result.summary` | Defender, SCC |
| Log source type | `metadata.log_type` | WINEVTLOG, O365, AZURE_AD, etc. |
| Specific event ID | `metadata.product_event_type` | Maps to Windows Event ID or O365 Operation |
| Event timestamp | `metadata.event_timestamp.seconds` | Used for correlation |
| User agent | `network.http.user_agent` | Entra ID sign-in, proxy logs |
| Geolocation country | `principal.location.country_or_region` | Entra ID sign-in |
| Conditional Access result | `security_result.action` (ALLOW, BLOCK, CHALLENGE) | Entra ID sign-in |
Common Pitfalls and How to Avoid Them
1. Command Line Auditing Not Enabled
Default Windows logs Event 4688 without command-line arguments. Without CLA, 4688 says "cmd.exe ran" and nothing else. Sysmon 1 always has the command line.
**Fix**: Enable CLA via GPO: Computer Configuration > Administrative Templates > System > Audit Process Creation > Include command line in process creation events.
2. Sysmon Deployed With Default Config
The default `sysmonconfig-sample.xml` is not production-ready. It logs everything. On a typical workstation: 1.5 GB per day.
**Fix**: Use SwiftOnSecurity's tuned config or Olaf Hartong's modular config. Review quarterly.
3. Parser Silently Changes Your Rules
Chronicle parsers update. When a parser changes how a field is mapped, your rule does not throw an error. It just stops firing.
**Example**: The WINEVTLOG parser historically placed `TicketEncryptionType` under `additional.fields`, then moved it to `security_result.detection_fields`. Rules written against the old path went silent.
**Fix**: Validate every rule before and after parser updates.(new parser impact tool coming soon)
4. Collecting Wrong Events on DCs vs Workstations
Event 1102 (audit log cleared) is critical on workstations (strong anti-forensics signal) and routine on DCs (fires during log rotation). Event 5145 (detailed file share access) is noise on workstations and valuable on file servers.
**Fix**: Build different collector configs for different host roles.
5. Not Using Reference Lists
A rule that bans PowerShell execution in Temp folders fires on every admin who runs `powershell -File C:\temp\install.ps1`. A rule that bans PowerShell in Temp folders, excluding approved admin workstations, fires only on real attackers.
**Fix**: Reference lists are the difference between alert storm and precision detection.
6. Deploying Rules Without Validating
Microsoft's schema changes and your field mapping assumptions both cause silent rule failure. Before shipping a rule to production, ingest synthetic events that satisfy trigger conditions and confirm the rule fires.
**Fix**: Validate against both positive and negative test cases before deployment.
Summary: What You've Learned
✅ Understand the ingest math: 40-70% cost reduction by filtering at collector
✅ Know the ingestion paths: Bindplane for endpoints, native feeds for O365/Entra ID
✅ Identify noise: Drop WFP, DCOM, routine DFS-R events at collector
✅ Master UDM mapping: One field path per concept across all Microsoft sources
✅ Avoid common pitfalls: Enable CLA, tune Sysmon, validate rules, use reference lists
**Next**: Part 2 covers on-premises detection (Active Directory, Kerberos, Sysmon, Windows Security events) and production-ready YARA-L rules for each.
