Lower reCAPTCHA v3 Scores on UAT (Multi-Server Setup) Compared to QA (Single Server) Using Same Keys

Question

We are raising this concern to the Google Cloud Security / reCAPTCHA community to understand whether the behavior we are observing is expected and to seek guidance on best practices.

After deploying reCAPTCHA v3 to our UAT environment, we are consistently receiving lower scores compared to our QA environment, even though the same reCAPTCHA site key and secret key are used in both environments. The QA setup runs on a single server, whereas the UAT environment is deployed across multiple servers with different IP addresses.

From an application perspective, the reCAPTCHA integration is functioning correctly in both environments. However, the score variation is impacting our ability to apply a consistent validation threshold.

We would like to understand:

Is it expected behavior for reCAPTCHA v3 scores to differ between a single-server setup and a multi-server setup with multiple IP addresses, even when using the same keys?
Does reCAPTCHA v3’s risk analysis consider server IP distribution or environment topology when calculating scores?
Are there recommended best practices for defining and tuning thresholds in multi-server UAT environments?

Additionally, for production:

In a production setup with multiple nodes and multiple servers, is it recommended to use the same reCAPTCHA site key across all production nodes?
Are there any risks or limitations associated with using a single key across multiple production servers, or should separate keys be considered?

Any guidance or recommendations from the community would be greatly appreciated.

JustAnother22 · Answer

1. Is it expected behavior for scores to differ between environments (QA vs. UAT)?

Yes, this is expected behavior.

reCAPTCHA v3 uses adaptive risk analysis that "learns" by observing real traffic on your site. Scores in non-production environments (like UAT, Staging, or QA) often differ from Production because they lack the volume and diversity of real human user behavior.
The "multi-server" aspect is NOT critical than the nature of the traffic. How is the test traffic being generated? UAT environments often feature:
- Repetitive behavior (testers doing the same actions).
- Internal IP addresses that may not have a "good" history established with Google.
- Here is a relevant article from the team.

2. Does risk analysis consider server IP distribution or topology?

Answer is a bit nuanced. The reCAPTCHA score is primarily generated based on the client's interaction, clients IPs and not your server's backend topology.

However, if your "multi-server" UAT setup involves internal testers accessing the site via a specific set of corporate IPs (VPN, NAT), Google's risk model sees all traffic coming from those few IPs. If those IPs are associated with automated traffic or lack "human" history, the scores will be lower.
Crucial Check: Ensure your multi-server setup is correctly forwarding the end-user's IP to the reCAPTCHA API (if you are creating assessments serverside). If your servers are passing their own internal IPs instead of the user's IP, Google will see all traffic as coming from a bot/server farm.

3. Best Practices for UAT & Thresholds

Separate Keys: It is highly recommended to use separate keys for Production and Non-Production (UAT/Staging) environments. This prevents your UAT testing (which looks like bot traffic) from polluting the risk model for your Production site.
Lower Thresholds in UAT: Since UAT scores are naturally lower/inaccurate due to lack of real data, you should set a lower threshold (e.g., 0.1 or 0.2) for the UAT environment, or set reCAPTCHA to "Audit Mode" (log scores but do not block) so validation testing isn't blocked.
Allowlist Testing IPs: If using reCAPTCHA Enterprise, you can create testing keys or allowlist specific testing IPs.

4. Production: Single Key vs. Multiple Keys

Same Key for All Nodes: YES. In a production setup with multiple nodes/servers serving the same application (same domain), you should use the same Site Key across all nodes.
Why? reCAPTCHA aggregates data to learn patterns. Using a single key allows it to see the full picture of your legitimate traffic across all servers, improving its accuracy. Splitting keys per node would fragment the data, making the model less effective and potentially lowering scores because each key sees less "good" traffic.
Risks: There are no risks to using a single key across multiple servers, provided they are for the same logical website/domain.

1. Is it expected behavior for scores to differ between environments (QA vs. UAT)?

2. Does risk analysis consider server IP distribution or topology?

3. Best Practices for UAT & Thresholds

4. Production: Single Key vs. Multiple Keys

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded