Skip to main content
Question

Lower reCAPTCHA v3 Scores on UAT (Multi-Server Setup) Compared to QA (Single Server) Using Same Keys

  • January 12, 2026
  • 2 replies
  • 96 views

SagarFromAurusTech
Forum|alt.badge.img

We are raising this concern to the Google Cloud Security / reCAPTCHA community to understand whether the behavior we are observing is expected and to seek guidance on best practices.

After deploying reCAPTCHA v3 to our UAT environment, we are consistently receiving lower scores compared to our QA environment, even though the same reCAPTCHA site key and secret key are used in both environments. The QA setup runs on a single server, whereas the UAT environment is deployed across multiple servers with different IP addresses.

From an application perspective, the reCAPTCHA integration is functioning correctly in both environments. However, the score variation is impacting our ability to apply a consistent validation threshold.

We would like to understand:

  • Is it expected behavior for reCAPTCHA v3 scores to differ between a single-server setup and a multi-server setup with multiple IP addresses, even when using the same keys?

  • Does reCAPTCHA v3’s risk analysis consider server IP distribution or environment topology when calculating scores?

  • Are there recommended best practices for defining and tuning thresholds in multi-server UAT environments?

Additionally, for production:

  • In a production setup with multiple nodes and multiple servers, is it recommended to use the same reCAPTCHA site key across all production nodes?

  • Are there any risks or limitations associated with using a single key across multiple production servers, or should separate keys be considered?

Any guidance or recommendations from the community would be greatly appreciated.

2 replies

SagarFromAurusTech
Forum|alt.badge.img

Following up on this thread, as we have not received any response yet and this issue is blocking our UAT validation.

To reiterate, we are observing consistently lower reCAPTCHA v3 scores in a multi-server UAT setup compared to a single-server QA setup, despite using the same site key and secret key and having a correctly implemented integration in both environments. The score variance is significant enough to impact threshold decisions and transaction outcomes.

We are specifically looking for clarity from the community or Google experts on the following points:

  • Is it expected behavior for reCAPTCHA v3 scores to differ between a single-server environment and a multi-server environment with multiple IP addresses, even when the same keys are used?

  • Does reCAPTCHA v3’s risk analysis model consider server IP distribution, environment topology, or traffic patterns across multiple nodes while calculating scores?

  • Are there recommended best practices for defining and tuning thresholds in multi-server UAT or staging environments, where traffic volume and patterns differ from QA?

Additionally, for production environments:

  • In a setup with multiple nodes and multiple servers for each node, is it recommended to use the same reCAPTCHA site key across all production nodes?

  • Are there any known risks, limitations, or scoring side effects when using a single key across multiple production servers, or should separate keys per node or each application of each node be considered?

Any guidance, official clarification, or real-world experience shared by the community would be extremely helpful, as this information is critical for defining a stable and secure rollout strategy.


JustAnother22
Forum|alt.badge.img

1. Is it expected behavior for scores to differ between environments (QA vs. UAT)?

Yes, this is expected behavior.

  • reCAPTCHA v3 uses adaptive risk analysis that "learns" by observing real traffic on your site. Scores in non-production environments (like UAT, Staging, or QA) often differ from Production because they lack the volume and diversity of real human user behavior.

  • The "multi-server" aspect is NOT critical than the nature of the traffic. How is the test traffic being generated? UAT environments often feature:

    • Repetitive behavior (testers doing the same actions).

    • Internal IP addresses that may not have a "good" history established with Google.

    • Here is a relevant article from the team. 

2. Does risk analysis consider server IP distribution or topology?

Answer is a bit nuanced. The reCAPTCHA score is primarily generated based on the client's interaction, clients IPs and not your server's backend topology.

  • However, if your "multi-server" UAT setup involves internal testers accessing the site via a specific set of corporate IPs (VPN, NAT), Google's risk model sees all traffic coming from those few IPs. If those IPs are associated with automated traffic or lack "human" history, the scores will be lower.

  • Crucial Check: Ensure your multi-server setup is correctly forwarding the end-user's IP to the reCAPTCHA API (if you are creating assessments serverside). If your servers are passing their own internal IPs instead of the user's IP, Google will see all traffic as coming from a bot/server farm.

3. Best Practices for UAT & Thresholds

  • Separate Keys: It is highly recommended to use separate keys for Production and Non-Production (UAT/Staging) environments. This prevents your UAT testing (which looks like bot traffic) from polluting the risk model for your Production site.

  • Lower Thresholds in UAT: Since UAT scores are naturally lower/inaccurate due to lack of real data, you should set a lower threshold (e.g., 0.1 or 0.2) for the UAT environment, or set reCAPTCHA to "Audit Mode" (log scores but do not block) so validation testing isn't blocked.

  • Allowlist Testing IPs: If using reCAPTCHA Enterprise, you can create testing keys or allowlist specific testing IPs.

4. Production: Single Key vs. Multiple Keys

  • Same Key for All Nodes: YES. In a production setup with multiple nodes/servers serving the same application (same domain), you should use the same Site Key across all nodes.

  • Why? reCAPTCHA aggregates data to learn patterns. Using a single key allows it to see the full picture of your legitimate traffic across all servers, improving its accuracy. Splitting keys per node would fragment the data, making the model less effective and potentially lowering scores because each key sees less "good" traffic.

  • Risks: There are no risks to using a single key across multiple servers, provided they are for the same logical website/domain.