Skip to content

SonarQube API sync: performance and default configuration improvements #14732

@sylvain-combe-sonarsource

Description

Summary

The current SonarQube API sync has a few design choices that could become problematic as the number of projects and findings grows. This issue proposes improvements in two areas: sync efficiency and default import scope.


1. Sync efficiency

Current behavior

The periodic sync task update_findings_from_source_issues (running every 3 hours) currently works as follows:

findings = Finding.objects.filter(sonarqube_issue__isnull=False, active=True)  # entire table, all at once
for finding in findings:
    client, _ = SonarQubeApiImporter.prepare_client(finding.test)  # new client + DB queries per finding
    issue = client.get_issue(sonarqube_issue.key)                  # 1 HTTP request per finding

With 10,000 synced findings across many projects, this results in:

  • 10,000 sequential HTTP requests to SonarQube — one per finding, no batching

  • A new requests.Session and several DB queries per finding to reconstruct the client

  • All matching findings loaded into memory at once (no iterator() or chunking)

  • Everything in a single Celery task — no fan-out, no parallelism

  • No try/except per finding: one network error aborts the entire run for all remaining findings

Suggested improvements

a) Batch issue key lookups

GET /api/issues/search accepts a comma-separated issues parameter. Findings could be grouped by SonarQube instance and project, then fetched in batches of up to 500 keys per request — reducing 10,000 HTTP calls to ~20.

Note: GET /api/issues/pull also exists and supports a changedSince timestamp for incremental fetches, but it is marked as an internal endpoint and requires both projectKey and branchName as mandatory parameters. Given that DefectDojo tracks branch at the Test level rather than per-finding, this endpoint is not suitable for the sync use case as it stands.

b) Group by project and reuse the client

Construct the client once per project (rather than once per finding) to eliminate redundant DB queries and HTTP session overhead within each sync run.

c) Use Finding.objects.iterator()

Use Django’s iterator() to avoid loading the entire result set into memory when syncing large numbers of findings.

d) Fan out to per-project Celery subtasks

Fan out into per-project Celery subtasks so that:

  • A slow or unreachable SonarQube instance no longer blocks syncs for unrelated projects

  • Each project's sync can be retried independently

e) Wrap each finding's update in try/except

Wrap each finding update in try/except so that:

  • One transient network error does not abort the entire 3‑hour sync task

  • The run can continue and log failures individually

f) Consider SonarQube webhooks as a complement

SonarQube Server supports outbound webhooks that fire after each analysis. These could trigger a targeted re-import in DefectDojo immediately after a scan completes. With webhooks handling the freshness concern, the full periodic sync could be reduced from every 3 hours to once per day — serving only as a safety net for manual SQ status changes or missed webhook deliveries.


2. Default import scope includes non-security issue types

Current behavior

When a SonarQube API import is configured without specifying the Extras field, DefectDojo imports all issue types:

  • BUG

  • VULNERABILITY

  • CODE_SMELL

  • SECURITY_HOTSPOT

Given that DefectDojo is a security-focused tool, importing BUG and CODE_SMELL findings:

  • Inflates finding counts and pollutes security dashboards

  • Skews metrics such as MTTR and severity distribution

  • Adds noise to deduplication logic

Suggested change

Change the default fallback to import only security-relevant types:

VULNERABILITY,SECURITY_HOTSPOT

Users who need BUG and CODE_SMELL can opt in explicitly via the Extras field. A short note in the documentation and tool configuration UI would make this opt-in visible.


Impact summary

Impact summary

Default to VULNERABILITY,SECURITY_HOTSPOT
Effort: Low — Impact: High (cleaner data for all new integrations).

try/except per finding in sync loop
Effort: Low — Impact: Medium (prevents full-task aborts).

iterator() on findings queryset
Effort: Low — Impact: Medium (reduces memory pressure).

Batch issue key lookups (up to 500/request)
Effort: Medium — Impact: High (eliminates N+1 HTTP calls).

Group by project, reuse client
Effort: Medium — Impact: High (eliminates N+1 client construction).

Per-project Celery fan-out
Effort: Medium — Impact: Medium (improves isolation and throughput).

Webhook-triggered import
Effort: High — Impact: High (near-realtime sync, reduces polling load).


I'll be happy to help potential contributors as a Sonar expert

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions