Summary
The current SonarQube API sync has a few design choices that could become problematic as the number of projects and findings grows. This issue proposes improvements in two areas: sync efficiency and default import scope.
1. Sync efficiency
Current behavior
The periodic sync task update_findings_from_source_issues (running every 3 hours) currently works as follows:
findings = Finding.objects.filter(sonarqube_issue__isnull=False, active=True) # entire table, all at once
for finding in findings:
client, _ = SonarQubeApiImporter.prepare_client(finding.test) # new client + DB queries per finding
issue = client.get_issue(sonarqube_issue.key) # 1 HTTP request per finding
With 10,000 synced findings across many projects, this results in:
10,000 sequential HTTP requests to SonarQube — one per finding, no batching
A new requests.Session and several DB queries per finding to reconstruct the client
All matching findings loaded into memory at once (no iterator() or chunking)
Everything in a single Celery task — no fan-out, no parallelism
No try/except per finding: one network error aborts the entire run for all remaining findings
Suggested improvements
a) Batch issue key lookups
GET /api/issues/search accepts a comma-separated issues parameter. Findings could be grouped by SonarQube instance and project, then fetched in batches of up to 500 keys per request — reducing 10,000 HTTP calls to ~20.
Note: GET /api/issues/pull also exists and supports a changedSince timestamp for incremental fetches, but it is marked as an internal endpoint and requires both projectKey and branchName as mandatory parameters. Given that DefectDojo tracks branch at the Test level rather than per-finding, this endpoint is not suitable for the sync use case as it stands.
b) Group by project and reuse the client
Construct the client once per project (rather than once per finding) to eliminate redundant DB queries and HTTP session overhead within each sync run.
c) Use Finding.objects.iterator()
Use Django’s iterator() to avoid loading the entire result set into memory when syncing large numbers of findings.
d) Fan out to per-project Celery subtasks
Fan out into per-project Celery subtasks so that:
e) Wrap each finding's update in try/except
Wrap each finding update in try/except so that:
f) Consider SonarQube webhooks as a complement
SonarQube Server supports outbound webhooks that fire after each analysis. These could trigger a targeted re-import in DefectDojo immediately after a scan completes. With webhooks handling the freshness concern, the full periodic sync could be reduced from every 3 hours to once per day — serving only as a safety net for manual SQ
status changes or missed webhook deliveries.
2. Default import scope includes non-security issue types
Current behavior
When a SonarQube API import is configured without specifying the Extras field, DefectDojo imports all issue types:
BUG
VULNERABILITY
CODE_SMELL
SECURITY_HOTSPOT
Given that DefectDojo is a security-focused tool, importing BUG and CODE_SMELL findings:
Inflates finding counts and pollutes security dashboards
Skews metrics such as MTTR and severity distribution
Adds noise to deduplication logic
Suggested change
Change the default fallback to import only security-relevant types:
VULNERABILITY,SECURITY_HOTSPOT
Users who need BUG and CODE_SMELL can opt in explicitly via the Extras field. A short note in the documentation and tool configuration UI would make this opt-in visible.
Impact summary
Impact summary
Default to VULNERABILITY,SECURITY_HOTSPOT
Effort: Low — Impact: High (cleaner data for all new integrations).
try/except per finding in sync loop
Effort: Low — Impact: Medium (prevents full-task aborts).
iterator() on findings queryset
Effort: Low — Impact: Medium (reduces memory pressure).
Batch issue key lookups (up to 500/request)
Effort: Medium — Impact: High (eliminates N+1 HTTP calls).
Group by project, reuse client
Effort: Medium — Impact: High (eliminates N+1 client construction).
Per-project Celery fan-out
Effort: Medium — Impact: Medium (improves isolation and throughput).
Webhook-triggered import
Effort: High — Impact: High (near-realtime sync, reduces polling load).
I'll be happy to help potential contributors as a Sonar expert
Summary
The current SonarQube API sync has a few design choices that could become problematic as the number of projects and findings grows. This issue proposes improvements in two areas: sync efficiency and default import scope.
1. Sync efficiency
Current behavior
The periodic sync task
update_findings_from_source_issues(running every 3 hours) currently works as follows:With 10,000 synced findings across many projects, this results in:
10,000 sequential HTTP requests to SonarQube — one per finding, no batching
A new
requests.Sessionand several DB queries per finding to reconstruct the clientAll matching findings loaded into memory at once (no
iterator()or chunking)Everything in a single Celery task — no fan-out, no parallelism
No
try/exceptper finding: one network error aborts the entire run for all remaining findingsSuggested improvements
a) Batch issue key lookups
GET /api/issues/searchaccepts a comma-separatedissuesparameter. Findings could be grouped by SonarQube instance and project, then fetched in batches of up to 500 keys per request — reducing 10,000 HTTP calls to ~20.b) Group by project and reuse the client
Construct the client once per project (rather than once per finding) to eliminate redundant DB queries and HTTP session overhead within each sync run.
c) Use
Finding.objects.iterator()Use Django’s
iterator()to avoid loading the entire result set into memory when syncing large numbers of findings.d) Fan out to per-project Celery subtasks
Fan out into per-project Celery subtasks so that:
A slow or unreachable SonarQube instance no longer blocks syncs for unrelated projects
Each project's sync can be retried independently
e) Wrap each finding's update in
try/exceptWrap each finding update in
try/exceptso that:One transient network error does not abort the entire 3‑hour sync task
The run can continue and log failures individually
f) Consider SonarQube webhooks as a complement
SonarQube Server supports outbound webhooks that fire after each analysis. These could trigger a targeted re-import in DefectDojo immediately after a scan completes. With webhooks handling the freshness concern, the full periodic sync could be reduced from every 3 hours to once per day — serving only as a safety net for manual SQ status changes or missed webhook deliveries.
2. Default import scope includes non-security issue types
Current behavior
When a SonarQube API import is configured without specifying the Extras field, DefectDojo imports all issue types:
BUGVULNERABILITYCODE_SMELLSECURITY_HOTSPOTGiven that DefectDojo is a security-focused tool, importing
BUGandCODE_SMELLfindings:Inflates finding counts and pollutes security dashboards
Skews metrics such as MTTR and severity distribution
Adds noise to deduplication logic
Suggested change
Change the default fallback to import only security-relevant types:
Users who need
BUGandCODE_SMELLcan opt in explicitly via the Extras field. A short note in the documentation and tool configuration UI would make this opt-in visible.Impact summary
Impact summaryDefault to VULNERABILITY,SECURITY_HOTSPOT
Effort: Low — Impact: High (cleaner data for all new integrations).
try/except per finding in sync loop
Effort: Low — Impact: Medium (prevents full-task aborts).
iterator() on findings queryset
Effort: Low — Impact: Medium (reduces memory pressure).
Batch issue key lookups (up to 500/request)
Effort: Medium — Impact: High (eliminates N+1 HTTP calls).
Group by project, reuse client
Effort: Medium — Impact: High (eliminates N+1 client construction).
Per-project Celery fan-out
Effort: Medium — Impact: Medium (improves isolation and throughput).
Webhook-triggered import
Effort: High — Impact: High (near-realtime sync, reduces polling load).
I'll be happy to help potential contributors as a Sonar expert