Skip to content

Commit d6e9023

Browse files
userhas404danassethCopilot
committed
feat(extensions,presets): authenticate GitHub-hosted catalog and download requests with GITHUB_TOKEN/GH_TOKEN
Squashed from #2087 (original author: @anasseth). Adds GitHub-token authentication to extension and preset catalog fetching and ZIP downloads so private GitHub repos work when GITHUB_TOKEN/GH_TOKEN is set, while preventing credential leakage to non-GitHub hosts. - Introduces shared _github_http module with build_github_request() and open_github_url() helpers - Routes ExtensionCatalog and PresetCatalog network calls through GitHub-auth-aware opener - Adds comprehensive unit/integration tests for auth header behavior - Updates user docs for both extensions and presets Co-authored-by: anasseth <16745089+anasseth@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 8fefd2a commit d6e9023

7 files changed

Lines changed: 482 additions & 20 deletions

File tree

extensions/EXTENSION-USER-GUIDE.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -423,7 +423,7 @@ In addition to extension-specific environment variables (`SPECKIT_{EXT_ID}_*`),
423423
| Variable | Description | Default |
424424
|----------|-------------|---------|
425425
| `SPECKIT_CATALOG_URL` | Override the full catalog stack with a single URL (backward compat) | Built-in default stack |
426-
| `GH_TOKEN` / `GITHUB_TOKEN` | GitHub API token for downloads | None |
426+
| `GH_TOKEN` / `GITHUB_TOKEN` | GitHub token for authenticated requests to GitHub-hosted URLs (`raw.githubusercontent.com`, `github.com`, `api.github.com`). Required when your catalog JSON or extension ZIPs are hosted in a private GitHub repository. | None |
427427

428428
#### Example: Using a custom catalog for testing
429429

@@ -435,6 +435,21 @@ export SPECKIT_CATALOG_URL="http://localhost:8000/catalog.json"
435435
export SPECKIT_CATALOG_URL="https://example.com/staging/catalog.json"
436436
```
437437

438+
#### Example: Using a private GitHub-hosted catalog
439+
440+
```bash
441+
# Authenticate with a token (gh CLI, PAT, or GITHUB_TOKEN in CI)
442+
export GITHUB_TOKEN=$(gh auth token)
443+
444+
# Search a private catalog added via `specify extension catalog add`
445+
specify extension search jira
446+
447+
# Install from a private catalog
448+
specify extension add jira-sync
449+
```
450+
451+
The token is attached automatically to requests targeting GitHub domains. Non-GitHub catalog URLs are always fetched without credentials.
452+
438453
---
439454

440455
## Extension Catalogs

presets/README.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -93,9 +93,25 @@ See [scaffold/](scaffold/) for a scaffold you can copy to create your own preset
9393

9494
## Environment Variables
9595

96-
| Variable | Description |
97-
|----------|-------------|
98-
| `SPECKIT_PRESET_CATALOG_URL` | Override the catalog URL (replaces all defaults) |
96+
| Variable | Description | Default |
97+
|----------|-------------|---------|
98+
| `SPECKIT_PRESET_CATALOG_URL` | Override the full catalog stack with a single URL (replaces all defaults) | Built-in default stack |
99+
| `GH_TOKEN` / `GITHUB_TOKEN` | GitHub token for authenticated requests to GitHub-hosted URLs (`raw.githubusercontent.com`, `github.com`, `api.github.com`). Required when your catalog JSON or preset ZIPs are hosted in a private GitHub repository. | None |
100+
101+
#### Example: Using a private GitHub-hosted catalog
102+
103+
```bash
104+
# Authenticate with a token (gh CLI, PAT, or GITHUB_TOKEN in CI)
105+
export GITHUB_TOKEN=$(gh auth token)
106+
107+
# Search a private catalog added via `specify preset catalog add`
108+
specify preset search my-template
109+
110+
# Install from a private catalog
111+
specify preset add my-template
112+
```
113+
114+
The token is attached automatically to requests targeting GitHub domains. Non-GitHub catalog URLs are always fetched without credentials.
99115

100116
## Configuration Files
101117

src/specify_cli/_github_http.py

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
"""Shared GitHub-authenticated HTTP helpers.
2+
3+
Used by both ExtensionCatalog and PresetCatalog to attach
4+
GITHUB_TOKEN / GH_TOKEN credentials to requests targeting
5+
GitHub-hosted domains, while preventing token leakage to
6+
third-party hosts on redirects.
7+
"""
8+
9+
import os
10+
import urllib.request
11+
from urllib.parse import urlparse
12+
from typing import Dict
13+
14+
# GitHub-owned hostnames that should receive the Authorization header.
15+
# Includes codeload.github.com because GitHub archive URL downloads
16+
# (e.g. /archive/refs/tags/<tag>.zip) redirect there and require auth
17+
# for private repositories.
18+
GITHUB_HOSTS = frozenset({
19+
"raw.githubusercontent.com",
20+
"github.com",
21+
"api.github.com",
22+
"codeload.github.com",
23+
})
24+
25+
26+
def build_github_request(url: str) -> urllib.request.Request:
27+
"""Build a urllib Request, adding a GitHub auth header when available.
28+
29+
Reads GITHUB_TOKEN or GH_TOKEN from the environment and attaches an
30+
``Authorization: token <value>`` header when the target hostname is one
31+
of the known GitHub-owned domains. Non-GitHub URLs are returned as plain
32+
requests so credentials are never leaked to third-party hosts.
33+
"""
34+
headers: Dict[str, str] = {}
35+
github_token = (os.environ.get("GITHUB_TOKEN") or "").strip()
36+
gh_token = (os.environ.get("GH_TOKEN") or "").strip()
37+
token = github_token or gh_token or None
38+
hostname = (urlparse(url).hostname or "").lower()
39+
if token and hostname in GITHUB_HOSTS:
40+
headers["Authorization"] = f"token {token}"
41+
return urllib.request.Request(url, headers=headers)
42+
43+
44+
class _StripAuthOnRedirect(urllib.request.HTTPRedirectHandler):
45+
"""Redirect handler that drops the Authorization header when leaving GitHub.
46+
47+
Prevents token leakage to CDNs or other third-party hosts that GitHub
48+
may redirect to (e.g. S3 for release asset downloads, objects.githubusercontent.com).
49+
Auth is preserved as long as the redirect target remains within GITHUB_HOSTS.
50+
"""
51+
52+
def redirect_request(self, req, fp, code, msg, headers, newurl):
53+
new_req = super().redirect_request(req, fp, code, msg, headers, newurl)
54+
if new_req is not None:
55+
hostname = (urlparse(newurl).hostname or "").lower()
56+
if hostname not in GITHUB_HOSTS:
57+
new_req.headers.pop("Authorization", None)
58+
return new_req
59+
60+
61+
def open_github_url(url: str, timeout: int = 10):
62+
"""Open a URL with GitHub auth, stripping the header on cross-host redirects.
63+
64+
When the request carries an Authorization header, a custom redirect
65+
handler drops that header if the redirect target is not a GitHub-owned
66+
domain, preventing token leakage to CDNs or other third-party hosts
67+
that GitHub may redirect to (e.g. S3 for release asset downloads).
68+
"""
69+
req = build_github_request(url)
70+
71+
if not req.get_header("Authorization"):
72+
return urllib.request.urlopen(req, timeout=timeout)
73+
74+
opener = urllib.request.build_opener(_StripAuthOnRedirect)
75+
return opener.open(req, timeout=timeout)

src/specify_cli/extensions.py

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1539,6 +1539,22 @@ def _validate_catalog_url(self, url: str) -> None:
15391539
if not parsed.netloc:
15401540
raise ValidationError("Catalog URL must be a valid URL with a host.")
15411541

1542+
def _make_request(self, url: str):
1543+
"""Build a urllib Request, adding a GitHub auth header when available.
1544+
1545+
Delegates to :func:`specify_cli._github_http.build_github_request`.
1546+
"""
1547+
from specify_cli._github_http import build_github_request
1548+
return build_github_request(url)
1549+
1550+
def _open_url(self, url: str, timeout: int = 10):
1551+
"""Open a URL with GitHub auth, stripping the header on cross-host redirects.
1552+
1553+
Delegates to :func:`specify_cli._github_http.open_github_url`.
1554+
"""
1555+
from specify_cli._github_http import open_github_url
1556+
return open_github_url(url, timeout)
1557+
15421558
def _load_catalog_config(self, config_path: Path) -> Optional[List[CatalogEntry]]:
15431559
"""Load catalog stack configuration from a YAML file.
15441560
@@ -1695,7 +1711,6 @@ def _fetch_single_catalog(self, entry: CatalogEntry, force_refresh: bool = False
16951711
Raises:
16961712
ExtensionError: If catalog cannot be fetched or has invalid format
16971713
"""
1698-
import urllib.request
16991714
import urllib.error
17001715

17011716
# Determine cache file paths (backward compat for default catalog)
@@ -1729,7 +1744,7 @@ def _fetch_single_catalog(self, entry: CatalogEntry, force_refresh: bool = False
17291744

17301745
# Fetch from network
17311746
try:
1732-
with urllib.request.urlopen(entry.url, timeout=10) as response:
1747+
with self._open_url(entry.url, timeout=10) as response:
17331748
catalog_data = json.loads(response.read())
17341749

17351750
if "schema_version" not in catalog_data or "extensions" not in catalog_data:
@@ -1843,10 +1858,9 @@ def fetch_catalog(self, force_refresh: bool = False) -> Dict[str, Any]:
18431858
catalog_url = self.get_catalog_url()
18441859

18451860
try:
1846-
import urllib.request
18471861
import urllib.error
18481862

1849-
with urllib.request.urlopen(catalog_url, timeout=10) as response:
1863+
with self._open_url(catalog_url, timeout=10) as response:
18501864
catalog_data = json.loads(response.read())
18511865

18521866
# Validate catalog structure
@@ -1957,7 +1971,6 @@ def download_extension(self, extension_id: str, target_dir: Optional[Path] = Non
19571971
Raises:
19581972
ExtensionError: If extension not found or download fails
19591973
"""
1960-
import urllib.request
19611974
import urllib.error
19621975

19631976
# Get extension info from catalog
@@ -1997,7 +2010,7 @@ def download_extension(self, extension_id: str, target_dir: Optional[Path] = Non
19972010

19982011
# Download the ZIP file
19992012
try:
2000-
with urllib.request.urlopen(download_url, timeout=60) as response:
2013+
with self._open_url(download_url, timeout=60) as response:
20012014
zip_data = response.read()
20022015

20032016
zip_path.write_bytes(zip_data)

src/specify_cli/presets.py

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1542,6 +1542,22 @@ def _validate_catalog_url(self, url: str) -> None:
15421542
"Catalog URL must be a valid URL with a host."
15431543
)
15441544

1545+
def _make_request(self, url: str):
1546+
"""Build a urllib Request, adding a GitHub auth header when available.
1547+
1548+
Delegates to :func:`specify_cli._github_http.build_github_request`.
1549+
"""
1550+
from specify_cli._github_http import build_github_request
1551+
return build_github_request(url)
1552+
1553+
def _open_url(self, url: str, timeout: int = 10):
1554+
"""Open a URL with GitHub auth, stripping the header on cross-host redirects.
1555+
1556+
Delegates to :func:`specify_cli._github_http.open_github_url`.
1557+
"""
1558+
from specify_cli._github_http import open_github_url
1559+
return open_github_url(url, timeout)
1560+
15451561
def _load_catalog_config(self, config_path: Path) -> Optional[List[PresetCatalogEntry]]:
15461562
"""Load catalog stack configuration from a YAML file.
15471563
@@ -1724,10 +1740,7 @@ def _fetch_single_catalog(self, entry: PresetCatalogEntry, force_refresh: bool =
17241740
pass
17251741

17261742
try:
1727-
import urllib.request
1728-
import urllib.error
1729-
1730-
with urllib.request.urlopen(entry.url, timeout=10) as response:
1743+
with self._open_url(entry.url, timeout=10) as response:
17311744
catalog_data = json.loads(response.read())
17321745

17331746
if (
@@ -1820,10 +1833,7 @@ def fetch_catalog(self, force_refresh: bool = False) -> Dict[str, Any]:
18201833
pass
18211834

18221835
try:
1823-
import urllib.request
1824-
import urllib.error
1825-
1826-
with urllib.request.urlopen(catalog_url, timeout=10) as response:
1836+
with self._open_url(catalog_url, timeout=10) as response:
18271837
catalog_data = json.loads(response.read())
18281838

18291839
if (
@@ -1942,7 +1952,6 @@ def download_pack(
19421952
Raises:
19431953
PresetError: If pack not found or download fails
19441954
"""
1945-
import urllib.request
19461955
import urllib.error
19471956

19481957
pack_info = self.get_pack_info(pack_id)
@@ -1994,7 +2003,7 @@ def download_pack(
19942003
zip_path = target_dir / zip_filename
19952004

19962005
try:
1997-
with urllib.request.urlopen(download_url, timeout=60) as response:
2006+
with self._open_url(download_url, timeout=60) as response:
19982007
zip_data = response.read()
19992008

20002009
zip_path.write_bytes(zip_data)

0 commit comments

Comments
 (0)