Fix outbound URL parameters in links.
The initial versions of the Site content export through
away any query parameters from any links.
This was done because URLs on the Site actually use query
parameters, but in the actual original Sites content there
were unnecessary queriers all over the place.
*However*, for URLs that didn't point to Sites pages, the
query strings kinda matter :(.
This CL puts them back.
Bug: 1270890
Change-Id: I66e729ac2d40475c289500ed9077923d5cc909f5
Reviewed-on: https://chromium-review.googlesource.com/c/website/+/3286726
Commit-Queue: Dirk Pranke <dpranke@google.com>
Reviewed-by: Struan Shrimpton <sshrimp@google.com>
diff --git a/scripts/export.py b/scripts/export.py
index d6f8894..45cacea 100755
--- a/scripts/export.py
+++ b/scripts/export.py
@@ -43,6 +43,7 @@
import traceback
import xml.etree.ElementTree as ET
+from urllib.parse import urlparse
from urllib.request import urlopen
from urllib.error import HTTPError, URLError
@@ -270,7 +271,9 @@
if href.startswith('/_/rsrc'):
href = '/' + '/'.join(href.split('/')[4:])
- if '?' in href:
+
+ url = urlparse(href)
+ if '?' in href and url.netloc == '':
href = href[0:href.index('?')]
if 'Screenshot' in href:
head, tail = href.split('Screenshot')