fromupstream: add support for non kernel.org archives
The 'also from' messages are not correct in cases where lore.kernel.org
doesn't have an appropriate archive. This is the case for hostap
project:
http://lists.infradead.org/pipermail/hostap/
https://patchwork.ozlabs.org/project/hostap/list/
It's not strictly a kernel project, so it probably doesn't meet the
kernel.org requirements:
https://korg.wiki.kernel.org/userdoc/lore
It is, however, available via the https://marc.info/?i=${MESSAGE_ID}
redirector.
Let's probe for non-404 responses, and iterate through a few different
archive sources. Note that public-inbox.org currently gives a nice list
of possible other redirects, so even if nothing is directly found, let's
still encode the MessageId in a public-inbox URL, with the hope that
maybe it can still be useful in finding a good archive.
Note that py3 upgrades (https://crrev.com/c/1963397) overlooked the fact
that the new urllib raises HTTPError for non-200 erro codes, so the
existing 'opener.get_code() != 200' check is obviated. Factor out a
_try_urlopen() to make this a little nicer to use.
BUG=none
TEST=`fromupstream.py -b= -t= pw://hostap/1255841`, where hostap points
at ozlabs; URL output:
(am from https://patchwork.ozlabs.org/patch/1255841/)
(also found at https://marc.info/?i=20200316211106.131858-1-matthewmwang@chromium.org)
Change-Id: I00b0ed3c52580cfc038b1ba3e9ee3d2cb62d16a1
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/dev-util/+/2108789
Tested-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Brian Norris <briannorris@chromium.org>
diff --git a/contrib/fromupstream.py b/contrib/fromupstream.py
index a0ed1ac..db2875f 100755
--- a/contrib/fromupstream.py
+++ b/contrib/fromupstream.py
@@ -171,10 +171,10 @@
if args['tag'] is None:
args['tag'] = 'FROMLIST: '
- opener = urllib.request.urlopen('%s/patch/%d/mbox' % (url, patch_id))
- if opener.getcode() != 200:
- errprint('Error: could not download patch - error code %d'
- % opener.getcode())
+ try:
+ opener = urllib.request.urlopen('%s/patch/%d/mbox' % (url, patch_id))
+ except urllib.error.HTTPError as e:
+ errprint('Error: could not download patch: %s' % e)
sys.exit(1)
patch_contents = opener.read()
@@ -186,8 +186,37 @@
message_id = re.sub('^<|>$', '', message_id.strip())
if args['source_line'] is None:
args['source_line'] = '(am from %s/patch/%d/)' % (url, patch_id)
- args['source_line'] += (
- '\n(also found at https://lkml.kernel.org/r/%s)' % message_id)
+ for url_template in [
+ 'https://lkml.kernel.org/r/%s',
+ # hostap project (and others) are here, but not kernel.org.
+ 'https://marc.info/?i=%s',
+ # public-inbox comes last as a "default"; it has a nice error page
+ # pointing to other redirectors, even if it doesn't have what
+ # you're looking for directly.
+ 'https://public-inbox.org/git/%s',
+ ]:
+ alt_url = url_template % message_id
+ if args['debug']:
+ print('Probing archive for message at: %s' % alt_url)
+ try:
+ urllib.request.urlopen(alt_url)
+ except urllib.error.HTTPError as e:
+ # Skip all HTTP errors. We can expect 404 for archives that
+ # don't have this MessageId, or 300 for public-inbox ("not
+ # found, but try these other redirects"). It's less clear what
+ # to do with transitory (or is it permanent?) server failures.
+ if args['debug']:
+ print('Skipping URL %s, error: %s' % (alt_url, e))
+ continue
+ # Success!
+ if args['debug']:
+ print('Found at %s' % alt_url)
+ break
+ else:
+ errprint(
+ "WARNING: couldn't find working MessageId URL; "
+ 'defaulting to "%s"' % alt_url)
+ args['source_line'] += '\n(also found at %s)' % alt_url
# Auto-snarf the Change-Id if it was encoded into the Message-Id.
mo = re.match(r'.*(I[a-f0-9]{40})@changeid$', message_id)