fromupstream: add support for non kernel.org archives

The 'also from' messages are not correct in cases where lore.kernel.org
doesn't have an appropriate archive. This is the case for hostap
project:

http://lists.infradead.org/pipermail/hostap/
https://patchwork.ozlabs.org/project/hostap/list/

It's not strictly a kernel project, so it probably doesn't meet the
kernel.org requirements:

https://korg.wiki.kernel.org/userdoc/lore

It is, however, available via the https://marc.info/?i=${MESSAGE_ID}
redirector.

Let's probe for non-404 responses, and iterate through a few different
archive sources. Note that public-inbox.org currently gives a nice list
of possible other redirects, so even if nothing is directly found, let's
still encode the MessageId in a public-inbox URL, with the hope that
maybe it can still be useful in finding a good archive.

Note that py3 upgrades (https://crrev.com/c/1963397) overlooked the fact
that the new urllib raises HTTPError for non-200 erro codes, so the
existing 'opener.get_code() != 200' check is obviated. Factor out a
_try_urlopen() to make this a little nicer to use.

BUG=none
TEST=`fromupstream.py -b= -t= pw://hostap/1255841`, where hostap points
     at ozlabs; URL output:
       (am from https://patchwork.ozlabs.org/patch/1255841/)
       (also found at https://marc.info/?i=20200316211106.131858-1-matthewmwang@chromium.org)

Change-Id: I00b0ed3c52580cfc038b1ba3e9ee3d2cb62d16a1
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/dev-util/+/2108789
Tested-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Brian Norris <briannorris@chromium.org>
diff --git a/contrib/fromupstream.py b/contrib/fromupstream.py
index a0ed1ac..db2875f 100755
--- a/contrib/fromupstream.py
+++ b/contrib/fromupstream.py
@@ -171,10 +171,10 @@
     if args['tag'] is None:
         args['tag'] = 'FROMLIST: '
 
-    opener = urllib.request.urlopen('%s/patch/%d/mbox' % (url, patch_id))
-    if opener.getcode() != 200:
-        errprint('Error: could not download patch - error code %d'
-                 % opener.getcode())
+    try:
+        opener = urllib.request.urlopen('%s/patch/%d/mbox' % (url, patch_id))
+    except urllib.error.HTTPError as e:
+        errprint('Error: could not download patch: %s' % e)
         sys.exit(1)
     patch_contents = opener.read()
 
@@ -186,8 +186,37 @@
     message_id = re.sub('^<|>$', '', message_id.strip())
     if args['source_line'] is None:
         args['source_line'] = '(am from %s/patch/%d/)' % (url, patch_id)
-        args['source_line'] += (
-            '\n(also found at https://lkml.kernel.org/r/%s)' % message_id)
+        for url_template in [
+            'https://lkml.kernel.org/r/%s',
+            # hostap project (and others) are here, but not kernel.org.
+            'https://marc.info/?i=%s',
+            # public-inbox comes last as a "default"; it has a nice error page
+            # pointing to other redirectors, even if it doesn't have what
+            # you're looking for directly.
+            'https://public-inbox.org/git/%s',
+        ]:
+            alt_url = url_template % message_id
+            if args['debug']:
+                print('Probing archive for message at: %s' % alt_url)
+            try:
+                urllib.request.urlopen(alt_url)
+            except urllib.error.HTTPError as e:
+                # Skip all HTTP errors. We can expect 404 for archives that
+                # don't have this MessageId, or 300 for public-inbox ("not
+                # found, but try these other redirects"). It's less clear what
+                # to do with transitory (or is it permanent?) server failures.
+                if args['debug']:
+                    print('Skipping URL %s, error: %s' % (alt_url, e))
+                continue
+            # Success!
+            if args['debug']:
+                print('Found at %s' % alt_url)
+            break
+        else:
+            errprint(
+                "WARNING: couldn't find working MessageId URL; "
+                'defaulting to "%s"' % alt_url)
+        args['source_line'] += '\n(also found at %s)' % alt_url
 
     # Auto-snarf the Change-Id if it was encoded into the Message-Id.
     mo = re.match(r'.*(I[a-f0-9]{40})@changeid$', message_id)