fromupstream: add support for non kernel.org archives The 'also from' messages are not correct in cases where lore.kernel.org doesn't have an appropriate archive. This is the case for hostap project: http://lists.infradead.org/pipermail/hostap/ https://patchwork.ozlabs.org/project/hostap/list/ It's not strictly a kernel project, so it probably doesn't meet the kernel.org requirements: https://korg.wiki.kernel.org/userdoc/lore It is, however, available via the https://marc.info/?i=${MESSAGE_ID} redirector. Let's probe for non-404 responses, and iterate through a few different archive sources. Note that public-inbox.org currently gives a nice list of possible other redirects, so even if nothing is directly found, let's still encode the MessageId in a public-inbox URL, with the hope that maybe it can still be useful in finding a good archive. Note that py3 upgrades (https://crrev.com/c/1963397) overlooked the fact that the new urllib raises HTTPError for non-200 erro codes, so the existing 'opener.get_code() != 200' check is obviated. Factor out a _try_urlopen() to make this a little nicer to use. BUG=none TEST=`fromupstream.py -b= -t= pw://hostap/1255841`, where hostap points at ozlabs; URL output: (am from https://patchwork.ozlabs.org/patch/1255841/) (also found at https://marc.info/?i=20200316211106.131858-1-matthewmwang@chromium.org) Change-Id: I00b0ed3c52580cfc038b1ba3e9ee3d2cb62d16a1 Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/dev-util/+/2108789 Tested-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Commit-Queue: Brian Norris <briannorris@chromium.org>

commit: 8553f031e8311e8be5503653cee8623f337b485d [log] [tgz]
author: Brian Norris <briannorris@chromium.org> Wed Mar 18 11:59:02 2020 -0700
committer: Commit Bot <commit-bot@chromium.org> Thu Mar 19 02:39:19 2020 +0000
tree: 383588709a5a05584d9a9bb59d742d6f695989ed
parent: 6baeb2ed1eab2775656180382f565642b0d44016 [diff] [blame]
diff --git a/contrib/fromupstream.py b/contrib/fromupstream.py
index a0ed1ac..db2875f 100755
--- a/contrib/fromupstream.py
+++ b/contrib/fromupstream.py

@@ -171,10 +171,10 @@
     if args['tag'] is None:
         args['tag'] = 'FROMLIST: '
 
-    opener = urllib.request.urlopen('%s/patch/%d/mbox' % (url, patch_id))
-    if opener.getcode() != 200:
-        errprint('Error: could not download patch - error code %d'
-                 % opener.getcode())
+    try:
+        opener = urllib.request.urlopen('%s/patch/%d/mbox' % (url, patch_id))
+    except urllib.error.HTTPError as e:
+        errprint('Error: could not download patch: %s' % e)
         sys.exit(1)
     patch_contents = opener.read()
 
@@ -186,8 +186,37 @@
     message_id = re.sub('^<|>$', '', message_id.strip())
     if args['source_line'] is None:
         args['source_line'] = '(am from %s/patch/%d/)' % (url, patch_id)
-        args['source_line'] += (
-            '\n(also found at https://lkml.kernel.org/r/%s)' % message_id)
+        for url_template in [
+            'https://lkml.kernel.org/r/%s',
+            # hostap project (and others) are here, but not kernel.org.
+            'https://marc.info/?i=%s',
+            # public-inbox comes last as a "default"; it has a nice error page
+            # pointing to other redirectors, even if it doesn't have what
+            # you're looking for directly.
+            'https://public-inbox.org/git/%s',
+        ]:
+            alt_url = url_template % message_id
+            if args['debug']:
+                print('Probing archive for message at: %s' % alt_url)
+            try:
+                urllib.request.urlopen(alt_url)
+            except urllib.error.HTTPError as e:
+                # Skip all HTTP errors. We can expect 404 for archives that
+                # don't have this MessageId, or 300 for public-inbox ("not
+                # found, but try these other redirects"). It's less clear what
+                # to do with transitory (or is it permanent?) server failures.
+                if args['debug']:
+                    print('Skipping URL %s, error: %s' % (alt_url, e))
+                continue
+            # Success!
+            if args['debug']:
+                print('Found at %s' % alt_url)
+            break
+        else:
+            errprint(
+                "WARNING: couldn't find working MessageId URL; "
+                'defaulting to "%s"' % alt_url)
+        args['source_line'] += '\n(also found at %s)' % alt_url
 
     # Auto-snarf the Change-Id if it was encoded into the Message-Id.
     mo = re.match(r'.*(I[a-f0-9]{40})@changeid$', message_id)
commit	8553f031e8311e8be5503653cee8623f337b485d	[log] [tgz]
author	Brian Norris <briannorris@chromium.org>	Wed Mar 18 11:59:02 2020 -0700
committer	Commit Bot <commit-bot@chromium.org>	Thu Mar 19 02:39:19 2020 +0000
tree	383588709a5a05584d9a9bb59d742d6f695989ed
parent	6baeb2ed1eab2775656180382f565642b0d44016 [diff] [blame]