blob: d505124a976bcaff3feb3ea447df82897ab76f73 [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
17/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110018jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao168f60a2020-07-14 13:19:33 +100019(RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin and writes CBOR
20or canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110021
Nigel Taod60815c2020-03-26 14:32:35 +110022See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110023
24----
25
26JSON Pointer (and this program's implementation) is one of many JSON query
27languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
28simple and fewer-featured compared to those others.
29
Nigel Tao168f60a2020-07-14 13:19:33 +100030One benefit of simplicity is that this program's CBOR, JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110031implementations do not dynamically allocate or free memory (yet it does not
32require that the entire input fits in memory at once). They are therefore
33trivially protected against certain bug classes: memory leaks, double-frees and
34use-after-frees.
35
Nigel Tao168f60a2020-07-14 13:19:33 +100036The CBOR and JSON implementations are also written in the Wuffs programming
37language (and then transpiled to C/C++), which is memory-safe (e.g. array
38indexing is bounds-checked) but also prevents integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110039
Nigel Taofe0cbbd2020-03-05 22:01:30 +110040For defense in depth, on Linux, this program also self-imposes a
41SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
42or writing its output. Under this sandbox, the only permitted system calls are
43read, write, exit and sigreturn.
44
Nigel Tao168f60a2020-07-14 13:19:33 +100045All together, this program aims to safely handle untrusted CBOR or JSON files
46without fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110047
48----
Nigel Tao1b073492020-02-16 22:11:36 +110049
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110050As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
51JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
52"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
53was first published on 2016-10-26 and updated on 2018-03-30.
54
Nigel Tao0cd2f982020-03-03 23:03:02 +110055After modifying this program, run "build-example.sh example/jsonptr/" and then
56"script/run-json-test-suite.sh" to catch correctness regressions.
57
58----
59
Nigel Taod0b16cb2020-03-14 10:15:54 +110060This program uses Wuffs' JSON decoder at a relatively low level, processing the
61decoder's token-stream output individually. The core loop, in pseudo-code, is
62"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110063changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110064output text based on that state and the token's source text. Notably,
65handle_token is not recursive, even though JSON values can nest.
66
67This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
68string, object) comprises one or more JSON tokens.
69
70An alternative, higher-level approach is in the sibling example/jsonfindptrs
71program. Neither approach is better or worse per se, but when studying this
72program, be aware that there are multiple ways to use Wuffs' JSON decoder.
73
74The two programs, jsonfindptrs and jsonptr, also demonstrate different
75trade-offs with regard to JSON object duplicate keys. The JSON spec permits
76different implementations to allow or reject duplicate keys. It is not always
77clear which approach is safer. Rejecting them is certainly unambiguous, and
78security bugs can lurk in ambiguous corners of a file format, if two different
79implementations both silently accept a file but differ on how to interpret it.
80On the other hand, in the worst case, detecting duplicate keys requires O(N)
81memory, where N is the size of the (potentially untrusted) input.
82
83This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
84mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
85it runs in a SECCOMP_MODE_STRICT sandbox.
86
87----
88
Nigel Tao1b073492020-02-16 22:11:36 +110089This example program differs from most other example Wuffs programs in that it
90is written in C++, not C.
91
92$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
93
94for a C++ compiler $CXX, such as clang++ or g++.
95*/
96
Nigel Tao721190a2020-04-03 22:25:21 +110097#if defined(__cplusplus) && (__cplusplus < 201103L)
98#error "This C++ program requires -std=c++11 or later"
99#endif
100
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100101#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100102#include <fcntl.h>
103#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100104#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100105#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100106
107// Wuffs ships as a "single file C library" or "header file library" as per
108// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
109//
110// To use that single file as a "foo.c"-like implementation, instead of a
111// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
112// compiling it.
113#define WUFFS_IMPLEMENTATION
114
115// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
116// release/c/etc.c whitelist which parts of Wuffs to build. That file contains
117// the entire Wuffs standard library, implementing a variety of codecs and file
118// formats. Without this macro definition, an optimizing compiler or linker may
119// very well discard Wuffs code for unused codecs, but listing the Wuffs
120// modules we use makes that process explicit. Preprocessing means that such
121// code simply isn't compiled.
122#define WUFFS_CONFIG__MODULES
123#define WUFFS_CONFIG__MODULE__BASE
Nigel Tao4e193592020-07-15 12:48:57 +1000124#define WUFFS_CONFIG__MODULE__CBOR
Nigel Tao1b073492020-02-16 22:11:36 +1100125#define WUFFS_CONFIG__MODULE__JSON
126
127// If building this program in an environment that doesn't easily accommodate
128// relative includes, you can use the script/inline-c-relative-includes.go
129// program to generate a stand-alone C++ file.
130#include "../../release/c/wuffs-unsupported-snapshot.c"
131
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100132#if defined(__linux__)
133#include <linux/prctl.h>
134#include <linux/seccomp.h>
135#include <sys/prctl.h>
136#include <sys/syscall.h>
137#define WUFFS_EXAMPLE_USE_SECCOMP
138#endif
139
Nigel Tao2cf76db2020-02-27 22:42:01 +1100140#define TRY(error_msg) \
141 do { \
142 const char* z = error_msg; \
143 if (z) { \
144 return z; \
145 } \
146 } while (false)
147
Nigel Taod60815c2020-03-26 14:32:35 +1100148static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100149
Nigel Taod60815c2020-03-26 14:32:35 +1100150static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100151 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100152 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100153 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100154 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100155 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao4e193592020-07-15 12:48:57 +1000156 " -i=FMT -input-format={json,cbor}\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000157 " -o=FMT -output-format={json,cbor}\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100158 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000159 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100160 " -t -tabs\n"
161 " -fail-if-unsandboxed\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000162 " -input-allow-json-comments\n"
163 " -input-allow-json-extra-comma\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000164 " -input-allow-json-inf-nan-numbers\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000165 " -output-cbor-metadata-as-json-comments\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000166 " -output-json-extra-comma\n"
Nigel Taodd114692020-07-25 21:54:12 +1000167 " -output-json-inf-nan-numbers\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000168 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100169 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100170 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100171 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100172 "----\n"
173 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100174 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000175 "Pointer (RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin\n"
176 "and writes CBOR or canonicalized, formatted UTF-8 JSON to stdout. The\n"
177 "input and output formats do not have to match, but conversion between\n"
178 "formats may be lossy.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100179 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000180 "Canonicalized JSON means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-\n"
181 "written as \"abc\\n\\txÅ·z\". It does not sort object keys or reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100182 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100183 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000184 "CBOR output is non-canonical (in the RFC 7049 Section 3.9 sense), as\n"
185 "sorting map keys and measuring indefinite-length containers requires\n"
186 "O(input_length) memory but this program runs in O(1) memory.\n"
187 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100188 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000189 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000190 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags. Those\n"
191 "flags only apply to JSON (not CBOR) output.\n"
192 "\n"
193 "The -input-format and -output-format flags select between reading and\n"
194 "writing JSON (the default, a textual format) or CBOR (a binary format).\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100195 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000196 "The -input-allow-json-comments flag allows \"/*slash-star*/\" and\n"
197 "\"//slash-slash\" C-style comments within JSON input.\n"
198 "\n"
199 "The -input-allow-json-extra-comma flag allows input like \"[1,2,]\",\n"
200 "with a comma after the final element of a JSON list or dictionary.\n"
201 "\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000202 "The -input-allow-json-inf-nan-numbers flag allows non-finite floating\n"
203 "point numbers (infinities and not-a-numbers) within JSON input.\n"
204 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000205 "The -output-cbor-metadata-as-json-comments writes CBOR tags and other\n"
206 "metadata as /*comments*/, when -i=json and -o=cbor are also set. Such\n"
207 "comments are non-compliant with the JSON specification but many parsers\n"
208 "accept them.\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000209 "\n"
210 "The -output-json-extra-comma flag writes extra commas, regardless of\n"
Nigel Taodd114692020-07-25 21:54:12 +1000211 "whether the input had it. Such commas are non-compliant with the JSON\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000212 "specification but many parsers accept them and they can produce simpler\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000213 "line-based diffs. This flag is ignored when -compact-output is set.\n"
214 "\n"
Nigel Taodd114692020-07-25 21:54:12 +1000215 "The -output-json-inf-nan-numbers flag writes Inf and NaN instead of a\n"
216 "substitute null value, when converting from -i=cbor to -o=json. Such\n"
217 "values are non-compliant with the JSON specification but many parsers\n"
218 "accept them.\n"
219 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000220 "When converting from -i=cbor to -o=json, CBOR permits map keys other\n"
221 "than (untagged) UTF-8 strings but JSON does not. This program rejects\n"
222 "such input, as doing otherwise has complicated interactions with the\n"
223 "-query=STR flag and streaming input.\n"
224 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100225 "----\n"
226 "\n"
227 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100228 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100229 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
230 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100231 "will print:\n"
232 " \"baz\"\n"
233 "\n"
234 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100235 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100236 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
237 "child (the value in a key-value pair) of the root whose key is the empty\n"
238 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100239 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000240 "If the query found a valid JSON|CBOR value, this program will return a\n"
241 "zero exit code even if the rest of the input isn't valid. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100242 "did not find a value, or found an invalid one, this program returns a\n"
243 "non-zero exit code, but may still print partial output to stdout.\n"
244 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000245 "The JSON and CBOR specifications (https://json.org/ or RFC 8259; RFC\n"
246 "7049) permit implementations to allow duplicate keys, as this one does.\n"
247 "This JSON Pointer implementation is also greedy, following the first\n"
248 "match for each fragment without back-tracking. For example, the\n"
249 "\"/foo/bar\" query will fail if the root object has multiple \"foo\"\n"
250 "children but the first one doesn't have a \"bar\" child, even if later\n"
251 "ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100252 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000253 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
254 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
255 "\"~\" and \"/\". Without this flag, this program also lets \"~n\" and\n"
256 "\"~r\" escape the New Line and Carriage Return ASCII control characters,\n"
257 "which can work better with line oriented Unix tools that assume exactly\n"
258 "one value (i.e. one JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100259 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100260 "----\n"
261 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100262 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000263 "output depth. JSON|CBOR containers ([] arrays and {} objects) can hold\n"
264 "other containers. When this flag is set, containers at depth NUM are\n"
265 "replaced with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is\n"
266 "equivalent to -d=1. The flag's absence means an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100267 "\n"
268 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000269 "affect whether or not the input is considered valid JSON|CBOR. The\n"
270 "format specifications permit implementations to set their own maximum\n"
271 "input depth. This JSON|CBOR implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100272 "\n"
273 "Depth is measured in terms of nested containers. It is unaffected by the\n"
274 "number of spaces or tabs used to indent.\n"
275 "\n"
276 "When both -max-output-depth and -query are set, the output depth is\n"
277 "measured from when the query resolves, not from the input root. The\n"
278 "input depth (measured from the root) is still limited to 1024.\n"
279 "\n"
280 "----\n"
281 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100282 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
283 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100284 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100285
Nigel Tao2cf76db2020-02-27 22:42:01 +1100286// ----
287
Nigel Taof3146c22020-03-26 08:47:42 +1100288// Wuffs allows either statically or dynamically allocated work buffers. This
289// program exercises static allocation.
290#define WORK_BUFFER_ARRAY_SIZE \
291 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
292#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100293uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100294#else
295// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100296uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100297#endif
298
Nigel Taod60815c2020-03-26 14:32:35 +1100299bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100300
Nigel Taod60815c2020-03-26 14:32:35 +1100301int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100302
Nigel Tao2cf76db2020-02-27 22:42:01 +1100303#define MAX_INDENT 8
Nigel Tao107f0ef2020-03-01 21:35:02 +1100304#define INDENT_SPACES_STRING " "
Nigel Tao6e7d1412020-03-06 09:21:35 +1100305#define INDENT_TAB_STRING "\t"
Nigel Tao107f0ef2020-03-01 21:35:02 +1100306
Nigel Taofdac24a2020-03-06 21:53:08 +1100307#ifndef DST_BUFFER_ARRAY_SIZE
308#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100309#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100310#ifndef SRC_BUFFER_ARRAY_SIZE
311#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100312#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100313#ifndef TOKEN_BUFFER_ARRAY_SIZE
314#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100315#endif
316
Nigel Taod60815c2020-03-26 14:32:35 +1100317uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
318uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
319wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100320
Nigel Taod60815c2020-03-26 14:32:35 +1100321wuffs_base__io_buffer g_dst;
322wuffs_base__io_buffer g_src;
323wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100324
Nigel Taod60815c2020-03-26 14:32:35 +1100325// g_curr_token_end_src_index is the g_src.data.ptr index of the end of the
326// current token. An invariant is that (g_curr_token_end_src_index <=
327// g_src.meta.ri).
328size_t g_curr_token_end_src_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100329
Nigel Tao27168032020-07-24 13:05:05 +1000330// Valid token's VBCs range in 0 ..= 15. Values over that are for tokens from
331// outside of the base package, such as the CBOR package.
332#define CATEGORY_CBOR_TAG 16
333
Nigel Tao850dc182020-07-21 22:52:04 +1000334struct {
335 uint64_t category;
336 uint64_t detail;
337} g_token_extension;
338
Nigel Taod60815c2020-03-26 14:32:35 +1100339uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100340
341enum class context {
342 none,
343 in_list_after_bracket,
344 in_list_after_value,
345 in_dict_after_brace,
346 in_dict_after_key,
347 in_dict_after_value,
Nigel Taod60815c2020-03-26 14:32:35 +1100348} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100349
Nigel Tao0cd2f982020-03-03 23:03:02 +1100350bool //
351in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100352 return (g_ctx == context::in_dict_after_brace) ||
353 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100354}
355
Nigel Taod60815c2020-03-26 14:32:35 +1100356uint32_t g_suppress_write_dst;
357bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100358
Nigel Tao4e193592020-07-15 12:48:57 +1000359wuffs_cbor__decoder g_cbor_decoder;
360wuffs_json__decoder g_json_decoder;
361wuffs_base__token_decoder* g_dec;
Nigel Tao1b073492020-02-16 22:11:36 +1100362
Nigel Taoea532452020-07-27 00:03:00 +1000363// g_spool_array is a 4 KiB buffer.
Nigel Tao168f60a2020-07-14 13:19:33 +1000364//
Nigel Taoea532452020-07-27 00:03:00 +1000365// For -o=cbor, strings up to SPOOL_ARRAY_SIZE long are written as a single
366// definite-length string. Longer strings are written as an indefinite-length
367// string containing multiple definite-length chunks, each of length up to
368// SPOOL_ARRAY_SIZE. See RFC 7049 section 2.2.2 "Indefinite-Length Byte Strings
369// and Text Strings". Byte strings and text strings are spooled prior to this
370// chunking, so that the output is determinate even when the input is streamed.
371//
372// For -o=json, CBOR byte strings are spooled prior to base64url encoding,
373// which map multiples of 3 source bytes to 4 destination bytes.
374//
375// If raising SPOOL_ARRAY_SIZE above 0xFFFF then you will also have to update
376// flush_cbor_output_string.
377#define SPOOL_ARRAY_SIZE 4096
378uint8_t g_spool_array[SPOOL_ARRAY_SIZE];
Nigel Tao168f60a2020-07-14 13:19:33 +1000379
380uint32_t g_cbor_output_string_length;
381bool g_cbor_output_string_is_multiple_chunks;
382bool g_cbor_output_string_is_utf_8;
383
Nigel Taoea532452020-07-27 00:03:00 +1000384uint32_t g_json_output_byte_string_length;
385
Nigel Tao0cd2f982020-03-03 23:03:02 +1100386// ----
387
388// Query is a JSON Pointer query. After initializing with a NUL-terminated C
389// string, its multiple fragments are consumed as the program walks the JSON
390// data from stdin. For example, letting "$" denote a NUL, suppose that we
391// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100392// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100393//
394// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
395// / a p p l e / b a n a n a / 1 2 / d u r i a n $
396// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
397// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100398// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100399//
Nigel Taob48ee752020-03-13 09:27:33 +1100400// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
401// start (inclusive) and end (exclusive) of the query fragment. They satisfy
402// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
403// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100404//
Nigel Taob48ee752020-03-13 09:27:33 +1100405// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
406// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100407//
408// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
409// tokens, as backslash-escaped values within that JSON string may each get
410// their own token.
411//
Nigel Taob48ee752020-03-13 09:27:33 +1100412// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100413//
Nigel Taob48ee752020-03-13 09:27:33 +1100414// While mfj remains non-nullptr, each token's unescaped contents are then
415// compared to that part of the fragment from mfj to mfk. If it is a prefix
416// (including the case of an exact match), then mfj is advanced by the
417// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100418//
419// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
420// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100421// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
422// responsible for calling Query::validate (with a strict_json_pointer_syntax
423// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100424//
Nigel Taob48ee752020-03-13 09:27:33 +1100425// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
426// incrementally match the object key with the query fragment. For example, if
427// we have already matched the "ban" of "banana", then we would accept any of
428// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
429// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100430//
Nigel Taob48ee752020-03-13 09:27:33 +1100431// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100432// v
433// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
434// / a p p l e / b a n a n a / 1 2 / d u r i a n $
435// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
436// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100437// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100438//
439// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100440// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
441// have a fragment match: the query fragment equals the object key. If there is
442// a next fragment (in this example, "12") we move the frag_etc pointers to its
443// start and end and increment Query::m_depth. Otherwise, we have matched the
444// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100445//
446// The discussion above centers on object keys. If the query fragment is
447// numeric then it can also match as an array index: the string fragment "12"
448// will match an array's 13th element (starting counting from zero). See RFC
449// 6901 for its precise definition of an "array index" number.
450//
Nigel Taob48ee752020-03-13 09:27:33 +1100451// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100452// whose type (wuffs_base__result_u64) is a result type. An error result means
453// that the fragment is not an array index. A value result holds the number of
454// list elements remaining. When matching a query fragment in an array (instead
455// of in an object), each element ticks this number down towards zero. At zero,
456// the upcoming JSON value is the one that matches the query fragment.
457class Query {
458 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100459 uint8_t* m_frag_i;
460 uint8_t* m_frag_j;
461 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100462
Nigel Taob48ee752020-03-13 09:27:33 +1100463 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100464
Nigel Taob48ee752020-03-13 09:27:33 +1100465 wuffs_base__result_u64 m_array_index;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100466
467 public:
468 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100469 m_frag_i = (uint8_t*)query_c_string;
470 m_frag_j = (uint8_t*)query_c_string;
471 m_frag_k = (uint8_t*)query_c_string;
472 m_depth = 0;
473 m_array_index.status.repr = "#main: not an array index query fragment";
474 m_array_index.value = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100475 }
476
Nigel Taob48ee752020-03-13 09:27:33 +1100477 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100478
Nigel Taob48ee752020-03-13 09:27:33 +1100479 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100480
481 // tick returns whether the fragment is a valid array index whose value is
482 // zero. If valid but non-zero, it decrements it and returns false.
483 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100484 if (m_array_index.status.is_ok()) {
485 if (m_array_index.value == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100486 return true;
487 }
Nigel Taob48ee752020-03-13 09:27:33 +1100488 m_array_index.value--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100489 }
490 return false;
491 }
492
493 // next_fragment moves to the next fragment, returning whether it existed.
494 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100495 uint8_t* k = m_frag_k;
496 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100497
498 this->reset(nullptr);
499
500 if (!k || (*k != '/')) {
501 return false;
502 }
503 k++;
504
505 bool all_digits = true;
506 uint8_t* i = k;
507 while ((*k != '\x00') && (*k != '/')) {
508 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
509 k++;
510 }
Nigel Taob48ee752020-03-13 09:27:33 +1100511 m_frag_i = i;
512 m_frag_j = i;
513 m_frag_k = k;
514 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100515 if (all_digits) {
516 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000517 m_array_index = wuffs_base__parse_number_u64(
518 wuffs_base__make_slice_u8(i, k - i),
519 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100520 }
521 return true;
522 }
523
Nigel Taob48ee752020-03-13 09:27:33 +1100524 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100525
Nigel Taob48ee752020-03-13 09:27:33 +1100526 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100527
528 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100529 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100530 return;
531 }
Nigel Taob48ee752020-03-13 09:27:33 +1100532 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100533 while (true) {
534 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100535 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100536 return;
537 }
538
539 if (*j == '\x00') {
540 break;
541
542 } else if (*j == '~') {
543 j++;
544 if (*j == '0') {
545 if (*ptr != '~') {
546 break;
547 }
548 } else if (*j == '1') {
549 if (*ptr != '/') {
550 break;
551 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100552 } else if (*j == 'n') {
553 if (*ptr != '\n') {
554 break;
555 }
556 } else if (*j == 'r') {
557 if (*ptr != '\r') {
558 break;
559 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100560 } else {
561 break;
562 }
563
564 } else if (*j != *ptr) {
565 break;
566 }
567
568 j++;
569 ptr++;
570 len--;
571 }
Nigel Taob48ee752020-03-13 09:27:33 +1100572 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100573 }
574
575 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100576 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100577 return;
578 }
579 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
580 size_t n = wuffs_base__utf_8__encode(
581 wuffs_base__make_slice_u8(&u[0],
582 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
583 code_point);
584 if (n > 0) {
585 this->incremental_match_slice(&u[0], n);
586 }
587 }
588
589 // validate returns whether the (ptr, len) arguments form a valid JSON
590 // Pointer. In particular, it must be valid UTF-8, and either be empty or
591 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100592 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
593 // followed by either 'n' or 'r'.
594 static bool validate(char* query_c_string,
595 size_t length,
596 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100597 if (length <= 0) {
598 return true;
599 }
600 if (query_c_string[0] != '/') {
601 return false;
602 }
603 wuffs_base__slice_u8 s =
604 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
605 bool previous_was_tilde = false;
606 while (s.len > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +1000607 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s.ptr, s.len);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100608 if (!o.is_valid()) {
609 return false;
610 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100611
612 if (previous_was_tilde) {
613 switch (o.code_point) {
614 case '0':
615 case '1':
616 break;
617 case 'n':
618 case 'r':
619 if (strict_json_pointer_syntax) {
620 return false;
621 }
622 break;
623 default:
624 return false;
625 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100626 }
627 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100628
Nigel Tao0cd2f982020-03-03 23:03:02 +1100629 s.ptr += o.byte_length;
630 s.len -= o.byte_length;
631 }
632 return !previous_was_tilde;
633 }
Nigel Taod60815c2020-03-26 14:32:35 +1100634} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100635
636// ----
637
Nigel Tao168f60a2020-07-14 13:19:33 +1000638enum class file_format {
639 json,
640 cbor,
641};
642
Nigel Tao68920952020-03-03 11:25:18 +1100643struct {
644 int remaining_argc;
645 char** remaining_argv;
646
Nigel Tao3690e832020-03-12 16:52:26 +1100647 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100648 bool fail_if_unsandboxed;
Nigel Tao4e193592020-07-15 12:48:57 +1000649 file_format input_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000650 bool input_allow_json_comments;
651 bool input_allow_json_extra_comma;
Nigel Tao51a38292020-07-19 22:43:17 +1000652 bool input_allow_json_inf_nan_numbers;
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100653 uint32_t max_output_depth;
Nigel Tao168f60a2020-07-14 13:19:33 +1000654 file_format output_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000655 bool output_cbor_metadata_as_json_comments;
Nigel Taoc766bb72020-07-09 12:59:32 +1000656 bool output_json_extra_comma;
Nigel Taodd114692020-07-25 21:54:12 +1000657 bool output_json_inf_nan_numbers;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100658 char* query_c_string;
Nigel Taoecadf722020-07-13 08:22:34 +1000659 size_t spaces;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100660 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100661 bool tabs;
Nigel Taod60815c2020-03-26 14:32:35 +1100662} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100663
664const char* //
665parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000666 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100667 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100668
669 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
670 for (; c < argc; c++) {
671 char* arg = argv[c];
672 if (*arg++ != '-') {
673 break;
674 }
675
676 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
677 // cases, a bare "-" is not a flag (some programs may interpret it as
678 // stdin) and a bare "--" means to stop parsing flags.
679 if (*arg == '\x00') {
680 break;
681 } else if (*arg == '-') {
682 arg++;
683 if (*arg == '\x00') {
684 c++;
685 break;
686 }
687 }
688
Nigel Tao3690e832020-03-12 16:52:26 +1100689 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100690 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100691 continue;
692 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100693 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
694 g_flags.max_output_depth = 1;
695 continue;
696 } else if (!strncmp(arg, "d=", 2) ||
697 !strncmp(arg, "max-output-depth=", 16)) {
698 while (*arg++ != '=') {
699 }
700 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000701 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
702 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000703 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100704 g_flags.max_output_depth = (uint32_t)(u.value);
705 continue;
706 }
707 return g_usage;
708 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100709 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100710 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100711 continue;
712 }
Nigel Tao4e193592020-07-15 12:48:57 +1000713 if (!strcmp(arg, "i=cbor") || !strcmp(arg, "input-format=cbor")) {
714 g_flags.input_format = file_format::cbor;
715 continue;
716 }
717 if (!strcmp(arg, "i=json") || !strcmp(arg, "input-format=json")) {
718 g_flags.input_format = file_format::json;
719 continue;
720 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000721 if (!strcmp(arg, "input-allow-json-comments")) {
722 g_flags.input_allow_json_comments = true;
723 continue;
724 }
725 if (!strcmp(arg, "input-allow-json-extra-comma")) {
726 g_flags.input_allow_json_extra_comma = true;
Nigel Taoc766bb72020-07-09 12:59:32 +1000727 continue;
728 }
Nigel Tao51a38292020-07-19 22:43:17 +1000729 if (!strcmp(arg, "input-allow-json-inf-nan-numbers")) {
730 g_flags.input_allow_json_inf_nan_numbers = true;
731 continue;
732 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000733 if (!strcmp(arg, "o=cbor") || !strcmp(arg, "output-format=cbor")) {
734 g_flags.output_format = file_format::cbor;
735 continue;
736 }
737 if (!strcmp(arg, "o=json") || !strcmp(arg, "output-format=json")) {
738 g_flags.output_format = file_format::json;
739 continue;
740 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000741 if (!strcmp(arg, "output-cbor-metadata-as-json-comments")) {
742 g_flags.output_cbor_metadata_as_json_comments = true;
743 continue;
744 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000745 if (!strcmp(arg, "output-json-extra-comma")) {
746 g_flags.output_json_extra_comma = true;
747 continue;
748 }
Nigel Taodd114692020-07-25 21:54:12 +1000749 if (!strcmp(arg, "output-json-inf-nan-numbers")) {
750 g_flags.output_json_inf_nan_numbers = true;
751 continue;
752 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100753 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
754 while (*arg++ != '=') {
755 }
Nigel Taod60815c2020-03-26 14:32:35 +1100756 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100757 continue;
758 }
Nigel Taoecadf722020-07-13 08:22:34 +1000759 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
760 while (*arg++ != '=') {
761 }
762 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
763 g_flags.spaces = arg[0] - '0';
764 continue;
765 }
766 return g_usage;
767 }
768 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100769 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100770 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100771 }
772 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100773 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100774 continue;
775 }
776
Nigel Taod60815c2020-03-26 14:32:35 +1100777 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100778 }
779
Nigel Taod60815c2020-03-26 14:32:35 +1100780 if (g_flags.query_c_string &&
781 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
782 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100783 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
784 }
785
Nigel Taod60815c2020-03-26 14:32:35 +1100786 g_flags.remaining_argc = argc - c;
787 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100788 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100789}
790
Nigel Tao2cf76db2020-02-27 22:42:01 +1100791const char* //
792initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100793 g_dst = wuffs_base__make_io_buffer(
794 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100795 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100796
Nigel Taod60815c2020-03-26 14:32:35 +1100797 g_src = wuffs_base__make_io_buffer(
798 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100799 wuffs_base__empty_io_buffer_meta());
800
Nigel Taod60815c2020-03-26 14:32:35 +1100801 g_tok = wuffs_base__make_token_buffer(
802 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100803 wuffs_base__empty_token_buffer_meta());
804
Nigel Taod60815c2020-03-26 14:32:35 +1100805 g_curr_token_end_src_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100806
Nigel Tao850dc182020-07-21 22:52:04 +1000807 g_token_extension.category = 0;
808 g_token_extension.detail = 0;
809
Nigel Taod60815c2020-03-26 14:32:35 +1100810 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100811
Nigel Taod60815c2020-03-26 14:32:35 +1100812 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100813
Nigel Tao68920952020-03-03 11:25:18 +1100814 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100815 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100816 return "main: unsandboxed";
817 }
Nigel Tao01abc842020-03-06 21:42:33 +1100818 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100819 if (g_flags.remaining_argc >
820 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
821 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100822 }
823
Nigel Taod60815c2020-03-26 14:32:35 +1100824 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100825
826 // If the query is non-empty, suprress writing to stdout until we've
827 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100828 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
829 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100830
Nigel Tao4e193592020-07-15 12:48:57 +1000831 if (g_flags.input_format == file_format::json) {
832 TRY(g_json_decoder
833 .initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
834 .message());
835 g_dec = g_json_decoder.upcast_as__wuffs_base__token_decoder();
836 } else {
837 TRY(g_cbor_decoder
838 .initialize(sizeof__wuffs_cbor__decoder(), WUFFS_VERSION, 0)
839 .message());
840 g_dec = g_cbor_decoder.upcast_as__wuffs_base__token_decoder();
841 }
Nigel Tao4b186b02020-03-18 14:25:21 +1100842
Nigel Tao3c8589b2020-07-19 21:49:00 +1000843 if (g_flags.input_allow_json_comments) {
844 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
845 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
846 }
847 if (g_flags.input_allow_json_extra_comma) {
Nigel Tao4e193592020-07-15 12:48:57 +1000848 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000849 }
Nigel Tao51a38292020-07-19 22:43:17 +1000850 if (g_flags.input_allow_json_inf_nan_numbers) {
851 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_INF_NAN_NUMBERS, true);
852 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000853
Nigel Tao4b186b02020-03-18 14:25:21 +1100854 // Consume an optional whitespace trailer. This isn't part of the JSON spec,
855 // but it works better with line oriented Unix tools (such as "echo 123 |
856 // jsonptr" where it's "echo", not "echo -n") or hand-edited JSON files which
857 // can accidentally contain trailing whitespace.
Nigel Tao4e193592020-07-15 12:48:57 +1000858 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100859
860 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100861}
Nigel Tao1b073492020-02-16 22:11:36 +1100862
863// ----
864
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100865// ignore_return_value suppresses errors from -Wall -Werror.
866static void //
867ignore_return_value(int ignored) {}
868
Nigel Tao2914bae2020-02-26 09:40:30 +1100869const char* //
870read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +1100871 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +1100872 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +1100873 }
Nigel Taod60815c2020-03-26 14:32:35 +1100874 g_src.compact();
875 if (g_src.meta.wi >= g_src.data.len) {
876 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100877 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100878 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000879 ssize_t n = read(g_input_file_descriptor, g_src.writer_pointer(),
880 g_src.writer_length());
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100881 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100882 g_src.meta.wi += n;
883 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100884 break;
885 } else if (errno != EINTR) {
886 return strerror(errno);
887 }
Nigel Tao1b073492020-02-16 22:11:36 +1100888 }
889 return nullptr;
890}
891
Nigel Tao2914bae2020-02-26 09:40:30 +1100892const char* //
893flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100894 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000895 size_t n = g_dst.reader_length();
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100896 if (n == 0) {
897 break;
Nigel Tao1b073492020-02-16 22:11:36 +1100898 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100899 const int stdout_fd = 1;
Nigel Taod6a10df2020-07-27 11:47:47 +1000900 ssize_t i = write(stdout_fd, g_dst.reader_pointer(), n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100901 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100902 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100903 } else if (errno != EINTR) {
904 return strerror(errno);
905 }
Nigel Tao1b073492020-02-16 22:11:36 +1100906 }
Nigel Taod60815c2020-03-26 14:32:35 +1100907 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +1100908 return nullptr;
909}
910
Nigel Tao2914bae2020-02-26 09:40:30 +1100911const char* //
912write_dst(const void* s, size_t n) {
Nigel Taod60815c2020-03-26 14:32:35 +1100913 if (g_suppress_write_dst > 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100914 return nullptr;
915 }
Nigel Tao1b073492020-02-16 22:11:36 +1100916 const uint8_t* p = static_cast<const uint8_t*>(s);
917 while (n > 0) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000918 size_t i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +1100919 if (i == 0) {
920 const char* z = flush_dst();
921 if (z) {
922 return z;
923 }
Nigel Taod6a10df2020-07-27 11:47:47 +1000924 i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +1100925 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100926 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100927 }
928 }
929
930 if (i > n) {
931 i = n;
932 }
Nigel Taod60815c2020-03-26 14:32:35 +1100933 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
934 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +1100935 p += i;
936 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +1100937 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +1100938 }
939 return nullptr;
940}
941
942// ----
943
Nigel Tao168f60a2020-07-14 13:19:33 +1000944const char* //
945write_literal(uint64_t vbd) {
946 const char* ptr = nullptr;
947 size_t len = 0;
948 if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__UNDEFINED) {
949 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +1000950 // JSON's closest approximation to "undefined" is "null".
951 if (g_flags.output_cbor_metadata_as_json_comments) {
952 ptr = "/*cbor:undefined*/null";
953 len = 22;
954 } else {
955 ptr = "null";
956 len = 4;
957 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000958 } else {
959 ptr = "\xF7";
960 len = 1;
961 }
962 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__NULL) {
963 if (g_flags.output_format == file_format::json) {
964 ptr = "null";
965 len = 4;
966 } else {
967 ptr = "\xF6";
968 len = 1;
969 }
970 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__FALSE) {
971 if (g_flags.output_format == file_format::json) {
972 ptr = "false";
973 len = 5;
974 } else {
975 ptr = "\xF4";
976 len = 1;
977 }
978 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__TRUE) {
979 if (g_flags.output_format == file_format::json) {
980 ptr = "true";
981 len = 4;
982 } else {
983 ptr = "\xF5";
984 len = 1;
985 }
986 } else {
987 return "main: internal error: unexpected write_literal argument";
988 }
989 return write_dst(ptr, len);
990}
991
992// ----
993
994const char* //
Nigel Tao664f8432020-07-16 21:25:14 +1000995write_number_as_cbor_f64(double f) {
Nigel Tao168f60a2020-07-14 13:19:33 +1000996 uint8_t buf[9];
997 wuffs_base__lossy_value_u16 lv16 =
998 wuffs_base__ieee_754_bit_representation__from_f64_to_u16_truncate(f);
999 if (!lv16.lossy) {
1000 buf[0] = 0xF9;
1001 wuffs_base__store_u16be__no_bounds_check(&buf[1], lv16.value);
1002 return write_dst(&buf[0], 3);
1003 }
1004 wuffs_base__lossy_value_u32 lv32 =
1005 wuffs_base__ieee_754_bit_representation__from_f64_to_u32_truncate(f);
1006 if (!lv32.lossy) {
1007 buf[0] = 0xFA;
1008 wuffs_base__store_u32be__no_bounds_check(&buf[1], lv32.value);
1009 return write_dst(&buf[0], 5);
1010 }
1011 buf[0] = 0xFB;
1012 wuffs_base__store_u64be__no_bounds_check(
1013 &buf[1], wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f));
1014 return write_dst(&buf[0], 9);
1015}
1016
1017const char* //
Nigel Tao664f8432020-07-16 21:25:14 +10001018write_number_as_cbor_u64(uint8_t base, uint64_t u) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001019 uint8_t buf[9];
1020 if (u < 0x18) {
1021 buf[0] = base | ((uint8_t)u);
1022 return write_dst(&buf[0], 1);
1023 } else if ((u >> 8) == 0) {
1024 buf[0] = base | 0x18;
1025 buf[1] = ((uint8_t)u);
1026 return write_dst(&buf[0], 2);
1027 } else if ((u >> 16) == 0) {
1028 buf[0] = base | 0x19;
1029 wuffs_base__store_u16be__no_bounds_check(&buf[1], ((uint16_t)u));
1030 return write_dst(&buf[0], 3);
1031 } else if ((u >> 32) == 0) {
1032 buf[0] = base | 0x1A;
1033 wuffs_base__store_u32be__no_bounds_check(&buf[1], ((uint32_t)u));
1034 return write_dst(&buf[0], 5);
1035 }
1036 buf[0] = base | 0x1B;
1037 wuffs_base__store_u64be__no_bounds_check(&buf[1], u);
1038 return write_dst(&buf[0], 9);
1039}
1040
1041const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001042write_number_as_json_f64(wuffs_base__slice_u8 s) {
Nigel Tao5a616b62020-07-24 23:54:52 +10001043 double f;
Nigel Taoee6927f2020-07-27 12:08:33 +10001044 switch (s.len) {
Nigel Tao5a616b62020-07-24 23:54:52 +10001045 case 3:
1046 f = wuffs_base__ieee_754_bit_representation__from_u16_to_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001047 wuffs_base__load_u16be__no_bounds_check(s.ptr + 1));
Nigel Tao5a616b62020-07-24 23:54:52 +10001048 break;
1049 case 5:
1050 f = wuffs_base__ieee_754_bit_representation__from_u32_to_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001051 wuffs_base__load_u32be__no_bounds_check(s.ptr + 1));
Nigel Tao5a616b62020-07-24 23:54:52 +10001052 break;
1053 case 9:
1054 f = wuffs_base__ieee_754_bit_representation__from_u64_to_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001055 wuffs_base__load_u64be__no_bounds_check(s.ptr + 1));
Nigel Tao5a616b62020-07-24 23:54:52 +10001056 break;
1057 default:
1058 return "main: internal error: unexpected write_number_as_json_f64 len";
1059 }
1060 uint8_t buf[512];
1061 const uint32_t precision = 0;
1062 size_t n = wuffs_base__render_number_f64(
1063 wuffs_base__make_slice_u8(&buf[0], sizeof buf), f, precision,
1064 WUFFS_BASE__RENDER_NUMBER_FXX__JUST_ENOUGH_PRECISION);
1065
Nigel Taodd114692020-07-25 21:54:12 +10001066 if (!g_flags.output_json_inf_nan_numbers) {
1067 // JSON numbers don't include Infinities or NaNs. For such numbers, their
1068 // IEEE 754 bit representation's 11 exponent bits are all on.
1069 uint64_t u = wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f);
1070 if (((u >> 52) & 0x7FF) == 0x7FF) {
1071 if (g_flags.output_cbor_metadata_as_json_comments) {
1072 TRY(write_dst("/*cbor:", 7));
1073 TRY(write_dst(&buf[0], n));
1074 TRY(write_dst("*/", 2));
1075 }
1076 return write_dst("null", 4);
Nigel Tao5a616b62020-07-24 23:54:52 +10001077 }
Nigel Tao5a616b62020-07-24 23:54:52 +10001078 }
1079
1080 return write_dst(&buf[0], n);
1081}
1082
1083const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001084write_cbor_minus_1_minus_x(wuffs_base__slice_u8 s) {
Nigel Tao27168032020-07-24 13:05:05 +10001085 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001086 return write_dst(s.ptr, s.len);
Nigel Tao27168032020-07-24 13:05:05 +10001087 }
1088
Nigel Taoee6927f2020-07-27 12:08:33 +10001089 if (s.len != 9) {
Nigel Tao850dc182020-07-21 22:52:04 +10001090 return "main: internal error: invalid ETC__MINUS_1_MINUS_X token length";
Nigel Tao664f8432020-07-16 21:25:14 +10001091 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001092 uint64_t u = 1 + wuffs_base__load_u64be__no_bounds_check(s.ptr + 1);
Nigel Tao850dc182020-07-21 22:52:04 +10001093 if (u == 0) {
1094 // See the cbor.TOKEN_VALUE_MINOR__MINUS_1_MINUS_X comment re overflow.
1095 return write_dst("-18446744073709551616", 21);
Nigel Tao664f8432020-07-16 21:25:14 +10001096 }
1097 uint8_t buf[1 + WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1098 uint8_t* b = &buf[0];
Nigel Tao850dc182020-07-21 22:52:04 +10001099 *b++ = '-';
Nigel Tao664f8432020-07-16 21:25:14 +10001100 size_t n = wuffs_base__render_number_u64(
1101 wuffs_base__make_slice_u8(b, WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL), u,
1102 WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao850dc182020-07-21 22:52:04 +10001103 return write_dst(&buf[0], 1 + n);
Nigel Tao664f8432020-07-16 21:25:14 +10001104}
1105
1106const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001107write_cbor_simple_value(uint64_t tag, wuffs_base__slice_u8 s) {
Nigel Tao042e94f2020-07-24 23:14:27 +10001108 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001109 return write_dst(s.ptr, s.len);
Nigel Tao042e94f2020-07-24 23:14:27 +10001110 }
1111
1112 if (!g_flags.output_cbor_metadata_as_json_comments) {
1113 return nullptr;
1114 }
1115 uint8_t buf[WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1116 size_t n = wuffs_base__render_number_u64(
1117 wuffs_base__make_slice_u8(&buf[0],
1118 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL),
1119 tag, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1120 TRY(write_dst("/*cbor:simple", 13));
1121 TRY(write_dst(&buf[0], n));
1122 return write_dst("*/null", 6);
1123}
1124
1125const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001126write_cbor_tag(uint64_t tag, wuffs_base__slice_u8 s) {
Nigel Tao27168032020-07-24 13:05:05 +10001127 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001128 return write_dst(s.ptr, s.len);
Nigel Tao27168032020-07-24 13:05:05 +10001129 }
1130
1131 if (!g_flags.output_cbor_metadata_as_json_comments) {
1132 return nullptr;
1133 }
1134 uint8_t buf[WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1135 size_t n = wuffs_base__render_number_u64(
1136 wuffs_base__make_slice_u8(&buf[0],
1137 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL),
1138 tag, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1139 TRY(write_dst("/*cbor:tag", 10));
1140 TRY(write_dst(&buf[0], n));
1141 return write_dst("*/", 2);
1142}
1143
1144const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001145write_number(uint64_t vbd, wuffs_base__slice_u8 s) {
Nigel Tao4e193592020-07-15 12:48:57 +10001146 if (g_flags.output_format == file_format::json) {
Nigel Tao5a616b62020-07-24 23:54:52 +10001147 const uint64_t cfp_fbbe_fifb =
1148 WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT |
1149 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_BINARY_BIG_ENDIAN |
1150 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_IGNORE_FIRST_BYTE;
Nigel Tao51a38292020-07-19 22:43:17 +10001151 if (g_flags.input_format == file_format::json) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001152 return write_dst(s.ptr, s.len);
Nigel Tao5a616b62020-07-24 23:54:52 +10001153 } else if ((vbd & cfp_fbbe_fifb) == cfp_fbbe_fifb) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001154 return write_number_as_json_f64(s);
Nigel Tao168f60a2020-07-14 13:19:33 +10001155 }
1156
Nigel Tao4e193592020-07-15 12:48:57 +10001157 // From here on, (g_flags.output_format == file_format::cbor).
Nigel Tao4e193592020-07-15 12:48:57 +10001158 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_TEXT) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001159 // First try to parse s as an integer. Something like
Nigel Tao168f60a2020-07-14 13:19:33 +10001160 // "1180591620717411303424" is a valid number (in the JSON sense) but will
1161 // overflow int64_t or uint64_t, so fall back to parsing it as a float64.
1162 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_INTEGER_SIGNED) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001163 if ((s.len > 0) && (s.ptr[0] == '-')) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001164 wuffs_base__result_i64 ri = wuffs_base__parse_number_i64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001165 s, WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao168f60a2020-07-14 13:19:33 +10001166 if (ri.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001167 return write_number_as_cbor_u64(0x20, ~ri.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001168 }
1169 } else {
1170 wuffs_base__result_u64 ru = wuffs_base__parse_number_u64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001171 s, WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao168f60a2020-07-14 13:19:33 +10001172 if (ru.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001173 return write_number_as_cbor_u64(0x00, ru.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001174 }
1175 }
1176 }
1177
1178 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT) {
1179 wuffs_base__result_f64 rf = wuffs_base__parse_number_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001180 s, WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao168f60a2020-07-14 13:19:33 +10001181 if (rf.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001182 return write_number_as_cbor_f64(rf.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001183 }
1184 }
Nigel Tao51a38292020-07-19 22:43:17 +10001185 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_INF) {
1186 return write_dst("\xF9\xFC\x00", 3);
1187 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_INF) {
1188 return write_dst("\xF9\x7C\x00", 3);
1189 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_NAN) {
1190 return write_dst("\xF9\xFF\xFF", 3);
1191 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_NAN) {
1192 return write_dst("\xF9\x7F\xFF", 3);
Nigel Tao168f60a2020-07-14 13:19:33 +10001193 }
1194
Nigel Tao4e193592020-07-15 12:48:57 +10001195fail:
Nigel Tao168f60a2020-07-14 13:19:33 +10001196 return "main: internal error: unexpected write_number argument";
1197}
1198
Nigel Tao4e193592020-07-15 12:48:57 +10001199const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001200write_inline_integer(uint64_t x, bool x_is_signed, wuffs_base__slice_u8 s) {
Nigel Tao4e193592020-07-15 12:48:57 +10001201 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001202 return write_dst(s.ptr, s.len);
Nigel Tao4e193592020-07-15 12:48:57 +10001203 }
1204
Nigel Taoc9d4e342020-07-21 15:20:34 +10001205 // Adding the two ETC__BYTE_LENGTH__ETC constants is overkill, but it's
1206 // simpler (for producing a constant-expression array size) than taking the
1207 // maximum of the two.
1208 uint8_t buf[WUFFS_BASE__I64__BYTE_LENGTH__MAX_INCL +
1209 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1210 wuffs_base__slice_u8 dst = wuffs_base__make_slice_u8(&buf[0], sizeof buf);
1211 size_t n =
1212 x_is_signed
1213 ? wuffs_base__render_number_i64(
1214 dst, (int64_t)x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS)
1215 : wuffs_base__render_number_u64(
1216 dst, x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao4e193592020-07-15 12:48:57 +10001217 return write_dst(&buf[0], n);
1218}
1219
Nigel Tao168f60a2020-07-14 13:19:33 +10001220// ----
1221
Nigel Tao2914bae2020-02-26 09:40:30 +11001222uint8_t //
1223hex_digit(uint8_t nibble) {
Nigel Taob5461bd2020-02-21 14:13:37 +11001224 nibble &= 0x0F;
1225 if (nibble <= 9) {
1226 return '0' + nibble;
1227 }
1228 return ('A' - 10) + nibble;
1229}
1230
Nigel Tao2914bae2020-02-26 09:40:30 +11001231const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001232flush_cbor_output_string() {
1233 uint8_t prefix[3];
1234 prefix[0] = g_cbor_output_string_is_utf_8 ? 0x60 : 0x40;
1235 if (g_cbor_output_string_length < 0x18) {
1236 prefix[0] |= g_cbor_output_string_length;
1237 TRY(write_dst(&prefix[0], 1));
1238 } else if (g_cbor_output_string_length <= 0xFF) {
1239 prefix[0] |= 0x18;
1240 prefix[1] = g_cbor_output_string_length;
1241 TRY(write_dst(&prefix[0], 2));
1242 } else if (g_cbor_output_string_length <= 0xFFFF) {
1243 prefix[0] |= 0x19;
1244 prefix[1] = g_cbor_output_string_length >> 8;
1245 prefix[2] = g_cbor_output_string_length;
1246 TRY(write_dst(&prefix[0], 3));
1247 } else {
1248 return "main: internal error: CBOR string output is too long";
1249 }
1250
1251 size_t n = g_cbor_output_string_length;
1252 g_cbor_output_string_length = 0;
Nigel Taoea532452020-07-27 00:03:00 +10001253 return write_dst(&g_spool_array[0], n);
Nigel Tao168f60a2020-07-14 13:19:33 +10001254}
1255
1256const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001257write_cbor_output_string(wuffs_base__slice_u8 s, bool finish) {
Nigel Taoea532452020-07-27 00:03:00 +10001258 // Check that g_spool_array can hold any UTF-8 code point.
1259 if (SPOOL_ARRAY_SIZE < 4) {
1260 return "main: internal error: SPOOL_ARRAY_SIZE is too short";
Nigel Tao168f60a2020-07-14 13:19:33 +10001261 }
1262
Nigel Taoee6927f2020-07-27 12:08:33 +10001263 uint8_t* ptr = s.ptr;
1264 size_t len = s.len;
Nigel Tao168f60a2020-07-14 13:19:33 +10001265 while (len > 0) {
Nigel Taoea532452020-07-27 00:03:00 +10001266 size_t available = SPOOL_ARRAY_SIZE - g_cbor_output_string_length;
Nigel Tao168f60a2020-07-14 13:19:33 +10001267 if (available >= len) {
Nigel Taoea532452020-07-27 00:03:00 +10001268 memcpy(&g_spool_array[g_cbor_output_string_length], ptr, len);
Nigel Tao168f60a2020-07-14 13:19:33 +10001269 g_cbor_output_string_length += len;
1270 ptr += len;
1271 len = 0;
1272 break;
1273
1274 } else if (available > 0) {
1275 if (!g_cbor_output_string_is_multiple_chunks) {
1276 g_cbor_output_string_is_multiple_chunks = true;
1277 TRY(write_dst(g_cbor_output_string_is_utf_8 ? "\x7F" : "\x5F", 1));
Nigel Tao3b486982020-02-27 15:05:59 +11001278 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001279
1280 if (g_cbor_output_string_is_utf_8) {
1281 // Walk the end backwards to a UTF-8 boundary, so that each chunk of
1282 // the multi-chunk string is also valid UTF-8.
1283 while (available > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +10001284 wuffs_base__utf_8__next__output o =
1285 wuffs_base__utf_8__next_from_end(ptr, available);
Nigel Tao168f60a2020-07-14 13:19:33 +10001286 if ((o.code_point != WUFFS_BASE__UNICODE_REPLACEMENT_CHARACTER) ||
1287 (o.byte_length != 1)) {
1288 break;
1289 }
1290 available--;
1291 }
1292 }
1293
Nigel Taoea532452020-07-27 00:03:00 +10001294 memcpy(&g_spool_array[g_cbor_output_string_length], ptr, available);
Nigel Tao168f60a2020-07-14 13:19:33 +10001295 g_cbor_output_string_length += available;
1296 ptr += available;
1297 len -= available;
Nigel Tao3b486982020-02-27 15:05:59 +11001298 }
1299
Nigel Tao168f60a2020-07-14 13:19:33 +10001300 TRY(flush_cbor_output_string());
1301 }
Nigel Taob9ad34f2020-03-03 12:44:01 +11001302
Nigel Tao168f60a2020-07-14 13:19:33 +10001303 if (finish) {
1304 TRY(flush_cbor_output_string());
1305 if (g_cbor_output_string_is_multiple_chunks) {
1306 TRY(write_dst("\xFF", 1));
1307 }
1308 }
1309 return nullptr;
1310}
Nigel Taob9ad34f2020-03-03 12:44:01 +11001311
Nigel Tao168f60a2020-07-14 13:19:33 +10001312const char* //
Nigel Taoea532452020-07-27 00:03:00 +10001313flush_json_output_byte_string(bool finish) {
1314 uint8_t* ptr = &g_spool_array[0];
1315 size_t len = g_json_output_byte_string_length;
1316 while (len > 0) {
1317 wuffs_base__transform__output o = wuffs_base__base_64__encode(
Nigel Taod6a10df2020-07-27 11:47:47 +10001318 g_dst.writer_slice(), wuffs_base__make_slice_u8(ptr, len), finish,
Nigel Taoea532452020-07-27 00:03:00 +10001319 WUFFS_BASE__BASE_64__URL_ALPHABET);
1320 g_dst.meta.wi += o.num_dst;
1321 ptr += o.num_src;
1322 len -= o.num_src;
1323 if (o.status.repr == nullptr) {
1324 if (len != 0) {
1325 return "main: internal error: inconsistent spool length";
1326 }
1327 g_json_output_byte_string_length = 0;
1328 break;
1329 } else if (o.status.repr == wuffs_base__suspension__short_read) {
1330 memmove(&g_spool_array[0], ptr, len);
1331 g_json_output_byte_string_length = len;
1332 break;
1333 } else if (o.status.repr != wuffs_base__suspension__short_write) {
1334 return o.status.message();
1335 }
1336 TRY(flush_dst());
1337 }
1338 return nullptr;
1339}
1340
1341const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001342write_json_output_byte_string(wuffs_base__slice_u8 s, bool finish) {
1343 uint8_t* ptr = s.ptr;
1344 size_t len = s.len;
Nigel Taoea532452020-07-27 00:03:00 +10001345 while (len > 0) {
1346 size_t available = SPOOL_ARRAY_SIZE - g_json_output_byte_string_length;
1347 if (available >= len) {
1348 memcpy(&g_spool_array[g_json_output_byte_string_length], ptr, len);
1349 g_json_output_byte_string_length += len;
1350 ptr += len;
1351 len = 0;
1352 break;
1353
1354 } else if (available > 0) {
1355 memcpy(&g_spool_array[g_json_output_byte_string_length], ptr, available);
1356 g_json_output_byte_string_length += available;
1357 ptr += available;
1358 len -= available;
1359 }
1360
1361 TRY(flush_json_output_byte_string(false));
1362 }
1363
1364 if (finish) {
1365 TRY(flush_json_output_byte_string(true));
1366 }
1367 return nullptr;
1368}
1369
1370// ----
1371
1372const char* //
Nigel Tao7cb76542020-07-19 22:19:04 +10001373handle_unicode_code_point(uint32_t ucp) {
1374 if (g_flags.output_format == file_format::json) {
1375 if (ucp < 0x0020) {
1376 switch (ucp) {
1377 case '\b':
1378 return write_dst("\\b", 2);
1379 case '\f':
1380 return write_dst("\\f", 2);
1381 case '\n':
1382 return write_dst("\\n", 2);
1383 case '\r':
1384 return write_dst("\\r", 2);
1385 case '\t':
1386 return write_dst("\\t", 2);
1387 }
1388
1389 // Other bytes less than 0x0020 are valid UTF-8 but not valid in a
1390 // JSON string. They need to remain escaped.
1391 uint8_t esc6[6];
1392 esc6[0] = '\\';
1393 esc6[1] = 'u';
1394 esc6[2] = '0';
1395 esc6[3] = '0';
1396 esc6[4] = hex_digit(ucp >> 4);
1397 esc6[5] = hex_digit(ucp >> 0);
1398 return write_dst(&esc6[0], 6);
1399
1400 } else if (ucp == '\"') {
1401 return write_dst("\\\"", 2);
1402
1403 } else if (ucp == '\\') {
1404 return write_dst("\\\\", 2);
1405 }
1406 }
1407
1408 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1409 size_t n = wuffs_base__utf_8__encode(
1410 wuffs_base__make_slice_u8(&u[0],
1411 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1412 ucp);
1413 if (n == 0) {
1414 return "main: internal error: unexpected Unicode code point";
1415 }
1416
1417 if (g_flags.output_format == file_format::json) {
1418 return write_dst(&u[0], n);
1419 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001420 return write_cbor_output_string(wuffs_base__make_slice_u8(&u[0], n), false);
Nigel Tao7cb76542020-07-19 22:19:04 +10001421}
Nigel Taod191a3f2020-07-19 22:14:54 +10001422
1423const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001424write_json_output_text_string(wuffs_base__slice_u8 s) {
1425 uint8_t* ptr = s.ptr;
1426 size_t len = s.len;
Nigel Taod191a3f2020-07-19 22:14:54 +10001427restart:
1428 while (true) {
1429 size_t i;
1430 for (i = 0; i < len; i++) {
1431 uint8_t c = ptr[i];
1432 if ((c == '"') || (c == '\\') || (c < 0x20)) {
1433 TRY(write_dst(ptr, i));
1434 TRY(handle_unicode_code_point(c));
1435 ptr += i + 1;
1436 len -= i + 1;
1437 goto restart;
1438 }
1439 }
1440 TRY(write_dst(ptr, len));
1441 break;
1442 }
1443 return nullptr;
1444}
1445
1446const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001447handle_string(uint64_t vbd,
Nigel Taoee6927f2020-07-27 12:08:33 +10001448 wuffs_base__slice_u8 s,
Nigel Tao168f60a2020-07-14 13:19:33 +10001449 bool start_of_token_chain,
1450 bool continued) {
1451 if (start_of_token_chain) {
1452 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +10001453 if (g_flags.output_cbor_metadata_as_json_comments &&
1454 !(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
Nigel Taoea532452020-07-27 00:03:00 +10001455 TRY(write_dst("/*cbor:base64url*/\"", 19));
1456 g_json_output_byte_string_length = 0;
Nigel Tao3c8589b2020-07-19 21:49:00 +10001457 } else {
1458 TRY(write_dst("\"", 1));
1459 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001460 } else {
1461 g_cbor_output_string_length = 0;
1462 g_cbor_output_string_is_multiple_chunks = false;
1463 g_cbor_output_string_is_utf_8 =
1464 vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8;
1465 }
1466 g_query.restart_fragment(in_dict_before_key() && g_query.is_at(g_depth));
1467 }
1468
1469 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_0_DST_1_SRC_DROP) {
1470 // No-op.
1471 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001472 if (g_flags.output_format == file_format::json) {
Nigel Taoaf757722020-07-18 17:27:11 +10001473 if (g_flags.input_format == file_format::json) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001474 TRY(write_dst(s.ptr, s.len));
Nigel Taoaf757722020-07-18 17:27:11 +10001475 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001476 TRY(write_json_output_text_string(s));
Nigel Taoaf757722020-07-18 17:27:11 +10001477 } else {
Nigel Taoee6927f2020-07-27 12:08:33 +10001478 TRY(write_json_output_byte_string(s, false));
Nigel Taoaf757722020-07-18 17:27:11 +10001479 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001480 } else {
Nigel Taoee6927f2020-07-27 12:08:33 +10001481 TRY(write_cbor_output_string(s, false));
Nigel Tao168f60a2020-07-14 13:19:33 +10001482 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001483 g_query.incremental_match_slice(s.ptr, s.len);
Nigel Taob9ad34f2020-03-03 12:44:01 +11001484 } else {
Nigel Tao168f60a2020-07-14 13:19:33 +10001485 return "main: internal error: unexpected string-token conversion";
1486 }
1487
1488 if (continued) {
1489 return nullptr;
1490 }
1491
1492 if (g_flags.output_format == file_format::json) {
Nigel Taoea532452020-07-27 00:03:00 +10001493 if (!(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001494 TRY(write_json_output_byte_string(wuffs_base__empty_slice_u8(), true));
Nigel Taoea532452020-07-27 00:03:00 +10001495 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001496 TRY(write_dst("\"", 1));
1497 } else {
Nigel Taoee6927f2020-07-27 12:08:33 +10001498 TRY(write_cbor_output_string(wuffs_base__empty_slice_u8(), true));
Nigel Tao168f60a2020-07-14 13:19:33 +10001499 }
1500 return nullptr;
1501}
1502
Nigel Taod191a3f2020-07-19 22:14:54 +10001503// ----
1504
Nigel Tao3b486982020-02-27 15:05:59 +11001505const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001506handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001507 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001508 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001509 uint64_t vbd = t.value_base_detail();
Nigel Taoee6927f2020-07-27 12:08:33 +10001510 uint64_t token_length = t.length();
1511 wuffs_base__slice_u8 tok = wuffs_base__make_slice_u8(
1512 g_src.data.ptr + g_curr_token_end_src_index - token_length,
1513 token_length);
Nigel Tao1b073492020-02-16 22:11:36 +11001514
1515 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001516 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001517 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001518 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001519 return "main: no match for query";
1520 }
Nigel Taod60815c2020-03-26 14:32:35 +11001521 if (g_depth <= 0) {
1522 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001523 }
Nigel Taod60815c2020-03-26 14:32:35 +11001524 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001525
Nigel Taod60815c2020-03-26 14:32:35 +11001526 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1527 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001528 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao168f60a2020-07-14 13:19:33 +10001529 if (g_flags.output_format == file_format::json) {
1530 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1531 ? "\"[…]\""
1532 : "\"{…}\"",
1533 7));
1534 } else {
1535 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1536 ? "\x65[…]"
1537 : "\x65{…}",
1538 6));
1539 }
1540 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001541 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001542 if ((g_ctx != context::in_list_after_bracket) &&
1543 (g_ctx != context::in_dict_after_brace) &&
1544 !g_flags.compact_output) {
Nigel Taoc766bb72020-07-09 12:59:32 +10001545 if (g_flags.output_json_extra_comma) {
1546 TRY(write_dst(",\n", 2));
1547 } else {
1548 TRY(write_dst("\n", 1));
1549 }
Nigel Taod60815c2020-03-26 14:32:35 +11001550 for (uint32_t i = 0; i < g_depth; i++) {
1551 TRY(write_dst(
1552 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001553 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001554 }
Nigel Tao1b073492020-02-16 22:11:36 +11001555 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001556
1557 TRY(write_dst(
1558 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1559 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001560 } else {
1561 TRY(write_dst("\xFF", 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001562 }
1563
Nigel Taod60815c2020-03-26 14:32:35 +11001564 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1565 ? context::in_list_after_value
1566 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001567 goto after_value;
1568 }
1569
Nigel Taod1c928a2020-02-28 12:43:53 +11001570 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
1571 // continuation of a multi-token chain.
Nigel Tao2ef39992020-04-09 17:24:39 +10001572 if (start_of_token_chain) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001573 if (g_flags.output_format != file_format::json) {
1574 // No-op.
1575 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001576 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1577 } else if (g_ctx != context::none) {
Nigel Taof8dfc762020-07-23 23:35:44 +10001578 if ((g_ctx == context::in_dict_after_brace) ||
1579 (g_ctx == context::in_dict_after_value)) {
1580 // Reject dict keys that aren't UTF-8 strings, which could otherwise
1581 // happen with -i=cbor -o=json.
1582 if ((vbc != WUFFS_BASE__TOKEN__VBC__STRING) ||
1583 !(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
1584 return "main: cannot convert CBOR non-text-string to JSON map key";
1585 }
1586 }
1587 if ((g_ctx == context::in_list_after_value) ||
1588 (g_ctx == context::in_dict_after_value)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001589 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001590 }
Nigel Taod60815c2020-03-26 14:32:35 +11001591 if (!g_flags.compact_output) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001592 TRY(write_dst("\n", 1));
Nigel Taod60815c2020-03-26 14:32:35 +11001593 for (size_t i = 0; i < g_depth; i++) {
1594 TRY(write_dst(
1595 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001596 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao0cd2f982020-03-03 23:03:02 +11001597 }
1598 }
1599 }
1600
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001601 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001602 if (g_query.is_at(g_depth)) {
1603 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001604 case context::in_list_after_bracket:
1605 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001606 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001607 break;
1608 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001609 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001610 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001611 default:
1612 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001613 }
1614 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001615 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001616 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001617 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001618 // There is no next fragment. We have matched the complete query, and
1619 // the upcoming JSON value is the result of that query.
1620 //
Nigel Taod60815c2020-03-26 14:32:35 +11001621 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1622 // we were about to decode a top-level value. This makes any subsequent
1623 // indentation be relative to this point, and we will return g_eod
1624 // after the upcoming JSON value is complete.
1625 if (g_suppress_write_dst != 1) {
1626 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001627 }
Nigel Taod60815c2020-03-26 14:32:35 +11001628 g_suppress_write_dst = 0;
1629 g_ctx = context::none;
1630 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001631 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1632 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1633 // The query has moved on to the next fragment but the upcoming JSON
1634 // value is not a container.
1635 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001636 }
1637 }
1638
1639 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001640 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001641 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001642 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001643 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1644 g_suppress_write_dst++;
Nigel Tao168f60a2020-07-14 13:19:33 +10001645 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001646 TRY(write_dst(
1647 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1648 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001649 } else {
1650 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1651 ? "\x9F"
1652 : "\xBF",
1653 1));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001654 }
Nigel Taod60815c2020-03-26 14:32:35 +11001655 g_depth++;
1656 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1657 ? context::in_list_after_bracket
1658 : context::in_dict_after_brace;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001659 return nullptr;
1660
Nigel Tao2cf76db2020-02-27 22:42:01 +11001661 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Taoee6927f2020-07-27 12:08:33 +10001662 TRY(handle_string(vbd, tok, start_of_token_chain, t.continued()));
Nigel Tao496e88b2020-04-09 22:10:08 +10001663 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001664 return nullptr;
1665 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001666 goto after_value;
1667
1668 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001669 if (!t.continued()) {
1670 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001671 }
1672 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001673 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001674 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001675
Nigel Tao85fba7f2020-02-29 16:28:06 +11001676 case WUFFS_BASE__TOKEN__VBC__LITERAL:
Nigel Tao168f60a2020-07-14 13:19:33 +10001677 TRY(write_literal(vbd));
1678 goto after_value;
1679
Nigel Tao2cf76db2020-02-27 22:42:01 +11001680 case WUFFS_BASE__TOKEN__VBC__NUMBER:
Nigel Taoee6927f2020-07-27 12:08:33 +10001681 TRY(write_number(vbd, tok));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001682 goto after_value;
Nigel Tao4e193592020-07-15 12:48:57 +10001683
Nigel Taoc9d4e342020-07-21 15:20:34 +10001684 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1685 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED: {
1686 bool x_is_signed = vbc == WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED;
1687 uint64_t x = x_is_signed
1688 ? ((uint64_t)(t.value_base_detail__sign_extended()))
1689 : vbd;
Nigel Tao850dc182020-07-21 22:52:04 +10001690 if (t.continued()) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001691 if (tok.len != 0) {
Nigel Tao03a87ea2020-07-21 23:29:26 +10001692 return "main: internal error: unexpected to-be-extended length";
1693 }
Nigel Tao850dc182020-07-21 22:52:04 +10001694 g_token_extension.category = vbc;
1695 g_token_extension.detail = x;
1696 return nullptr;
1697 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001698 TRY(write_inline_integer(x, x_is_signed, tok));
Nigel Tao4e193592020-07-15 12:48:57 +10001699 goto after_value;
Nigel Taoc9d4e342020-07-21 15:20:34 +10001700 }
Nigel Tao1b073492020-02-16 22:11:36 +11001701 }
1702
Nigel Tao850dc182020-07-21 22:52:04 +10001703 int64_t ext = t.value_extension();
1704 if (ext >= 0) {
Nigel Tao27168032020-07-24 13:05:05 +10001705 uint64_t x = (g_token_extension.detail
1706 << WUFFS_BASE__TOKEN__VALUE_EXTENSION__NUM_BITS) |
1707 ((uint64_t)ext);
Nigel Tao850dc182020-07-21 22:52:04 +10001708 switch (g_token_extension.category) {
1709 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1710 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED:
Nigel Tao850dc182020-07-21 22:52:04 +10001711 TRY(write_inline_integer(
1712 x,
1713 g_token_extension.category ==
1714 WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED,
Nigel Taoee6927f2020-07-27 12:08:33 +10001715 tok));
Nigel Tao850dc182020-07-21 22:52:04 +10001716 g_token_extension.category = 0;
1717 g_token_extension.detail = 0;
1718 goto after_value;
Nigel Tao27168032020-07-24 13:05:05 +10001719 case CATEGORY_CBOR_TAG:
Nigel Taoee6927f2020-07-27 12:08:33 +10001720 TRY(write_cbor_tag(x, tok));
Nigel Tao27168032020-07-24 13:05:05 +10001721 g_token_extension.category = 0;
1722 g_token_extension.detail = 0;
1723 return nullptr;
Nigel Tao850dc182020-07-21 22:52:04 +10001724 }
1725 }
1726
Nigel Tao664f8432020-07-16 21:25:14 +10001727 if (t.value_major() == WUFFS_CBOR__TOKEN_VALUE_MAJOR) {
1728 uint64_t value_minor = t.value_minor();
Nigel Taoc9e20102020-07-24 23:19:12 +10001729 if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__MINUS_1_MINUS_X) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001730 TRY(write_cbor_minus_1_minus_x(tok));
Nigel Taoc9e20102020-07-24 23:19:12 +10001731 goto after_value;
1732 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__SIMPLE_VALUE) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001733 TRY(write_cbor_simple_value(vbd, tok));
Nigel Taoc9e20102020-07-24 23:19:12 +10001734 goto after_value;
1735 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__TAG) {
Nigel Tao27168032020-07-24 13:05:05 +10001736 if (t.continued()) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001737 if (tok.len != 0) {
Nigel Tao27168032020-07-24 13:05:05 +10001738 return "main: internal error: unexpected to-be-extended length";
1739 }
1740 g_token_extension.category = CATEGORY_CBOR_TAG;
1741 g_token_extension.detail = vbd;
1742 return nullptr;
1743 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001744 return write_cbor_tag(vbd, tok);
Nigel Tao664f8432020-07-16 21:25:14 +10001745 }
1746 }
1747
1748 // Return an error if we didn't match the (value_major, value_minor) or
1749 // (vbc, vbd) pair.
Nigel Tao2cf76db2020-02-27 22:42:01 +11001750 return "main: internal error: unexpected token";
1751 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001752
Nigel Tao2cf76db2020-02-27 22:42:01 +11001753 // Book-keeping after completing a value (whether a container value or a
1754 // simple value). Empty parent containers are no longer empty. If the parent
1755 // container is a "{...}" object, toggle between keys and values.
1756after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001757 if (g_depth == 0) {
1758 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001759 }
Nigel Taod60815c2020-03-26 14:32:35 +11001760 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001761 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001762 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001763 break;
1764 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001765 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001766 break;
1767 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001768 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001769 break;
1770 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001771 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001772 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001773 default:
1774 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001775 }
1776 return nullptr;
1777}
1778
1779const char* //
1780main1(int argc, char** argv) {
1781 TRY(initialize_globals(argc, argv));
1782
Nigel Taocd183f92020-07-14 12:11:05 +10001783 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001784 while (true) {
Nigel Tao4e193592020-07-15 12:48:57 +10001785 wuffs_base__status status = g_dec->decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001786 &g_tok, &g_src,
1787 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001788
Nigel Taod60815c2020-03-26 14:32:35 +11001789 while (g_tok.meta.ri < g_tok.meta.wi) {
1790 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao2cf76db2020-02-27 22:42:01 +11001791 uint64_t n = t.length();
Nigel Taod60815c2020-03-26 14:32:35 +11001792 if ((g_src.meta.ri - g_curr_token_end_src_index) < n) {
1793 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001794 }
Nigel Taod60815c2020-03-26 14:32:35 +11001795 g_curr_token_end_src_index += n;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001796
Nigel Taod0b16cb2020-03-14 10:15:54 +11001797 // Skip filler tokens (e.g. whitespace).
Nigel Tao3c8589b2020-07-19 21:49:00 +10001798 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao496e88b2020-04-09 22:10:08 +10001799 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001800 continue;
1801 }
1802
Nigel Tao2ef39992020-04-09 17:24:39 +10001803 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao496e88b2020-04-09 22:10:08 +10001804 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001805 if (z == nullptr) {
1806 continue;
Nigel Taod60815c2020-03-26 14:32:35 +11001807 } else if (z == g_eod) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001808 goto end_of_data;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001809 }
1810 return z;
Nigel Tao1b073492020-02-16 22:11:36 +11001811 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001812
1813 if (status.repr == nullptr) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001814 return "main: internal error: unexpected end of token stream";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001815 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Taod60815c2020-03-26 14:32:35 +11001816 if (g_curr_token_end_src_index != g_src.meta.ri) {
1817 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001818 }
1819 TRY(read_src());
Nigel Taod60815c2020-03-26 14:32:35 +11001820 g_curr_token_end_src_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001821 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001822 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001823 } else {
1824 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001825 }
1826 }
Nigel Tao0cd2f982020-03-03 23:03:02 +11001827end_of_data:
1828
Nigel Taod60815c2020-03-26 14:32:35 +11001829 // With a non-empty g_query, don't try to consume trailing whitespace or
Nigel Tao0cd2f982020-03-03 23:03:02 +11001830 // confirm that we've processed all the tokens.
Nigel Taod60815c2020-03-26 14:32:35 +11001831 if (g_flags.query_c_string && *g_flags.query_c_string) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001832 return nullptr;
1833 }
Nigel Tao6b161af2020-02-24 11:01:48 +11001834
Nigel Tao6b161af2020-02-24 11:01:48 +11001835 // Check that we've exhausted the input.
Nigel Taod60815c2020-03-26 14:32:35 +11001836 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001837 TRY(read_src());
1838 }
Nigel Taod60815c2020-03-26 14:32:35 +11001839 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
Nigel Tao51a38292020-07-19 22:43:17 +10001840 return "main: valid JSON|CBOR followed by further (unexpected) data";
Nigel Tao6b161af2020-02-24 11:01:48 +11001841 }
1842
1843 // Check that we've used all of the decoded tokens, other than trailing
Nigel Tao4b186b02020-03-18 14:25:21 +11001844 // filler tokens. For example, "true\n" is valid JSON (and fully consumed
1845 // with WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE enabled) with a trailing
1846 // filler token for the "\n".
Nigel Taod60815c2020-03-26 14:32:35 +11001847 for (; g_tok.meta.ri < g_tok.meta.wi; g_tok.meta.ri++) {
1848 if (g_tok.data.ptr[g_tok.meta.ri].value_base_category() !=
Nigel Tao6b161af2020-02-24 11:01:48 +11001849 WUFFS_BASE__TOKEN__VBC__FILLER) {
1850 return "main: internal error: decoded OK but unprocessed tokens remain";
1851 }
1852 }
1853
1854 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001855}
1856
Nigel Tao2914bae2020-02-26 09:40:30 +11001857int //
1858compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001859 if (!status_msg) {
1860 return 0;
1861 }
Nigel Tao01abc842020-03-06 21:42:33 +11001862 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001863 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001864 n = strlen(status_msg);
1865 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001866 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001867 if (n >= 2047) {
1868 status_msg = "main: internal error: error message is too long";
1869 n = strnlen(status_msg, 2047);
1870 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001871 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001872 const int stderr_fd = 2;
1873 ignore_return_value(write(stderr_fd, status_msg, n));
1874 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Tao9cc2c252020-02-23 17:05:49 +11001875 // Return an exit code of 1 for regular (forseen) errors, e.g. badly
1876 // formatted or unsupported input.
1877 //
1878 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1879 // run-time checks found that an internal invariant did not hold.
1880 //
1881 // Automated testing, including badly formatted inputs, can therefore
1882 // discriminate between expected failure (exit code 1) and unexpected failure
1883 // (other non-zero exit codes). Specifically, exit code 2 for internal
1884 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1885 // linux) for a segmentation fault (e.g. null pointer dereference).
1886 return strstr(status_msg, "internal error:") ? 2 : 1;
1887}
1888
Nigel Tao2914bae2020-02-26 09:40:30 +11001889int //
1890main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001891 // Look for an input filename (the first non-flag argument) in argv. If there
1892 // is one, open it (but do not read from it) before we self-impose a sandbox.
1893 //
1894 // Flags start with "-", unless it comes after a bare "--" arg.
1895 {
1896 bool dash_dash = false;
1897 int a;
1898 for (a = 1; a < argc; a++) {
1899 char* arg = argv[a];
1900 if ((arg[0] == '-') && !dash_dash) {
1901 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1902 continue;
1903 }
Nigel Taod60815c2020-03-26 14:32:35 +11001904 g_input_file_descriptor = open(arg, O_RDONLY);
1905 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001906 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1907 return 1;
1908 }
1909 break;
1910 }
1911 }
1912
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001913#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1914 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001915 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001916#endif
1917
Nigel Tao0cd2f982020-03-03 23:03:02 +11001918 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001919 if (g_wrote_to_dst) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001920 const char* z1 = (g_flags.output_format == file_format::json)
1921 ? write_dst("\n", 1)
1922 : nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001923 const char* z2 = flush_dst();
1924 z = z ? z : (z1 ? z1 : z2);
1925 }
1926 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001927
1928#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1929 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1930 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1931 // only SYS_exit.
1932 syscall(SYS_exit, exit_code);
1933#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001934 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001935}