blob: 64de0c29d7c7d98e90d39bc4fb062bc33ebb845e [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
17/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110018jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao168f60a2020-07-14 13:19:33 +100019(RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin and writes CBOR
20or canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110021
Nigel Taod60815c2020-03-26 14:32:35 +110022See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110023
24----
25
26JSON Pointer (and this program's implementation) is one of many JSON query
27languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
28simple and fewer-featured compared to those others.
29
Nigel Tao168f60a2020-07-14 13:19:33 +100030One benefit of simplicity is that this program's CBOR, JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110031implementations do not dynamically allocate or free memory (yet it does not
32require that the entire input fits in memory at once). They are therefore
33trivially protected against certain bug classes: memory leaks, double-frees and
34use-after-frees.
35
Nigel Tao168f60a2020-07-14 13:19:33 +100036The CBOR and JSON implementations are also written in the Wuffs programming
37language (and then transpiled to C/C++), which is memory-safe (e.g. array
38indexing is bounds-checked) but also prevents integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110039
Nigel Taofe0cbbd2020-03-05 22:01:30 +110040For defense in depth, on Linux, this program also self-imposes a
41SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
42or writing its output. Under this sandbox, the only permitted system calls are
43read, write, exit and sigreturn.
44
Nigel Tao168f60a2020-07-14 13:19:33 +100045All together, this program aims to safely handle untrusted CBOR or JSON files
46without fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110047
48----
Nigel Tao1b073492020-02-16 22:11:36 +110049
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110050As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
51JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
52"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
53was first published on 2016-10-26 and updated on 2018-03-30.
54
Nigel Tao0cd2f982020-03-03 23:03:02 +110055After modifying this program, run "build-example.sh example/jsonptr/" and then
56"script/run-json-test-suite.sh" to catch correctness regressions.
57
58----
59
Nigel Taod0b16cb2020-03-14 10:15:54 +110060This program uses Wuffs' JSON decoder at a relatively low level, processing the
61decoder's token-stream output individually. The core loop, in pseudo-code, is
62"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110063changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110064output text based on that state and the token's source text. Notably,
65handle_token is not recursive, even though JSON values can nest.
66
67This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
68string, object) comprises one or more JSON tokens.
69
70An alternative, higher-level approach is in the sibling example/jsonfindptrs
71program. Neither approach is better or worse per se, but when studying this
72program, be aware that there are multiple ways to use Wuffs' JSON decoder.
73
74The two programs, jsonfindptrs and jsonptr, also demonstrate different
75trade-offs with regard to JSON object duplicate keys. The JSON spec permits
76different implementations to allow or reject duplicate keys. It is not always
77clear which approach is safer. Rejecting them is certainly unambiguous, and
78security bugs can lurk in ambiguous corners of a file format, if two different
79implementations both silently accept a file but differ on how to interpret it.
80On the other hand, in the worst case, detecting duplicate keys requires O(N)
81memory, where N is the size of the (potentially untrusted) input.
82
83This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
84mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
85it runs in a SECCOMP_MODE_STRICT sandbox.
86
87----
88
Nigel Tao50bfab92020-08-05 11:39:09 +100089To run:
Nigel Tao1b073492020-02-16 22:11:36 +110090
91$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
92
93for a C++ compiler $CXX, such as clang++ or g++.
94*/
95
Nigel Tao721190a2020-04-03 22:25:21 +110096#if defined(__cplusplus) && (__cplusplus < 201103L)
97#error "This C++ program requires -std=c++11 or later"
98#endif
99
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100100#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100101#include <fcntl.h>
102#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100103#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100104#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100105
106// Wuffs ships as a "single file C library" or "header file library" as per
107// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
108//
109// To use that single file as a "foo.c"-like implementation, instead of a
110// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
111// compiling it.
112#define WUFFS_IMPLEMENTATION
113
114// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
115// release/c/etc.c whitelist which parts of Wuffs to build. That file contains
116// the entire Wuffs standard library, implementing a variety of codecs and file
117// formats. Without this macro definition, an optimizing compiler or linker may
118// very well discard Wuffs code for unused codecs, but listing the Wuffs
119// modules we use makes that process explicit. Preprocessing means that such
120// code simply isn't compiled.
121#define WUFFS_CONFIG__MODULES
122#define WUFFS_CONFIG__MODULE__BASE
Nigel Tao4e193592020-07-15 12:48:57 +1000123#define WUFFS_CONFIG__MODULE__CBOR
Nigel Tao1b073492020-02-16 22:11:36 +1100124#define WUFFS_CONFIG__MODULE__JSON
125
126// If building this program in an environment that doesn't easily accommodate
127// relative includes, you can use the script/inline-c-relative-includes.go
128// program to generate a stand-alone C++ file.
129#include "../../release/c/wuffs-unsupported-snapshot.c"
130
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100131#if defined(__linux__)
132#include <linux/prctl.h>
133#include <linux/seccomp.h>
134#include <sys/prctl.h>
135#include <sys/syscall.h>
136#define WUFFS_EXAMPLE_USE_SECCOMP
137#endif
138
Nigel Tao2cf76db2020-02-27 22:42:01 +1100139#define TRY(error_msg) \
140 do { \
141 const char* z = error_msg; \
142 if (z) { \
143 return z; \
144 } \
145 } while (false)
146
Nigel Taod60815c2020-03-26 14:32:35 +1100147static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100148
Nigel Taod60815c2020-03-26 14:32:35 +1100149static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100150 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100151 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100152 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100153 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100154 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao4e193592020-07-15 12:48:57 +1000155 " -i=FMT -input-format={json,cbor}\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000156 " -o=FMT -output-format={json,cbor}\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100157 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000158 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100159 " -t -tabs\n"
160 " -fail-if-unsandboxed\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000161 " -input-allow-json-comments\n"
162 " -input-allow-json-extra-comma\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000163 " -input-allow-json-inf-nan-numbers\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000164 " -output-cbor-metadata-as-json-comments\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000165 " -output-json-extra-comma\n"
Nigel Taodd114692020-07-25 21:54:12 +1000166 " -output-json-inf-nan-numbers\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000167 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100168 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100169 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100170 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100171 "----\n"
172 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100173 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000174 "Pointer (RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin\n"
175 "and writes CBOR or canonicalized, formatted UTF-8 JSON to stdout. The\n"
176 "input and output formats do not have to match, but conversion between\n"
177 "formats may be lossy.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100178 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000179 "Canonicalized JSON means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-\n"
180 "written as \"abc\\n\\txÅ·z\". It does not sort object keys or reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100181 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100182 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000183 "CBOR output is non-canonical (in the RFC 7049 Section 3.9 sense), as\n"
184 "sorting map keys and measuring indefinite-length containers requires\n"
185 "O(input_length) memory but this program runs in O(1) memory.\n"
186 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100187 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000188 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000189 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags. Those\n"
190 "flags only apply to JSON (not CBOR) output.\n"
191 "\n"
192 "The -input-format and -output-format flags select between reading and\n"
193 "writing JSON (the default, a textual format) or CBOR (a binary format).\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100194 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000195 "The -input-allow-json-comments flag allows \"/*slash-star*/\" and\n"
196 "\"//slash-slash\" C-style comments within JSON input.\n"
197 "\n"
198 "The -input-allow-json-extra-comma flag allows input like \"[1,2,]\",\n"
199 "with a comma after the final element of a JSON list or dictionary.\n"
200 "\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000201 "The -input-allow-json-inf-nan-numbers flag allows non-finite floating\n"
202 "point numbers (infinities and not-a-numbers) within JSON input.\n"
203 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000204 "The -output-cbor-metadata-as-json-comments writes CBOR tags and other\n"
205 "metadata as /*comments*/, when -i=json and -o=cbor are also set. Such\n"
206 "comments are non-compliant with the JSON specification but many parsers\n"
207 "accept them.\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000208 "\n"
209 "The -output-json-extra-comma flag writes extra commas, regardless of\n"
Nigel Taodd114692020-07-25 21:54:12 +1000210 "whether the input had it. Such commas are non-compliant with the JSON\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000211 "specification but many parsers accept them and they can produce simpler\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000212 "line-based diffs. This flag is ignored when -compact-output is set.\n"
213 "\n"
Nigel Taodd114692020-07-25 21:54:12 +1000214 "The -output-json-inf-nan-numbers flag writes Inf and NaN instead of a\n"
215 "substitute null value, when converting from -i=cbor to -o=json. Such\n"
216 "values are non-compliant with the JSON specification but many parsers\n"
217 "accept them.\n"
218 "\n"
Nigel Tao983a74f2020-07-27 15:17:46 +1000219 "CBOR is more permissive about map keys but JSON only allows strings.\n"
220 "When converting from -i=cbor to -o=json, this program rejects keys other\n"
221 "than text strings and non-negative integers (CBOR major types 3 and 0).\n"
222 "Integer keys like 123 quoted to be string keys like \"123\". Being even\n"
223 "more permissive would have complicated interactions with the -query=STR\n"
224 "flag and streaming input, so this program just rejects other keys.\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000225 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100226 "----\n"
227 "\n"
228 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100229 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100230 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
231 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100232 "will print:\n"
233 " \"baz\"\n"
234 "\n"
235 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100236 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100237 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
238 "child (the value in a key-value pair) of the root whose key is the empty\n"
239 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100240 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000241 "If the query found a valid JSON|CBOR value, this program will return a\n"
242 "zero exit code even if the rest of the input isn't valid. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100243 "did not find a value, or found an invalid one, this program returns a\n"
244 "non-zero exit code, but may still print partial output to stdout.\n"
245 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000246 "The JSON and CBOR specifications (https://json.org/ or RFC 8259; RFC\n"
247 "7049) permit implementations to allow duplicate keys, as this one does.\n"
248 "This JSON Pointer implementation is also greedy, following the first\n"
249 "match for each fragment without back-tracking. For example, the\n"
250 "\"/foo/bar\" query will fail if the root object has multiple \"foo\"\n"
251 "children but the first one doesn't have a \"bar\" child, even if later\n"
252 "ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100253 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000254 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
255 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
256 "\"~\" and \"/\". Without this flag, this program also lets \"~n\" and\n"
257 "\"~r\" escape the New Line and Carriage Return ASCII control characters,\n"
258 "which can work better with line oriented Unix tools that assume exactly\n"
259 "one value (i.e. one JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100260 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100261 "----\n"
262 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100263 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000264 "output depth. JSON|CBOR containers ([] arrays and {} objects) can hold\n"
265 "other containers. When this flag is set, containers at depth NUM are\n"
266 "replaced with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is\n"
267 "equivalent to -d=1. The flag's absence means an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100268 "\n"
269 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000270 "affect whether or not the input is considered valid JSON|CBOR. The\n"
271 "format specifications permit implementations to set their own maximum\n"
272 "input depth. This JSON|CBOR implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100273 "\n"
274 "Depth is measured in terms of nested containers. It is unaffected by the\n"
275 "number of spaces or tabs used to indent.\n"
276 "\n"
277 "When both -max-output-depth and -query are set, the output depth is\n"
278 "measured from when the query resolves, not from the input root. The\n"
279 "input depth (measured from the root) is still limited to 1024.\n"
280 "\n"
281 "----\n"
282 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100283 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
284 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100285 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100286
Nigel Tao2cf76db2020-02-27 22:42:01 +1100287// ----
288
Nigel Taof3146c22020-03-26 08:47:42 +1100289// Wuffs allows either statically or dynamically allocated work buffers. This
290// program exercises static allocation.
291#define WORK_BUFFER_ARRAY_SIZE \
292 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
293#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100294uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100295#else
296// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100297uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100298#endif
299
Nigel Taod60815c2020-03-26 14:32:35 +1100300bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100301
Nigel Taod60815c2020-03-26 14:32:35 +1100302int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100303
Nigel Tao2cf76db2020-02-27 22:42:01 +1100304#define MAX_INDENT 8
Nigel Tao107f0ef2020-03-01 21:35:02 +1100305#define INDENT_SPACES_STRING " "
Nigel Tao6e7d1412020-03-06 09:21:35 +1100306#define INDENT_TAB_STRING "\t"
Nigel Tao107f0ef2020-03-01 21:35:02 +1100307
Nigel Taofdac24a2020-03-06 21:53:08 +1100308#ifndef DST_BUFFER_ARRAY_SIZE
309#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100310#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100311#ifndef SRC_BUFFER_ARRAY_SIZE
312#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100313#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100314#ifndef TOKEN_BUFFER_ARRAY_SIZE
315#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100316#endif
317
Nigel Taod60815c2020-03-26 14:32:35 +1100318uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
319uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
320wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100321
Nigel Taod60815c2020-03-26 14:32:35 +1100322wuffs_base__io_buffer g_dst;
323wuffs_base__io_buffer g_src;
324wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100325
Nigel Taod60815c2020-03-26 14:32:35 +1100326// g_curr_token_end_src_index is the g_src.data.ptr index of the end of the
327// current token. An invariant is that (g_curr_token_end_src_index <=
328// g_src.meta.ri).
329size_t g_curr_token_end_src_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100330
Nigel Tao27168032020-07-24 13:05:05 +1000331// Valid token's VBCs range in 0 ..= 15. Values over that are for tokens from
332// outside of the base package, such as the CBOR package.
333#define CATEGORY_CBOR_TAG 16
334
Nigel Tao850dc182020-07-21 22:52:04 +1000335struct {
336 uint64_t category;
337 uint64_t detail;
338} g_token_extension;
339
Nigel Tao77c75512020-07-27 21:35:11 +1000340bool g_previous_token_was_cbor_tag;
341
Nigel Taod60815c2020-03-26 14:32:35 +1100342uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100343
344enum class context {
345 none,
346 in_list_after_bracket,
347 in_list_after_value,
348 in_dict_after_brace,
349 in_dict_after_key,
350 in_dict_after_value,
Nigel Taod60815c2020-03-26 14:32:35 +1100351} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100352
Nigel Tao0cd2f982020-03-03 23:03:02 +1100353bool //
354in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100355 return (g_ctx == context::in_dict_after_brace) ||
356 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100357}
358
Nigel Taod60815c2020-03-26 14:32:35 +1100359uint32_t g_suppress_write_dst;
360bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100361
Nigel Tao4e193592020-07-15 12:48:57 +1000362wuffs_cbor__decoder g_cbor_decoder;
363wuffs_json__decoder g_json_decoder;
364wuffs_base__token_decoder* g_dec;
Nigel Tao1b073492020-02-16 22:11:36 +1100365
Nigel Taoea532452020-07-27 00:03:00 +1000366// g_spool_array is a 4 KiB buffer.
Nigel Tao168f60a2020-07-14 13:19:33 +1000367//
Nigel Taoea532452020-07-27 00:03:00 +1000368// For -o=cbor, strings up to SPOOL_ARRAY_SIZE long are written as a single
369// definite-length string. Longer strings are written as an indefinite-length
370// string containing multiple definite-length chunks, each of length up to
371// SPOOL_ARRAY_SIZE. See RFC 7049 section 2.2.2 "Indefinite-Length Byte Strings
372// and Text Strings". Byte strings and text strings are spooled prior to this
373// chunking, so that the output is determinate even when the input is streamed.
374//
375// For -o=json, CBOR byte strings are spooled prior to base64url encoding,
376// which map multiples of 3 source bytes to 4 destination bytes.
377//
378// If raising SPOOL_ARRAY_SIZE above 0xFFFF then you will also have to update
379// flush_cbor_output_string.
380#define SPOOL_ARRAY_SIZE 4096
381uint8_t g_spool_array[SPOOL_ARRAY_SIZE];
Nigel Tao168f60a2020-07-14 13:19:33 +1000382
383uint32_t g_cbor_output_string_length;
384bool g_cbor_output_string_is_multiple_chunks;
385bool g_cbor_output_string_is_utf_8;
386
Nigel Taoea532452020-07-27 00:03:00 +1000387uint32_t g_json_output_byte_string_length;
388
Nigel Tao0cd2f982020-03-03 23:03:02 +1100389// ----
390
391// Query is a JSON Pointer query. After initializing with a NUL-terminated C
392// string, its multiple fragments are consumed as the program walks the JSON
393// data from stdin. For example, letting "$" denote a NUL, suppose that we
394// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100395// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100396//
397// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
398// / a p p l e / b a n a n a / 1 2 / d u r i a n $
399// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
400// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100401// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100402//
Nigel Taob48ee752020-03-13 09:27:33 +1100403// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
404// start (inclusive) and end (exclusive) of the query fragment. They satisfy
405// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
406// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100407//
Nigel Taob48ee752020-03-13 09:27:33 +1100408// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
409// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100410//
411// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
412// tokens, as backslash-escaped values within that JSON string may each get
413// their own token.
414//
Nigel Taob48ee752020-03-13 09:27:33 +1100415// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100416//
Nigel Taob48ee752020-03-13 09:27:33 +1100417// While mfj remains non-nullptr, each token's unescaped contents are then
418// compared to that part of the fragment from mfj to mfk. If it is a prefix
419// (including the case of an exact match), then mfj is advanced by the
420// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100421//
422// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
423// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100424// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
425// responsible for calling Query::validate (with a strict_json_pointer_syntax
426// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100427//
Nigel Taob48ee752020-03-13 09:27:33 +1100428// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
429// incrementally match the object key with the query fragment. For example, if
430// we have already matched the "ban" of "banana", then we would accept any of
431// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
432// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100433//
Nigel Taob48ee752020-03-13 09:27:33 +1100434// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100435// v
436// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
437// / a p p l e / b a n a n a / 1 2 / d u r i a n $
438// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
439// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100440// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100441//
442// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100443// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
444// have a fragment match: the query fragment equals the object key. If there is
445// a next fragment (in this example, "12") we move the frag_etc pointers to its
446// start and end and increment Query::m_depth. Otherwise, we have matched the
447// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100448//
449// The discussion above centers on object keys. If the query fragment is
450// numeric then it can also match as an array index: the string fragment "12"
451// will match an array's 13th element (starting counting from zero). See RFC
452// 6901 for its precise definition of an "array index" number.
453//
Nigel Taob48ee752020-03-13 09:27:33 +1100454// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100455// whose type (wuffs_base__result_u64) is a result type. An error result means
456// that the fragment is not an array index. A value result holds the number of
457// list elements remaining. When matching a query fragment in an array (instead
458// of in an object), each element ticks this number down towards zero. At zero,
459// the upcoming JSON value is the one that matches the query fragment.
460class Query {
461 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100462 uint8_t* m_frag_i;
463 uint8_t* m_frag_j;
464 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100465
Nigel Taob48ee752020-03-13 09:27:33 +1100466 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100467
Nigel Taob48ee752020-03-13 09:27:33 +1100468 wuffs_base__result_u64 m_array_index;
Nigel Tao983a74f2020-07-27 15:17:46 +1000469 uint64_t m_array_index_remaining;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100470
471 public:
472 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100473 m_frag_i = (uint8_t*)query_c_string;
474 m_frag_j = (uint8_t*)query_c_string;
475 m_frag_k = (uint8_t*)query_c_string;
476 m_depth = 0;
477 m_array_index.status.repr = "#main: not an array index query fragment";
478 m_array_index.value = 0;
Nigel Tao983a74f2020-07-27 15:17:46 +1000479 m_array_index_remaining = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100480 }
481
Nigel Taob48ee752020-03-13 09:27:33 +1100482 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100483
Nigel Taob48ee752020-03-13 09:27:33 +1100484 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100485
486 // tick returns whether the fragment is a valid array index whose value is
487 // zero. If valid but non-zero, it decrements it and returns false.
488 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100489 if (m_array_index.status.is_ok()) {
Nigel Tao983a74f2020-07-27 15:17:46 +1000490 if (m_array_index_remaining == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100491 return true;
492 }
Nigel Tao983a74f2020-07-27 15:17:46 +1000493 m_array_index_remaining--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100494 }
495 return false;
496 }
497
498 // next_fragment moves to the next fragment, returning whether it existed.
499 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100500 uint8_t* k = m_frag_k;
501 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100502
503 this->reset(nullptr);
504
505 if (!k || (*k != '/')) {
506 return false;
507 }
508 k++;
509
510 bool all_digits = true;
511 uint8_t* i = k;
512 while ((*k != '\x00') && (*k != '/')) {
513 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
514 k++;
515 }
Nigel Taob48ee752020-03-13 09:27:33 +1100516 m_frag_i = i;
517 m_frag_j = i;
518 m_frag_k = k;
519 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100520 if (all_digits) {
521 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000522 m_array_index = wuffs_base__parse_number_u64(
523 wuffs_base__make_slice_u8(i, k - i),
524 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao983a74f2020-07-27 15:17:46 +1000525 m_array_index_remaining = m_array_index.value;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100526 }
527 return true;
528 }
529
Nigel Taob48ee752020-03-13 09:27:33 +1100530 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100531
Nigel Taob48ee752020-03-13 09:27:33 +1100532 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100533
Nigel Tao983a74f2020-07-27 15:17:46 +1000534 void restart_and_match_unsigned_number(bool enable, uint64_t u) {
535 m_frag_j =
536 (enable && (m_array_index.status.is_ok()) && (m_array_index.value == u))
537 ? m_frag_k
538 : nullptr;
539 }
540
Nigel Tao0cd2f982020-03-03 23:03:02 +1100541 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100542 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100543 return;
544 }
Nigel Taob48ee752020-03-13 09:27:33 +1100545 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100546 while (true) {
547 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100548 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100549 return;
550 }
551
552 if (*j == '\x00') {
553 break;
554
555 } else if (*j == '~') {
556 j++;
557 if (*j == '0') {
558 if (*ptr != '~') {
559 break;
560 }
561 } else if (*j == '1') {
562 if (*ptr != '/') {
563 break;
564 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100565 } else if (*j == 'n') {
566 if (*ptr != '\n') {
567 break;
568 }
569 } else if (*j == 'r') {
570 if (*ptr != '\r') {
571 break;
572 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100573 } else {
574 break;
575 }
576
577 } else if (*j != *ptr) {
578 break;
579 }
580
581 j++;
582 ptr++;
583 len--;
584 }
Nigel Taob48ee752020-03-13 09:27:33 +1100585 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100586 }
587
588 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100589 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100590 return;
591 }
592 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
593 size_t n = wuffs_base__utf_8__encode(
594 wuffs_base__make_slice_u8(&u[0],
595 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
596 code_point);
597 if (n > 0) {
598 this->incremental_match_slice(&u[0], n);
599 }
600 }
601
602 // validate returns whether the (ptr, len) arguments form a valid JSON
603 // Pointer. In particular, it must be valid UTF-8, and either be empty or
604 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100605 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
606 // followed by either 'n' or 'r'.
607 static bool validate(char* query_c_string,
608 size_t length,
609 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100610 if (length <= 0) {
611 return true;
612 }
613 if (query_c_string[0] != '/') {
614 return false;
615 }
616 wuffs_base__slice_u8 s =
617 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
618 bool previous_was_tilde = false;
619 while (s.len > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +1000620 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s.ptr, s.len);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100621 if (!o.is_valid()) {
622 return false;
623 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100624
625 if (previous_was_tilde) {
626 switch (o.code_point) {
627 case '0':
628 case '1':
629 break;
630 case 'n':
631 case 'r':
632 if (strict_json_pointer_syntax) {
633 return false;
634 }
635 break;
636 default:
637 return false;
638 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100639 }
640 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100641
Nigel Tao0cd2f982020-03-03 23:03:02 +1100642 s.ptr += o.byte_length;
643 s.len -= o.byte_length;
644 }
645 return !previous_was_tilde;
646 }
Nigel Taod60815c2020-03-26 14:32:35 +1100647} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100648
649// ----
650
Nigel Tao168f60a2020-07-14 13:19:33 +1000651enum class file_format {
652 json,
653 cbor,
654};
655
Nigel Tao68920952020-03-03 11:25:18 +1100656struct {
657 int remaining_argc;
658 char** remaining_argv;
659
Nigel Tao3690e832020-03-12 16:52:26 +1100660 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100661 bool fail_if_unsandboxed;
Nigel Tao4e193592020-07-15 12:48:57 +1000662 file_format input_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000663 bool input_allow_json_comments;
664 bool input_allow_json_extra_comma;
Nigel Tao51a38292020-07-19 22:43:17 +1000665 bool input_allow_json_inf_nan_numbers;
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100666 uint32_t max_output_depth;
Nigel Tao168f60a2020-07-14 13:19:33 +1000667 file_format output_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000668 bool output_cbor_metadata_as_json_comments;
Nigel Taoc766bb72020-07-09 12:59:32 +1000669 bool output_json_extra_comma;
Nigel Taodd114692020-07-25 21:54:12 +1000670 bool output_json_inf_nan_numbers;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100671 char* query_c_string;
Nigel Taoecadf722020-07-13 08:22:34 +1000672 size_t spaces;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100673 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100674 bool tabs;
Nigel Taod60815c2020-03-26 14:32:35 +1100675} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100676
677const char* //
678parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000679 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100680 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100681
682 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
683 for (; c < argc; c++) {
684 char* arg = argv[c];
685 if (*arg++ != '-') {
686 break;
687 }
688
689 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
690 // cases, a bare "-" is not a flag (some programs may interpret it as
691 // stdin) and a bare "--" means to stop parsing flags.
692 if (*arg == '\x00') {
693 break;
694 } else if (*arg == '-') {
695 arg++;
696 if (*arg == '\x00') {
697 c++;
698 break;
699 }
700 }
701
Nigel Tao3690e832020-03-12 16:52:26 +1100702 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100703 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100704 continue;
705 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100706 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
707 g_flags.max_output_depth = 1;
708 continue;
709 } else if (!strncmp(arg, "d=", 2) ||
710 !strncmp(arg, "max-output-depth=", 16)) {
711 while (*arg++ != '=') {
712 }
713 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000714 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
715 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000716 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100717 g_flags.max_output_depth = (uint32_t)(u.value);
718 continue;
719 }
720 return g_usage;
721 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100722 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100723 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100724 continue;
725 }
Nigel Tao4e193592020-07-15 12:48:57 +1000726 if (!strcmp(arg, "i=cbor") || !strcmp(arg, "input-format=cbor")) {
727 g_flags.input_format = file_format::cbor;
728 continue;
729 }
730 if (!strcmp(arg, "i=json") || !strcmp(arg, "input-format=json")) {
731 g_flags.input_format = file_format::json;
732 continue;
733 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000734 if (!strcmp(arg, "input-allow-json-comments")) {
735 g_flags.input_allow_json_comments = true;
736 continue;
737 }
738 if (!strcmp(arg, "input-allow-json-extra-comma")) {
739 g_flags.input_allow_json_extra_comma = true;
Nigel Taoc766bb72020-07-09 12:59:32 +1000740 continue;
741 }
Nigel Tao51a38292020-07-19 22:43:17 +1000742 if (!strcmp(arg, "input-allow-json-inf-nan-numbers")) {
743 g_flags.input_allow_json_inf_nan_numbers = true;
744 continue;
745 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000746 if (!strcmp(arg, "o=cbor") || !strcmp(arg, "output-format=cbor")) {
747 g_flags.output_format = file_format::cbor;
748 continue;
749 }
750 if (!strcmp(arg, "o=json") || !strcmp(arg, "output-format=json")) {
751 g_flags.output_format = file_format::json;
752 continue;
753 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000754 if (!strcmp(arg, "output-cbor-metadata-as-json-comments")) {
755 g_flags.output_cbor_metadata_as_json_comments = true;
756 continue;
757 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000758 if (!strcmp(arg, "output-json-extra-comma")) {
759 g_flags.output_json_extra_comma = true;
760 continue;
761 }
Nigel Taodd114692020-07-25 21:54:12 +1000762 if (!strcmp(arg, "output-json-inf-nan-numbers")) {
763 g_flags.output_json_inf_nan_numbers = true;
764 continue;
765 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100766 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
767 while (*arg++ != '=') {
768 }
Nigel Taod60815c2020-03-26 14:32:35 +1100769 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100770 continue;
771 }
Nigel Taoecadf722020-07-13 08:22:34 +1000772 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
773 while (*arg++ != '=') {
774 }
775 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
776 g_flags.spaces = arg[0] - '0';
777 continue;
778 }
779 return g_usage;
780 }
781 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100782 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100783 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100784 }
785 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100786 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100787 continue;
788 }
789
Nigel Taod60815c2020-03-26 14:32:35 +1100790 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100791 }
792
Nigel Taod60815c2020-03-26 14:32:35 +1100793 if (g_flags.query_c_string &&
794 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
795 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100796 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
797 }
798
Nigel Taod60815c2020-03-26 14:32:35 +1100799 g_flags.remaining_argc = argc - c;
800 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100801 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100802}
803
Nigel Tao2cf76db2020-02-27 22:42:01 +1100804const char* //
805initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100806 g_dst = wuffs_base__make_io_buffer(
807 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100808 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100809
Nigel Taod60815c2020-03-26 14:32:35 +1100810 g_src = wuffs_base__make_io_buffer(
811 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100812 wuffs_base__empty_io_buffer_meta());
813
Nigel Taod60815c2020-03-26 14:32:35 +1100814 g_tok = wuffs_base__make_token_buffer(
815 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100816 wuffs_base__empty_token_buffer_meta());
817
Nigel Taod60815c2020-03-26 14:32:35 +1100818 g_curr_token_end_src_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100819
Nigel Tao850dc182020-07-21 22:52:04 +1000820 g_token_extension.category = 0;
821 g_token_extension.detail = 0;
822
Nigel Tao77c75512020-07-27 21:35:11 +1000823 g_previous_token_was_cbor_tag = false;
824
Nigel Taod60815c2020-03-26 14:32:35 +1100825 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100826
Nigel Taod60815c2020-03-26 14:32:35 +1100827 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100828
Nigel Tao68920952020-03-03 11:25:18 +1100829 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100830 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100831 return "main: unsandboxed";
832 }
Nigel Tao01abc842020-03-06 21:42:33 +1100833 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100834 if (g_flags.remaining_argc >
835 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
836 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100837 }
838
Nigel Taod60815c2020-03-26 14:32:35 +1100839 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100840
Nigel Taoc96b31c2020-07-27 22:37:23 +1000841 // If the query is non-empty, suppress writing to stdout until we've
Nigel Tao0cd2f982020-03-03 23:03:02 +1100842 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100843 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
844 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100845
Nigel Tao4e193592020-07-15 12:48:57 +1000846 if (g_flags.input_format == file_format::json) {
847 TRY(g_json_decoder
848 .initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
849 .message());
850 g_dec = g_json_decoder.upcast_as__wuffs_base__token_decoder();
851 } else {
852 TRY(g_cbor_decoder
853 .initialize(sizeof__wuffs_cbor__decoder(), WUFFS_VERSION, 0)
854 .message());
855 g_dec = g_cbor_decoder.upcast_as__wuffs_base__token_decoder();
856 }
Nigel Tao4b186b02020-03-18 14:25:21 +1100857
Nigel Tao3c8589b2020-07-19 21:49:00 +1000858 if (g_flags.input_allow_json_comments) {
859 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
860 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
861 }
862 if (g_flags.input_allow_json_extra_comma) {
Nigel Tao4e193592020-07-15 12:48:57 +1000863 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000864 }
Nigel Tao51a38292020-07-19 22:43:17 +1000865 if (g_flags.input_allow_json_inf_nan_numbers) {
866 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_INF_NAN_NUMBERS, true);
867 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000868
Nigel Tao4b186b02020-03-18 14:25:21 +1100869 // Consume an optional whitespace trailer. This isn't part of the JSON spec,
870 // but it works better with line oriented Unix tools (such as "echo 123 |
871 // jsonptr" where it's "echo", not "echo -n") or hand-edited JSON files which
872 // can accidentally contain trailing whitespace.
Nigel Tao4e193592020-07-15 12:48:57 +1000873 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100874
875 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100876}
Nigel Tao1b073492020-02-16 22:11:36 +1100877
878// ----
879
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100880// ignore_return_value suppresses errors from -Wall -Werror.
881static void //
882ignore_return_value(int ignored) {}
883
Nigel Tao2914bae2020-02-26 09:40:30 +1100884const char* //
885read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +1100886 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +1100887 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +1100888 }
Nigel Taod60815c2020-03-26 14:32:35 +1100889 g_src.compact();
890 if (g_src.meta.wi >= g_src.data.len) {
891 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100892 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100893 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000894 ssize_t n = read(g_input_file_descriptor, g_src.writer_pointer(),
895 g_src.writer_length());
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100896 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100897 g_src.meta.wi += n;
898 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100899 break;
900 } else if (errno != EINTR) {
901 return strerror(errno);
902 }
Nigel Tao1b073492020-02-16 22:11:36 +1100903 }
904 return nullptr;
905}
906
Nigel Tao2914bae2020-02-26 09:40:30 +1100907const char* //
908flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100909 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000910 size_t n = g_dst.reader_length();
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100911 if (n == 0) {
912 break;
Nigel Tao1b073492020-02-16 22:11:36 +1100913 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100914 const int stdout_fd = 1;
Nigel Taod6a10df2020-07-27 11:47:47 +1000915 ssize_t i = write(stdout_fd, g_dst.reader_pointer(), n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100916 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100917 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100918 } else if (errno != EINTR) {
919 return strerror(errno);
920 }
Nigel Tao1b073492020-02-16 22:11:36 +1100921 }
Nigel Taod60815c2020-03-26 14:32:35 +1100922 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +1100923 return nullptr;
924}
925
Nigel Tao2914bae2020-02-26 09:40:30 +1100926const char* //
927write_dst(const void* s, size_t n) {
Nigel Taod60815c2020-03-26 14:32:35 +1100928 if (g_suppress_write_dst > 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100929 return nullptr;
930 }
Nigel Tao1b073492020-02-16 22:11:36 +1100931 const uint8_t* p = static_cast<const uint8_t*>(s);
932 while (n > 0) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000933 size_t i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +1100934 if (i == 0) {
935 const char* z = flush_dst();
936 if (z) {
937 return z;
938 }
Nigel Taod6a10df2020-07-27 11:47:47 +1000939 i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +1100940 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100941 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100942 }
943 }
944
945 if (i > n) {
946 i = n;
947 }
Nigel Taod60815c2020-03-26 14:32:35 +1100948 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
949 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +1100950 p += i;
951 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +1100952 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +1100953 }
954 return nullptr;
955}
956
957// ----
958
Nigel Tao168f60a2020-07-14 13:19:33 +1000959const char* //
960write_literal(uint64_t vbd) {
961 const char* ptr = nullptr;
962 size_t len = 0;
963 if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__UNDEFINED) {
964 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +1000965 // JSON's closest approximation to "undefined" is "null".
966 if (g_flags.output_cbor_metadata_as_json_comments) {
967 ptr = "/*cbor:undefined*/null";
968 len = 22;
969 } else {
970 ptr = "null";
971 len = 4;
972 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000973 } else {
974 ptr = "\xF7";
975 len = 1;
976 }
977 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__NULL) {
978 if (g_flags.output_format == file_format::json) {
979 ptr = "null";
980 len = 4;
981 } else {
982 ptr = "\xF6";
983 len = 1;
984 }
985 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__FALSE) {
986 if (g_flags.output_format == file_format::json) {
987 ptr = "false";
988 len = 5;
989 } else {
990 ptr = "\xF4";
991 len = 1;
992 }
993 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__TRUE) {
994 if (g_flags.output_format == file_format::json) {
995 ptr = "true";
996 len = 4;
997 } else {
998 ptr = "\xF5";
999 len = 1;
1000 }
1001 } else {
1002 return "main: internal error: unexpected write_literal argument";
1003 }
1004 return write_dst(ptr, len);
1005}
1006
1007// ----
1008
1009const char* //
Nigel Tao664f8432020-07-16 21:25:14 +10001010write_number_as_cbor_f64(double f) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001011 uint8_t buf[9];
1012 wuffs_base__lossy_value_u16 lv16 =
1013 wuffs_base__ieee_754_bit_representation__from_f64_to_u16_truncate(f);
1014 if (!lv16.lossy) {
1015 buf[0] = 0xF9;
1016 wuffs_base__store_u16be__no_bounds_check(&buf[1], lv16.value);
1017 return write_dst(&buf[0], 3);
1018 }
1019 wuffs_base__lossy_value_u32 lv32 =
1020 wuffs_base__ieee_754_bit_representation__from_f64_to_u32_truncate(f);
1021 if (!lv32.lossy) {
1022 buf[0] = 0xFA;
1023 wuffs_base__store_u32be__no_bounds_check(&buf[1], lv32.value);
1024 return write_dst(&buf[0], 5);
1025 }
1026 buf[0] = 0xFB;
1027 wuffs_base__store_u64be__no_bounds_check(
1028 &buf[1], wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f));
1029 return write_dst(&buf[0], 9);
1030}
1031
1032const char* //
Nigel Tao664f8432020-07-16 21:25:14 +10001033write_number_as_cbor_u64(uint8_t base, uint64_t u) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001034 uint8_t buf[9];
1035 if (u < 0x18) {
1036 buf[0] = base | ((uint8_t)u);
1037 return write_dst(&buf[0], 1);
1038 } else if ((u >> 8) == 0) {
1039 buf[0] = base | 0x18;
1040 buf[1] = ((uint8_t)u);
1041 return write_dst(&buf[0], 2);
1042 } else if ((u >> 16) == 0) {
1043 buf[0] = base | 0x19;
1044 wuffs_base__store_u16be__no_bounds_check(&buf[1], ((uint16_t)u));
1045 return write_dst(&buf[0], 3);
1046 } else if ((u >> 32) == 0) {
1047 buf[0] = base | 0x1A;
1048 wuffs_base__store_u32be__no_bounds_check(&buf[1], ((uint32_t)u));
1049 return write_dst(&buf[0], 5);
1050 }
1051 buf[0] = base | 0x1B;
1052 wuffs_base__store_u64be__no_bounds_check(&buf[1], u);
1053 return write_dst(&buf[0], 9);
1054}
1055
1056const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001057write_number_as_json_f64(wuffs_base__slice_u8 s) {
Nigel Tao5a616b62020-07-24 23:54:52 +10001058 double f;
Nigel Taoee6927f2020-07-27 12:08:33 +10001059 switch (s.len) {
Nigel Tao5a616b62020-07-24 23:54:52 +10001060 case 3:
1061 f = wuffs_base__ieee_754_bit_representation__from_u16_to_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001062 wuffs_base__load_u16be__no_bounds_check(s.ptr + 1));
Nigel Tao5a616b62020-07-24 23:54:52 +10001063 break;
1064 case 5:
1065 f = wuffs_base__ieee_754_bit_representation__from_u32_to_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001066 wuffs_base__load_u32be__no_bounds_check(s.ptr + 1));
Nigel Tao5a616b62020-07-24 23:54:52 +10001067 break;
1068 case 9:
1069 f = wuffs_base__ieee_754_bit_representation__from_u64_to_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001070 wuffs_base__load_u64be__no_bounds_check(s.ptr + 1));
Nigel Tao5a616b62020-07-24 23:54:52 +10001071 break;
1072 default:
1073 return "main: internal error: unexpected write_number_as_json_f64 len";
1074 }
1075 uint8_t buf[512];
1076 const uint32_t precision = 0;
1077 size_t n = wuffs_base__render_number_f64(
1078 wuffs_base__make_slice_u8(&buf[0], sizeof buf), f, precision,
1079 WUFFS_BASE__RENDER_NUMBER_FXX__JUST_ENOUGH_PRECISION);
1080
Nigel Taodd114692020-07-25 21:54:12 +10001081 if (!g_flags.output_json_inf_nan_numbers) {
1082 // JSON numbers don't include Infinities or NaNs. For such numbers, their
1083 // IEEE 754 bit representation's 11 exponent bits are all on.
1084 uint64_t u = wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f);
1085 if (((u >> 52) & 0x7FF) == 0x7FF) {
1086 if (g_flags.output_cbor_metadata_as_json_comments) {
1087 TRY(write_dst("/*cbor:", 7));
1088 TRY(write_dst(&buf[0], n));
1089 TRY(write_dst("*/", 2));
1090 }
1091 return write_dst("null", 4);
Nigel Tao5a616b62020-07-24 23:54:52 +10001092 }
Nigel Tao5a616b62020-07-24 23:54:52 +10001093 }
1094
1095 return write_dst(&buf[0], n);
1096}
1097
1098const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001099write_cbor_minus_1_minus_x(wuffs_base__slice_u8 s) {
Nigel Tao27168032020-07-24 13:05:05 +10001100 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001101 return write_dst(s.ptr, s.len);
Nigel Tao27168032020-07-24 13:05:05 +10001102 }
1103
Nigel Taoee6927f2020-07-27 12:08:33 +10001104 if (s.len != 9) {
Nigel Tao850dc182020-07-21 22:52:04 +10001105 return "main: internal error: invalid ETC__MINUS_1_MINUS_X token length";
Nigel Tao664f8432020-07-16 21:25:14 +10001106 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001107 uint64_t u = 1 + wuffs_base__load_u64be__no_bounds_check(s.ptr + 1);
Nigel Tao850dc182020-07-21 22:52:04 +10001108 if (u == 0) {
1109 // See the cbor.TOKEN_VALUE_MINOR__MINUS_1_MINUS_X comment re overflow.
1110 return write_dst("-18446744073709551616", 21);
Nigel Tao664f8432020-07-16 21:25:14 +10001111 }
1112 uint8_t buf[1 + WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1113 uint8_t* b = &buf[0];
Nigel Tao850dc182020-07-21 22:52:04 +10001114 *b++ = '-';
Nigel Tao664f8432020-07-16 21:25:14 +10001115 size_t n = wuffs_base__render_number_u64(
1116 wuffs_base__make_slice_u8(b, WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL), u,
1117 WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao850dc182020-07-21 22:52:04 +10001118 return write_dst(&buf[0], 1 + n);
Nigel Tao664f8432020-07-16 21:25:14 +10001119}
1120
1121const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001122write_cbor_simple_value(uint64_t tag, wuffs_base__slice_u8 s) {
Nigel Tao042e94f2020-07-24 23:14:27 +10001123 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001124 return write_dst(s.ptr, s.len);
Nigel Tao042e94f2020-07-24 23:14:27 +10001125 }
1126
1127 if (!g_flags.output_cbor_metadata_as_json_comments) {
Nigel Tao35c4e952020-07-27 18:01:05 +10001128 return write_dst("null", 4);
Nigel Tao042e94f2020-07-24 23:14:27 +10001129 }
1130 uint8_t buf[WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1131 size_t n = wuffs_base__render_number_u64(
1132 wuffs_base__make_slice_u8(&buf[0],
1133 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL),
1134 tag, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1135 TRY(write_dst("/*cbor:simple", 13));
1136 TRY(write_dst(&buf[0], n));
1137 return write_dst("*/null", 6);
1138}
1139
1140const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001141write_cbor_tag(uint64_t tag, wuffs_base__slice_u8 s) {
Nigel Tao27168032020-07-24 13:05:05 +10001142 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001143 return write_dst(s.ptr, s.len);
Nigel Tao27168032020-07-24 13:05:05 +10001144 }
1145
1146 if (!g_flags.output_cbor_metadata_as_json_comments) {
1147 return nullptr;
1148 }
1149 uint8_t buf[WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1150 size_t n = wuffs_base__render_number_u64(
1151 wuffs_base__make_slice_u8(&buf[0],
1152 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL),
1153 tag, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1154 TRY(write_dst("/*cbor:tag", 10));
1155 TRY(write_dst(&buf[0], n));
1156 return write_dst("*/", 2);
1157}
1158
1159const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001160write_number(uint64_t vbd, wuffs_base__slice_u8 s) {
Nigel Tao0a68f632020-07-28 10:39:32 +10001161 const uint64_t cfp_fbbe_fifb =
1162 WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT |
1163 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_BINARY_BIG_ENDIAN |
1164 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_IGNORE_FIRST_BYTE;
1165
Nigel Tao4e193592020-07-15 12:48:57 +10001166 if (g_flags.output_format == file_format::json) {
Nigel Tao51a38292020-07-19 22:43:17 +10001167 if (g_flags.input_format == file_format::json) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001168 return write_dst(s.ptr, s.len);
Nigel Tao5a616b62020-07-24 23:54:52 +10001169 } else if ((vbd & cfp_fbbe_fifb) == cfp_fbbe_fifb) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001170 return write_number_as_json_f64(s);
Nigel Tao168f60a2020-07-14 13:19:33 +10001171 }
1172
Nigel Tao4e193592020-07-15 12:48:57 +10001173 // From here on, (g_flags.output_format == file_format::cbor).
Nigel Tao4e193592020-07-15 12:48:57 +10001174 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_TEXT) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001175 // First try to parse s as an integer. Something like
Nigel Tao168f60a2020-07-14 13:19:33 +10001176 // "1180591620717411303424" is a valid number (in the JSON sense) but will
1177 // overflow int64_t or uint64_t, so fall back to parsing it as a float64.
1178 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_INTEGER_SIGNED) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001179 if ((s.len > 0) && (s.ptr[0] == '-')) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001180 wuffs_base__result_i64 ri = wuffs_base__parse_number_i64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001181 s, WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao168f60a2020-07-14 13:19:33 +10001182 if (ri.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001183 return write_number_as_cbor_u64(0x20, ~ri.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001184 }
1185 } else {
1186 wuffs_base__result_u64 ru = wuffs_base__parse_number_u64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001187 s, WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao168f60a2020-07-14 13:19:33 +10001188 if (ru.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001189 return write_number_as_cbor_u64(0x00, ru.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001190 }
1191 }
1192 }
1193
1194 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT) {
1195 wuffs_base__result_f64 rf = wuffs_base__parse_number_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001196 s, WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao168f60a2020-07-14 13:19:33 +10001197 if (rf.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001198 return write_number_as_cbor_f64(rf.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001199 }
1200 }
Nigel Tao51a38292020-07-19 22:43:17 +10001201 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_INF) {
1202 return write_dst("\xF9\xFC\x00", 3);
1203 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_INF) {
1204 return write_dst("\xF9\x7C\x00", 3);
1205 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_NAN) {
1206 return write_dst("\xF9\xFF\xFF", 3);
1207 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_NAN) {
1208 return write_dst("\xF9\x7F\xFF", 3);
Nigel Tao0a68f632020-07-28 10:39:32 +10001209 } else if ((vbd & cfp_fbbe_fifb) == cfp_fbbe_fifb) {
1210 return write_dst(s.ptr, s.len);
Nigel Tao168f60a2020-07-14 13:19:33 +10001211 }
1212
Nigel Tao4e193592020-07-15 12:48:57 +10001213fail:
Nigel Tao168f60a2020-07-14 13:19:33 +10001214 return "main: internal error: unexpected write_number argument";
1215}
1216
Nigel Tao4e193592020-07-15 12:48:57 +10001217const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001218write_inline_integer(uint64_t x, bool x_is_signed, wuffs_base__slice_u8 s) {
Nigel Tao983a74f2020-07-27 15:17:46 +10001219 bool is_key = in_dict_before_key();
1220 g_query.restart_and_match_unsigned_number(
1221 is_key && g_query.is_at(g_depth) && !x_is_signed, x);
1222
Nigel Tao4e193592020-07-15 12:48:57 +10001223 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001224 return write_dst(s.ptr, s.len);
Nigel Tao4e193592020-07-15 12:48:57 +10001225 }
1226
Nigel Tao983a74f2020-07-27 15:17:46 +10001227 if (is_key) {
1228 TRY(write_dst("\"", 1));
1229 }
1230
Nigel Taoc9d4e342020-07-21 15:20:34 +10001231 // Adding the two ETC__BYTE_LENGTH__ETC constants is overkill, but it's
1232 // simpler (for producing a constant-expression array size) than taking the
1233 // maximum of the two.
1234 uint8_t buf[WUFFS_BASE__I64__BYTE_LENGTH__MAX_INCL +
1235 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1236 wuffs_base__slice_u8 dst = wuffs_base__make_slice_u8(&buf[0], sizeof buf);
1237 size_t n =
1238 x_is_signed
1239 ? wuffs_base__render_number_i64(
1240 dst, (int64_t)x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS)
1241 : wuffs_base__render_number_u64(
1242 dst, x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao983a74f2020-07-27 15:17:46 +10001243 TRY(write_dst(&buf[0], n));
1244
1245 if (is_key) {
1246 TRY(write_dst("\"", 1));
1247 }
1248 return nullptr;
Nigel Tao4e193592020-07-15 12:48:57 +10001249}
1250
Nigel Tao168f60a2020-07-14 13:19:33 +10001251// ----
1252
Nigel Tao2914bae2020-02-26 09:40:30 +11001253uint8_t //
1254hex_digit(uint8_t nibble) {
Nigel Taob5461bd2020-02-21 14:13:37 +11001255 nibble &= 0x0F;
1256 if (nibble <= 9) {
1257 return '0' + nibble;
1258 }
1259 return ('A' - 10) + nibble;
1260}
1261
Nigel Tao2914bae2020-02-26 09:40:30 +11001262const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001263flush_cbor_output_string() {
1264 uint8_t prefix[3];
1265 prefix[0] = g_cbor_output_string_is_utf_8 ? 0x60 : 0x40;
1266 if (g_cbor_output_string_length < 0x18) {
1267 prefix[0] |= g_cbor_output_string_length;
1268 TRY(write_dst(&prefix[0], 1));
1269 } else if (g_cbor_output_string_length <= 0xFF) {
1270 prefix[0] |= 0x18;
1271 prefix[1] = g_cbor_output_string_length;
1272 TRY(write_dst(&prefix[0], 2));
1273 } else if (g_cbor_output_string_length <= 0xFFFF) {
1274 prefix[0] |= 0x19;
1275 prefix[1] = g_cbor_output_string_length >> 8;
1276 prefix[2] = g_cbor_output_string_length;
1277 TRY(write_dst(&prefix[0], 3));
1278 } else {
1279 return "main: internal error: CBOR string output is too long";
1280 }
1281
1282 size_t n = g_cbor_output_string_length;
1283 g_cbor_output_string_length = 0;
Nigel Taoea532452020-07-27 00:03:00 +10001284 return write_dst(&g_spool_array[0], n);
Nigel Tao168f60a2020-07-14 13:19:33 +10001285}
1286
1287const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001288write_cbor_output_string(wuffs_base__slice_u8 s, bool finish) {
Nigel Taoea532452020-07-27 00:03:00 +10001289 // Check that g_spool_array can hold any UTF-8 code point.
1290 if (SPOOL_ARRAY_SIZE < 4) {
1291 return "main: internal error: SPOOL_ARRAY_SIZE is too short";
Nigel Tao168f60a2020-07-14 13:19:33 +10001292 }
1293
Nigel Taoee6927f2020-07-27 12:08:33 +10001294 uint8_t* ptr = s.ptr;
1295 size_t len = s.len;
Nigel Tao168f60a2020-07-14 13:19:33 +10001296 while (len > 0) {
Nigel Taoea532452020-07-27 00:03:00 +10001297 size_t available = SPOOL_ARRAY_SIZE - g_cbor_output_string_length;
Nigel Tao168f60a2020-07-14 13:19:33 +10001298 if (available >= len) {
Nigel Taoea532452020-07-27 00:03:00 +10001299 memcpy(&g_spool_array[g_cbor_output_string_length], ptr, len);
Nigel Tao168f60a2020-07-14 13:19:33 +10001300 g_cbor_output_string_length += len;
1301 ptr += len;
1302 len = 0;
1303 break;
1304
1305 } else if (available > 0) {
1306 if (!g_cbor_output_string_is_multiple_chunks) {
1307 g_cbor_output_string_is_multiple_chunks = true;
1308 TRY(write_dst(g_cbor_output_string_is_utf_8 ? "\x7F" : "\x5F", 1));
Nigel Tao3b486982020-02-27 15:05:59 +11001309 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001310
1311 if (g_cbor_output_string_is_utf_8) {
1312 // Walk the end backwards to a UTF-8 boundary, so that each chunk of
1313 // the multi-chunk string is also valid UTF-8.
1314 while (available > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +10001315 wuffs_base__utf_8__next__output o =
1316 wuffs_base__utf_8__next_from_end(ptr, available);
Nigel Tao168f60a2020-07-14 13:19:33 +10001317 if ((o.code_point != WUFFS_BASE__UNICODE_REPLACEMENT_CHARACTER) ||
1318 (o.byte_length != 1)) {
1319 break;
1320 }
1321 available--;
1322 }
1323 }
1324
Nigel Taoea532452020-07-27 00:03:00 +10001325 memcpy(&g_spool_array[g_cbor_output_string_length], ptr, available);
Nigel Tao168f60a2020-07-14 13:19:33 +10001326 g_cbor_output_string_length += available;
1327 ptr += available;
1328 len -= available;
Nigel Tao3b486982020-02-27 15:05:59 +11001329 }
1330
Nigel Tao168f60a2020-07-14 13:19:33 +10001331 TRY(flush_cbor_output_string());
1332 }
Nigel Taob9ad34f2020-03-03 12:44:01 +11001333
Nigel Tao168f60a2020-07-14 13:19:33 +10001334 if (finish) {
1335 TRY(flush_cbor_output_string());
1336 if (g_cbor_output_string_is_multiple_chunks) {
1337 TRY(write_dst("\xFF", 1));
1338 }
1339 }
1340 return nullptr;
1341}
Nigel Taob9ad34f2020-03-03 12:44:01 +11001342
Nigel Tao168f60a2020-07-14 13:19:33 +10001343const char* //
Nigel Taoea532452020-07-27 00:03:00 +10001344flush_json_output_byte_string(bool finish) {
1345 uint8_t* ptr = &g_spool_array[0];
1346 size_t len = g_json_output_byte_string_length;
1347 while (len > 0) {
1348 wuffs_base__transform__output o = wuffs_base__base_64__encode(
Nigel Taod6a10df2020-07-27 11:47:47 +10001349 g_dst.writer_slice(), wuffs_base__make_slice_u8(ptr, len), finish,
Nigel Taoea532452020-07-27 00:03:00 +10001350 WUFFS_BASE__BASE_64__URL_ALPHABET);
1351 g_dst.meta.wi += o.num_dst;
1352 ptr += o.num_src;
1353 len -= o.num_src;
1354 if (o.status.repr == nullptr) {
1355 if (len != 0) {
1356 return "main: internal error: inconsistent spool length";
1357 }
1358 g_json_output_byte_string_length = 0;
1359 break;
1360 } else if (o.status.repr == wuffs_base__suspension__short_read) {
1361 memmove(&g_spool_array[0], ptr, len);
1362 g_json_output_byte_string_length = len;
1363 break;
1364 } else if (o.status.repr != wuffs_base__suspension__short_write) {
1365 return o.status.message();
1366 }
1367 TRY(flush_dst());
1368 }
1369 return nullptr;
1370}
1371
1372const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001373write_json_output_byte_string(wuffs_base__slice_u8 s, bool finish) {
Nigel Taoc96b31c2020-07-27 22:37:23 +10001374 // This function and flush_json_output_byte_string doesn't call write_dst.
1375 // Instead, they call wuffs_base__base_64__encode to write directly to g_dst.
1376 // It is therefore responsible for checking g_suppress_write_dst.
1377 if (g_suppress_write_dst > 0) {
1378 return nullptr;
1379 }
1380
Nigel Taoee6927f2020-07-27 12:08:33 +10001381 uint8_t* ptr = s.ptr;
1382 size_t len = s.len;
Nigel Taoea532452020-07-27 00:03:00 +10001383 while (len > 0) {
1384 size_t available = SPOOL_ARRAY_SIZE - g_json_output_byte_string_length;
1385 if (available >= len) {
1386 memcpy(&g_spool_array[g_json_output_byte_string_length], ptr, len);
1387 g_json_output_byte_string_length += len;
1388 ptr += len;
1389 len = 0;
1390 break;
1391
1392 } else if (available > 0) {
1393 memcpy(&g_spool_array[g_json_output_byte_string_length], ptr, available);
1394 g_json_output_byte_string_length += available;
1395 ptr += available;
1396 len -= available;
1397 }
1398
1399 TRY(flush_json_output_byte_string(false));
1400 }
1401
1402 if (finish) {
1403 TRY(flush_json_output_byte_string(true));
1404 }
1405 return nullptr;
1406}
1407
1408// ----
1409
1410const char* //
Nigel Tao7cb76542020-07-19 22:19:04 +10001411handle_unicode_code_point(uint32_t ucp) {
1412 if (g_flags.output_format == file_format::json) {
1413 if (ucp < 0x0020) {
1414 switch (ucp) {
1415 case '\b':
1416 return write_dst("\\b", 2);
1417 case '\f':
1418 return write_dst("\\f", 2);
1419 case '\n':
1420 return write_dst("\\n", 2);
1421 case '\r':
1422 return write_dst("\\r", 2);
1423 case '\t':
1424 return write_dst("\\t", 2);
1425 }
1426
1427 // Other bytes less than 0x0020 are valid UTF-8 but not valid in a
1428 // JSON string. They need to remain escaped.
1429 uint8_t esc6[6];
1430 esc6[0] = '\\';
1431 esc6[1] = 'u';
1432 esc6[2] = '0';
1433 esc6[3] = '0';
1434 esc6[4] = hex_digit(ucp >> 4);
1435 esc6[5] = hex_digit(ucp >> 0);
1436 return write_dst(&esc6[0], 6);
1437
1438 } else if (ucp == '\"') {
1439 return write_dst("\\\"", 2);
1440
1441 } else if (ucp == '\\') {
1442 return write_dst("\\\\", 2);
1443 }
1444 }
1445
1446 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1447 size_t n = wuffs_base__utf_8__encode(
1448 wuffs_base__make_slice_u8(&u[0],
1449 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1450 ucp);
1451 if (n == 0) {
1452 return "main: internal error: unexpected Unicode code point";
1453 }
1454
1455 if (g_flags.output_format == file_format::json) {
1456 return write_dst(&u[0], n);
1457 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001458 return write_cbor_output_string(wuffs_base__make_slice_u8(&u[0], n), false);
Nigel Tao7cb76542020-07-19 22:19:04 +10001459}
Nigel Taod191a3f2020-07-19 22:14:54 +10001460
1461const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001462write_json_output_text_string(wuffs_base__slice_u8 s) {
1463 uint8_t* ptr = s.ptr;
1464 size_t len = s.len;
Nigel Taod191a3f2020-07-19 22:14:54 +10001465restart:
1466 while (true) {
1467 size_t i;
1468 for (i = 0; i < len; i++) {
1469 uint8_t c = ptr[i];
1470 if ((c == '"') || (c == '\\') || (c < 0x20)) {
1471 TRY(write_dst(ptr, i));
1472 TRY(handle_unicode_code_point(c));
1473 ptr += i + 1;
1474 len -= i + 1;
1475 goto restart;
1476 }
1477 }
1478 TRY(write_dst(ptr, len));
1479 break;
1480 }
1481 return nullptr;
1482}
1483
1484const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001485handle_string(uint64_t vbd,
Nigel Taoee6927f2020-07-27 12:08:33 +10001486 wuffs_base__slice_u8 s,
Nigel Tao168f60a2020-07-14 13:19:33 +10001487 bool start_of_token_chain,
1488 bool continued) {
1489 if (start_of_token_chain) {
1490 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +10001491 if (g_flags.output_cbor_metadata_as_json_comments &&
1492 !(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
Nigel Taoea532452020-07-27 00:03:00 +10001493 TRY(write_dst("/*cbor:base64url*/\"", 19));
1494 g_json_output_byte_string_length = 0;
Nigel Tao3c8589b2020-07-19 21:49:00 +10001495 } else {
1496 TRY(write_dst("\"", 1));
1497 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001498 } else {
1499 g_cbor_output_string_length = 0;
1500 g_cbor_output_string_is_multiple_chunks = false;
1501 g_cbor_output_string_is_utf_8 =
1502 vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8;
1503 }
1504 g_query.restart_fragment(in_dict_before_key() && g_query.is_at(g_depth));
1505 }
1506
1507 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_0_DST_1_SRC_DROP) {
1508 // No-op.
1509 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001510 if (g_flags.output_format == file_format::json) {
Nigel Taoaf757722020-07-18 17:27:11 +10001511 if (g_flags.input_format == file_format::json) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001512 TRY(write_dst(s.ptr, s.len));
Nigel Taoaf757722020-07-18 17:27:11 +10001513 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001514 TRY(write_json_output_text_string(s));
Nigel Taoaf757722020-07-18 17:27:11 +10001515 } else {
Nigel Taoee6927f2020-07-27 12:08:33 +10001516 TRY(write_json_output_byte_string(s, false));
Nigel Taoaf757722020-07-18 17:27:11 +10001517 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001518 } else {
Nigel Taoee6927f2020-07-27 12:08:33 +10001519 TRY(write_cbor_output_string(s, false));
Nigel Tao168f60a2020-07-14 13:19:33 +10001520 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001521 g_query.incremental_match_slice(s.ptr, s.len);
Nigel Taob9ad34f2020-03-03 12:44:01 +11001522 } else {
Nigel Tao168f60a2020-07-14 13:19:33 +10001523 return "main: internal error: unexpected string-token conversion";
1524 }
1525
1526 if (continued) {
1527 return nullptr;
1528 }
1529
1530 if (g_flags.output_format == file_format::json) {
Nigel Taoea532452020-07-27 00:03:00 +10001531 if (!(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001532 TRY(write_json_output_byte_string(wuffs_base__empty_slice_u8(), true));
Nigel Taoea532452020-07-27 00:03:00 +10001533 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001534 TRY(write_dst("\"", 1));
1535 } else {
Nigel Taoee6927f2020-07-27 12:08:33 +10001536 TRY(write_cbor_output_string(wuffs_base__empty_slice_u8(), true));
Nigel Tao168f60a2020-07-14 13:19:33 +10001537 }
1538 return nullptr;
1539}
1540
Nigel Taod191a3f2020-07-19 22:14:54 +10001541// ----
1542
Nigel Tao3b486982020-02-27 15:05:59 +11001543const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001544handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001545 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001546 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001547 uint64_t vbd = t.value_base_detail();
Nigel Taoee6927f2020-07-27 12:08:33 +10001548 uint64_t token_length = t.length();
1549 wuffs_base__slice_u8 tok = wuffs_base__make_slice_u8(
1550 g_src.data.ptr + g_curr_token_end_src_index - token_length,
1551 token_length);
Nigel Tao1b073492020-02-16 22:11:36 +11001552
1553 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001554 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001555 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001556 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001557 return "main: no match for query";
1558 }
Nigel Taod60815c2020-03-26 14:32:35 +11001559 if (g_depth <= 0) {
1560 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001561 }
Nigel Taod60815c2020-03-26 14:32:35 +11001562 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001563
Nigel Taod60815c2020-03-26 14:32:35 +11001564 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1565 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001566 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao168f60a2020-07-14 13:19:33 +10001567 if (g_flags.output_format == file_format::json) {
1568 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1569 ? "\"[…]\""
1570 : "\"{…}\"",
1571 7));
1572 } else {
1573 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1574 ? "\x65[…]"
1575 : "\x65{…}",
1576 6));
1577 }
1578 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001579 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001580 if ((g_ctx != context::in_list_after_bracket) &&
1581 (g_ctx != context::in_dict_after_brace) &&
1582 !g_flags.compact_output) {
Nigel Taoc766bb72020-07-09 12:59:32 +10001583 if (g_flags.output_json_extra_comma) {
1584 TRY(write_dst(",\n", 2));
1585 } else {
1586 TRY(write_dst("\n", 1));
1587 }
Nigel Taod60815c2020-03-26 14:32:35 +11001588 for (uint32_t i = 0; i < g_depth; i++) {
1589 TRY(write_dst(
1590 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001591 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001592 }
Nigel Tao1b073492020-02-16 22:11:36 +11001593 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001594
1595 TRY(write_dst(
1596 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1597 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001598 } else {
1599 TRY(write_dst("\xFF", 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001600 }
1601
Nigel Taod60815c2020-03-26 14:32:35 +11001602 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1603 ? context::in_list_after_value
1604 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001605 goto after_value;
1606 }
1607
Nigel Taod1c928a2020-02-28 12:43:53 +11001608 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
Nigel Tao77c75512020-07-27 21:35:11 +10001609 // continuation of a multi-token chain or a CBOR tagged data item.
1610 if (g_previous_token_was_cbor_tag) {
1611 g_previous_token_was_cbor_tag = false;
1612 } else if (start_of_token_chain) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001613 if (g_flags.output_format != file_format::json) {
1614 // No-op.
1615 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001616 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1617 } else if (g_ctx != context::none) {
Nigel Taof8dfc762020-07-23 23:35:44 +10001618 if ((g_ctx == context::in_dict_after_brace) ||
1619 (g_ctx == context::in_dict_after_value)) {
Nigel Tao983a74f2020-07-27 15:17:46 +10001620 // Reject dict keys that aren't UTF-8 strings or non-negative
1621 // integers, which could otherwise happen with -i=cbor -o=json.
1622 if (vbc == WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED) {
1623 // No-op.
1624 } else if ((vbc == WUFFS_BASE__TOKEN__VBC__STRING) &&
1625 (vbd &
1626 WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
1627 // No-op.
1628 } else {
Nigel Taof8dfc762020-07-23 23:35:44 +10001629 return "main: cannot convert CBOR non-text-string to JSON map key";
1630 }
1631 }
1632 if ((g_ctx == context::in_list_after_value) ||
1633 (g_ctx == context::in_dict_after_value)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001634 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001635 }
Nigel Taod60815c2020-03-26 14:32:35 +11001636 if (!g_flags.compact_output) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001637 TRY(write_dst("\n", 1));
Nigel Taod60815c2020-03-26 14:32:35 +11001638 for (size_t i = 0; i < g_depth; i++) {
1639 TRY(write_dst(
1640 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001641 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao0cd2f982020-03-03 23:03:02 +11001642 }
1643 }
1644 }
1645
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001646 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001647 if (g_query.is_at(g_depth)) {
1648 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001649 case context::in_list_after_bracket:
1650 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001651 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001652 break;
1653 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001654 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001655 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001656 default:
1657 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001658 }
1659 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001660 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001661 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001662 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001663 // There is no next fragment. We have matched the complete query, and
1664 // the upcoming JSON value is the result of that query.
1665 //
Nigel Taod60815c2020-03-26 14:32:35 +11001666 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1667 // we were about to decode a top-level value. This makes any subsequent
1668 // indentation be relative to this point, and we will return g_eod
1669 // after the upcoming JSON value is complete.
1670 if (g_suppress_write_dst != 1) {
1671 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001672 }
Nigel Taod60815c2020-03-26 14:32:35 +11001673 g_suppress_write_dst = 0;
1674 g_ctx = context::none;
1675 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001676 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1677 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1678 // The query has moved on to the next fragment but the upcoming JSON
1679 // value is not a container.
1680 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001681 }
1682 }
1683
1684 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001685 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001686 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001687 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001688 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1689 g_suppress_write_dst++;
Nigel Tao168f60a2020-07-14 13:19:33 +10001690 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001691 TRY(write_dst(
1692 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1693 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001694 } else {
1695 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1696 ? "\x9F"
1697 : "\xBF",
1698 1));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001699 }
Nigel Taod60815c2020-03-26 14:32:35 +11001700 g_depth++;
1701 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1702 ? context::in_list_after_bracket
1703 : context::in_dict_after_brace;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001704 return nullptr;
1705
Nigel Tao2cf76db2020-02-27 22:42:01 +11001706 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Taoee6927f2020-07-27 12:08:33 +10001707 TRY(handle_string(vbd, tok, start_of_token_chain, t.continued()));
Nigel Tao496e88b2020-04-09 22:10:08 +10001708 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001709 return nullptr;
1710 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001711 goto after_value;
1712
1713 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001714 if (!t.continued()) {
1715 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001716 }
1717 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001718 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001719 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001720
Nigel Tao85fba7f2020-02-29 16:28:06 +11001721 case WUFFS_BASE__TOKEN__VBC__LITERAL:
Nigel Tao168f60a2020-07-14 13:19:33 +10001722 TRY(write_literal(vbd));
1723 goto after_value;
1724
Nigel Tao2cf76db2020-02-27 22:42:01 +11001725 case WUFFS_BASE__TOKEN__VBC__NUMBER:
Nigel Taoee6927f2020-07-27 12:08:33 +10001726 TRY(write_number(vbd, tok));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001727 goto after_value;
Nigel Tao4e193592020-07-15 12:48:57 +10001728
Nigel Taoc9d4e342020-07-21 15:20:34 +10001729 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1730 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED: {
1731 bool x_is_signed = vbc == WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED;
1732 uint64_t x = x_is_signed
1733 ? ((uint64_t)(t.value_base_detail__sign_extended()))
1734 : vbd;
Nigel Tao850dc182020-07-21 22:52:04 +10001735 if (t.continued()) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001736 if (tok.len != 0) {
Nigel Tao03a87ea2020-07-21 23:29:26 +10001737 return "main: internal error: unexpected to-be-extended length";
1738 }
Nigel Tao850dc182020-07-21 22:52:04 +10001739 g_token_extension.category = vbc;
1740 g_token_extension.detail = x;
1741 return nullptr;
1742 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001743 TRY(write_inline_integer(x, x_is_signed, tok));
Nigel Tao4e193592020-07-15 12:48:57 +10001744 goto after_value;
Nigel Taoc9d4e342020-07-21 15:20:34 +10001745 }
Nigel Tao1b073492020-02-16 22:11:36 +11001746 }
1747
Nigel Tao850dc182020-07-21 22:52:04 +10001748 int64_t ext = t.value_extension();
1749 if (ext >= 0) {
Nigel Tao27168032020-07-24 13:05:05 +10001750 uint64_t x = (g_token_extension.detail
1751 << WUFFS_BASE__TOKEN__VALUE_EXTENSION__NUM_BITS) |
1752 ((uint64_t)ext);
Nigel Tao850dc182020-07-21 22:52:04 +10001753 switch (g_token_extension.category) {
1754 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1755 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED:
Nigel Tao850dc182020-07-21 22:52:04 +10001756 TRY(write_inline_integer(
1757 x,
1758 g_token_extension.category ==
1759 WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED,
Nigel Taoee6927f2020-07-27 12:08:33 +10001760 tok));
Nigel Tao850dc182020-07-21 22:52:04 +10001761 g_token_extension.category = 0;
1762 g_token_extension.detail = 0;
1763 goto after_value;
Nigel Tao27168032020-07-24 13:05:05 +10001764 case CATEGORY_CBOR_TAG:
Nigel Tao77c75512020-07-27 21:35:11 +10001765 g_previous_token_was_cbor_tag = true;
Nigel Taoee6927f2020-07-27 12:08:33 +10001766 TRY(write_cbor_tag(x, tok));
Nigel Tao27168032020-07-24 13:05:05 +10001767 g_token_extension.category = 0;
1768 g_token_extension.detail = 0;
1769 return nullptr;
Nigel Tao850dc182020-07-21 22:52:04 +10001770 }
1771 }
1772
Nigel Tao664f8432020-07-16 21:25:14 +10001773 if (t.value_major() == WUFFS_CBOR__TOKEN_VALUE_MAJOR) {
1774 uint64_t value_minor = t.value_minor();
Nigel Taoc9e20102020-07-24 23:19:12 +10001775 if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__MINUS_1_MINUS_X) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001776 TRY(write_cbor_minus_1_minus_x(tok));
Nigel Taoc9e20102020-07-24 23:19:12 +10001777 goto after_value;
1778 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__SIMPLE_VALUE) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001779 TRY(write_cbor_simple_value(vbd, tok));
Nigel Taoc9e20102020-07-24 23:19:12 +10001780 goto after_value;
1781 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__TAG) {
Nigel Tao77c75512020-07-27 21:35:11 +10001782 g_previous_token_was_cbor_tag = true;
Nigel Tao27168032020-07-24 13:05:05 +10001783 if (t.continued()) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001784 if (tok.len != 0) {
Nigel Tao27168032020-07-24 13:05:05 +10001785 return "main: internal error: unexpected to-be-extended length";
1786 }
1787 g_token_extension.category = CATEGORY_CBOR_TAG;
1788 g_token_extension.detail = vbd;
1789 return nullptr;
1790 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001791 return write_cbor_tag(vbd, tok);
Nigel Tao664f8432020-07-16 21:25:14 +10001792 }
1793 }
1794
1795 // Return an error if we didn't match the (value_major, value_minor) or
1796 // (vbc, vbd) pair.
Nigel Tao2cf76db2020-02-27 22:42:01 +11001797 return "main: internal error: unexpected token";
1798 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001799
Nigel Tao2cf76db2020-02-27 22:42:01 +11001800 // Book-keeping after completing a value (whether a container value or a
1801 // simple value). Empty parent containers are no longer empty. If the parent
1802 // container is a "{...}" object, toggle between keys and values.
1803after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001804 if (g_depth == 0) {
1805 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001806 }
Nigel Taod60815c2020-03-26 14:32:35 +11001807 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001808 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001809 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001810 break;
1811 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001812 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001813 break;
1814 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001815 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001816 break;
1817 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001818 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001819 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001820 default:
1821 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001822 }
1823 return nullptr;
1824}
1825
1826const char* //
1827main1(int argc, char** argv) {
1828 TRY(initialize_globals(argc, argv));
1829
Nigel Taocd183f92020-07-14 12:11:05 +10001830 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001831 while (true) {
Nigel Tao4e193592020-07-15 12:48:57 +10001832 wuffs_base__status status = g_dec->decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001833 &g_tok, &g_src,
1834 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001835
Nigel Taod60815c2020-03-26 14:32:35 +11001836 while (g_tok.meta.ri < g_tok.meta.wi) {
1837 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao2cf76db2020-02-27 22:42:01 +11001838 uint64_t n = t.length();
Nigel Taod60815c2020-03-26 14:32:35 +11001839 if ((g_src.meta.ri - g_curr_token_end_src_index) < n) {
1840 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001841 }
Nigel Taod60815c2020-03-26 14:32:35 +11001842 g_curr_token_end_src_index += n;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001843
Nigel Taod0b16cb2020-03-14 10:15:54 +11001844 // Skip filler tokens (e.g. whitespace).
Nigel Tao3c8589b2020-07-19 21:49:00 +10001845 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao496e88b2020-04-09 22:10:08 +10001846 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001847 continue;
1848 }
1849
Nigel Tao2ef39992020-04-09 17:24:39 +10001850 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao496e88b2020-04-09 22:10:08 +10001851 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001852 if (z == nullptr) {
1853 continue;
Nigel Taod60815c2020-03-26 14:32:35 +11001854 } else if (z == g_eod) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001855 goto end_of_data;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001856 }
1857 return z;
Nigel Tao1b073492020-02-16 22:11:36 +11001858 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001859
1860 if (status.repr == nullptr) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001861 return "main: internal error: unexpected end of token stream";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001862 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Taod60815c2020-03-26 14:32:35 +11001863 if (g_curr_token_end_src_index != g_src.meta.ri) {
1864 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001865 }
1866 TRY(read_src());
Nigel Taod60815c2020-03-26 14:32:35 +11001867 g_curr_token_end_src_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001868 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001869 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001870 } else {
1871 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001872 }
1873 }
Nigel Tao0cd2f982020-03-03 23:03:02 +11001874end_of_data:
1875
Nigel Taod60815c2020-03-26 14:32:35 +11001876 // With a non-empty g_query, don't try to consume trailing whitespace or
Nigel Tao0cd2f982020-03-03 23:03:02 +11001877 // confirm that we've processed all the tokens.
Nigel Taod60815c2020-03-26 14:32:35 +11001878 if (g_flags.query_c_string && *g_flags.query_c_string) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001879 return nullptr;
1880 }
Nigel Tao6b161af2020-02-24 11:01:48 +11001881
Nigel Tao6b161af2020-02-24 11:01:48 +11001882 // Check that we've exhausted the input.
Nigel Taod60815c2020-03-26 14:32:35 +11001883 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001884 TRY(read_src());
1885 }
Nigel Taod60815c2020-03-26 14:32:35 +11001886 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
Nigel Tao51a38292020-07-19 22:43:17 +10001887 return "main: valid JSON|CBOR followed by further (unexpected) data";
Nigel Tao6b161af2020-02-24 11:01:48 +11001888 }
1889
1890 // Check that we've used all of the decoded tokens, other than trailing
Nigel Tao4b186b02020-03-18 14:25:21 +11001891 // filler tokens. For example, "true\n" is valid JSON (and fully consumed
1892 // with WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE enabled) with a trailing
1893 // filler token for the "\n".
Nigel Taod60815c2020-03-26 14:32:35 +11001894 for (; g_tok.meta.ri < g_tok.meta.wi; g_tok.meta.ri++) {
1895 if (g_tok.data.ptr[g_tok.meta.ri].value_base_category() !=
Nigel Tao6b161af2020-02-24 11:01:48 +11001896 WUFFS_BASE__TOKEN__VBC__FILLER) {
1897 return "main: internal error: decoded OK but unprocessed tokens remain";
1898 }
1899 }
1900
1901 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001902}
1903
Nigel Tao2914bae2020-02-26 09:40:30 +11001904int //
1905compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001906 if (!status_msg) {
1907 return 0;
1908 }
Nigel Tao01abc842020-03-06 21:42:33 +11001909 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001910 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001911 n = strlen(status_msg);
1912 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001913 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001914 if (n >= 2047) {
1915 status_msg = "main: internal error: error message is too long";
1916 n = strnlen(status_msg, 2047);
1917 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001918 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001919 const int stderr_fd = 2;
1920 ignore_return_value(write(stderr_fd, status_msg, n));
1921 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Tao9cc2c252020-02-23 17:05:49 +11001922 // Return an exit code of 1 for regular (forseen) errors, e.g. badly
1923 // formatted or unsupported input.
1924 //
1925 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1926 // run-time checks found that an internal invariant did not hold.
1927 //
1928 // Automated testing, including badly formatted inputs, can therefore
1929 // discriminate between expected failure (exit code 1) and unexpected failure
1930 // (other non-zero exit codes). Specifically, exit code 2 for internal
1931 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1932 // linux) for a segmentation fault (e.g. null pointer dereference).
1933 return strstr(status_msg, "internal error:") ? 2 : 1;
1934}
1935
Nigel Tao2914bae2020-02-26 09:40:30 +11001936int //
1937main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001938 // Look for an input filename (the first non-flag argument) in argv. If there
1939 // is one, open it (but do not read from it) before we self-impose a sandbox.
1940 //
1941 // Flags start with "-", unless it comes after a bare "--" arg.
1942 {
1943 bool dash_dash = false;
1944 int a;
1945 for (a = 1; a < argc; a++) {
1946 char* arg = argv[a];
1947 if ((arg[0] == '-') && !dash_dash) {
1948 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1949 continue;
1950 }
Nigel Taod60815c2020-03-26 14:32:35 +11001951 g_input_file_descriptor = open(arg, O_RDONLY);
1952 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001953 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1954 return 1;
1955 }
1956 break;
1957 }
1958 }
1959
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001960#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1961 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001962 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001963#endif
1964
Nigel Tao0cd2f982020-03-03 23:03:02 +11001965 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001966 if (g_wrote_to_dst) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001967 const char* z1 = (g_flags.output_format == file_format::json)
1968 ? write_dst("\n", 1)
1969 : nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001970 const char* z2 = flush_dst();
1971 z = z ? z : (z1 ? z1 : z2);
1972 }
1973 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001974
1975#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1976 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1977 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1978 // only SYS_exit.
1979 syscall(SYS_exit, exit_code);
1980#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001981 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001982}