blob: baa03b113f7994bb982b34e410372c075094c95e [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
17/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110018jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao0291a472020-08-13 22:40:10 +100019(RFC 6901) query syntax. It reads UTF-8 JSON from stdin and writes
20canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110021
Nigel Taod60815c2020-03-26 14:32:35 +110022See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110023
24----
25
26JSON Pointer (and this program's implementation) is one of many JSON query
27languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
28simple and fewer-featured compared to those others.
29
Nigel Tao0291a472020-08-13 22:40:10 +100030One benefit of simplicity is that this program's JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110031implementations do not dynamically allocate or free memory (yet it does not
32require that the entire input fits in memory at once). They are therefore
33trivially protected against certain bug classes: memory leaks, double-frees and
34use-after-frees.
35
Nigel Tao0291a472020-08-13 22:40:10 +100036The core JSON implementation is also written in the Wuffs programming language
37(and then transpiled to C/C++), which is memory-safe (e.g. array indexing is
38bounds-checked) but also guards against integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110039
Nigel Taofe0cbbd2020-03-05 22:01:30 +110040For defense in depth, on Linux, this program also self-imposes a
41SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
42or writing its output. Under this sandbox, the only permitted system calls are
43read, write, exit and sigreturn.
44
Nigel Tao0291a472020-08-13 22:40:10 +100045All together, this program aims to safely handle untrusted JSON files without
46fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110047
48----
Nigel Tao1b073492020-02-16 22:11:36 +110049
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110050As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
51JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
52"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
53was first published on 2016-10-26 and updated on 2018-03-30.
54
Nigel Tao0cd2f982020-03-03 23:03:02 +110055After modifying this program, run "build-example.sh example/jsonptr/" and then
56"script/run-json-test-suite.sh" to catch correctness regressions.
57
58----
59
Nigel Taod0b16cb2020-03-14 10:15:54 +110060This program uses Wuffs' JSON decoder at a relatively low level, processing the
61decoder's token-stream output individually. The core loop, in pseudo-code, is
62"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110063changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110064output text based on that state and the token's source text. Notably,
65handle_token is not recursive, even though JSON values can nest.
66
67This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
68string, object) comprises one or more JSON tokens.
69
70An alternative, higher-level approach is in the sibling example/jsonfindptrs
71program. Neither approach is better or worse per se, but when studying this
72program, be aware that there are multiple ways to use Wuffs' JSON decoder.
73
74The two programs, jsonfindptrs and jsonptr, also demonstrate different
75trade-offs with regard to JSON object duplicate keys. The JSON spec permits
76different implementations to allow or reject duplicate keys. It is not always
77clear which approach is safer. Rejecting them is certainly unambiguous, and
78security bugs can lurk in ambiguous corners of a file format, if two different
79implementations both silently accept a file but differ on how to interpret it.
80On the other hand, in the worst case, detecting duplicate keys requires O(N)
81memory, where N is the size of the (potentially untrusted) input.
82
83This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
84mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
85it runs in a SECCOMP_MODE_STRICT sandbox.
86
87----
88
Nigel Tao50bfab92020-08-05 11:39:09 +100089To run:
Nigel Tao1b073492020-02-16 22:11:36 +110090
91$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
92
93for a C++ compiler $CXX, such as clang++ or g++.
94*/
95
Nigel Tao721190a2020-04-03 22:25:21 +110096#if defined(__cplusplus) && (__cplusplus < 201103L)
97#error "This C++ program requires -std=c++11 or later"
98#endif
99
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100100#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100101#include <fcntl.h>
102#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100103#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100104#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100105
106// Wuffs ships as a "single file C library" or "header file library" as per
107// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
108//
109// To use that single file as a "foo.c"-like implementation, instead of a
110// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
111// compiling it.
112#define WUFFS_IMPLEMENTATION
113
114// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
115// release/c/etc.c whitelist which parts of Wuffs to build. That file contains
116// the entire Wuffs standard library, implementing a variety of codecs and file
117// formats. Without this macro definition, an optimizing compiler or linker may
118// very well discard Wuffs code for unused codecs, but listing the Wuffs
119// modules we use makes that process explicit. Preprocessing means that such
120// code simply isn't compiled.
121#define WUFFS_CONFIG__MODULES
122#define WUFFS_CONFIG__MODULE__BASE
123#define WUFFS_CONFIG__MODULE__JSON
124
125// If building this program in an environment that doesn't easily accommodate
126// relative includes, you can use the script/inline-c-relative-includes.go
127// program to generate a stand-alone C++ file.
128#include "../../release/c/wuffs-unsupported-snapshot.c"
129
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100130#if defined(__linux__)
131#include <linux/prctl.h>
132#include <linux/seccomp.h>
133#include <sys/prctl.h>
134#include <sys/syscall.h>
135#define WUFFS_EXAMPLE_USE_SECCOMP
136#endif
137
Nigel Tao2cf76db2020-02-27 22:42:01 +1100138#define TRY(error_msg) \
139 do { \
140 const char* z = error_msg; \
141 if (z) { \
142 return z; \
143 } \
144 } while (false)
145
Nigel Taod60815c2020-03-26 14:32:35 +1100146static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100147
Nigel Taod60815c2020-03-26 14:32:35 +1100148static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100149 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100150 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100151 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100152 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100153 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100154 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000155 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100156 " -t -tabs\n"
157 " -fail-if-unsandboxed\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000158 " -input-allow-comments\n"
159 " -input-allow-extra-comma\n"
160 " -input-allow-inf-nan-numbers\n"
Nigel Tao21042052020-08-19 23:13:54 +1000161 " -jwcc\n"
162 " -output-comments\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000163 " -output-extra-comma\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000164 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100165 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100166 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100167 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100168 "----\n"
169 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100170 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000171 "Pointer (RFC 6901) query syntax. It reads UTF-8 JSON from stdin and\n"
172 "writes canonicalized, formatted UTF-8 JSON to stdout.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100173 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000174 "Canonicalized means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-written\n"
175 "as \"abc\\n\\txÅ·z\". It does not sort object keys, nor does it reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100176 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100177 "\n"
178 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000179 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000180 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags.\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000181 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000182 "The -input-allow-comments flag allows \"/*slash-star*/\" and\n"
183 "\"//slash-slash\" C-style comments within JSON input. Such comments are\n"
Nigel Tao21042052020-08-19 23:13:54 +1000184 "stripped from the output unless -output-comments was also set.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100185 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000186 "The -input-allow-extra-comma flag allows input like \"[1,2,]\", with a\n"
187 "comma after the final element of a JSON list or dictionary.\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000188 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000189 "The -input-allow-inf-nan-numbers flag allows non-finite floating point\n"
190 "numbers (infinities and not-a-numbers) within JSON input.\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000191 "\n"
Nigel Tao21042052020-08-19 23:13:54 +1000192 "The -output-comments flag copies any input comments to the output. It\n"
193 "has no effect unless -input-allow-comments was also set. Comments look\n"
194 "better after commas than before them, but a closing \"]\" or \"}\" can\n"
195 "occur after arbitrarily many comments, so -output-comments also requires\n"
196 "that one or both of -compact-output and -output-extra-comma be set.\n"
197 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000198 "The -output-extra-comma flag writes output like \"[1,2,]\", with a comma\n"
199 "after the final element of a JSON list or dictionary. Such commas are\n"
200 "non-compliant with the JSON specification but many parsers accept them\n"
201 "and they can produce simpler line-based diffs. This flag is ignored when\n"
202 "-compact-output is set.\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000203 "\n"
Nigel Tao21042052020-08-19 23:13:54 +1000204 "The -jwcc flag (JSON With Commas and Comments) enables all of:\n"
205 " -input-allow-comments\n"
206 " -input-allow-extra-comma\n"
207 " -output-comments\n"
208 " -output-extra-comma\n"
209 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100210 "----\n"
211 "\n"
212 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100213 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100214 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
215 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100216 "will print:\n"
217 " \"baz\"\n"
218 "\n"
219 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100220 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100221 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
222 "child (the value in a key-value pair) of the root whose key is the empty\n"
223 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100224 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000225 "If the query found a valid JSON value, this program will return a zero\n"
226 "exit code even if the rest of the input isn't valid JSON. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100227 "did not find a value, or found an invalid one, this program returns a\n"
228 "non-zero exit code, but may still print partial output to stdout.\n"
229 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000230 "The JSON specification (https://json.org/) permits implementations that\n"
231 "allow duplicate keys, as this one does. This JSON Pointer implementation\n"
232 "is also greedy, following the first match for each fragment without\n"
233 "back-tracking. For example, the \"/foo/bar\" query will fail if the root\n"
234 "object has multiple \"foo\" children but the first one doesn't have a\n"
235 "\"bar\" child, even if later ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100236 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000237 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
238 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
239 "\"~\" and \"/\". Without this flag, this program also lets \"~n\" and\n"
240 "\"~r\" escape the New Line and Carriage Return ASCII control characters,\n"
241 "which can work better with line oriented Unix tools that assume exactly\n"
242 "one value (i.e. one JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100243 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100244 "----\n"
245 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100246 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000247 "output depth. JSON containers ([] arrays and {} objects) can hold other\n"
248 "containers. When this flag is set, containers at depth NUM are replaced\n"
249 "with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is equivalent to\n"
250 "-d=1. The flag's absence is equivalent to an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100251 "\n"
252 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000253 "affect whether or not the input is considered valid JSON. The JSON\n"
254 "specification permits implementations to set their own maximum input\n"
255 "depth. This JSON implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100256 "\n"
257 "Depth is measured in terms of nested containers. It is unaffected by the\n"
258 "number of spaces or tabs used to indent.\n"
259 "\n"
260 "When both -max-output-depth and -query are set, the output depth is\n"
261 "measured from when the query resolves, not from the input root. The\n"
262 "input depth (measured from the root) is still limited to 1024.\n"
263 "\n"
264 "----\n"
265 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100266 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
267 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100268 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100269
Nigel Tao2cf76db2020-02-27 22:42:01 +1100270// ----
271
Nigel Tao63441812020-08-21 14:05:48 +1000272// ascii_escapes was created by script/print-json-ascii-escapes.go.
273const uint8_t ascii_escapes[1024] = {
274 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x30, 0x00, // 0x00: "\\u0000"
275 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x31, 0x00, // 0x01: "\\u0001"
276 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x32, 0x00, // 0x02: "\\u0002"
277 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x33, 0x00, // 0x03: "\\u0003"
278 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x34, 0x00, // 0x04: "\\u0004"
279 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x35, 0x00, // 0x05: "\\u0005"
280 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x36, 0x00, // 0x06: "\\u0006"
281 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x37, 0x00, // 0x07: "\\u0007"
282 0x02, 0x5C, 0x62, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x08: "\\b"
283 0x02, 0x5C, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x09: "\\t"
284 0x02, 0x5C, 0x6E, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0A: "\\n"
285 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x42, 0x00, // 0x0B: "\\u000B"
286 0x02, 0x5C, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0C: "\\f"
287 0x02, 0x5C, 0x72, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0D: "\\r"
288 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x45, 0x00, // 0x0E: "\\u000E"
289 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x46, 0x00, // 0x0F: "\\u000F"
290 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x30, 0x00, // 0x10: "\\u0010"
291 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x31, 0x00, // 0x11: "\\u0011"
292 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x32, 0x00, // 0x12: "\\u0012"
293 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x33, 0x00, // 0x13: "\\u0013"
294 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x34, 0x00, // 0x14: "\\u0014"
295 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x35, 0x00, // 0x15: "\\u0015"
296 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x36, 0x00, // 0x16: "\\u0016"
297 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x37, 0x00, // 0x17: "\\u0017"
298 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x38, 0x00, // 0x18: "\\u0018"
299 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x39, 0x00, // 0x19: "\\u0019"
300 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x41, 0x00, // 0x1A: "\\u001A"
301 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x42, 0x00, // 0x1B: "\\u001B"
302 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x43, 0x00, // 0x1C: "\\u001C"
303 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x44, 0x00, // 0x1D: "\\u001D"
304 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x45, 0x00, // 0x1E: "\\u001E"
305 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x46, 0x00, // 0x1F: "\\u001F"
306 0x06, 0x5C, 0x75, 0x30, 0x30, 0x32, 0x30, 0x00, // 0x20: "\\u0020"
307 0x01, 0x21, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x21: "!"
308 0x02, 0x5C, 0x22, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x22: "\\\""
309 0x01, 0x23, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x23: "#"
310 0x01, 0x24, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x24: "$"
311 0x01, 0x25, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x25: "%"
312 0x01, 0x26, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x26: "&"
313 0x01, 0x27, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x27: "'"
314 0x01, 0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x28: "("
315 0x01, 0x29, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x29: ")"
316 0x01, 0x2A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2A: "*"
317 0x01, 0x2B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2B: "+"
318 0x01, 0x2C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2C: ","
319 0x01, 0x2D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2D: "-"
320 0x01, 0x2E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2E: "."
321 0x01, 0x2F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2F: "/"
322 0x01, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x30: "0"
323 0x01, 0x31, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x31: "1"
324 0x01, 0x32, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x32: "2"
325 0x01, 0x33, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x33: "3"
326 0x01, 0x34, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x34: "4"
327 0x01, 0x35, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x35: "5"
328 0x01, 0x36, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x36: "6"
329 0x01, 0x37, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x37: "7"
330 0x01, 0x38, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x38: "8"
331 0x01, 0x39, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x39: "9"
332 0x01, 0x3A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3A: ":"
333 0x01, 0x3B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3B: ";"
334 0x01, 0x3C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3C: "<"
335 0x01, 0x3D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3D: "="
336 0x01, 0x3E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3E: ">"
337 0x01, 0x3F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3F: "?"
338 0x01, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x40: "@"
339 0x01, 0x41, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x41: "A"
340 0x01, 0x42, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x42: "B"
341 0x01, 0x43, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x43: "C"
342 0x01, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x44: "D"
343 0x01, 0x45, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x45: "E"
344 0x01, 0x46, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x46: "F"
345 0x01, 0x47, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x47: "G"
346 0x01, 0x48, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x48: "H"
347 0x01, 0x49, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x49: "I"
348 0x01, 0x4A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4A: "J"
349 0x01, 0x4B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4B: "K"
350 0x01, 0x4C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4C: "L"
351 0x01, 0x4D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4D: "M"
352 0x01, 0x4E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4E: "N"
353 0x01, 0x4F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4F: "O"
354 0x01, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x50: "P"
355 0x01, 0x51, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x51: "Q"
356 0x01, 0x52, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x52: "R"
357 0x01, 0x53, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x53: "S"
358 0x01, 0x54, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x54: "T"
359 0x01, 0x55, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x55: "U"
360 0x01, 0x56, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x56: "V"
361 0x01, 0x57, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x57: "W"
362 0x01, 0x58, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x58: "X"
363 0x01, 0x59, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x59: "Y"
364 0x01, 0x5A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5A: "Z"
365 0x01, 0x5B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5B: "["
366 0x02, 0x5C, 0x5C, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5C: "\\\\"
367 0x01, 0x5D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5D: "]"
368 0x01, 0x5E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5E: "^"
369 0x01, 0x5F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5F: "_"
370 0x01, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x60: "`"
371 0x01, 0x61, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x61: "a"
372 0x01, 0x62, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x62: "b"
373 0x01, 0x63, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x63: "c"
374 0x01, 0x64, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x64: "d"
375 0x01, 0x65, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x65: "e"
376 0x01, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x66: "f"
377 0x01, 0x67, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x67: "g"
378 0x01, 0x68, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x68: "h"
379 0x01, 0x69, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x69: "i"
380 0x01, 0x6A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6A: "j"
381 0x01, 0x6B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6B: "k"
382 0x01, 0x6C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6C: "l"
383 0x01, 0x6D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6D: "m"
384 0x01, 0x6E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6E: "n"
385 0x01, 0x6F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6F: "o"
386 0x01, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x70: "p"
387 0x01, 0x71, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x71: "q"
388 0x01, 0x72, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x72: "r"
389 0x01, 0x73, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x73: "s"
390 0x01, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x74: "t"
391 0x01, 0x75, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x75: "u"
392 0x01, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x76: "v"
393 0x01, 0x77, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x77: "w"
394 0x01, 0x78, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x78: "x"
395 0x01, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x79: "y"
396 0x01, 0x7A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7A: "z"
397 0x01, 0x7B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7B: "{"
398 0x01, 0x7C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7C: "|"
399 0x01, 0x7D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7D: "}"
400 0x01, 0x7E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7E: "~"
401 0x01, 0x7F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7F: "<DEL>"
402};
403
Nigel Taof3146c22020-03-26 08:47:42 +1100404// Wuffs allows either statically or dynamically allocated work buffers. This
405// program exercises static allocation.
406#define WORK_BUFFER_ARRAY_SIZE \
407 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
408#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100409uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100410#else
411// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100412uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100413#endif
414
Nigel Taod60815c2020-03-26 14:32:35 +1100415bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100416
Nigel Taod60815c2020-03-26 14:32:35 +1100417int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100418
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000419#define NEW_LINE_THEN_256_SPACES \
420 "\n " \
421 " " \
422 " " \
423 " "
424#define NEW_LINE_THEN_256_TABS \
425 "\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
426 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
427 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
428 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
429 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
430 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
431 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t"
432
433const char* g_new_line_then_256_indent_bytes;
434uint32_t g_bytes_per_indent_depth;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100435
Nigel Taofdac24a2020-03-06 21:53:08 +1100436#ifndef DST_BUFFER_ARRAY_SIZE
437#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100438#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100439#ifndef SRC_BUFFER_ARRAY_SIZE
440#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100441#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100442#ifndef TOKEN_BUFFER_ARRAY_SIZE
443#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100444#endif
445
Nigel Taod60815c2020-03-26 14:32:35 +1100446uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
447uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
448wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100449
Nigel Taod60815c2020-03-26 14:32:35 +1100450wuffs_base__io_buffer g_dst;
451wuffs_base__io_buffer g_src;
452wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100453
Nigel Tao991bd512020-08-19 09:38:16 +1000454// g_cursor_index is the g_src.data.ptr index between the previous and current
455// token. An invariant is that (g_cursor_index <= g_src.meta.ri).
456size_t g_cursor_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100457
Nigel Taod60815c2020-03-26 14:32:35 +1100458uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100459
460enum class context {
461 none,
462 in_list_after_bracket,
463 in_list_after_value,
464 in_dict_after_brace,
465 in_dict_after_key,
466 in_dict_after_value,
Nigel Taod60815c2020-03-26 14:32:35 +1100467} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100468
Nigel Tao0cd2f982020-03-03 23:03:02 +1100469bool //
470in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100471 return (g_ctx == context::in_dict_after_brace) ||
472 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100473}
474
Nigel Tao21042052020-08-19 23:13:54 +1000475bool g_is_after_comment;
476
Nigel Taod60815c2020-03-26 14:32:35 +1100477uint32_t g_suppress_write_dst;
478bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100479
Nigel Tao0291a472020-08-13 22:40:10 +1000480wuffs_json__decoder g_dec;
Nigel Taoea532452020-07-27 00:03:00 +1000481
Nigel Tao0cd2f982020-03-03 23:03:02 +1100482// ----
483
484// Query is a JSON Pointer query. After initializing with a NUL-terminated C
485// string, its multiple fragments are consumed as the program walks the JSON
486// data from stdin. For example, letting "$" denote a NUL, suppose that we
487// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100488// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100489//
490// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
491// / a p p l e / b a n a n a / 1 2 / d u r i a n $
492// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
493// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100494// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100495//
Nigel Taob48ee752020-03-13 09:27:33 +1100496// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
497// start (inclusive) and end (exclusive) of the query fragment. They satisfy
498// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
499// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100500//
Nigel Taob48ee752020-03-13 09:27:33 +1100501// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
502// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100503//
504// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
505// tokens, as backslash-escaped values within that JSON string may each get
506// their own token.
507//
Nigel Taob48ee752020-03-13 09:27:33 +1100508// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100509//
Nigel Taob48ee752020-03-13 09:27:33 +1100510// While mfj remains non-nullptr, each token's unescaped contents are then
511// compared to that part of the fragment from mfj to mfk. If it is a prefix
512// (including the case of an exact match), then mfj is advanced by the
513// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100514//
515// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
516// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100517// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
518// responsible for calling Query::validate (with a strict_json_pointer_syntax
519// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100520//
Nigel Taob48ee752020-03-13 09:27:33 +1100521// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
522// incrementally match the object key with the query fragment. For example, if
523// we have already matched the "ban" of "banana", then we would accept any of
524// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
525// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100526//
Nigel Taob48ee752020-03-13 09:27:33 +1100527// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100528// v
529// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
530// / a p p l e / b a n a n a / 1 2 / d u r i a n $
531// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
532// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100533// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100534//
535// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100536// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
537// have a fragment match: the query fragment equals the object key. If there is
538// a next fragment (in this example, "12") we move the frag_etc pointers to its
539// start and end and increment Query::m_depth. Otherwise, we have matched the
540// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100541//
542// The discussion above centers on object keys. If the query fragment is
543// numeric then it can also match as an array index: the string fragment "12"
544// will match an array's 13th element (starting counting from zero). See RFC
545// 6901 for its precise definition of an "array index" number.
546//
Nigel Taob48ee752020-03-13 09:27:33 +1100547// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100548// whose type (wuffs_base__result_u64) is a result type. An error result means
549// that the fragment is not an array index. A value result holds the number of
550// list elements remaining. When matching a query fragment in an array (instead
551// of in an object), each element ticks this number down towards zero. At zero,
552// the upcoming JSON value is the one that matches the query fragment.
553class Query {
554 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100555 uint8_t* m_frag_i;
556 uint8_t* m_frag_j;
557 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100558
Nigel Taob48ee752020-03-13 09:27:33 +1100559 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100560
Nigel Taob48ee752020-03-13 09:27:33 +1100561 wuffs_base__result_u64 m_array_index;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100562
563 public:
564 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100565 m_frag_i = (uint8_t*)query_c_string;
566 m_frag_j = (uint8_t*)query_c_string;
567 m_frag_k = (uint8_t*)query_c_string;
568 m_depth = 0;
569 m_array_index.status.repr = "#main: not an array index query fragment";
570 m_array_index.value = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100571 }
572
Nigel Taob48ee752020-03-13 09:27:33 +1100573 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100574
Nigel Taob48ee752020-03-13 09:27:33 +1100575 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100576
577 // tick returns whether the fragment is a valid array index whose value is
578 // zero. If valid but non-zero, it decrements it and returns false.
579 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100580 if (m_array_index.status.is_ok()) {
Nigel Tao0291a472020-08-13 22:40:10 +1000581 if (m_array_index.value == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100582 return true;
583 }
Nigel Tao0291a472020-08-13 22:40:10 +1000584 m_array_index.value--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100585 }
586 return false;
587 }
588
589 // next_fragment moves to the next fragment, returning whether it existed.
590 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100591 uint8_t* k = m_frag_k;
592 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100593
594 this->reset(nullptr);
595
596 if (!k || (*k != '/')) {
597 return false;
598 }
599 k++;
600
601 bool all_digits = true;
602 uint8_t* i = k;
603 while ((*k != '\x00') && (*k != '/')) {
604 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
605 k++;
606 }
Nigel Taob48ee752020-03-13 09:27:33 +1100607 m_frag_i = i;
608 m_frag_j = i;
609 m_frag_k = k;
610 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100611 if (all_digits) {
612 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000613 m_array_index = wuffs_base__parse_number_u64(
614 wuffs_base__make_slice_u8(i, k - i),
615 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100616 }
617 return true;
618 }
619
Nigel Taob48ee752020-03-13 09:27:33 +1100620 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100621
Nigel Taob48ee752020-03-13 09:27:33 +1100622 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100623
624 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100625 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100626 return;
627 }
Nigel Taob48ee752020-03-13 09:27:33 +1100628 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100629 while (true) {
630 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100631 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100632 return;
633 }
634
635 if (*j == '\x00') {
636 break;
637
638 } else if (*j == '~') {
639 j++;
640 if (*j == '0') {
641 if (*ptr != '~') {
642 break;
643 }
644 } else if (*j == '1') {
645 if (*ptr != '/') {
646 break;
647 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100648 } else if (*j == 'n') {
649 if (*ptr != '\n') {
650 break;
651 }
652 } else if (*j == 'r') {
653 if (*ptr != '\r') {
654 break;
655 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100656 } else {
657 break;
658 }
659
660 } else if (*j != *ptr) {
661 break;
662 }
663
664 j++;
665 ptr++;
666 len--;
667 }
Nigel Taob48ee752020-03-13 09:27:33 +1100668 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100669 }
670
671 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100672 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100673 return;
674 }
675 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
676 size_t n = wuffs_base__utf_8__encode(
677 wuffs_base__make_slice_u8(&u[0],
678 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
679 code_point);
680 if (n > 0) {
681 this->incremental_match_slice(&u[0], n);
682 }
683 }
684
685 // validate returns whether the (ptr, len) arguments form a valid JSON
686 // Pointer. In particular, it must be valid UTF-8, and either be empty or
687 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100688 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
689 // followed by either 'n' or 'r'.
690 static bool validate(char* query_c_string,
691 size_t length,
692 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100693 if (length <= 0) {
694 return true;
695 }
696 if (query_c_string[0] != '/') {
697 return false;
698 }
699 wuffs_base__slice_u8 s =
700 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
701 bool previous_was_tilde = false;
702 while (s.len > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +1000703 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s.ptr, s.len);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100704 if (!o.is_valid()) {
705 return false;
706 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100707
708 if (previous_was_tilde) {
709 switch (o.code_point) {
710 case '0':
711 case '1':
712 break;
713 case 'n':
714 case 'r':
715 if (strict_json_pointer_syntax) {
716 return false;
717 }
718 break;
719 default:
720 return false;
721 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100722 }
723 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100724
Nigel Tao0cd2f982020-03-03 23:03:02 +1100725 s.ptr += o.byte_length;
726 s.len -= o.byte_length;
727 }
728 return !previous_was_tilde;
729 }
Nigel Taod60815c2020-03-26 14:32:35 +1100730} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100731
732// ----
733
Nigel Tao68920952020-03-03 11:25:18 +1100734struct {
735 int remaining_argc;
736 char** remaining_argv;
737
Nigel Tao3690e832020-03-12 16:52:26 +1100738 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100739 bool fail_if_unsandboxed;
Nigel Tao0291a472020-08-13 22:40:10 +1000740 bool input_allow_comments;
741 bool input_allow_extra_comma;
742 bool input_allow_inf_nan_numbers;
Nigel Tao21042052020-08-19 23:13:54 +1000743 bool output_comments;
Nigel Tao0291a472020-08-13 22:40:10 +1000744 bool output_extra_comma;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100745 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100746 bool tabs;
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000747
748 uint32_t max_output_depth;
749 uint32_t spaces;
750
751 char* query_c_string;
Nigel Taod60815c2020-03-26 14:32:35 +1100752} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100753
754const char* //
755parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000756 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100757 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100758
759 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
760 for (; c < argc; c++) {
761 char* arg = argv[c];
762 if (*arg++ != '-') {
763 break;
764 }
765
766 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
767 // cases, a bare "-" is not a flag (some programs may interpret it as
768 // stdin) and a bare "--" means to stop parsing flags.
769 if (*arg == '\x00') {
770 break;
771 } else if (*arg == '-') {
772 arg++;
773 if (*arg == '\x00') {
774 c++;
775 break;
776 }
777 }
778
Nigel Tao3690e832020-03-12 16:52:26 +1100779 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100780 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100781 continue;
782 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100783 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
784 g_flags.max_output_depth = 1;
785 continue;
786 } else if (!strncmp(arg, "d=", 2) ||
787 !strncmp(arg, "max-output-depth=", 16)) {
788 while (*arg++ != '=') {
789 }
790 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000791 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
792 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000793 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100794 g_flags.max_output_depth = (uint32_t)(u.value);
795 continue;
796 }
797 return g_usage;
798 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100799 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100800 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100801 continue;
802 }
Nigel Tao0291a472020-08-13 22:40:10 +1000803 if (!strcmp(arg, "input-allow-comments")) {
804 g_flags.input_allow_comments = true;
Nigel Tao4e193592020-07-15 12:48:57 +1000805 continue;
806 }
Nigel Tao0291a472020-08-13 22:40:10 +1000807 if (!strcmp(arg, "input-allow-extra-comma")) {
808 g_flags.input_allow_extra_comma = true;
Nigel Tao4e193592020-07-15 12:48:57 +1000809 continue;
810 }
Nigel Tao0291a472020-08-13 22:40:10 +1000811 if (!strcmp(arg, "input-allow-inf-nan-numbers")) {
812 g_flags.input_allow_inf_nan_numbers = true;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000813 continue;
814 }
Nigel Tao21042052020-08-19 23:13:54 +1000815 if (!strcmp(arg, "jwcc")) {
816 g_flags.input_allow_comments = true;
817 g_flags.input_allow_extra_comma = true;
818 g_flags.output_comments = true;
819 g_flags.output_extra_comma = true;
820 continue;
821 }
822 if (!strcmp(arg, "output-comments")) {
823 g_flags.output_comments = true;
824 continue;
825 }
Nigel Tao0291a472020-08-13 22:40:10 +1000826 if (!strcmp(arg, "output-extra-comma")) {
827 g_flags.output_extra_comma = true;
Nigel Taodd114692020-07-25 21:54:12 +1000828 continue;
829 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100830 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
831 while (*arg++ != '=') {
832 }
Nigel Taod60815c2020-03-26 14:32:35 +1100833 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100834 continue;
835 }
Nigel Taoecadf722020-07-13 08:22:34 +1000836 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
837 while (*arg++ != '=') {
838 }
839 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
840 g_flags.spaces = arg[0] - '0';
841 continue;
842 }
843 return g_usage;
844 }
845 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100846 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100847 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100848 }
849 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100850 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100851 continue;
852 }
853
Nigel Taod60815c2020-03-26 14:32:35 +1100854 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100855 }
856
Nigel Taod60815c2020-03-26 14:32:35 +1100857 if (g_flags.query_c_string &&
858 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
859 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100860 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
861 }
862
Nigel Taod60815c2020-03-26 14:32:35 +1100863 g_flags.remaining_argc = argc - c;
864 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100865 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100866}
867
Nigel Tao2cf76db2020-02-27 22:42:01 +1100868const char* //
869initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100870 g_dst = wuffs_base__make_io_buffer(
871 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100872 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100873
Nigel Taod60815c2020-03-26 14:32:35 +1100874 g_src = wuffs_base__make_io_buffer(
875 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100876 wuffs_base__empty_io_buffer_meta());
877
Nigel Taod60815c2020-03-26 14:32:35 +1100878 g_tok = wuffs_base__make_token_buffer(
879 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100880 wuffs_base__empty_token_buffer_meta());
881
Nigel Tao991bd512020-08-19 09:38:16 +1000882 g_cursor_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100883
Nigel Taod60815c2020-03-26 14:32:35 +1100884 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100885
Nigel Taod60815c2020-03-26 14:32:35 +1100886 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100887
Nigel Tao21042052020-08-19 23:13:54 +1000888 g_is_after_comment = false;
889
Nigel Tao68920952020-03-03 11:25:18 +1100890 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100891 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100892 return "main: unsandboxed";
893 }
Nigel Tao21042052020-08-19 23:13:54 +1000894 if (g_flags.output_comments && !g_flags.compact_output &&
895 !g_flags.output_extra_comma) {
896 return "main: -output-comments requires one or both of -compact-output and "
897 "-output-extra-comma";
898 }
Nigel Tao01abc842020-03-06 21:42:33 +1100899 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100900 if (g_flags.remaining_argc >
901 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
902 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100903 }
904
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000905 g_new_line_then_256_indent_bytes =
906 g_flags.tabs ? NEW_LINE_THEN_256_TABS : NEW_LINE_THEN_256_SPACES;
907 g_bytes_per_indent_depth = g_flags.tabs ? 1 : g_flags.spaces;
908
Nigel Taod60815c2020-03-26 14:32:35 +1100909 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100910
Nigel Taoc96b31c2020-07-27 22:37:23 +1000911 // If the query is non-empty, suppress writing to stdout until we've
Nigel Tao0cd2f982020-03-03 23:03:02 +1100912 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100913 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
914 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100915
Nigel Tao0291a472020-08-13 22:40:10 +1000916 TRY(g_dec.initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
917 .message());
Nigel Tao4b186b02020-03-18 14:25:21 +1100918
Nigel Tao0291a472020-08-13 22:40:10 +1000919 if (g_flags.input_allow_comments) {
920 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
921 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
Nigel Tao3c8589b2020-07-19 21:49:00 +1000922 }
Nigel Tao0291a472020-08-13 22:40:10 +1000923 if (g_flags.input_allow_extra_comma) {
924 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000925 }
Nigel Tao0291a472020-08-13 22:40:10 +1000926 if (g_flags.input_allow_inf_nan_numbers) {
927 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_INF_NAN_NUMBERS, true);
Nigel Tao51a38292020-07-19 22:43:17 +1000928 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000929
Nigel Tao4b186b02020-03-18 14:25:21 +1100930 // Consume an optional whitespace trailer. This isn't part of the JSON spec,
931 // but it works better with line oriented Unix tools (such as "echo 123 |
932 // jsonptr" where it's "echo", not "echo -n") or hand-edited JSON files which
933 // can accidentally contain trailing whitespace.
Nigel Tao0291a472020-08-13 22:40:10 +1000934 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100935
936 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100937}
Nigel Tao1b073492020-02-16 22:11:36 +1100938
939// ----
940
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100941// ignore_return_value suppresses errors from -Wall -Werror.
942static void //
943ignore_return_value(int ignored) {}
944
Nigel Tao2914bae2020-02-26 09:40:30 +1100945const char* //
946read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +1100947 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +1100948 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +1100949 }
Nigel Taod60815c2020-03-26 14:32:35 +1100950 g_src.compact();
951 if (g_src.meta.wi >= g_src.data.len) {
952 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100953 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100954 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000955 ssize_t n = read(g_input_file_descriptor, g_src.writer_pointer(),
956 g_src.writer_length());
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100957 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100958 g_src.meta.wi += n;
959 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100960 break;
961 } else if (errno != EINTR) {
962 return strerror(errno);
963 }
Nigel Tao1b073492020-02-16 22:11:36 +1100964 }
965 return nullptr;
966}
967
Nigel Tao2914bae2020-02-26 09:40:30 +1100968const char* //
969flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100970 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000971 size_t n = g_dst.reader_length();
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100972 if (n == 0) {
973 break;
Nigel Tao1b073492020-02-16 22:11:36 +1100974 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100975 const int stdout_fd = 1;
Nigel Taod6a10df2020-07-27 11:47:47 +1000976 ssize_t i = write(stdout_fd, g_dst.reader_pointer(), n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100977 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100978 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100979 } else if (errno != EINTR) {
980 return strerror(errno);
981 }
Nigel Tao1b073492020-02-16 22:11:36 +1100982 }
Nigel Taod60815c2020-03-26 14:32:35 +1100983 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +1100984 return nullptr;
985}
986
Nigel Tao2914bae2020-02-26 09:40:30 +1100987const char* //
Nigel Tao6b86cbc2020-08-19 11:39:56 +1000988write_dst_slow(const void* s, size_t n) {
Nigel Tao1b073492020-02-16 22:11:36 +1100989 const uint8_t* p = static_cast<const uint8_t*>(s);
990 while (n > 0) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000991 size_t i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +1100992 if (i == 0) {
993 const char* z = flush_dst();
994 if (z) {
995 return z;
996 }
Nigel Taod6a10df2020-07-27 11:47:47 +1000997 i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +1100998 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100999 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +11001000 }
1001 }
1002
1003 if (i > n) {
1004 i = n;
1005 }
Nigel Taod60815c2020-03-26 14:32:35 +11001006 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
1007 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +11001008 p += i;
1009 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +11001010 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +11001011 }
1012 return nullptr;
1013}
1014
Nigel Tao6b86cbc2020-08-19 11:39:56 +10001015inline const char* //
1016write_dst(const void* s, size_t n) {
1017 if (g_suppress_write_dst > 0) {
1018 return nullptr;
1019 } else if (n <= (DST_BUFFER_ARRAY_SIZE - g_dst.meta.wi)) {
1020 memcpy(g_dst.data.ptr + g_dst.meta.wi, s, n);
1021 g_dst.meta.wi += n;
1022 g_wrote_to_dst = true;
1023 return nullptr;
1024 }
1025 return write_dst_slow(s, n);
1026}
1027
Nigel Tao21042052020-08-19 23:13:54 +10001028#define TRY_INDENT_WITH_LEADING_NEW_LINE \
1029 do { \
1030 uint32_t indent = g_depth * g_bytes_per_indent_depth; \
1031 TRY(write_dst(g_new_line_then_256_indent_bytes, 1 + (indent & 0xFF))); \
1032 for (indent >>= 8; indent > 0; indent--) { \
1033 TRY(write_dst(g_new_line_then_256_indent_bytes + 1, 0x100)); \
1034 } \
1035 } while (false)
1036
1037// TRY_INDENT_SANS_LEADING_NEW_LINE is used after comments, which print their
1038// own "\n".
1039#define TRY_INDENT_SANS_LEADING_NEW_LINE \
1040 do { \
1041 uint32_t indent = g_depth * g_bytes_per_indent_depth; \
1042 TRY(write_dst(g_new_line_then_256_indent_bytes + 1, (indent & 0xFF))); \
1043 for (indent >>= 8; indent > 0; indent--) { \
1044 TRY(write_dst(g_new_line_then_256_indent_bytes + 1, 0x100)); \
1045 } \
1046 } while (false)
1047
Nigel Tao1b073492020-02-16 22:11:36 +11001048// ----
1049
Nigel Tao2914bae2020-02-26 09:40:30 +11001050const char* //
Nigel Tao7cb76542020-07-19 22:19:04 +10001051handle_unicode_code_point(uint32_t ucp) {
Nigel Tao63441812020-08-21 14:05:48 +10001052 if (ucp < 0x80) {
1053 return write_dst(&ascii_escapes[8 * ucp + 1], ascii_escapes[8 * ucp]);
Nigel Tao7cb76542020-07-19 22:19:04 +10001054 }
Nigel Tao7cb76542020-07-19 22:19:04 +10001055 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1056 size_t n = wuffs_base__utf_8__encode(
1057 wuffs_base__make_slice_u8(&u[0],
1058 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1059 ucp);
1060 if (n == 0) {
1061 return "main: internal error: unexpected Unicode code point";
1062 }
Nigel Tao0291a472020-08-13 22:40:10 +10001063 return write_dst(&u[0], n);
Nigel Tao168f60a2020-07-14 13:19:33 +10001064}
1065
Nigel Taod191a3f2020-07-19 22:14:54 +10001066// ----
1067
Nigel Tao50db4a42020-08-20 11:31:28 +10001068inline const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001069handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001070 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001071 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001072 uint64_t vbd = t.value_base_detail();
Nigel Taoee6927f2020-07-27 12:08:33 +10001073 uint64_t token_length = t.length();
Nigel Tao991bd512020-08-19 09:38:16 +10001074 // The "- token_length" is because we incremented g_cursor_index before
1075 // calling handle_token.
Nigel Taoee6927f2020-07-27 12:08:33 +10001076 wuffs_base__slice_u8 tok = wuffs_base__make_slice_u8(
Nigel Tao991bd512020-08-19 09:38:16 +10001077 g_src.data.ptr + g_cursor_index - token_length, token_length);
Nigel Tao1b073492020-02-16 22:11:36 +11001078
1079 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001080 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001081 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001082 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001083 return "main: no match for query";
1084 }
Nigel Taod60815c2020-03-26 14:32:35 +11001085 if (g_depth <= 0) {
1086 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001087 }
Nigel Taod60815c2020-03-26 14:32:35 +11001088 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001089
Nigel Taod60815c2020-03-26 14:32:35 +11001090 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1091 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001092 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao0291a472020-08-13 22:40:10 +10001093 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1094 ? "\"[…]\""
1095 : "\"{…}\"",
1096 7));
1097 } else {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001098 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001099 if ((g_ctx != context::in_list_after_bracket) &&
1100 (g_ctx != context::in_dict_after_brace) &&
1101 !g_flags.compact_output) {
Nigel Tao21042052020-08-19 23:13:54 +10001102 if (g_is_after_comment) {
1103 TRY_INDENT_SANS_LEADING_NEW_LINE;
1104 } else {
1105 if (g_flags.output_extra_comma) {
1106 TRY(write_dst(",", 1));
1107 }
1108 TRY_INDENT_WITH_LEADING_NEW_LINE;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001109 }
Nigel Tao1b073492020-02-16 22:11:36 +11001110 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001111
1112 TRY(write_dst(
1113 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1114 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001115 }
1116
Nigel Taod60815c2020-03-26 14:32:35 +11001117 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1118 ? context::in_list_after_value
1119 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001120 goto after_value;
1121 }
1122
Nigel Taod1c928a2020-02-28 12:43:53 +11001123 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
Nigel Tao0291a472020-08-13 22:40:10 +10001124 // continuation of a multi-token chain.
1125 if (start_of_token_chain) {
Nigel Tao21042052020-08-19 23:13:54 +10001126 if (g_is_after_comment) {
1127 TRY_INDENT_SANS_LEADING_NEW_LINE;
1128 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001129 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1130 } else if (g_ctx != context::none) {
Nigel Tao0291a472020-08-13 22:40:10 +10001131 if ((g_ctx != context::in_list_after_bracket) &&
1132 (g_ctx != context::in_dict_after_brace)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001133 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001134 }
Nigel Taod60815c2020-03-26 14:32:35 +11001135 if (!g_flags.compact_output) {
Nigel Tao21042052020-08-19 23:13:54 +10001136 TRY_INDENT_WITH_LEADING_NEW_LINE;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001137 }
1138 }
1139
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001140 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001141 if (g_query.is_at(g_depth)) {
1142 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001143 case context::in_list_after_bracket:
1144 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001145 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001146 break;
1147 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001148 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001149 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001150 default:
1151 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001152 }
1153 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001154 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001155 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001156 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001157 // There is no next fragment. We have matched the complete query, and
1158 // the upcoming JSON value is the result of that query.
1159 //
Nigel Taod60815c2020-03-26 14:32:35 +11001160 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1161 // we were about to decode a top-level value. This makes any subsequent
1162 // indentation be relative to this point, and we will return g_eod
1163 // after the upcoming JSON value is complete.
1164 if (g_suppress_write_dst != 1) {
1165 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001166 }
Nigel Taod60815c2020-03-26 14:32:35 +11001167 g_suppress_write_dst = 0;
1168 g_ctx = context::none;
1169 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001170 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1171 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1172 // The query has moved on to the next fragment but the upcoming JSON
1173 // value is not a container.
1174 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001175 }
1176 }
1177
1178 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001179 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001180 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001181 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001182 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1183 g_suppress_write_dst++;
Nigel Tao0291a472020-08-13 22:40:10 +10001184 } else {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001185 TRY(write_dst(
1186 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1187 1));
1188 }
Nigel Taod60815c2020-03-26 14:32:35 +11001189 g_depth++;
1190 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1191 ? context::in_list_after_bracket
1192 : context::in_dict_after_brace;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001193 return nullptr;
1194
Nigel Tao2cf76db2020-02-27 22:42:01 +11001195 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Tao0291a472020-08-13 22:40:10 +10001196 if (start_of_token_chain) {
1197 TRY(write_dst("\"", 1));
1198 g_query.restart_fragment(in_dict_before_key() &&
1199 g_query.is_at(g_depth));
1200 }
1201
1202 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_0_DST_1_SRC_DROP) {
1203 // No-op.
1204 } else if (vbd &
1205 WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
1206 TRY(write_dst(tok.ptr, tok.len));
1207 g_query.incremental_match_slice(tok.ptr, tok.len);
1208 } else {
1209 return "main: internal error: unexpected string-token conversion";
1210 }
1211
Nigel Tao496e88b2020-04-09 22:10:08 +10001212 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001213 return nullptr;
1214 }
Nigel Tao0291a472020-08-13 22:40:10 +10001215 TRY(write_dst("\"", 1));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001216 goto after_value;
1217
1218 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001219 if (!t.continued()) {
1220 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001221 }
1222 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001223 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001224 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001225
Nigel Tao85fba7f2020-02-29 16:28:06 +11001226 case WUFFS_BASE__TOKEN__VBC__LITERAL:
Nigel Tao2cf76db2020-02-27 22:42:01 +11001227 case WUFFS_BASE__TOKEN__VBC__NUMBER:
Nigel Tao0291a472020-08-13 22:40:10 +10001228 TRY(write_dst(tok.ptr, tok.len));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001229 goto after_value;
Nigel Tao1b073492020-02-16 22:11:36 +11001230 }
1231
Nigel Tao0291a472020-08-13 22:40:10 +10001232 // Return an error if we didn't match the (vbc, vbd) pair.
Nigel Tao2cf76db2020-02-27 22:42:01 +11001233 return "main: internal error: unexpected token";
1234 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001235
Nigel Tao2cf76db2020-02-27 22:42:01 +11001236 // Book-keeping after completing a value (whether a container value or a
1237 // simple value). Empty parent containers are no longer empty. If the parent
1238 // container is a "{...}" object, toggle between keys and values.
1239after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001240 if (g_depth == 0) {
1241 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001242 }
Nigel Taod60815c2020-03-26 14:32:35 +11001243 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001244 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001245 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001246 break;
1247 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001248 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001249 break;
1250 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001251 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001252 break;
1253 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001254 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001255 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001256 default:
1257 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001258 }
1259 return nullptr;
1260}
1261
1262const char* //
1263main1(int argc, char** argv) {
1264 TRY(initialize_globals(argc, argv));
1265
Nigel Taocd183f92020-07-14 12:11:05 +10001266 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001267 while (true) {
Nigel Tao0291a472020-08-13 22:40:10 +10001268 wuffs_base__status status = g_dec.decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001269 &g_tok, &g_src,
1270 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001271
Nigel Taod60815c2020-03-26 14:32:35 +11001272 while (g_tok.meta.ri < g_tok.meta.wi) {
1273 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao991bd512020-08-19 09:38:16 +10001274 uint64_t token_length = t.length();
1275 if ((g_src.meta.ri - g_cursor_index) < token_length) {
Nigel Taod60815c2020-03-26 14:32:35 +11001276 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001277 }
Nigel Tao991bd512020-08-19 09:38:16 +10001278 g_cursor_index += token_length;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001279
Nigel Tao21042052020-08-19 23:13:54 +10001280 // Handle filler tokens (e.g. whitespace, punctuation and comments).
1281 // These are skipped, unless -output-comments is enabled.
Nigel Tao3c8589b2020-07-19 21:49:00 +10001282 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao21042052020-08-19 23:13:54 +10001283 if (g_flags.output_comments &&
1284 (t.value_base_detail() &
1285 WUFFS_BASE__TOKEN__VBD__FILLER__COMMENT_ANY)) {
1286 if (g_flags.compact_output) {
1287 TRY(write_dst(g_src.data.ptr + g_cursor_index - token_length,
1288 token_length));
1289 } else {
1290 if (start_of_token_chain) {
1291 if (g_is_after_comment) {
1292 TRY_INDENT_SANS_LEADING_NEW_LINE;
1293 } else if (g_ctx != context::none) {
1294 if (g_ctx == context::in_dict_after_key) {
1295 TRY(write_dst(":", 1));
1296 } else if ((g_ctx != context::in_list_after_bracket) &&
1297 (g_ctx != context::in_dict_after_brace)) {
1298 TRY(write_dst(",", 1));
1299 }
1300 if (!g_flags.compact_output) {
1301 TRY_INDENT_WITH_LEADING_NEW_LINE;
1302 }
1303 }
1304 }
1305 TRY(write_dst(g_src.data.ptr + g_cursor_index - token_length,
1306 token_length));
1307 if (!t.continued() &&
1308 (t.value_base_detail() &
1309 WUFFS_BASE__TOKEN__VBD__FILLER__COMMENT_BLOCK)) {
1310 TRY(write_dst("\n", 1));
1311 }
1312 g_is_after_comment = true;
1313 }
1314 }
Nigel Tao496e88b2020-04-09 22:10:08 +10001315 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001316 continue;
1317 }
1318
Nigel Tao2ef39992020-04-09 17:24:39 +10001319 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao21042052020-08-19 23:13:54 +10001320 g_is_after_comment = false;
Nigel Tao496e88b2020-04-09 22:10:08 +10001321 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001322 if (z == nullptr) {
1323 continue;
Nigel Taod60815c2020-03-26 14:32:35 +11001324 } else if (z == g_eod) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001325 goto end_of_data;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001326 }
1327 return z;
Nigel Tao1b073492020-02-16 22:11:36 +11001328 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001329
1330 if (status.repr == nullptr) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001331 return "main: internal error: unexpected end of token stream";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001332 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Tao991bd512020-08-19 09:38:16 +10001333 if (g_cursor_index != g_src.meta.ri) {
Nigel Taod60815c2020-03-26 14:32:35 +11001334 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001335 }
1336 TRY(read_src());
Nigel Tao991bd512020-08-19 09:38:16 +10001337 g_cursor_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001338 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001339 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001340 } else {
1341 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001342 }
1343 }
Nigel Tao0cd2f982020-03-03 23:03:02 +11001344end_of_data:
1345
Nigel Taod60815c2020-03-26 14:32:35 +11001346 // With a non-empty g_query, don't try to consume trailing whitespace or
Nigel Tao0cd2f982020-03-03 23:03:02 +11001347 // confirm that we've processed all the tokens.
Nigel Taod60815c2020-03-26 14:32:35 +11001348 if (g_flags.query_c_string && *g_flags.query_c_string) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001349 return nullptr;
1350 }
Nigel Tao6b161af2020-02-24 11:01:48 +11001351
Nigel Tao6b161af2020-02-24 11:01:48 +11001352 // Check that we've exhausted the input.
Nigel Taod60815c2020-03-26 14:32:35 +11001353 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001354 TRY(read_src());
1355 }
Nigel Taod60815c2020-03-26 14:32:35 +11001356 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
Nigel Tao0291a472020-08-13 22:40:10 +10001357 return "main: valid JSON followed by further (unexpected) data";
Nigel Tao6b161af2020-02-24 11:01:48 +11001358 }
1359
1360 // Check that we've used all of the decoded tokens, other than trailing
Nigel Tao4b186b02020-03-18 14:25:21 +11001361 // filler tokens. For example, "true\n" is valid JSON (and fully consumed
1362 // with WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE enabled) with a trailing
1363 // filler token for the "\n".
Nigel Taod60815c2020-03-26 14:32:35 +11001364 for (; g_tok.meta.ri < g_tok.meta.wi; g_tok.meta.ri++) {
1365 if (g_tok.data.ptr[g_tok.meta.ri].value_base_category() !=
Nigel Tao6b161af2020-02-24 11:01:48 +11001366 WUFFS_BASE__TOKEN__VBC__FILLER) {
1367 return "main: internal error: decoded OK but unprocessed tokens remain";
1368 }
1369 }
1370
1371 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001372}
1373
Nigel Tao2914bae2020-02-26 09:40:30 +11001374int //
1375compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001376 if (!status_msg) {
1377 return 0;
1378 }
Nigel Tao01abc842020-03-06 21:42:33 +11001379 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001380 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001381 n = strlen(status_msg);
1382 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001383 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001384 if (n >= 2047) {
1385 status_msg = "main: internal error: error message is too long";
1386 n = strnlen(status_msg, 2047);
1387 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001388 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001389 const int stderr_fd = 2;
1390 ignore_return_value(write(stderr_fd, status_msg, n));
1391 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Tao9cc2c252020-02-23 17:05:49 +11001392 // Return an exit code of 1 for regular (forseen) errors, e.g. badly
1393 // formatted or unsupported input.
1394 //
1395 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1396 // run-time checks found that an internal invariant did not hold.
1397 //
1398 // Automated testing, including badly formatted inputs, can therefore
1399 // discriminate between expected failure (exit code 1) and unexpected failure
1400 // (other non-zero exit codes). Specifically, exit code 2 for internal
1401 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1402 // linux) for a segmentation fault (e.g. null pointer dereference).
1403 return strstr(status_msg, "internal error:") ? 2 : 1;
1404}
1405
Nigel Tao2914bae2020-02-26 09:40:30 +11001406int //
1407main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001408 // Look for an input filename (the first non-flag argument) in argv. If there
1409 // is one, open it (but do not read from it) before we self-impose a sandbox.
1410 //
1411 // Flags start with "-", unless it comes after a bare "--" arg.
1412 {
1413 bool dash_dash = false;
1414 int a;
1415 for (a = 1; a < argc; a++) {
1416 char* arg = argv[a];
1417 if ((arg[0] == '-') && !dash_dash) {
1418 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1419 continue;
1420 }
Nigel Taod60815c2020-03-26 14:32:35 +11001421 g_input_file_descriptor = open(arg, O_RDONLY);
1422 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001423 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1424 return 1;
1425 }
1426 break;
1427 }
1428 }
1429
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001430#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1431 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001432 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001433#endif
1434
Nigel Tao0cd2f982020-03-03 23:03:02 +11001435 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001436 if (g_wrote_to_dst) {
Nigel Tao0291a472020-08-13 22:40:10 +10001437 const char* z1 = write_dst("\n", 1);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001438 const char* z2 = flush_dst();
1439 z = z ? z : (z1 ? z1 : z2);
1440 }
1441 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001442
1443#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1444 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1445 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1446 // only SYS_exit.
1447 syscall(SYS_exit, exit_code);
1448#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001449 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001450}