blob: 0ee259b0bf23e84ab3b055776d46fbd6f5dcff01 [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
Nigel Taob55d5392020-09-11 08:11:02 +100017// jsonptr is discussed extensively at
18// https://nigeltao.github.io/blog/2020/jsonptr.html
19
Nigel Tao1b073492020-02-16 22:11:36 +110020/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110021jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao0291a472020-08-13 22:40:10 +100022(RFC 6901) query syntax. It reads UTF-8 JSON from stdin and writes
23canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110024
Nigel Taod60815c2020-03-26 14:32:35 +110025See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110026
27----
28
29JSON Pointer (and this program's implementation) is one of many JSON query
30languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
31simple and fewer-featured compared to those others.
32
Nigel Tao0291a472020-08-13 22:40:10 +100033One benefit of simplicity is that this program's JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110034implementations do not dynamically allocate or free memory (yet it does not
35require that the entire input fits in memory at once). They are therefore
36trivially protected against certain bug classes: memory leaks, double-frees and
37use-after-frees.
38
Nigel Tao0291a472020-08-13 22:40:10 +100039The core JSON implementation is also written in the Wuffs programming language
40(and then transpiled to C/C++), which is memory-safe (e.g. array indexing is
41bounds-checked) but also guards against integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110042
Nigel Taofe0cbbd2020-03-05 22:01:30 +110043For defense in depth, on Linux, this program also self-imposes a
44SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
45or writing its output. Under this sandbox, the only permitted system calls are
46read, write, exit and sigreturn.
47
Nigel Tao0291a472020-08-13 22:40:10 +100048All together, this program aims to safely handle untrusted JSON files without
49fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110050
51----
Nigel Tao1b073492020-02-16 22:11:36 +110052
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110053As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
54JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
55"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
56was first published on 2016-10-26 and updated on 2018-03-30.
57
Nigel Tao0cd2f982020-03-03 23:03:02 +110058After modifying this program, run "build-example.sh example/jsonptr/" and then
59"script/run-json-test-suite.sh" to catch correctness regressions.
60
61----
62
Nigel Taod0b16cb2020-03-14 10:15:54 +110063This program uses Wuffs' JSON decoder at a relatively low level, processing the
64decoder's token-stream output individually. The core loop, in pseudo-code, is
65"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110066changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110067output text based on that state and the token's source text. Notably,
68handle_token is not recursive, even though JSON values can nest.
69
70This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
71string, object) comprises one or more JSON tokens.
72
73An alternative, higher-level approach is in the sibling example/jsonfindptrs
74program. Neither approach is better or worse per se, but when studying this
75program, be aware that there are multiple ways to use Wuffs' JSON decoder.
76
77The two programs, jsonfindptrs and jsonptr, also demonstrate different
78trade-offs with regard to JSON object duplicate keys. The JSON spec permits
79different implementations to allow or reject duplicate keys. It is not always
80clear which approach is safer. Rejecting them is certainly unambiguous, and
81security bugs can lurk in ambiguous corners of a file format, if two different
82implementations both silently accept a file but differ on how to interpret it.
83On the other hand, in the worst case, detecting duplicate keys requires O(N)
84memory, where N is the size of the (potentially untrusted) input.
85
86This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
87mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
88it runs in a SECCOMP_MODE_STRICT sandbox.
89
90----
91
Nigel Tao50bfab92020-08-05 11:39:09 +100092To run:
Nigel Tao1b073492020-02-16 22:11:36 +110093
94$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
95
96for a C++ compiler $CXX, such as clang++ or g++.
97*/
98
Nigel Tao721190a2020-04-03 22:25:21 +110099#if defined(__cplusplus) && (__cplusplus < 201103L)
100#error "This C++ program requires -std=c++11 or later"
101#endif
102
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100103#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100104#include <fcntl.h>
105#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100106#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100107#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100108
109// Wuffs ships as a "single file C library" or "header file library" as per
110// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
111//
112// To use that single file as a "foo.c"-like implementation, instead of a
113// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
114// compiling it.
115#define WUFFS_IMPLEMENTATION
116
117// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
118// release/c/etc.c whitelist which parts of Wuffs to build. That file contains
119// the entire Wuffs standard library, implementing a variety of codecs and file
120// formats. Without this macro definition, an optimizing compiler or linker may
121// very well discard Wuffs code for unused codecs, but listing the Wuffs
122// modules we use makes that process explicit. Preprocessing means that such
123// code simply isn't compiled.
124#define WUFFS_CONFIG__MODULES
125#define WUFFS_CONFIG__MODULE__BASE
126#define WUFFS_CONFIG__MODULE__JSON
127
128// If building this program in an environment that doesn't easily accommodate
129// relative includes, you can use the script/inline-c-relative-includes.go
130// program to generate a stand-alone C++ file.
131#include "../../release/c/wuffs-unsupported-snapshot.c"
132
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100133#if defined(__linux__)
134#include <linux/prctl.h>
135#include <linux/seccomp.h>
136#include <sys/prctl.h>
137#include <sys/syscall.h>
138#define WUFFS_EXAMPLE_USE_SECCOMP
139#endif
140
Nigel Tao2cf76db2020-02-27 22:42:01 +1100141#define TRY(error_msg) \
142 do { \
143 const char* z = error_msg; \
144 if (z) { \
145 return z; \
146 } \
147 } while (false)
148
Nigel Taod60815c2020-03-26 14:32:35 +1100149static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100150
Nigel Taod60815c2020-03-26 14:32:35 +1100151static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100152 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100153 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100154 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100155 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100156 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100157 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000158 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100159 " -t -tabs\n"
160 " -fail-if-unsandboxed\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000161 " -input-allow-comments\n"
162 " -input-allow-extra-comma\n"
163 " -input-allow-inf-nan-numbers\n"
Nigel Tao21042052020-08-19 23:13:54 +1000164 " -jwcc\n"
165 " -output-comments\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000166 " -output-extra-comma\n"
Nigel Tao75682542020-08-22 21:40:18 +1000167 " -output-inf-nan-numbers\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000168 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100169 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100170 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100171 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100172 "----\n"
173 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100174 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000175 "Pointer (RFC 6901) query syntax. It reads UTF-8 JSON from stdin and\n"
176 "writes canonicalized, formatted UTF-8 JSON to stdout.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100177 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000178 "Canonicalized means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-written\n"
179 "as \"abc\\n\\txÅ·z\". It does not sort object keys, nor does it reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100180 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100181 "\n"
182 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000183 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000184 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags.\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000185 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000186 "The -input-allow-comments flag allows \"/*slash-star*/\" and\n"
187 "\"//slash-slash\" C-style comments within JSON input. Such comments are\n"
Nigel Tao21042052020-08-19 23:13:54 +1000188 "stripped from the output unless -output-comments was also set.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100189 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000190 "The -input-allow-extra-comma flag allows input like \"[1,2,]\", with a\n"
191 "comma after the final element of a JSON list or dictionary.\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000192 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000193 "The -input-allow-inf-nan-numbers flag allows non-finite floating point\n"
Nigel Tao75682542020-08-22 21:40:18 +1000194 "numbers (infinities and not-a-numbers) within JSON input. This flag\n"
195 "requires that -output-inf-nan-numbers also be set.\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000196 "\n"
Nigel Tao21042052020-08-19 23:13:54 +1000197 "The -output-comments flag copies any input comments to the output. It\n"
198 "has no effect unless -input-allow-comments was also set. Comments look\n"
199 "better after commas than before them, but a closing \"]\" or \"}\" can\n"
200 "occur after arbitrarily many comments, so -output-comments also requires\n"
201 "that one or both of -compact-output and -output-extra-comma be set.\n"
202 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000203 "The -output-extra-comma flag writes output like \"[1,2,]\", with a comma\n"
204 "after the final element of a JSON list or dictionary. Such commas are\n"
205 "non-compliant with the JSON specification but many parsers accept them\n"
206 "and they can produce simpler line-based diffs. This flag is ignored when\n"
207 "-compact-output is set.\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000208 "\n"
Nigel Tao21042052020-08-19 23:13:54 +1000209 "The -jwcc flag (JSON With Commas and Comments) enables all of:\n"
210 " -input-allow-comments\n"
211 " -input-allow-extra-comma\n"
212 " -output-comments\n"
213 " -output-extra-comma\n"
214 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100215 "----\n"
216 "\n"
217 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100218 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100219 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
220 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100221 "will print:\n"
222 " \"baz\"\n"
223 "\n"
224 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100225 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100226 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
227 "child (the value in a key-value pair) of the root whose key is the empty\n"
228 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100229 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000230 "If the query found a valid JSON value, this program will return a zero\n"
231 "exit code even if the rest of the input isn't valid JSON. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100232 "did not find a value, or found an invalid one, this program returns a\n"
233 "non-zero exit code, but may still print partial output to stdout.\n"
234 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000235 "The JSON specification (https://json.org/) permits implementations that\n"
236 "allow duplicate keys, as this one does. This JSON Pointer implementation\n"
237 "is also greedy, following the first match for each fragment without\n"
238 "back-tracking. For example, the \"/foo/bar\" query will fail if the root\n"
239 "object has multiple \"foo\" children but the first one doesn't have a\n"
240 "\"bar\" child, even if later ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100241 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000242 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
243 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
244 "\"~\" and \"/\". Without this flag, this program also lets \"~n\" and\n"
245 "\"~r\" escape the New Line and Carriage Return ASCII control characters,\n"
246 "which can work better with line oriented Unix tools that assume exactly\n"
247 "one value (i.e. one JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100248 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100249 "----\n"
250 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100251 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000252 "output depth. JSON containers ([] arrays and {} objects) can hold other\n"
253 "containers. When this flag is set, containers at depth NUM are replaced\n"
254 "with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is equivalent to\n"
255 "-d=1. The flag's absence is equivalent to an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100256 "\n"
257 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000258 "affect whether or not the input is considered valid JSON. The JSON\n"
259 "specification permits implementations to set their own maximum input\n"
260 "depth. This JSON implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100261 "\n"
262 "Depth is measured in terms of nested containers. It is unaffected by the\n"
263 "number of spaces or tabs used to indent.\n"
264 "\n"
265 "When both -max-output-depth and -query are set, the output depth is\n"
266 "measured from when the query resolves, not from the input root. The\n"
267 "input depth (measured from the root) is still limited to 1024.\n"
268 "\n"
269 "----\n"
270 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100271 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
272 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100273 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100274
Nigel Tao2cf76db2020-02-27 22:42:01 +1100275// ----
276
Nigel Tao63441812020-08-21 14:05:48 +1000277// ascii_escapes was created by script/print-json-ascii-escapes.go.
278const uint8_t ascii_escapes[1024] = {
279 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x30, 0x00, // 0x00: "\\u0000"
280 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x31, 0x00, // 0x01: "\\u0001"
281 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x32, 0x00, // 0x02: "\\u0002"
282 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x33, 0x00, // 0x03: "\\u0003"
283 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x34, 0x00, // 0x04: "\\u0004"
284 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x35, 0x00, // 0x05: "\\u0005"
285 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x36, 0x00, // 0x06: "\\u0006"
286 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x37, 0x00, // 0x07: "\\u0007"
287 0x02, 0x5C, 0x62, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x08: "\\b"
288 0x02, 0x5C, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x09: "\\t"
289 0x02, 0x5C, 0x6E, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0A: "\\n"
290 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x42, 0x00, // 0x0B: "\\u000B"
291 0x02, 0x5C, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0C: "\\f"
292 0x02, 0x5C, 0x72, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0D: "\\r"
293 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x45, 0x00, // 0x0E: "\\u000E"
294 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x46, 0x00, // 0x0F: "\\u000F"
295 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x30, 0x00, // 0x10: "\\u0010"
296 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x31, 0x00, // 0x11: "\\u0011"
297 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x32, 0x00, // 0x12: "\\u0012"
298 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x33, 0x00, // 0x13: "\\u0013"
299 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x34, 0x00, // 0x14: "\\u0014"
300 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x35, 0x00, // 0x15: "\\u0015"
301 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x36, 0x00, // 0x16: "\\u0016"
302 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x37, 0x00, // 0x17: "\\u0017"
303 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x38, 0x00, // 0x18: "\\u0018"
304 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x39, 0x00, // 0x19: "\\u0019"
305 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x41, 0x00, // 0x1A: "\\u001A"
306 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x42, 0x00, // 0x1B: "\\u001B"
307 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x43, 0x00, // 0x1C: "\\u001C"
308 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x44, 0x00, // 0x1D: "\\u001D"
309 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x45, 0x00, // 0x1E: "\\u001E"
310 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x46, 0x00, // 0x1F: "\\u001F"
311 0x06, 0x5C, 0x75, 0x30, 0x30, 0x32, 0x30, 0x00, // 0x20: "\\u0020"
312 0x01, 0x21, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x21: "!"
313 0x02, 0x5C, 0x22, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x22: "\\\""
314 0x01, 0x23, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x23: "#"
315 0x01, 0x24, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x24: "$"
316 0x01, 0x25, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x25: "%"
317 0x01, 0x26, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x26: "&"
318 0x01, 0x27, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x27: "'"
319 0x01, 0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x28: "("
320 0x01, 0x29, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x29: ")"
321 0x01, 0x2A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2A: "*"
322 0x01, 0x2B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2B: "+"
323 0x01, 0x2C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2C: ","
324 0x01, 0x2D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2D: "-"
325 0x01, 0x2E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2E: "."
326 0x01, 0x2F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2F: "/"
327 0x01, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x30: "0"
328 0x01, 0x31, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x31: "1"
329 0x01, 0x32, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x32: "2"
330 0x01, 0x33, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x33: "3"
331 0x01, 0x34, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x34: "4"
332 0x01, 0x35, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x35: "5"
333 0x01, 0x36, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x36: "6"
334 0x01, 0x37, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x37: "7"
335 0x01, 0x38, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x38: "8"
336 0x01, 0x39, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x39: "9"
337 0x01, 0x3A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3A: ":"
338 0x01, 0x3B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3B: ";"
339 0x01, 0x3C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3C: "<"
340 0x01, 0x3D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3D: "="
341 0x01, 0x3E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3E: ">"
342 0x01, 0x3F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3F: "?"
343 0x01, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x40: "@"
344 0x01, 0x41, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x41: "A"
345 0x01, 0x42, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x42: "B"
346 0x01, 0x43, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x43: "C"
347 0x01, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x44: "D"
348 0x01, 0x45, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x45: "E"
349 0x01, 0x46, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x46: "F"
350 0x01, 0x47, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x47: "G"
351 0x01, 0x48, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x48: "H"
352 0x01, 0x49, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x49: "I"
353 0x01, 0x4A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4A: "J"
354 0x01, 0x4B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4B: "K"
355 0x01, 0x4C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4C: "L"
356 0x01, 0x4D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4D: "M"
357 0x01, 0x4E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4E: "N"
358 0x01, 0x4F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4F: "O"
359 0x01, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x50: "P"
360 0x01, 0x51, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x51: "Q"
361 0x01, 0x52, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x52: "R"
362 0x01, 0x53, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x53: "S"
363 0x01, 0x54, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x54: "T"
364 0x01, 0x55, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x55: "U"
365 0x01, 0x56, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x56: "V"
366 0x01, 0x57, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x57: "W"
367 0x01, 0x58, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x58: "X"
368 0x01, 0x59, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x59: "Y"
369 0x01, 0x5A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5A: "Z"
370 0x01, 0x5B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5B: "["
371 0x02, 0x5C, 0x5C, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5C: "\\\\"
372 0x01, 0x5D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5D: "]"
373 0x01, 0x5E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5E: "^"
374 0x01, 0x5F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5F: "_"
375 0x01, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x60: "`"
376 0x01, 0x61, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x61: "a"
377 0x01, 0x62, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x62: "b"
378 0x01, 0x63, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x63: "c"
379 0x01, 0x64, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x64: "d"
380 0x01, 0x65, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x65: "e"
381 0x01, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x66: "f"
382 0x01, 0x67, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x67: "g"
383 0x01, 0x68, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x68: "h"
384 0x01, 0x69, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x69: "i"
385 0x01, 0x6A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6A: "j"
386 0x01, 0x6B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6B: "k"
387 0x01, 0x6C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6C: "l"
388 0x01, 0x6D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6D: "m"
389 0x01, 0x6E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6E: "n"
390 0x01, 0x6F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6F: "o"
391 0x01, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x70: "p"
392 0x01, 0x71, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x71: "q"
393 0x01, 0x72, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x72: "r"
394 0x01, 0x73, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x73: "s"
395 0x01, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x74: "t"
396 0x01, 0x75, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x75: "u"
397 0x01, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x76: "v"
398 0x01, 0x77, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x77: "w"
399 0x01, 0x78, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x78: "x"
400 0x01, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x79: "y"
401 0x01, 0x7A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7A: "z"
402 0x01, 0x7B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7B: "{"
403 0x01, 0x7C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7C: "|"
404 0x01, 0x7D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7D: "}"
405 0x01, 0x7E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7E: "~"
406 0x01, 0x7F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7F: "<DEL>"
407};
408
Nigel Taof3146c22020-03-26 08:47:42 +1100409// Wuffs allows either statically or dynamically allocated work buffers. This
410// program exercises static allocation.
411#define WORK_BUFFER_ARRAY_SIZE \
412 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
413#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100414uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100415#else
416// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100417uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100418#endif
419
Nigel Taod60815c2020-03-26 14:32:35 +1100420bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100421
Nigel Taod60815c2020-03-26 14:32:35 +1100422int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100423
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000424#define NEW_LINE_THEN_256_SPACES \
425 "\n " \
426 " " \
427 " " \
428 " "
429#define NEW_LINE_THEN_256_TABS \
430 "\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
431 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
432 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
433 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
434 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
435 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
436 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t"
437
438const char* g_new_line_then_256_indent_bytes;
439uint32_t g_bytes_per_indent_depth;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100440
Nigel Taofdac24a2020-03-06 21:53:08 +1100441#ifndef DST_BUFFER_ARRAY_SIZE
442#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100443#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100444#ifndef SRC_BUFFER_ARRAY_SIZE
445#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100446#endif
Nigel Tao63e67962020-08-26 00:00:32 +1000447// 1 token is 8 bytes. 4Ki tokens is 32KiB.
Nigel Taofdac24a2020-03-06 21:53:08 +1100448#ifndef TOKEN_BUFFER_ARRAY_SIZE
449#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100450#endif
451
Nigel Taod60815c2020-03-26 14:32:35 +1100452uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
453uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
454wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100455
Nigel Taod60815c2020-03-26 14:32:35 +1100456wuffs_base__io_buffer g_dst;
457wuffs_base__io_buffer g_src;
458wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100459
Nigel Tao991bd512020-08-19 09:38:16 +1000460// g_cursor_index is the g_src.data.ptr index between the previous and current
461// token. An invariant is that (g_cursor_index <= g_src.meta.ri).
462size_t g_cursor_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100463
Nigel Taod60815c2020-03-26 14:32:35 +1100464uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100465
466enum class context {
467 none,
468 in_list_after_bracket,
469 in_list_after_value,
470 in_dict_after_brace,
471 in_dict_after_key,
472 in_dict_after_value,
Nigel Taod60815c2020-03-26 14:32:35 +1100473} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100474
Nigel Tao0cd2f982020-03-03 23:03:02 +1100475bool //
476in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100477 return (g_ctx == context::in_dict_after_brace) ||
478 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100479}
480
Nigel Tao21042052020-08-19 23:13:54 +1000481bool g_is_after_comment;
482
Nigel Taod60815c2020-03-26 14:32:35 +1100483uint32_t g_suppress_write_dst;
484bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100485
Nigel Tao0291a472020-08-13 22:40:10 +1000486wuffs_json__decoder g_dec;
Nigel Taoea532452020-07-27 00:03:00 +1000487
Nigel Tao0cd2f982020-03-03 23:03:02 +1100488// ----
489
490// Query is a JSON Pointer query. After initializing with a NUL-terminated C
491// string, its multiple fragments are consumed as the program walks the JSON
492// data from stdin. For example, letting "$" denote a NUL, suppose that we
493// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100494// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100495//
496// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
497// / a p p l e / b a n a n a / 1 2 / d u r i a n $
498// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
499// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100500// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100501//
Nigel Taob48ee752020-03-13 09:27:33 +1100502// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
503// start (inclusive) and end (exclusive) of the query fragment. They satisfy
504// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
505// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100506//
Nigel Taob48ee752020-03-13 09:27:33 +1100507// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
508// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100509//
510// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
511// tokens, as backslash-escaped values within that JSON string may each get
512// their own token.
513//
Nigel Taob48ee752020-03-13 09:27:33 +1100514// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100515//
Nigel Taob48ee752020-03-13 09:27:33 +1100516// While mfj remains non-nullptr, each token's unescaped contents are then
517// compared to that part of the fragment from mfj to mfk. If it is a prefix
518// (including the case of an exact match), then mfj is advanced by the
519// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100520//
521// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
522// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100523// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
524// responsible for calling Query::validate (with a strict_json_pointer_syntax
525// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100526//
Nigel Taob48ee752020-03-13 09:27:33 +1100527// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
528// incrementally match the object key with the query fragment. For example, if
529// we have already matched the "ban" of "banana", then we would accept any of
530// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
531// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100532//
Nigel Taob48ee752020-03-13 09:27:33 +1100533// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100534// v
535// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
536// / a p p l e / b a n a n a / 1 2 / d u r i a n $
537// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
538// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100539// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100540//
541// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100542// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
543// have a fragment match: the query fragment equals the object key. If there is
544// a next fragment (in this example, "12") we move the frag_etc pointers to its
545// start and end and increment Query::m_depth. Otherwise, we have matched the
546// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100547//
548// The discussion above centers on object keys. If the query fragment is
549// numeric then it can also match as an array index: the string fragment "12"
550// will match an array's 13th element (starting counting from zero). See RFC
551// 6901 for its precise definition of an "array index" number.
552//
Nigel Taob48ee752020-03-13 09:27:33 +1100553// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100554// whose type (wuffs_base__result_u64) is a result type. An error result means
555// that the fragment is not an array index. A value result holds the number of
556// list elements remaining. When matching a query fragment in an array (instead
557// of in an object), each element ticks this number down towards zero. At zero,
558// the upcoming JSON value is the one that matches the query fragment.
559class Query {
560 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100561 uint8_t* m_frag_i;
562 uint8_t* m_frag_j;
563 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100564
Nigel Taob48ee752020-03-13 09:27:33 +1100565 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100566
Nigel Taob48ee752020-03-13 09:27:33 +1100567 wuffs_base__result_u64 m_array_index;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100568
569 public:
570 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100571 m_frag_i = (uint8_t*)query_c_string;
572 m_frag_j = (uint8_t*)query_c_string;
573 m_frag_k = (uint8_t*)query_c_string;
574 m_depth = 0;
575 m_array_index.status.repr = "#main: not an array index query fragment";
576 m_array_index.value = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100577 }
578
Nigel Taob48ee752020-03-13 09:27:33 +1100579 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100580
Nigel Taob48ee752020-03-13 09:27:33 +1100581 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100582
583 // tick returns whether the fragment is a valid array index whose value is
584 // zero. If valid but non-zero, it decrements it and returns false.
585 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100586 if (m_array_index.status.is_ok()) {
Nigel Tao0291a472020-08-13 22:40:10 +1000587 if (m_array_index.value == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100588 return true;
589 }
Nigel Tao0291a472020-08-13 22:40:10 +1000590 m_array_index.value--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100591 }
592 return false;
593 }
594
595 // next_fragment moves to the next fragment, returning whether it existed.
596 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100597 uint8_t* k = m_frag_k;
598 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100599
600 this->reset(nullptr);
601
602 if (!k || (*k != '/')) {
603 return false;
604 }
605 k++;
606
607 bool all_digits = true;
608 uint8_t* i = k;
609 while ((*k != '\x00') && (*k != '/')) {
610 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
611 k++;
612 }
Nigel Taob48ee752020-03-13 09:27:33 +1100613 m_frag_i = i;
614 m_frag_j = i;
615 m_frag_k = k;
616 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100617 if (all_digits) {
618 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000619 m_array_index = wuffs_base__parse_number_u64(
620 wuffs_base__make_slice_u8(i, k - i),
621 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100622 }
623 return true;
624 }
625
Nigel Taob48ee752020-03-13 09:27:33 +1100626 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100627
Nigel Taob48ee752020-03-13 09:27:33 +1100628 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100629
630 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100631 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100632 return;
633 }
Nigel Taob48ee752020-03-13 09:27:33 +1100634 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100635 while (true) {
636 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100637 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100638 return;
639 }
640
641 if (*j == '\x00') {
642 break;
643
644 } else if (*j == '~') {
645 j++;
646 if (*j == '0') {
647 if (*ptr != '~') {
648 break;
649 }
650 } else if (*j == '1') {
651 if (*ptr != '/') {
652 break;
653 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100654 } else if (*j == 'n') {
655 if (*ptr != '\n') {
656 break;
657 }
658 } else if (*j == 'r') {
659 if (*ptr != '\r') {
660 break;
661 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100662 } else {
663 break;
664 }
665
666 } else if (*j != *ptr) {
667 break;
668 }
669
670 j++;
671 ptr++;
672 len--;
673 }
Nigel Taob48ee752020-03-13 09:27:33 +1100674 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100675 }
676
677 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100678 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100679 return;
680 }
681 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
682 size_t n = wuffs_base__utf_8__encode(
683 wuffs_base__make_slice_u8(&u[0],
684 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
685 code_point);
686 if (n > 0) {
687 this->incremental_match_slice(&u[0], n);
688 }
689 }
690
691 // validate returns whether the (ptr, len) arguments form a valid JSON
692 // Pointer. In particular, it must be valid UTF-8, and either be empty or
693 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100694 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
695 // followed by either 'n' or 'r'.
696 static bool validate(char* query_c_string,
697 size_t length,
698 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100699 if (length <= 0) {
700 return true;
701 }
702 if (query_c_string[0] != '/') {
703 return false;
704 }
705 wuffs_base__slice_u8 s =
706 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
707 bool previous_was_tilde = false;
708 while (s.len > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +1000709 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s.ptr, s.len);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100710 if (!o.is_valid()) {
711 return false;
712 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100713
714 if (previous_was_tilde) {
715 switch (o.code_point) {
716 case '0':
717 case '1':
718 break;
719 case 'n':
720 case 'r':
721 if (strict_json_pointer_syntax) {
722 return false;
723 }
724 break;
725 default:
726 return false;
727 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100728 }
729 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100730
Nigel Tao0cd2f982020-03-03 23:03:02 +1100731 s.ptr += o.byte_length;
732 s.len -= o.byte_length;
733 }
734 return !previous_was_tilde;
735 }
Nigel Taod60815c2020-03-26 14:32:35 +1100736} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100737
738// ----
739
Nigel Tao68920952020-03-03 11:25:18 +1100740struct {
741 int remaining_argc;
742 char** remaining_argv;
743
Nigel Tao3690e832020-03-12 16:52:26 +1100744 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100745 bool fail_if_unsandboxed;
Nigel Tao0291a472020-08-13 22:40:10 +1000746 bool input_allow_comments;
747 bool input_allow_extra_comma;
748 bool input_allow_inf_nan_numbers;
Nigel Tao21042052020-08-19 23:13:54 +1000749 bool output_comments;
Nigel Tao0291a472020-08-13 22:40:10 +1000750 bool output_extra_comma;
Nigel Tao75682542020-08-22 21:40:18 +1000751 bool output_inf_nan_numbers;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100752 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100753 bool tabs;
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000754
755 uint32_t max_output_depth;
756 uint32_t spaces;
757
758 char* query_c_string;
Nigel Taod60815c2020-03-26 14:32:35 +1100759} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100760
761const char* //
762parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000763 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100764 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100765
766 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
767 for (; c < argc; c++) {
768 char* arg = argv[c];
769 if (*arg++ != '-') {
770 break;
771 }
772
773 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
774 // cases, a bare "-" is not a flag (some programs may interpret it as
775 // stdin) and a bare "--" means to stop parsing flags.
776 if (*arg == '\x00') {
777 break;
778 } else if (*arg == '-') {
779 arg++;
780 if (*arg == '\x00') {
781 c++;
782 break;
783 }
784 }
785
Nigel Tao3690e832020-03-12 16:52:26 +1100786 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100787 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100788 continue;
789 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100790 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
791 g_flags.max_output_depth = 1;
792 continue;
793 } else if (!strncmp(arg, "d=", 2) ||
794 !strncmp(arg, "max-output-depth=", 16)) {
795 while (*arg++ != '=') {
796 }
797 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000798 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
799 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000800 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100801 g_flags.max_output_depth = (uint32_t)(u.value);
802 continue;
803 }
804 return g_usage;
805 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100806 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100807 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100808 continue;
809 }
Nigel Tao0291a472020-08-13 22:40:10 +1000810 if (!strcmp(arg, "input-allow-comments")) {
811 g_flags.input_allow_comments = true;
Nigel Tao4e193592020-07-15 12:48:57 +1000812 continue;
813 }
Nigel Tao0291a472020-08-13 22:40:10 +1000814 if (!strcmp(arg, "input-allow-extra-comma")) {
815 g_flags.input_allow_extra_comma = true;
Nigel Tao4e193592020-07-15 12:48:57 +1000816 continue;
817 }
Nigel Tao0291a472020-08-13 22:40:10 +1000818 if (!strcmp(arg, "input-allow-inf-nan-numbers")) {
819 g_flags.input_allow_inf_nan_numbers = true;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000820 continue;
821 }
Nigel Tao21042052020-08-19 23:13:54 +1000822 if (!strcmp(arg, "jwcc")) {
823 g_flags.input_allow_comments = true;
824 g_flags.input_allow_extra_comma = true;
825 g_flags.output_comments = true;
826 g_flags.output_extra_comma = true;
827 continue;
828 }
829 if (!strcmp(arg, "output-comments")) {
830 g_flags.output_comments = true;
831 continue;
832 }
Nigel Tao0291a472020-08-13 22:40:10 +1000833 if (!strcmp(arg, "output-extra-comma")) {
834 g_flags.output_extra_comma = true;
Nigel Taodd114692020-07-25 21:54:12 +1000835 continue;
836 }
Nigel Tao75682542020-08-22 21:40:18 +1000837 if (!strcmp(arg, "output-inf-nan-numbers")) {
838 g_flags.output_inf_nan_numbers = true;
839 continue;
840 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100841 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
842 while (*arg++ != '=') {
843 }
Nigel Taod60815c2020-03-26 14:32:35 +1100844 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100845 continue;
846 }
Nigel Taoecadf722020-07-13 08:22:34 +1000847 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
848 while (*arg++ != '=') {
849 }
850 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
851 g_flags.spaces = arg[0] - '0';
852 continue;
853 }
854 return g_usage;
855 }
856 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100857 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100858 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100859 }
860 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100861 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100862 continue;
863 }
864
Nigel Taod60815c2020-03-26 14:32:35 +1100865 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100866 }
867
Nigel Taod60815c2020-03-26 14:32:35 +1100868 if (g_flags.query_c_string &&
869 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
870 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100871 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
872 }
873
Nigel Taod60815c2020-03-26 14:32:35 +1100874 g_flags.remaining_argc = argc - c;
875 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100876 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100877}
878
Nigel Tao2cf76db2020-02-27 22:42:01 +1100879const char* //
880initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100881 g_dst = wuffs_base__make_io_buffer(
882 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100883 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100884
Nigel Taod60815c2020-03-26 14:32:35 +1100885 g_src = wuffs_base__make_io_buffer(
886 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100887 wuffs_base__empty_io_buffer_meta());
888
Nigel Taod60815c2020-03-26 14:32:35 +1100889 g_tok = wuffs_base__make_token_buffer(
890 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100891 wuffs_base__empty_token_buffer_meta());
892
Nigel Tao991bd512020-08-19 09:38:16 +1000893 g_cursor_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100894
Nigel Taod60815c2020-03-26 14:32:35 +1100895 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100896
Nigel Taod60815c2020-03-26 14:32:35 +1100897 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100898
Nigel Tao21042052020-08-19 23:13:54 +1000899 g_is_after_comment = false;
900
Nigel Tao68920952020-03-03 11:25:18 +1100901 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100902 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100903 return "main: unsandboxed";
904 }
Nigel Tao21042052020-08-19 23:13:54 +1000905 if (g_flags.output_comments && !g_flags.compact_output &&
906 !g_flags.output_extra_comma) {
907 return "main: -output-comments requires one or both of -compact-output and "
908 "-output-extra-comma";
909 }
Nigel Tao75682542020-08-22 21:40:18 +1000910 if (g_flags.input_allow_inf_nan_numbers && !g_flags.output_inf_nan_numbers) {
911 return "main: -input-allow-inf-nan-numbers requires "
912 "-output-inf-nan-numbers";
913 }
Nigel Tao01abc842020-03-06 21:42:33 +1100914 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100915 if (g_flags.remaining_argc >
916 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
917 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100918 }
919
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000920 g_new_line_then_256_indent_bytes =
921 g_flags.tabs ? NEW_LINE_THEN_256_TABS : NEW_LINE_THEN_256_SPACES;
922 g_bytes_per_indent_depth = g_flags.tabs ? 1 : g_flags.spaces;
923
Nigel Taod60815c2020-03-26 14:32:35 +1100924 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100925
Nigel Taoc96b31c2020-07-27 22:37:23 +1000926 // If the query is non-empty, suppress writing to stdout until we've
Nigel Tao0cd2f982020-03-03 23:03:02 +1100927 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100928 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
929 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100930
Nigel Tao0291a472020-08-13 22:40:10 +1000931 TRY(g_dec.initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
932 .message());
Nigel Tao4b186b02020-03-18 14:25:21 +1100933
Nigel Tao0291a472020-08-13 22:40:10 +1000934 if (g_flags.input_allow_comments) {
935 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
936 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
Nigel Taofa50f4d2020-09-21 11:07:36 +1000937 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_COMMENT, true);
Nigel Tao3c8589b2020-07-19 21:49:00 +1000938 }
Nigel Tao0291a472020-08-13 22:40:10 +1000939 if (g_flags.input_allow_extra_comma) {
940 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000941 }
Nigel Tao0291a472020-08-13 22:40:10 +1000942 if (g_flags.input_allow_inf_nan_numbers) {
943 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_INF_NAN_NUMBERS, true);
Nigel Tao51a38292020-07-19 22:43:17 +1000944 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000945
Nigel Tao4b186b02020-03-18 14:25:21 +1100946 // Consume an optional whitespace trailer. This isn't part of the JSON spec,
947 // but it works better with line oriented Unix tools (such as "echo 123 |
948 // jsonptr" where it's "echo", not "echo -n") or hand-edited JSON files which
949 // can accidentally contain trailing whitespace.
Nigel Tao0291a472020-08-13 22:40:10 +1000950 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100951
952 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100953}
Nigel Tao1b073492020-02-16 22:11:36 +1100954
955// ----
956
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100957// ignore_return_value suppresses errors from -Wall -Werror.
958static void //
959ignore_return_value(int ignored) {}
960
Nigel Tao2914bae2020-02-26 09:40:30 +1100961const char* //
962read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +1100963 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +1100964 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +1100965 }
Nigel Taod60815c2020-03-26 14:32:35 +1100966 g_src.compact();
967 if (g_src.meta.wi >= g_src.data.len) {
968 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100969 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100970 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000971 ssize_t n = read(g_input_file_descriptor, g_src.writer_pointer(),
972 g_src.writer_length());
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100973 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100974 g_src.meta.wi += n;
975 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100976 break;
977 } else if (errno != EINTR) {
978 return strerror(errno);
979 }
Nigel Tao1b073492020-02-16 22:11:36 +1100980 }
981 return nullptr;
982}
983
Nigel Tao2914bae2020-02-26 09:40:30 +1100984const char* //
985flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100986 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000987 size_t n = g_dst.reader_length();
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100988 if (n == 0) {
989 break;
Nigel Tao1b073492020-02-16 22:11:36 +1100990 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100991 const int stdout_fd = 1;
Nigel Taod6a10df2020-07-27 11:47:47 +1000992 ssize_t i = write(stdout_fd, g_dst.reader_pointer(), n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100993 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100994 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100995 } else if (errno != EINTR) {
996 return strerror(errno);
997 }
Nigel Tao1b073492020-02-16 22:11:36 +1100998 }
Nigel Taod60815c2020-03-26 14:32:35 +1100999 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +11001000 return nullptr;
1001}
1002
Nigel Tao2914bae2020-02-26 09:40:30 +11001003const char* //
Nigel Tao6b86cbc2020-08-19 11:39:56 +10001004write_dst_slow(const void* s, size_t n) {
Nigel Tao1b073492020-02-16 22:11:36 +11001005 const uint8_t* p = static_cast<const uint8_t*>(s);
1006 while (n > 0) {
Nigel Taod6a10df2020-07-27 11:47:47 +10001007 size_t i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +11001008 if (i == 0) {
1009 const char* z = flush_dst();
1010 if (z) {
1011 return z;
1012 }
Nigel Taod6a10df2020-07-27 11:47:47 +10001013 i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +11001014 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +11001015 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +11001016 }
1017 }
1018
1019 if (i > n) {
1020 i = n;
1021 }
Nigel Taod60815c2020-03-26 14:32:35 +11001022 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
1023 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +11001024 p += i;
1025 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +11001026 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +11001027 }
1028 return nullptr;
1029}
1030
Nigel Tao6b86cbc2020-08-19 11:39:56 +10001031inline const char* //
1032write_dst(const void* s, size_t n) {
1033 if (g_suppress_write_dst > 0) {
1034 return nullptr;
1035 } else if (n <= (DST_BUFFER_ARRAY_SIZE - g_dst.meta.wi)) {
1036 memcpy(g_dst.data.ptr + g_dst.meta.wi, s, n);
1037 g_dst.meta.wi += n;
1038 g_wrote_to_dst = true;
1039 return nullptr;
1040 }
1041 return write_dst_slow(s, n);
1042}
1043
Nigel Tao21042052020-08-19 23:13:54 +10001044#define TRY_INDENT_WITH_LEADING_NEW_LINE \
1045 do { \
1046 uint32_t indent = g_depth * g_bytes_per_indent_depth; \
1047 TRY(write_dst(g_new_line_then_256_indent_bytes, 1 + (indent & 0xFF))); \
1048 for (indent >>= 8; indent > 0; indent--) { \
1049 TRY(write_dst(g_new_line_then_256_indent_bytes + 1, 0x100)); \
1050 } \
1051 } while (false)
1052
1053// TRY_INDENT_SANS_LEADING_NEW_LINE is used after comments, which print their
1054// own "\n".
1055#define TRY_INDENT_SANS_LEADING_NEW_LINE \
1056 do { \
1057 uint32_t indent = g_depth * g_bytes_per_indent_depth; \
1058 TRY(write_dst(g_new_line_then_256_indent_bytes + 1, (indent & 0xFF))); \
1059 for (indent >>= 8; indent > 0; indent--) { \
1060 TRY(write_dst(g_new_line_then_256_indent_bytes + 1, 0x100)); \
1061 } \
1062 } while (false)
1063
Nigel Tao1b073492020-02-16 22:11:36 +11001064// ----
1065
Nigel Tao2914bae2020-02-26 09:40:30 +11001066const char* //
Nigel Tao7cb76542020-07-19 22:19:04 +10001067handle_unicode_code_point(uint32_t ucp) {
Nigel Tao63441812020-08-21 14:05:48 +10001068 if (ucp < 0x80) {
1069 return write_dst(&ascii_escapes[8 * ucp + 1], ascii_escapes[8 * ucp]);
Nigel Tao7cb76542020-07-19 22:19:04 +10001070 }
Nigel Tao7cb76542020-07-19 22:19:04 +10001071 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1072 size_t n = wuffs_base__utf_8__encode(
1073 wuffs_base__make_slice_u8(&u[0],
1074 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1075 ucp);
1076 if (n == 0) {
1077 return "main: internal error: unexpected Unicode code point";
1078 }
Nigel Tao0291a472020-08-13 22:40:10 +10001079 return write_dst(&u[0], n);
Nigel Tao168f60a2020-07-14 13:19:33 +10001080}
1081
Nigel Taod191a3f2020-07-19 22:14:54 +10001082// ----
1083
Nigel Tao50db4a42020-08-20 11:31:28 +10001084inline const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001085handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001086 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001087 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001088 uint64_t vbd = t.value_base_detail();
Nigel Taoee6927f2020-07-27 12:08:33 +10001089 uint64_t token_length = t.length();
Nigel Tao991bd512020-08-19 09:38:16 +10001090 // The "- token_length" is because we incremented g_cursor_index before
1091 // calling handle_token.
Nigel Taoee6927f2020-07-27 12:08:33 +10001092 wuffs_base__slice_u8 tok = wuffs_base__make_slice_u8(
Nigel Tao991bd512020-08-19 09:38:16 +10001093 g_src.data.ptr + g_cursor_index - token_length, token_length);
Nigel Tao1b073492020-02-16 22:11:36 +11001094
1095 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001096 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001097 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001098 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001099 return "main: no match for query";
1100 }
Nigel Taod60815c2020-03-26 14:32:35 +11001101 if (g_depth <= 0) {
1102 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001103 }
Nigel Taod60815c2020-03-26 14:32:35 +11001104 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001105
Nigel Taod60815c2020-03-26 14:32:35 +11001106 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1107 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001108 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao0291a472020-08-13 22:40:10 +10001109 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1110 ? "\"[…]\""
1111 : "\"{…}\"",
1112 7));
1113 } else {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001114 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001115 if ((g_ctx != context::in_list_after_bracket) &&
1116 (g_ctx != context::in_dict_after_brace) &&
1117 !g_flags.compact_output) {
Nigel Tao21042052020-08-19 23:13:54 +10001118 if (g_is_after_comment) {
1119 TRY_INDENT_SANS_LEADING_NEW_LINE;
1120 } else {
1121 if (g_flags.output_extra_comma) {
1122 TRY(write_dst(",", 1));
1123 }
1124 TRY_INDENT_WITH_LEADING_NEW_LINE;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001125 }
Nigel Tao1b073492020-02-16 22:11:36 +11001126 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001127
1128 TRY(write_dst(
1129 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1130 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001131 }
1132
Nigel Taod60815c2020-03-26 14:32:35 +11001133 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1134 ? context::in_list_after_value
1135 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001136 goto after_value;
1137 }
1138
Nigel Taod1c928a2020-02-28 12:43:53 +11001139 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
Nigel Tao0291a472020-08-13 22:40:10 +10001140 // continuation of a multi-token chain.
1141 if (start_of_token_chain) {
Nigel Tao21042052020-08-19 23:13:54 +10001142 if (g_is_after_comment) {
1143 TRY_INDENT_SANS_LEADING_NEW_LINE;
1144 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001145 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1146 } else if (g_ctx != context::none) {
Nigel Tao0291a472020-08-13 22:40:10 +10001147 if ((g_ctx != context::in_list_after_bracket) &&
1148 (g_ctx != context::in_dict_after_brace)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001149 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001150 }
Nigel Taod60815c2020-03-26 14:32:35 +11001151 if (!g_flags.compact_output) {
Nigel Tao21042052020-08-19 23:13:54 +10001152 TRY_INDENT_WITH_LEADING_NEW_LINE;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001153 }
1154 }
1155
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001156 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001157 if (g_query.is_at(g_depth)) {
1158 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001159 case context::in_list_after_bracket:
1160 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001161 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001162 break;
1163 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001164 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001165 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001166 default:
1167 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001168 }
1169 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001170 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001171 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001172 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001173 // There is no next fragment. We have matched the complete query, and
1174 // the upcoming JSON value is the result of that query.
1175 //
Nigel Taod60815c2020-03-26 14:32:35 +11001176 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1177 // we were about to decode a top-level value. This makes any subsequent
1178 // indentation be relative to this point, and we will return g_eod
1179 // after the upcoming JSON value is complete.
1180 if (g_suppress_write_dst != 1) {
1181 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001182 }
Nigel Taod60815c2020-03-26 14:32:35 +11001183 g_suppress_write_dst = 0;
1184 g_ctx = context::none;
1185 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001186 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1187 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1188 // The query has moved on to the next fragment but the upcoming JSON
1189 // value is not a container.
1190 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001191 }
1192 }
1193
1194 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001195 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001196 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001197 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001198 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1199 g_suppress_write_dst++;
Nigel Tao0291a472020-08-13 22:40:10 +10001200 } else {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001201 TRY(write_dst(
1202 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1203 1));
1204 }
Nigel Taod60815c2020-03-26 14:32:35 +11001205 g_depth++;
1206 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1207 ? context::in_list_after_bracket
1208 : context::in_dict_after_brace;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001209 return nullptr;
1210
Nigel Tao2cf76db2020-02-27 22:42:01 +11001211 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Tao0291a472020-08-13 22:40:10 +10001212 if (start_of_token_chain) {
1213 TRY(write_dst("\"", 1));
1214 g_query.restart_fragment(in_dict_before_key() &&
1215 g_query.is_at(g_depth));
1216 }
1217
Nigel Taoade01652020-08-21 15:57:51 +10001218 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
Nigel Tao0291a472020-08-13 22:40:10 +10001219 TRY(write_dst(tok.ptr, tok.len));
1220 g_query.incremental_match_slice(tok.ptr, tok.len);
Nigel Tao0291a472020-08-13 22:40:10 +10001221 }
1222
Nigel Tao496e88b2020-04-09 22:10:08 +10001223 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001224 return nullptr;
1225 }
Nigel Tao0291a472020-08-13 22:40:10 +10001226 TRY(write_dst("\"", 1));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001227 goto after_value;
1228
1229 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001230 if (!t.continued()) {
1231 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001232 }
1233 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001234 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001235 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001236 }
1237
Nigel Tao3f688b22020-08-21 15:51:48 +10001238 // We have a literal or a number.
1239 TRY(write_dst(tok.ptr, tok.len));
1240 goto after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001241 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001242
Nigel Tao2cf76db2020-02-27 22:42:01 +11001243 // Book-keeping after completing a value (whether a container value or a
1244 // simple value). Empty parent containers are no longer empty. If the parent
1245 // container is a "{...}" object, toggle between keys and values.
1246after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001247 if (g_depth == 0) {
1248 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001249 }
Nigel Taod60815c2020-03-26 14:32:35 +11001250 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001251 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001252 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001253 break;
1254 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001255 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001256 break;
1257 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001258 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001259 break;
1260 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001261 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001262 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001263 default:
1264 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001265 }
1266 return nullptr;
1267}
1268
1269const char* //
1270main1(int argc, char** argv) {
1271 TRY(initialize_globals(argc, argv));
1272
Nigel Taocd183f92020-07-14 12:11:05 +10001273 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001274 while (true) {
Nigel Tao0291a472020-08-13 22:40:10 +10001275 wuffs_base__status status = g_dec.decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001276 &g_tok, &g_src,
1277 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001278
Nigel Taod60815c2020-03-26 14:32:35 +11001279 while (g_tok.meta.ri < g_tok.meta.wi) {
1280 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao991bd512020-08-19 09:38:16 +10001281 uint64_t token_length = t.length();
1282 if ((g_src.meta.ri - g_cursor_index) < token_length) {
Nigel Taod60815c2020-03-26 14:32:35 +11001283 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001284 }
Nigel Tao991bd512020-08-19 09:38:16 +10001285 g_cursor_index += token_length;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001286
Nigel Tao21042052020-08-19 23:13:54 +10001287 // Handle filler tokens (e.g. whitespace, punctuation and comments).
1288 // These are skipped, unless -output-comments is enabled.
Nigel Tao3c8589b2020-07-19 21:49:00 +10001289 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao21042052020-08-19 23:13:54 +10001290 if (g_flags.output_comments &&
1291 (t.value_base_detail() &
1292 WUFFS_BASE__TOKEN__VBD__FILLER__COMMENT_ANY)) {
1293 if (g_flags.compact_output) {
1294 TRY(write_dst(g_src.data.ptr + g_cursor_index - token_length,
1295 token_length));
1296 } else {
1297 if (start_of_token_chain) {
1298 if (g_is_after_comment) {
1299 TRY_INDENT_SANS_LEADING_NEW_LINE;
1300 } else if (g_ctx != context::none) {
1301 if (g_ctx == context::in_dict_after_key) {
1302 TRY(write_dst(":", 1));
1303 } else if ((g_ctx != context::in_list_after_bracket) &&
1304 (g_ctx != context::in_dict_after_brace)) {
1305 TRY(write_dst(",", 1));
1306 }
1307 if (!g_flags.compact_output) {
1308 TRY_INDENT_WITH_LEADING_NEW_LINE;
1309 }
1310 }
1311 }
1312 TRY(write_dst(g_src.data.ptr + g_cursor_index - token_length,
1313 token_length));
1314 if (!t.continued() &&
1315 (t.value_base_detail() &
1316 WUFFS_BASE__TOKEN__VBD__FILLER__COMMENT_BLOCK)) {
1317 TRY(write_dst("\n", 1));
1318 }
1319 g_is_after_comment = true;
1320 }
1321 }
Nigel Tao496e88b2020-04-09 22:10:08 +10001322 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001323 continue;
1324 }
1325
Nigel Tao2ef39992020-04-09 17:24:39 +10001326 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao21042052020-08-19 23:13:54 +10001327 g_is_after_comment = false;
Nigel Tao496e88b2020-04-09 22:10:08 +10001328 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001329 if (z == nullptr) {
1330 continue;
Nigel Taod60815c2020-03-26 14:32:35 +11001331 } else if (z == g_eod) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001332 goto end_of_data;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001333 }
1334 return z;
Nigel Tao1b073492020-02-16 22:11:36 +11001335 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001336
1337 if (status.repr == nullptr) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001338 return "main: internal error: unexpected end of token stream";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001339 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Tao991bd512020-08-19 09:38:16 +10001340 if (g_cursor_index != g_src.meta.ri) {
Nigel Taod60815c2020-03-26 14:32:35 +11001341 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001342 }
1343 TRY(read_src());
Nigel Tao991bd512020-08-19 09:38:16 +10001344 g_cursor_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001345 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001346 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001347 } else {
1348 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001349 }
1350 }
Nigel Tao0cd2f982020-03-03 23:03:02 +11001351end_of_data:
1352
Nigel Taod60815c2020-03-26 14:32:35 +11001353 // With a non-empty g_query, don't try to consume trailing whitespace or
Nigel Tao0cd2f982020-03-03 23:03:02 +11001354 // confirm that we've processed all the tokens.
Nigel Taod60815c2020-03-26 14:32:35 +11001355 if (g_flags.query_c_string && *g_flags.query_c_string) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001356 return nullptr;
1357 }
Nigel Tao6b161af2020-02-24 11:01:48 +11001358
Nigel Tao6b161af2020-02-24 11:01:48 +11001359 // Check that we've exhausted the input.
Nigel Taod60815c2020-03-26 14:32:35 +11001360 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001361 TRY(read_src());
1362 }
Nigel Taod60815c2020-03-26 14:32:35 +11001363 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
Nigel Tao0291a472020-08-13 22:40:10 +10001364 return "main: valid JSON followed by further (unexpected) data";
Nigel Tao6b161af2020-02-24 11:01:48 +11001365 }
1366
1367 // Check that we've used all of the decoded tokens, other than trailing
Nigel Tao4b186b02020-03-18 14:25:21 +11001368 // filler tokens. For example, "true\n" is valid JSON (and fully consumed
1369 // with WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE enabled) with a trailing
1370 // filler token for the "\n".
Nigel Taod60815c2020-03-26 14:32:35 +11001371 for (; g_tok.meta.ri < g_tok.meta.wi; g_tok.meta.ri++) {
1372 if (g_tok.data.ptr[g_tok.meta.ri].value_base_category() !=
Nigel Tao6b161af2020-02-24 11:01:48 +11001373 WUFFS_BASE__TOKEN__VBC__FILLER) {
1374 return "main: internal error: decoded OK but unprocessed tokens remain";
1375 }
1376 }
1377
1378 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001379}
1380
Nigel Tao2914bae2020-02-26 09:40:30 +11001381int //
1382compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001383 if (!status_msg) {
1384 return 0;
1385 }
Nigel Tao01abc842020-03-06 21:42:33 +11001386 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001387 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001388 n = strlen(status_msg);
1389 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001390 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001391 if (n >= 2047) {
1392 status_msg = "main: internal error: error message is too long";
1393 n = strnlen(status_msg, 2047);
1394 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001395 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001396 const int stderr_fd = 2;
1397 ignore_return_value(write(stderr_fd, status_msg, n));
1398 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Tao9cc2c252020-02-23 17:05:49 +11001399 // Return an exit code of 1 for regular (forseen) errors, e.g. badly
1400 // formatted or unsupported input.
1401 //
1402 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1403 // run-time checks found that an internal invariant did not hold.
1404 //
1405 // Automated testing, including badly formatted inputs, can therefore
1406 // discriminate between expected failure (exit code 1) and unexpected failure
1407 // (other non-zero exit codes). Specifically, exit code 2 for internal
1408 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1409 // linux) for a segmentation fault (e.g. null pointer dereference).
1410 return strstr(status_msg, "internal error:") ? 2 : 1;
1411}
1412
Nigel Tao2914bae2020-02-26 09:40:30 +11001413int //
1414main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001415 // Look for an input filename (the first non-flag argument) in argv. If there
1416 // is one, open it (but do not read from it) before we self-impose a sandbox.
1417 //
1418 // Flags start with "-", unless it comes after a bare "--" arg.
1419 {
1420 bool dash_dash = false;
1421 int a;
1422 for (a = 1; a < argc; a++) {
1423 char* arg = argv[a];
1424 if ((arg[0] == '-') && !dash_dash) {
1425 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1426 continue;
1427 }
Nigel Taod60815c2020-03-26 14:32:35 +11001428 g_input_file_descriptor = open(arg, O_RDONLY);
1429 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001430 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1431 return 1;
1432 }
1433 break;
1434 }
1435 }
1436
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001437#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1438 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001439 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001440#endif
1441
Nigel Tao0cd2f982020-03-03 23:03:02 +11001442 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001443 if (g_wrote_to_dst) {
Nigel Tao0291a472020-08-13 22:40:10 +10001444 const char* z1 = write_dst("\n", 1);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001445 const char* z2 = flush_dst();
1446 z = z ? z : (z1 ? z1 : z2);
1447 }
1448 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001449
1450#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1451 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1452 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1453 // only SYS_exit.
1454 syscall(SYS_exit, exit_code);
1455#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001456 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001457}