blob: 3d72c58e78704b95b6bec0f6af36608deed92cdf [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
Nigel Taob55d5392020-09-11 08:11:02 +100017// jsonptr is discussed extensively at
18// https://nigeltao.github.io/blog/2020/jsonptr.html
19
Nigel Tao1b073492020-02-16 22:11:36 +110020/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110021jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao0291a472020-08-13 22:40:10 +100022(RFC 6901) query syntax. It reads UTF-8 JSON from stdin and writes
23canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110024
Nigel Taod60815c2020-03-26 14:32:35 +110025See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110026
27----
28
29JSON Pointer (and this program's implementation) is one of many JSON query
30languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
31simple and fewer-featured compared to those others.
32
Nigel Tao0291a472020-08-13 22:40:10 +100033One benefit of simplicity is that this program's JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110034implementations do not dynamically allocate or free memory (yet it does not
35require that the entire input fits in memory at once). They are therefore
36trivially protected against certain bug classes: memory leaks, double-frees and
37use-after-frees.
38
Nigel Tao0291a472020-08-13 22:40:10 +100039The core JSON implementation is also written in the Wuffs programming language
40(and then transpiled to C/C++), which is memory-safe (e.g. array indexing is
41bounds-checked) but also guards against integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110042
Nigel Taofe0cbbd2020-03-05 22:01:30 +110043For defense in depth, on Linux, this program also self-imposes a
44SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
45or writing its output. Under this sandbox, the only permitted system calls are
46read, write, exit and sigreturn.
47
Nigel Tao0291a472020-08-13 22:40:10 +100048All together, this program aims to safely handle untrusted JSON files without
49fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110050
51----
Nigel Tao1b073492020-02-16 22:11:36 +110052
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110053As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
54JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
55"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
56was first published on 2016-10-26 and updated on 2018-03-30.
57
Nigel Tao0cd2f982020-03-03 23:03:02 +110058After modifying this program, run "build-example.sh example/jsonptr/" and then
59"script/run-json-test-suite.sh" to catch correctness regressions.
60
61----
62
Nigel Taod0b16cb2020-03-14 10:15:54 +110063This program uses Wuffs' JSON decoder at a relatively low level, processing the
64decoder's token-stream output individually. The core loop, in pseudo-code, is
65"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110066changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110067output text based on that state and the token's source text. Notably,
68handle_token is not recursive, even though JSON values can nest.
69
70This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
71string, object) comprises one or more JSON tokens.
72
73An alternative, higher-level approach is in the sibling example/jsonfindptrs
74program. Neither approach is better or worse per se, but when studying this
75program, be aware that there are multiple ways to use Wuffs' JSON decoder.
76
77The two programs, jsonfindptrs and jsonptr, also demonstrate different
78trade-offs with regard to JSON object duplicate keys. The JSON spec permits
79different implementations to allow or reject duplicate keys. It is not always
80clear which approach is safer. Rejecting them is certainly unambiguous, and
81security bugs can lurk in ambiguous corners of a file format, if two different
82implementations both silently accept a file but differ on how to interpret it.
83On the other hand, in the worst case, detecting duplicate keys requires O(N)
84memory, where N is the size of the (potentially untrusted) input.
85
86This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
87mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
88it runs in a SECCOMP_MODE_STRICT sandbox.
89
90----
91
Nigel Tao50bfab92020-08-05 11:39:09 +100092To run:
Nigel Tao1b073492020-02-16 22:11:36 +110093
94$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
95
96for a C++ compiler $CXX, such as clang++ or g++.
97*/
98
Nigel Tao721190a2020-04-03 22:25:21 +110099#if defined(__cplusplus) && (__cplusplus < 201103L)
100#error "This C++ program requires -std=c++11 or later"
101#endif
102
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100103#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100104#include <fcntl.h>
105#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100106#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100107#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100108
109// Wuffs ships as a "single file C library" or "header file library" as per
110// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
111//
112// To use that single file as a "foo.c"-like implementation, instead of a
113// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
114// compiling it.
115#define WUFFS_IMPLEMENTATION
116
117// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
Nigel Tao2f788042021-01-23 19:29:19 +1100118// release/c/etc.c choose which parts of Wuffs to build. That file contains the
119// entire Wuffs standard library, implementing a variety of codecs and file
Nigel Tao1b073492020-02-16 22:11:36 +1100120// formats. Without this macro definition, an optimizing compiler or linker may
121// very well discard Wuffs code for unused codecs, but listing the Wuffs
122// modules we use makes that process explicit. Preprocessing means that such
123// code simply isn't compiled.
124#define WUFFS_CONFIG__MODULES
125#define WUFFS_CONFIG__MODULE__BASE
126#define WUFFS_CONFIG__MODULE__JSON
127
128// If building this program in an environment that doesn't easily accommodate
129// relative includes, you can use the script/inline-c-relative-includes.go
130// program to generate a stand-alone C++ file.
131#include "../../release/c/wuffs-unsupported-snapshot.c"
132
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100133#if defined(__linux__)
134#include <linux/prctl.h>
135#include <linux/seccomp.h>
136#include <sys/prctl.h>
137#include <sys/syscall.h>
138#define WUFFS_EXAMPLE_USE_SECCOMP
139#endif
140
Nigel Tao2cf76db2020-02-27 22:42:01 +1100141#define TRY(error_msg) \
142 do { \
143 const char* z = error_msg; \
144 if (z) { \
145 return z; \
146 } \
147 } while (false)
148
Nigel Taod60815c2020-03-26 14:32:35 +1100149static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100150
Nigel Taod60815c2020-03-26 14:32:35 +1100151static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100152 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100153 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100154 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100155 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100156 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100157 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000158 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100159 " -t -tabs\n"
160 " -fail-if-unsandboxed\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000161 " -input-allow-comments\n"
162 " -input-allow-extra-comma\n"
163 " -input-allow-inf-nan-numbers\n"
Nigel Tao04126792021-02-22 12:23:57 +1100164 " -input-jwcc\n"
Nigel Tao21042052020-08-19 23:13:54 +1000165 " -jwcc\n"
166 " -output-comments\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000167 " -output-extra-comma\n"
Nigel Tao75682542020-08-22 21:40:18 +1000168 " -output-inf-nan-numbers\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000169 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100170 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100171 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100172 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100173 "----\n"
174 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100175 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000176 "Pointer (RFC 6901) query syntax. It reads UTF-8 JSON from stdin and\n"
177 "writes canonicalized, formatted UTF-8 JSON to stdout.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100178 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000179 "Canonicalized means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-written\n"
180 "as \"abc\\n\\txÅ·z\". It does not sort object keys, nor does it reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100181 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100182 "\n"
183 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000184 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000185 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags.\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000186 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000187 "The -input-allow-comments flag allows \"/*slash-star*/\" and\n"
188 "\"//slash-slash\" C-style comments within JSON input. Such comments are\n"
Nigel Tao21042052020-08-19 23:13:54 +1000189 "stripped from the output unless -output-comments was also set.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100190 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000191 "The -input-allow-extra-comma flag allows input like \"[1,2,]\", with a\n"
192 "comma after the final element of a JSON list or dictionary.\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000193 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000194 "The -input-allow-inf-nan-numbers flag allows non-finite floating point\n"
Nigel Tao75682542020-08-22 21:40:18 +1000195 "numbers (infinities and not-a-numbers) within JSON input. This flag\n"
196 "requires that -output-inf-nan-numbers also be set.\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000197 "\n"
Nigel Tao21042052020-08-19 23:13:54 +1000198 "The -output-comments flag copies any input comments to the output. It\n"
199 "has no effect unless -input-allow-comments was also set. Comments look\n"
200 "better after commas than before them, but a closing \"]\" or \"}\" can\n"
201 "occur after arbitrarily many comments, so -output-comments also requires\n"
202 "that one or both of -compact-output and -output-extra-comma be set.\n"
203 "\n"
Nigel Tao773994c2021-02-22 10:50:08 +1100204 "With -output-comments, consecutive blank lines collapse to a single\n"
205 "blank line. Without that flag, all blank lines are removed.\n"
206 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000207 "The -output-extra-comma flag writes output like \"[1,2,]\", with a comma\n"
208 "after the final element of a JSON list or dictionary. Such commas are\n"
209 "non-compliant with the JSON specification but many parsers accept them\n"
210 "and they can produce simpler line-based diffs. This flag is ignored when\n"
211 "-compact-output is set.\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000212 "\n"
Nigel Tao04126792021-02-22 12:23:57 +1100213 "Combining some of those flags results in speaking JWCC (JSON With Commas\n"
214 "and Comments), not plain JSON. For convenience, the -input-jwcc or -jwcc\n"
215 "flags enables the first two or all four of:\n"
Nigel Tao21042052020-08-19 23:13:54 +1000216 " -input-allow-comments\n"
217 " -input-allow-extra-comma\n"
218 " -output-comments\n"
219 " -output-extra-comma\n"
220 "\n"
Nigel Tao04126792021-02-22 12:23:57 +1100221#if defined(WUFFS_EXAMPLE_SPEAK_JWCC_NOT_JSON)
222 "This program was configured at compile time to always use -jwcc.\n"
223 "\n"
224#endif
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100225 "----\n"
226 "\n"
227 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100228 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100229 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
230 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100231 "will print:\n"
232 " \"baz\"\n"
233 "\n"
234 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100235 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100236 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
237 "child (the value in a key-value pair) of the root whose key is the empty\n"
238 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100239 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000240 "If the query found a valid JSON value, this program will return a zero\n"
241 "exit code even if the rest of the input isn't valid JSON. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100242 "did not find a value, or found an invalid one, this program returns a\n"
243 "non-zero exit code, but may still print partial output to stdout.\n"
244 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000245 "The JSON specification (https://json.org/) permits implementations that\n"
246 "allow duplicate keys, as this one does. This JSON Pointer implementation\n"
247 "is also greedy, following the first match for each fragment without\n"
248 "back-tracking. For example, the \"/foo/bar\" query will fail if the root\n"
249 "object has multiple \"foo\" children but the first one doesn't have a\n"
250 "\"bar\" child, even if later ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100251 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000252 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
253 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
Nigel Tao904004e2020-11-15 20:56:04 +1100254 "\"~\" and \"/\". Without this flag, this program also lets \"~n\",\n"
255 "\"~r\" and \"~t\" escape the New Line, Carriage Return and Horizontal\n"
256 "Tab ASCII control characters, which can work better with line oriented\n"
257 "(and tab separated) Unix tools that assume exactly one record (e.g. one\n"
258 "JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100259 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100260 "----\n"
261 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100262 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000263 "output depth. JSON containers ([] arrays and {} objects) can hold other\n"
264 "containers. When this flag is set, containers at depth NUM are replaced\n"
265 "with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is equivalent to\n"
266 "-d=1. The flag's absence is equivalent to an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100267 "\n"
268 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000269 "affect whether or not the input is considered valid JSON. The JSON\n"
270 "specification permits implementations to set their own maximum input\n"
271 "depth. This JSON implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100272 "\n"
273 "Depth is measured in terms of nested containers. It is unaffected by the\n"
274 "number of spaces or tabs used to indent.\n"
275 "\n"
276 "When both -max-output-depth and -query are set, the output depth is\n"
277 "measured from when the query resolves, not from the input root. The\n"
278 "input depth (measured from the root) is still limited to 1024.\n"
279 "\n"
280 "----\n"
281 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100282 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
283 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100284 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100285
Nigel Tao2cf76db2020-02-27 22:42:01 +1100286// ----
287
Nigel Tao63441812020-08-21 14:05:48 +1000288// ascii_escapes was created by script/print-json-ascii-escapes.go.
289const uint8_t ascii_escapes[1024] = {
290 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x30, 0x00, // 0x00: "\\u0000"
291 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x31, 0x00, // 0x01: "\\u0001"
292 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x32, 0x00, // 0x02: "\\u0002"
293 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x33, 0x00, // 0x03: "\\u0003"
294 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x34, 0x00, // 0x04: "\\u0004"
295 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x35, 0x00, // 0x05: "\\u0005"
296 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x36, 0x00, // 0x06: "\\u0006"
297 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x37, 0x00, // 0x07: "\\u0007"
298 0x02, 0x5C, 0x62, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x08: "\\b"
299 0x02, 0x5C, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x09: "\\t"
300 0x02, 0x5C, 0x6E, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0A: "\\n"
301 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x42, 0x00, // 0x0B: "\\u000B"
302 0x02, 0x5C, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0C: "\\f"
303 0x02, 0x5C, 0x72, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0D: "\\r"
304 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x45, 0x00, // 0x0E: "\\u000E"
305 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x46, 0x00, // 0x0F: "\\u000F"
306 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x30, 0x00, // 0x10: "\\u0010"
307 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x31, 0x00, // 0x11: "\\u0011"
308 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x32, 0x00, // 0x12: "\\u0012"
309 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x33, 0x00, // 0x13: "\\u0013"
310 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x34, 0x00, // 0x14: "\\u0014"
311 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x35, 0x00, // 0x15: "\\u0015"
312 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x36, 0x00, // 0x16: "\\u0016"
313 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x37, 0x00, // 0x17: "\\u0017"
314 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x38, 0x00, // 0x18: "\\u0018"
315 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x39, 0x00, // 0x19: "\\u0019"
316 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x41, 0x00, // 0x1A: "\\u001A"
317 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x42, 0x00, // 0x1B: "\\u001B"
318 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x43, 0x00, // 0x1C: "\\u001C"
319 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x44, 0x00, // 0x1D: "\\u001D"
320 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x45, 0x00, // 0x1E: "\\u001E"
321 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x46, 0x00, // 0x1F: "\\u001F"
322 0x06, 0x5C, 0x75, 0x30, 0x30, 0x32, 0x30, 0x00, // 0x20: "\\u0020"
323 0x01, 0x21, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x21: "!"
324 0x02, 0x5C, 0x22, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x22: "\\\""
325 0x01, 0x23, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x23: "#"
326 0x01, 0x24, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x24: "$"
327 0x01, 0x25, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x25: "%"
328 0x01, 0x26, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x26: "&"
329 0x01, 0x27, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x27: "'"
330 0x01, 0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x28: "("
331 0x01, 0x29, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x29: ")"
332 0x01, 0x2A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2A: "*"
333 0x01, 0x2B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2B: "+"
334 0x01, 0x2C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2C: ","
335 0x01, 0x2D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2D: "-"
336 0x01, 0x2E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2E: "."
337 0x01, 0x2F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2F: "/"
338 0x01, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x30: "0"
339 0x01, 0x31, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x31: "1"
340 0x01, 0x32, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x32: "2"
341 0x01, 0x33, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x33: "3"
342 0x01, 0x34, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x34: "4"
343 0x01, 0x35, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x35: "5"
344 0x01, 0x36, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x36: "6"
345 0x01, 0x37, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x37: "7"
346 0x01, 0x38, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x38: "8"
347 0x01, 0x39, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x39: "9"
348 0x01, 0x3A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3A: ":"
349 0x01, 0x3B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3B: ";"
350 0x01, 0x3C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3C: "<"
351 0x01, 0x3D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3D: "="
352 0x01, 0x3E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3E: ">"
353 0x01, 0x3F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3F: "?"
354 0x01, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x40: "@"
355 0x01, 0x41, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x41: "A"
356 0x01, 0x42, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x42: "B"
357 0x01, 0x43, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x43: "C"
358 0x01, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x44: "D"
359 0x01, 0x45, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x45: "E"
360 0x01, 0x46, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x46: "F"
361 0x01, 0x47, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x47: "G"
362 0x01, 0x48, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x48: "H"
363 0x01, 0x49, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x49: "I"
364 0x01, 0x4A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4A: "J"
365 0x01, 0x4B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4B: "K"
366 0x01, 0x4C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4C: "L"
367 0x01, 0x4D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4D: "M"
368 0x01, 0x4E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4E: "N"
369 0x01, 0x4F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4F: "O"
370 0x01, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x50: "P"
371 0x01, 0x51, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x51: "Q"
372 0x01, 0x52, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x52: "R"
373 0x01, 0x53, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x53: "S"
374 0x01, 0x54, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x54: "T"
375 0x01, 0x55, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x55: "U"
376 0x01, 0x56, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x56: "V"
377 0x01, 0x57, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x57: "W"
378 0x01, 0x58, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x58: "X"
379 0x01, 0x59, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x59: "Y"
380 0x01, 0x5A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5A: "Z"
381 0x01, 0x5B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5B: "["
382 0x02, 0x5C, 0x5C, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5C: "\\\\"
383 0x01, 0x5D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5D: "]"
384 0x01, 0x5E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5E: "^"
385 0x01, 0x5F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5F: "_"
386 0x01, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x60: "`"
387 0x01, 0x61, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x61: "a"
388 0x01, 0x62, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x62: "b"
389 0x01, 0x63, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x63: "c"
390 0x01, 0x64, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x64: "d"
391 0x01, 0x65, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x65: "e"
392 0x01, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x66: "f"
393 0x01, 0x67, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x67: "g"
394 0x01, 0x68, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x68: "h"
395 0x01, 0x69, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x69: "i"
396 0x01, 0x6A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6A: "j"
397 0x01, 0x6B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6B: "k"
398 0x01, 0x6C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6C: "l"
399 0x01, 0x6D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6D: "m"
400 0x01, 0x6E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6E: "n"
401 0x01, 0x6F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6F: "o"
402 0x01, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x70: "p"
403 0x01, 0x71, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x71: "q"
404 0x01, 0x72, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x72: "r"
405 0x01, 0x73, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x73: "s"
406 0x01, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x74: "t"
407 0x01, 0x75, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x75: "u"
408 0x01, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x76: "v"
409 0x01, 0x77, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x77: "w"
410 0x01, 0x78, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x78: "x"
411 0x01, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x79: "y"
412 0x01, 0x7A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7A: "z"
413 0x01, 0x7B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7B: "{"
414 0x01, 0x7C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7C: "|"
415 0x01, 0x7D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7D: "}"
416 0x01, 0x7E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7E: "~"
417 0x01, 0x7F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7F: "<DEL>"
418};
419
Nigel Taof3146c22020-03-26 08:47:42 +1100420// Wuffs allows either statically or dynamically allocated work buffers. This
421// program exercises static allocation.
422#define WORK_BUFFER_ARRAY_SIZE \
423 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
424#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100425uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100426#else
427// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100428uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100429#endif
430
Nigel Taod60815c2020-03-26 14:32:35 +1100431bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100432
Nigel Taod60815c2020-03-26 14:32:35 +1100433int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100434
Nigel Tao773994c2021-02-22 10:50:08 +1100435#define TWO_NEW_LINES_THEN_256_SPACES \
436 "\n\n " \
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000437 " " \
438 " " \
Nigel Tao773994c2021-02-22 10:50:08 +1100439 " "
440#define TWO_NEW_LINES_THEN_256_TABS \
441 "\n\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000442 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
443 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
444 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
445 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
446 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
Nigel Tao773994c2021-02-22 10:50:08 +1100447 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t"
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000448
Nigel Tao773994c2021-02-22 10:50:08 +1100449const char* g_two_new_lines_then_256_indent_bytes;
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000450uint32_t g_bytes_per_indent_depth;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100451
Nigel Taofdac24a2020-03-06 21:53:08 +1100452#ifndef DST_BUFFER_ARRAY_SIZE
453#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100454#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100455#ifndef SRC_BUFFER_ARRAY_SIZE
456#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100457#endif
Nigel Tao63e67962020-08-26 00:00:32 +1000458// 1 token is 8 bytes. 4Ki tokens is 32KiB.
Nigel Taofdac24a2020-03-06 21:53:08 +1100459#ifndef TOKEN_BUFFER_ARRAY_SIZE
460#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100461#endif
462
Nigel Taod60815c2020-03-26 14:32:35 +1100463uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
464uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
465wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100466
Nigel Taod60815c2020-03-26 14:32:35 +1100467wuffs_base__io_buffer g_dst;
468wuffs_base__io_buffer g_src;
469wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100470
Nigel Tao991bd512020-08-19 09:38:16 +1000471// g_cursor_index is the g_src.data.ptr index between the previous and current
472// token. An invariant is that (g_cursor_index <= g_src.meta.ri).
473size_t g_cursor_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100474
Nigel Taod60815c2020-03-26 14:32:35 +1100475uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100476
477enum class context {
478 none,
479 in_list_after_bracket,
480 in_list_after_value,
481 in_dict_after_brace,
482 in_dict_after_key,
483 in_dict_after_value,
Nigel Taocd4cbc92020-09-22 22:22:15 +1000484 end_of_data,
Nigel Taod60815c2020-03-26 14:32:35 +1100485} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100486
Nigel Tao0cd2f982020-03-03 23:03:02 +1100487bool //
488in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100489 return (g_ctx == context::in_dict_after_brace) ||
490 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100491}
492
Nigel Tao773994c2021-02-22 10:50:08 +1100493uint64_t g_num_input_blank_lines;
494
Nigel Tao21042052020-08-19 23:13:54 +1000495bool g_is_after_comment;
496
Nigel Taod60815c2020-03-26 14:32:35 +1100497uint32_t g_suppress_write_dst;
498bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100499
Nigel Tao0291a472020-08-13 22:40:10 +1000500wuffs_json__decoder g_dec;
Nigel Taoea532452020-07-27 00:03:00 +1000501
Nigel Tao0cd2f982020-03-03 23:03:02 +1100502// ----
503
504// Query is a JSON Pointer query. After initializing with a NUL-terminated C
505// string, its multiple fragments are consumed as the program walks the JSON
506// data from stdin. For example, letting "$" denote a NUL, suppose that we
507// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100508// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100509//
510// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
511// / a p p l e / b a n a n a / 1 2 / d u r i a n $
512// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
513// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100514// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100515//
Nigel Taob48ee752020-03-13 09:27:33 +1100516// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
517// start (inclusive) and end (exclusive) of the query fragment. They satisfy
518// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
519// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100520//
Nigel Taob48ee752020-03-13 09:27:33 +1100521// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
522// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100523//
524// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
525// tokens, as backslash-escaped values within that JSON string may each get
526// their own token.
527//
Nigel Taob48ee752020-03-13 09:27:33 +1100528// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100529//
Nigel Taob48ee752020-03-13 09:27:33 +1100530// While mfj remains non-nullptr, each token's unescaped contents are then
531// compared to that part of the fragment from mfj to mfk. If it is a prefix
532// (including the case of an exact match), then mfj is advanced by the
533// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100534//
535// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
536// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100537// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
538// responsible for calling Query::validate (with a strict_json_pointer_syntax
539// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100540//
Nigel Taob48ee752020-03-13 09:27:33 +1100541// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
542// incrementally match the object key with the query fragment. For example, if
543// we have already matched the "ban" of "banana", then we would accept any of
544// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
545// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100546//
Nigel Taob48ee752020-03-13 09:27:33 +1100547// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100548// v
549// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
550// / a p p l e / b a n a n a / 1 2 / d u r i a n $
551// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
552// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100553// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100554//
555// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100556// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
557// have a fragment match: the query fragment equals the object key. If there is
558// a next fragment (in this example, "12") we move the frag_etc pointers to its
559// start and end and increment Query::m_depth. Otherwise, we have matched the
560// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100561//
562// The discussion above centers on object keys. If the query fragment is
563// numeric then it can also match as an array index: the string fragment "12"
564// will match an array's 13th element (starting counting from zero). See RFC
565// 6901 for its precise definition of an "array index" number.
566//
Nigel Taob48ee752020-03-13 09:27:33 +1100567// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100568// whose type (wuffs_base__result_u64) is a result type. An error result means
569// that the fragment is not an array index. A value result holds the number of
570// list elements remaining. When matching a query fragment in an array (instead
571// of in an object), each element ticks this number down towards zero. At zero,
572// the upcoming JSON value is the one that matches the query fragment.
573class Query {
574 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100575 uint8_t* m_frag_i;
576 uint8_t* m_frag_j;
577 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100578
Nigel Taob48ee752020-03-13 09:27:33 +1100579 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100580
Nigel Taob48ee752020-03-13 09:27:33 +1100581 wuffs_base__result_u64 m_array_index;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100582
583 public:
584 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100585 m_frag_i = (uint8_t*)query_c_string;
586 m_frag_j = (uint8_t*)query_c_string;
587 m_frag_k = (uint8_t*)query_c_string;
588 m_depth = 0;
589 m_array_index.status.repr = "#main: not an array index query fragment";
590 m_array_index.value = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100591 }
592
Nigel Taob48ee752020-03-13 09:27:33 +1100593 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100594
Nigel Taob48ee752020-03-13 09:27:33 +1100595 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100596
597 // tick returns whether the fragment is a valid array index whose value is
598 // zero. If valid but non-zero, it decrements it and returns false.
599 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100600 if (m_array_index.status.is_ok()) {
Nigel Tao0291a472020-08-13 22:40:10 +1000601 if (m_array_index.value == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100602 return true;
603 }
Nigel Tao0291a472020-08-13 22:40:10 +1000604 m_array_index.value--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100605 }
606 return false;
607 }
608
609 // next_fragment moves to the next fragment, returning whether it existed.
610 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100611 uint8_t* k = m_frag_k;
612 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100613
614 this->reset(nullptr);
615
616 if (!k || (*k != '/')) {
617 return false;
618 }
619 k++;
620
621 bool all_digits = true;
622 uint8_t* i = k;
623 while ((*k != '\x00') && (*k != '/')) {
624 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
625 k++;
626 }
Nigel Taob48ee752020-03-13 09:27:33 +1100627 m_frag_i = i;
628 m_frag_j = i;
629 m_frag_k = k;
630 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100631 if (all_digits) {
632 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000633 m_array_index = wuffs_base__parse_number_u64(
634 wuffs_base__make_slice_u8(i, k - i),
635 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100636 }
637 return true;
638 }
639
Nigel Taob48ee752020-03-13 09:27:33 +1100640 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100641
Nigel Taob48ee752020-03-13 09:27:33 +1100642 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100643
644 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100645 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100646 return;
647 }
Nigel Taob48ee752020-03-13 09:27:33 +1100648 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100649 while (true) {
650 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100651 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100652 return;
653 }
654
655 if (*j == '\x00') {
656 break;
657
658 } else if (*j == '~') {
659 j++;
660 if (*j == '0') {
661 if (*ptr != '~') {
662 break;
663 }
664 } else if (*j == '1') {
665 if (*ptr != '/') {
666 break;
667 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100668 } else if (*j == 'n') {
669 if (*ptr != '\n') {
670 break;
671 }
672 } else if (*j == 'r') {
673 if (*ptr != '\r') {
674 break;
675 }
Nigel Tao904004e2020-11-15 20:56:04 +1100676 } else if (*j == 't') {
677 if (*ptr != '\t') {
678 break;
679 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100680 } else {
681 break;
682 }
683
684 } else if (*j != *ptr) {
685 break;
686 }
687
688 j++;
689 ptr++;
690 len--;
691 }
Nigel Taob48ee752020-03-13 09:27:33 +1100692 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100693 }
694
695 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100696 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100697 return;
698 }
699 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
700 size_t n = wuffs_base__utf_8__encode(
701 wuffs_base__make_slice_u8(&u[0],
702 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
703 code_point);
704 if (n > 0) {
705 this->incremental_match_slice(&u[0], n);
706 }
707 }
708
709 // validate returns whether the (ptr, len) arguments form a valid JSON
710 // Pointer. In particular, it must be valid UTF-8, and either be empty or
711 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100712 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
Nigel Tao904004e2020-11-15 20:56:04 +1100713 // followed by either 'n', 'r' or 't'.
Nigel Taod6fdfb12020-03-11 12:24:14 +1100714 static bool validate(char* query_c_string,
715 size_t length,
716 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100717 if (length <= 0) {
718 return true;
719 }
720 if (query_c_string[0] != '/') {
721 return false;
722 }
723 wuffs_base__slice_u8 s =
724 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
725 bool previous_was_tilde = false;
726 while (s.len > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +1000727 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s.ptr, s.len);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100728 if (!o.is_valid()) {
729 return false;
730 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100731
732 if (previous_was_tilde) {
733 switch (o.code_point) {
734 case '0':
735 case '1':
736 break;
737 case 'n':
738 case 'r':
Nigel Tao904004e2020-11-15 20:56:04 +1100739 case 't':
Nigel Taod6fdfb12020-03-11 12:24:14 +1100740 if (strict_json_pointer_syntax) {
741 return false;
742 }
743 break;
744 default:
745 return false;
746 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100747 }
748 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100749
Nigel Tao0cd2f982020-03-03 23:03:02 +1100750 s.ptr += o.byte_length;
751 s.len -= o.byte_length;
752 }
753 return !previous_was_tilde;
754 }
Nigel Taod60815c2020-03-26 14:32:35 +1100755} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100756
757// ----
758
Nigel Tao68920952020-03-03 11:25:18 +1100759struct {
760 int remaining_argc;
761 char** remaining_argv;
762
Nigel Tao3690e832020-03-12 16:52:26 +1100763 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100764 bool fail_if_unsandboxed;
Nigel Tao0291a472020-08-13 22:40:10 +1000765 bool input_allow_comments;
766 bool input_allow_extra_comma;
767 bool input_allow_inf_nan_numbers;
Nigel Tao21042052020-08-19 23:13:54 +1000768 bool output_comments;
Nigel Tao0291a472020-08-13 22:40:10 +1000769 bool output_extra_comma;
Nigel Tao75682542020-08-22 21:40:18 +1000770 bool output_inf_nan_numbers;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100771 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100772 bool tabs;
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000773
774 uint32_t max_output_depth;
775 uint32_t spaces;
776
777 char* query_c_string;
Nigel Taod60815c2020-03-26 14:32:35 +1100778} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100779
780const char* //
781parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000782 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100783 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100784
Nigel Tao04126792021-02-22 12:23:57 +1100785#if defined(WUFFS_EXAMPLE_SPEAK_JWCC_NOT_JSON)
786 g_flags.input_allow_comments = true;
787 g_flags.input_allow_extra_comma = true;
788 g_flags.output_comments = true;
789 g_flags.output_extra_comma = true;
790#endif
791
Nigel Tao68920952020-03-03 11:25:18 +1100792 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
793 for (; c < argc; c++) {
794 char* arg = argv[c];
795 if (*arg++ != '-') {
796 break;
797 }
798
799 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
800 // cases, a bare "-" is not a flag (some programs may interpret it as
801 // stdin) and a bare "--" means to stop parsing flags.
802 if (*arg == '\x00') {
803 break;
804 } else if (*arg == '-') {
805 arg++;
806 if (*arg == '\x00') {
807 c++;
808 break;
809 }
810 }
811
Nigel Tao3690e832020-03-12 16:52:26 +1100812 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100813 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100814 continue;
815 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100816 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
817 g_flags.max_output_depth = 1;
818 continue;
819 } else if (!strncmp(arg, "d=", 2) ||
820 !strncmp(arg, "max-output-depth=", 16)) {
821 while (*arg++ != '=') {
822 }
823 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000824 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
825 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000826 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100827 g_flags.max_output_depth = (uint32_t)(u.value);
828 continue;
829 }
830 return g_usage;
831 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100832 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100833 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100834 continue;
835 }
Nigel Tao0291a472020-08-13 22:40:10 +1000836 if (!strcmp(arg, "input-allow-comments")) {
837 g_flags.input_allow_comments = true;
Nigel Tao4e193592020-07-15 12:48:57 +1000838 continue;
839 }
Nigel Tao0291a472020-08-13 22:40:10 +1000840 if (!strcmp(arg, "input-allow-extra-comma")) {
841 g_flags.input_allow_extra_comma = true;
Nigel Tao4e193592020-07-15 12:48:57 +1000842 continue;
843 }
Nigel Tao0291a472020-08-13 22:40:10 +1000844 if (!strcmp(arg, "input-allow-inf-nan-numbers")) {
845 g_flags.input_allow_inf_nan_numbers = true;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000846 continue;
847 }
Nigel Tao04126792021-02-22 12:23:57 +1100848 if (!strcmp(arg, "input-jwcc")) {
849 g_flags.input_allow_comments = true;
850 g_flags.input_allow_extra_comma = true;
851 continue;
852 }
Nigel Tao21042052020-08-19 23:13:54 +1000853 if (!strcmp(arg, "jwcc")) {
854 g_flags.input_allow_comments = true;
855 g_flags.input_allow_extra_comma = true;
856 g_flags.output_comments = true;
857 g_flags.output_extra_comma = true;
858 continue;
859 }
860 if (!strcmp(arg, "output-comments")) {
861 g_flags.output_comments = true;
862 continue;
863 }
Nigel Tao0291a472020-08-13 22:40:10 +1000864 if (!strcmp(arg, "output-extra-comma")) {
865 g_flags.output_extra_comma = true;
Nigel Taodd114692020-07-25 21:54:12 +1000866 continue;
867 }
Nigel Tao75682542020-08-22 21:40:18 +1000868 if (!strcmp(arg, "output-inf-nan-numbers")) {
869 g_flags.output_inf_nan_numbers = true;
870 continue;
871 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100872 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
873 while (*arg++ != '=') {
874 }
Nigel Taod60815c2020-03-26 14:32:35 +1100875 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100876 continue;
877 }
Nigel Taoecadf722020-07-13 08:22:34 +1000878 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
879 while (*arg++ != '=') {
880 }
881 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
882 g_flags.spaces = arg[0] - '0';
883 continue;
884 }
885 return g_usage;
886 }
887 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100888 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100889 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100890 }
891 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100892 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100893 continue;
894 }
895
Nigel Taod60815c2020-03-26 14:32:35 +1100896 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100897 }
898
Nigel Taod60815c2020-03-26 14:32:35 +1100899 if (g_flags.query_c_string &&
900 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
901 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100902 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
903 }
904
Nigel Taod60815c2020-03-26 14:32:35 +1100905 g_flags.remaining_argc = argc - c;
906 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100907 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100908}
909
Nigel Tao2cf76db2020-02-27 22:42:01 +1100910const char* //
911initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100912 g_dst = wuffs_base__make_io_buffer(
913 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100914 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100915
Nigel Taod60815c2020-03-26 14:32:35 +1100916 g_src = wuffs_base__make_io_buffer(
917 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100918 wuffs_base__empty_io_buffer_meta());
919
Nigel Taod60815c2020-03-26 14:32:35 +1100920 g_tok = wuffs_base__make_token_buffer(
921 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100922 wuffs_base__empty_token_buffer_meta());
923
Nigel Tao991bd512020-08-19 09:38:16 +1000924 g_cursor_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100925
Nigel Taod60815c2020-03-26 14:32:35 +1100926 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100927
Nigel Taod60815c2020-03-26 14:32:35 +1100928 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100929
Nigel Tao773994c2021-02-22 10:50:08 +1100930 g_num_input_blank_lines = 0;
931
Nigel Tao21042052020-08-19 23:13:54 +1000932 g_is_after_comment = false;
933
Nigel Tao68920952020-03-03 11:25:18 +1100934 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100935 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100936 return "main: unsandboxed";
937 }
Nigel Tao21042052020-08-19 23:13:54 +1000938 if (g_flags.output_comments && !g_flags.compact_output &&
939 !g_flags.output_extra_comma) {
940 return "main: -output-comments requires one or both of -compact-output and "
941 "-output-extra-comma";
942 }
Nigel Tao75682542020-08-22 21:40:18 +1000943 if (g_flags.input_allow_inf_nan_numbers && !g_flags.output_inf_nan_numbers) {
944 return "main: -input-allow-inf-nan-numbers requires "
945 "-output-inf-nan-numbers";
946 }
Nigel Tao01abc842020-03-06 21:42:33 +1100947 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100948 if (g_flags.remaining_argc >
949 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
950 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100951 }
952
Nigel Tao773994c2021-02-22 10:50:08 +1100953 g_two_new_lines_then_256_indent_bytes = g_flags.tabs
954 ? TWO_NEW_LINES_THEN_256_TABS
955 : TWO_NEW_LINES_THEN_256_SPACES;
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000956 g_bytes_per_indent_depth = g_flags.tabs ? 1 : g_flags.spaces;
957
Nigel Taod60815c2020-03-26 14:32:35 +1100958 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100959
Nigel Taoc96b31c2020-07-27 22:37:23 +1000960 // If the query is non-empty, suppress writing to stdout until we've
Nigel Tao0cd2f982020-03-03 23:03:02 +1100961 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100962 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
963 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100964
Nigel Tao0291a472020-08-13 22:40:10 +1000965 TRY(g_dec.initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
966 .message());
Nigel Tao4b186b02020-03-18 14:25:21 +1100967
Nigel Tao0291a472020-08-13 22:40:10 +1000968 if (g_flags.input_allow_comments) {
969 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
970 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
Nigel Tao3c8589b2020-07-19 21:49:00 +1000971 }
Nigel Tao0291a472020-08-13 22:40:10 +1000972 if (g_flags.input_allow_extra_comma) {
973 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000974 }
Nigel Tao0291a472020-08-13 22:40:10 +1000975 if (g_flags.input_allow_inf_nan_numbers) {
976 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_INF_NAN_NUMBERS, true);
Nigel Tao51a38292020-07-19 22:43:17 +1000977 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000978
Nigel Taocd4cbc92020-09-22 22:22:15 +1000979 // Consume any optional trailing whitespace and comments. This isn't part of
980 // the JSON spec, but it works better with line oriented Unix tools (such as
981 // "echo 123 | jsonptr" where it's "echo", not "echo -n") or hand-edited JSON
982 // files which can accidentally contain trailing whitespace.
983 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_FILLER, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100984
985 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100986}
Nigel Tao1b073492020-02-16 22:11:36 +1100987
988// ----
989
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100990// ignore_return_value suppresses errors from -Wall -Werror.
991static void //
992ignore_return_value(int ignored) {}
993
Nigel Tao2914bae2020-02-26 09:40:30 +1100994const char* //
995read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +1100996 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +1100997 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +1100998 }
Nigel Taod60815c2020-03-26 14:32:35 +1100999 g_src.compact();
1000 if (g_src.meta.wi >= g_src.data.len) {
1001 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +11001002 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001003 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +10001004 ssize_t n = read(g_input_file_descriptor, g_src.writer_pointer(),
1005 g_src.writer_length());
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001006 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +11001007 g_src.meta.wi += n;
1008 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001009 break;
1010 } else if (errno != EINTR) {
1011 return strerror(errno);
1012 }
Nigel Tao1b073492020-02-16 22:11:36 +11001013 }
1014 return nullptr;
1015}
1016
Nigel Tao2914bae2020-02-26 09:40:30 +11001017const char* //
1018flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001019 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +10001020 size_t n = g_dst.reader_length();
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001021 if (n == 0) {
1022 break;
Nigel Tao1b073492020-02-16 22:11:36 +11001023 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001024 const int stdout_fd = 1;
Nigel Taod6a10df2020-07-27 11:47:47 +10001025 ssize_t i = write(stdout_fd, g_dst.reader_pointer(), n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001026 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +11001027 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001028 } else if (errno != EINTR) {
1029 return strerror(errno);
1030 }
Nigel Tao1b073492020-02-16 22:11:36 +11001031 }
Nigel Taod60815c2020-03-26 14:32:35 +11001032 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +11001033 return nullptr;
1034}
1035
Nigel Tao2914bae2020-02-26 09:40:30 +11001036const char* //
Nigel Tao6b86cbc2020-08-19 11:39:56 +10001037write_dst_slow(const void* s, size_t n) {
Nigel Tao1b073492020-02-16 22:11:36 +11001038 const uint8_t* p = static_cast<const uint8_t*>(s);
1039 while (n > 0) {
Nigel Taod6a10df2020-07-27 11:47:47 +10001040 size_t i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +11001041 if (i == 0) {
1042 const char* z = flush_dst();
1043 if (z) {
1044 return z;
1045 }
Nigel Taod6a10df2020-07-27 11:47:47 +10001046 i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +11001047 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +11001048 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +11001049 }
1050 }
1051
1052 if (i > n) {
1053 i = n;
1054 }
Nigel Taod60815c2020-03-26 14:32:35 +11001055 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
1056 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +11001057 p += i;
1058 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +11001059 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +11001060 }
1061 return nullptr;
1062}
1063
Nigel Tao6b86cbc2020-08-19 11:39:56 +10001064inline const char* //
1065write_dst(const void* s, size_t n) {
1066 if (g_suppress_write_dst > 0) {
1067 return nullptr;
1068 } else if (n <= (DST_BUFFER_ARRAY_SIZE - g_dst.meta.wi)) {
1069 memcpy(g_dst.data.ptr + g_dst.meta.wi, s, n);
1070 g_dst.meta.wi += n;
1071 g_wrote_to_dst = true;
1072 return nullptr;
1073 }
1074 return write_dst_slow(s, n);
1075}
1076
Nigel Tao77f85522021-07-19 00:00:13 +10001077#define TRY_INDENT \
Nigel Tao773994c2021-02-22 10:50:08 +11001078 do { \
1079 uint32_t adj = (g_num_input_blank_lines > 1) ? 1 : 0; \
1080 g_num_input_blank_lines = 0; \
1081 uint32_t indent = g_depth * g_bytes_per_indent_depth; \
1082 TRY(write_dst(g_two_new_lines_then_256_indent_bytes + 1 - adj, \
1083 1 + adj + (indent & 0xFF))); \
1084 for (indent >>= 8; indent > 0; indent--) { \
1085 TRY(write_dst(g_two_new_lines_then_256_indent_bytes + 2, 0x100)); \
1086 } \
Nigel Tao21042052020-08-19 23:13:54 +10001087 } while (false)
1088
Nigel Tao1b073492020-02-16 22:11:36 +11001089// ----
1090
Nigel Tao2914bae2020-02-26 09:40:30 +11001091const char* //
Nigel Tao7cb76542020-07-19 22:19:04 +10001092handle_unicode_code_point(uint32_t ucp) {
Nigel Tao63441812020-08-21 14:05:48 +10001093 if (ucp < 0x80) {
1094 return write_dst(&ascii_escapes[8 * ucp + 1], ascii_escapes[8 * ucp]);
Nigel Tao7cb76542020-07-19 22:19:04 +10001095 }
Nigel Tao7cb76542020-07-19 22:19:04 +10001096 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1097 size_t n = wuffs_base__utf_8__encode(
1098 wuffs_base__make_slice_u8(&u[0],
1099 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1100 ucp);
1101 if (n == 0) {
1102 return "main: internal error: unexpected Unicode code point";
1103 }
Nigel Tao0291a472020-08-13 22:40:10 +10001104 return write_dst(&u[0], n);
Nigel Tao168f60a2020-07-14 13:19:33 +10001105}
1106
Nigel Taod191a3f2020-07-19 22:14:54 +10001107// ----
1108
Nigel Tao50db4a42020-08-20 11:31:28 +10001109inline const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001110handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001111 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001112 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001113 uint64_t vbd = t.value_base_detail();
Nigel Taoee6927f2020-07-27 12:08:33 +10001114 uint64_t token_length = t.length();
Nigel Tao991bd512020-08-19 09:38:16 +10001115 // The "- token_length" is because we incremented g_cursor_index before
1116 // calling handle_token.
Nigel Taoee6927f2020-07-27 12:08:33 +10001117 wuffs_base__slice_u8 tok = wuffs_base__make_slice_u8(
Nigel Tao991bd512020-08-19 09:38:16 +10001118 g_src.data.ptr + g_cursor_index - token_length, token_length);
Nigel Tao1b073492020-02-16 22:11:36 +11001119
1120 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001121 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001122 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001123 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001124 return "main: no match for query";
1125 }
Nigel Taod60815c2020-03-26 14:32:35 +11001126 if (g_depth <= 0) {
1127 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001128 }
Nigel Taod60815c2020-03-26 14:32:35 +11001129 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001130
Nigel Taod60815c2020-03-26 14:32:35 +11001131 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1132 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001133 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao0291a472020-08-13 22:40:10 +10001134 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1135 ? "\"[…]\""
1136 : "\"{…}\"",
1137 7));
1138 } else {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001139 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001140 if ((g_ctx != context::in_list_after_bracket) &&
1141 (g_ctx != context::in_dict_after_brace) &&
1142 !g_flags.compact_output) {
Nigel Tao21042052020-08-19 23:13:54 +10001143 if (g_is_after_comment) {
Nigel Tao77f85522021-07-19 00:00:13 +10001144 TRY_INDENT;
Nigel Tao21042052020-08-19 23:13:54 +10001145 } else {
1146 if (g_flags.output_extra_comma) {
1147 TRY(write_dst(",", 1));
1148 }
Nigel Tao77f85522021-07-19 00:00:13 +10001149 TRY_INDENT;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001150 }
Nigel Tao773994c2021-02-22 10:50:08 +11001151 } else {
1152 g_num_input_blank_lines = 0;
Nigel Tao1b073492020-02-16 22:11:36 +11001153 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001154
1155 TRY(write_dst(
1156 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1157 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001158 }
1159
Nigel Taod60815c2020-03-26 14:32:35 +11001160 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1161 ? context::in_list_after_value
1162 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001163 goto after_value;
1164 }
1165
Nigel Taod1c928a2020-02-28 12:43:53 +11001166 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
Nigel Tao0291a472020-08-13 22:40:10 +10001167 // continuation of a multi-token chain.
1168 if (start_of_token_chain) {
Nigel Tao21042052020-08-19 23:13:54 +10001169 if (g_is_after_comment) {
Nigel Tao77f85522021-07-19 00:00:13 +10001170 TRY_INDENT;
Nigel Tao21042052020-08-19 23:13:54 +10001171 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001172 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1173 } else if (g_ctx != context::none) {
Nigel Tao0291a472020-08-13 22:40:10 +10001174 if ((g_ctx != context::in_list_after_bracket) &&
1175 (g_ctx != context::in_dict_after_brace)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001176 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001177 }
Nigel Taod60815c2020-03-26 14:32:35 +11001178 if (!g_flags.compact_output) {
Nigel Tao77f85522021-07-19 00:00:13 +10001179 TRY_INDENT;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001180 }
1181 }
1182
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001183 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001184 if (g_query.is_at(g_depth)) {
1185 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001186 case context::in_list_after_bracket:
1187 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001188 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001189 break;
1190 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001191 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001192 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001193 default:
1194 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001195 }
1196 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001197 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001198 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001199 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001200 // There is no next fragment. We have matched the complete query, and
1201 // the upcoming JSON value is the result of that query.
1202 //
Nigel Taod60815c2020-03-26 14:32:35 +11001203 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1204 // we were about to decode a top-level value. This makes any subsequent
1205 // indentation be relative to this point, and we will return g_eod
1206 // after the upcoming JSON value is complete.
1207 if (g_suppress_write_dst != 1) {
1208 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001209 }
Nigel Taod60815c2020-03-26 14:32:35 +11001210 g_suppress_write_dst = 0;
1211 g_ctx = context::none;
1212 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001213 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1214 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1215 // The query has moved on to the next fragment but the upcoming JSON
1216 // value is not a container.
1217 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001218 }
1219 }
1220
1221 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001222 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001223 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001224 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001225 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1226 g_suppress_write_dst++;
Nigel Tao0291a472020-08-13 22:40:10 +10001227 } else {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001228 TRY(write_dst(
1229 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1230 1));
1231 }
Nigel Taod60815c2020-03-26 14:32:35 +11001232 g_depth++;
1233 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1234 ? context::in_list_after_bracket
1235 : context::in_dict_after_brace;
Nigel Tao773994c2021-02-22 10:50:08 +11001236 g_num_input_blank_lines = 0;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001237 return nullptr;
1238
Nigel Tao2cf76db2020-02-27 22:42:01 +11001239 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Tao0291a472020-08-13 22:40:10 +10001240 if (start_of_token_chain) {
1241 TRY(write_dst("\"", 1));
1242 g_query.restart_fragment(in_dict_before_key() &&
1243 g_query.is_at(g_depth));
1244 }
1245
Nigel Taoade01652020-08-21 15:57:51 +10001246 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
Nigel Tao0291a472020-08-13 22:40:10 +10001247 TRY(write_dst(tok.ptr, tok.len));
1248 g_query.incremental_match_slice(tok.ptr, tok.len);
Nigel Tao0291a472020-08-13 22:40:10 +10001249 }
1250
Nigel Tao496e88b2020-04-09 22:10:08 +10001251 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001252 return nullptr;
1253 }
Nigel Tao0291a472020-08-13 22:40:10 +10001254 TRY(write_dst("\"", 1));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001255 goto after_value;
1256
1257 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001258 if (!t.continued()) {
1259 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001260 }
1261 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001262 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001263 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001264 }
1265
Nigel Tao3f688b22020-08-21 15:51:48 +10001266 // We have a literal or a number.
1267 TRY(write_dst(tok.ptr, tok.len));
1268 goto after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001269 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001270
Nigel Tao2cf76db2020-02-27 22:42:01 +11001271 // Book-keeping after completing a value (whether a container value or a
1272 // simple value). Empty parent containers are no longer empty. If the parent
1273 // container is a "{...}" object, toggle between keys and values.
1274after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001275 if (g_depth == 0) {
1276 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001277 }
Nigel Taod60815c2020-03-26 14:32:35 +11001278 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001279 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001280 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001281 break;
1282 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001283 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001284 break;
1285 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001286 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001287 break;
1288 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001289 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001290 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001291 default:
1292 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001293 }
1294 return nullptr;
1295}
1296
1297const char* //
1298main1(int argc, char** argv) {
1299 TRY(initialize_globals(argc, argv));
1300
Nigel Taocd183f92020-07-14 12:11:05 +10001301 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001302 while (true) {
Nigel Tao0291a472020-08-13 22:40:10 +10001303 wuffs_base__status status = g_dec.decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001304 &g_tok, &g_src,
1305 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001306
Nigel Taod60815c2020-03-26 14:32:35 +11001307 while (g_tok.meta.ri < g_tok.meta.wi) {
1308 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao991bd512020-08-19 09:38:16 +10001309 uint64_t token_length = t.length();
1310 if ((g_src.meta.ri - g_cursor_index) < token_length) {
Nigel Taod60815c2020-03-26 14:32:35 +11001311 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001312 }
Nigel Tao991bd512020-08-19 09:38:16 +10001313 g_cursor_index += token_length;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001314
Nigel Tao21042052020-08-19 23:13:54 +10001315 // Handle filler tokens (e.g. whitespace, punctuation and comments).
1316 // These are skipped, unless -output-comments is enabled.
Nigel Tao3c8589b2020-07-19 21:49:00 +10001317 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao773994c2021-02-22 10:50:08 +11001318 if (!g_flags.output_comments) {
1319 // No-op.
1320 } else if (t.value_base_detail() &
1321 WUFFS_BASE__TOKEN__VBD__FILLER__COMMENT_ANY) {
Nigel Tao21042052020-08-19 23:13:54 +10001322 if (g_flags.compact_output) {
1323 TRY(write_dst(g_src.data.ptr + g_cursor_index - token_length,
1324 token_length));
Nigel Tao77f85522021-07-19 00:00:13 +10001325 if (!t.continued() &&
1326 (t.value_base_detail() &
1327 WUFFS_BASE__TOKEN__VBD__FILLER__COMMENT_LINE)) {
1328 TRY(write_dst("\n", 1));
1329 }
Nigel Tao773994c2021-02-22 10:50:08 +11001330
Nigel Tao21042052020-08-19 23:13:54 +10001331 } else {
1332 if (start_of_token_chain) {
1333 if (g_is_after_comment) {
Nigel Tao77f85522021-07-19 00:00:13 +10001334 TRY_INDENT;
Nigel Tao21042052020-08-19 23:13:54 +10001335 } else if (g_ctx != context::none) {
1336 if (g_ctx == context::in_dict_after_key) {
1337 TRY(write_dst(":", 1));
1338 } else if ((g_ctx != context::in_list_after_bracket) &&
Nigel Taocd4cbc92020-09-22 22:22:15 +10001339 (g_ctx != context::in_dict_after_brace) &&
1340 (g_ctx != context::end_of_data)) {
Nigel Tao21042052020-08-19 23:13:54 +10001341 TRY(write_dst(",", 1));
1342 }
Nigel Tao77f85522021-07-19 00:00:13 +10001343 TRY_INDENT;
Nigel Tao21042052020-08-19 23:13:54 +10001344 }
1345 }
1346 TRY(write_dst(g_src.data.ptr + g_cursor_index - token_length,
1347 token_length));
Nigel Tao21042052020-08-19 23:13:54 +10001348 g_is_after_comment = true;
1349 }
Nigel Tao773994c2021-02-22 10:50:08 +11001350 if (g_ctx == context::in_list_after_bracket) {
1351 g_ctx = context::in_list_after_value;
1352 } else if (g_ctx == context::in_dict_after_brace) {
1353 g_ctx = context::in_dict_after_value;
1354 }
Nigel Tao77f85522021-07-19 00:00:13 +10001355 g_num_input_blank_lines = 0;
Nigel Tao773994c2021-02-22 10:50:08 +11001356
1357 } else {
1358 uint8_t* p = g_src.data.ptr + g_cursor_index - token_length;
1359 uint8_t* q = g_src.data.ptr + g_cursor_index;
1360 for (; p < q; p++) {
1361 if (*p == '\n') {
1362 g_num_input_blank_lines++;
1363 }
1364 }
Nigel Tao21042052020-08-19 23:13:54 +10001365 }
Nigel Tao773994c2021-02-22 10:50:08 +11001366
Nigel Tao496e88b2020-04-09 22:10:08 +10001367 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001368 continue;
1369 }
1370
Nigel Tao2ef39992020-04-09 17:24:39 +10001371 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao21042052020-08-19 23:13:54 +10001372 g_is_after_comment = false;
Nigel Tao496e88b2020-04-09 22:10:08 +10001373 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001374 if (z == nullptr) {
1375 continue;
Nigel Taocd4cbc92020-09-22 22:22:15 +10001376 } else if (z != g_eod) {
1377 return z;
1378 } else if (g_flags.query_c_string && *g_flags.query_c_string) {
1379 // With a non-empty g_query, don't try to consume trailing filler or
1380 // confirm that we've processed all the tokens.
1381 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001382 }
Nigel Taocd4cbc92020-09-22 22:22:15 +10001383 g_ctx = context::end_of_data;
Nigel Tao1b073492020-02-16 22:11:36 +11001384 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001385
1386 if (status.repr == nullptr) {
Nigel Taocd4cbc92020-09-22 22:22:15 +10001387 if (g_ctx != context::end_of_data) {
1388 return "main: internal error: unexpected end of token stream";
1389 }
1390 // Check that we've exhausted the input.
1391 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
1392 TRY(read_src());
1393 }
1394 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
1395 return "main: valid JSON followed by further (unexpected) data";
1396 }
1397 // All done.
1398 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001399 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Tao991bd512020-08-19 09:38:16 +10001400 if (g_cursor_index != g_src.meta.ri) {
Nigel Taod60815c2020-03-26 14:32:35 +11001401 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001402 }
1403 TRY(read_src());
Nigel Tao991bd512020-08-19 09:38:16 +10001404 g_cursor_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001405 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001406 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001407 } else {
1408 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001409 }
1410 }
1411}
1412
Nigel Tao2914bae2020-02-26 09:40:30 +11001413int //
1414compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001415 if (!status_msg) {
1416 return 0;
1417 }
Nigel Tao01abc842020-03-06 21:42:33 +11001418 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001419 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001420 n = strlen(status_msg);
1421 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001422 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001423 if (n >= 2047) {
1424 status_msg = "main: internal error: error message is too long";
1425 n = strnlen(status_msg, 2047);
1426 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001427 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001428 const int stderr_fd = 2;
1429 ignore_return_value(write(stderr_fd, status_msg, n));
1430 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Taoa51867d2021-05-19 21:34:09 +10001431 // Return an exit code of 1 for regular (foreseen) errors, e.g. badly
Nigel Tao9cc2c252020-02-23 17:05:49 +11001432 // formatted or unsupported input.
1433 //
1434 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1435 // run-time checks found that an internal invariant did not hold.
1436 //
1437 // Automated testing, including badly formatted inputs, can therefore
1438 // discriminate between expected failure (exit code 1) and unexpected failure
1439 // (other non-zero exit codes). Specifically, exit code 2 for internal
1440 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1441 // linux) for a segmentation fault (e.g. null pointer dereference).
1442 return strstr(status_msg, "internal error:") ? 2 : 1;
1443}
1444
Nigel Tao2914bae2020-02-26 09:40:30 +11001445int //
1446main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001447 // Look for an input filename (the first non-flag argument) in argv. If there
1448 // is one, open it (but do not read from it) before we self-impose a sandbox.
1449 //
1450 // Flags start with "-", unless it comes after a bare "--" arg.
1451 {
1452 bool dash_dash = false;
1453 int a;
1454 for (a = 1; a < argc; a++) {
1455 char* arg = argv[a];
1456 if ((arg[0] == '-') && !dash_dash) {
1457 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1458 continue;
1459 }
Nigel Taod60815c2020-03-26 14:32:35 +11001460 g_input_file_descriptor = open(arg, O_RDONLY);
1461 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001462 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1463 return 1;
1464 }
1465 break;
1466 }
1467 }
1468
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001469#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1470 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001471 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001472#endif
1473
Nigel Tao0cd2f982020-03-03 23:03:02 +11001474 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001475 if (g_wrote_to_dst) {
Nigel Taocd4cbc92020-09-22 22:22:15 +10001476 const char* z1 = g_is_after_comment ? nullptr : write_dst("\n", 1);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001477 const char* z2 = flush_dst();
1478 z = z ? z : (z1 ? z1 : z2);
1479 }
1480 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001481
1482#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1483 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1484 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1485 // only SYS_exit.
1486 syscall(SYS_exit, exit_code);
1487#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001488 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001489}