blob: eb8d5652d7f7f37988cc8ed311abd997af9ca0a1 [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
Nigel Taob55d5392020-09-11 08:11:02 +100017// jsonptr is discussed extensively at
18// https://nigeltao.github.io/blog/2020/jsonptr.html
19
Nigel Tao1b073492020-02-16 22:11:36 +110020/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110021jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao0291a472020-08-13 22:40:10 +100022(RFC 6901) query syntax. It reads UTF-8 JSON from stdin and writes
23canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110024
Nigel Taod60815c2020-03-26 14:32:35 +110025See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110026
27----
28
29JSON Pointer (and this program's implementation) is one of many JSON query
30languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
31simple and fewer-featured compared to those others.
32
Nigel Tao0291a472020-08-13 22:40:10 +100033One benefit of simplicity is that this program's JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110034implementations do not dynamically allocate or free memory (yet it does not
35require that the entire input fits in memory at once). They are therefore
36trivially protected against certain bug classes: memory leaks, double-frees and
37use-after-frees.
38
Nigel Tao0291a472020-08-13 22:40:10 +100039The core JSON implementation is also written in the Wuffs programming language
40(and then transpiled to C/C++), which is memory-safe (e.g. array indexing is
41bounds-checked) but also guards against integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110042
Nigel Taofe0cbbd2020-03-05 22:01:30 +110043For defense in depth, on Linux, this program also self-imposes a
44SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
45or writing its output. Under this sandbox, the only permitted system calls are
46read, write, exit and sigreturn.
47
Nigel Tao0291a472020-08-13 22:40:10 +100048All together, this program aims to safely handle untrusted JSON files without
49fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110050
51----
Nigel Tao1b073492020-02-16 22:11:36 +110052
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110053As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
54JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
55"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
56was first published on 2016-10-26 and updated on 2018-03-30.
57
Nigel Tao0cd2f982020-03-03 23:03:02 +110058After modifying this program, run "build-example.sh example/jsonptr/" and then
59"script/run-json-test-suite.sh" to catch correctness regressions.
60
61----
62
Nigel Taod0b16cb2020-03-14 10:15:54 +110063This program uses Wuffs' JSON decoder at a relatively low level, processing the
64decoder's token-stream output individually. The core loop, in pseudo-code, is
65"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110066changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110067output text based on that state and the token's source text. Notably,
68handle_token is not recursive, even though JSON values can nest.
69
70This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
71string, object) comprises one or more JSON tokens.
72
73An alternative, higher-level approach is in the sibling example/jsonfindptrs
74program. Neither approach is better or worse per se, but when studying this
75program, be aware that there are multiple ways to use Wuffs' JSON decoder.
76
77The two programs, jsonfindptrs and jsonptr, also demonstrate different
78trade-offs with regard to JSON object duplicate keys. The JSON spec permits
79different implementations to allow or reject duplicate keys. It is not always
80clear which approach is safer. Rejecting them is certainly unambiguous, and
81security bugs can lurk in ambiguous corners of a file format, if two different
82implementations both silently accept a file but differ on how to interpret it.
83On the other hand, in the worst case, detecting duplicate keys requires O(N)
84memory, where N is the size of the (potentially untrusted) input.
85
86This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
87mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
88it runs in a SECCOMP_MODE_STRICT sandbox.
89
90----
91
Nigel Tao50bfab92020-08-05 11:39:09 +100092To run:
Nigel Tao1b073492020-02-16 22:11:36 +110093
94$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
95
96for a C++ compiler $CXX, such as clang++ or g++.
97*/
98
Nigel Tao721190a2020-04-03 22:25:21 +110099#if defined(__cplusplus) && (__cplusplus < 201103L)
100#error "This C++ program requires -std=c++11 or later"
101#endif
102
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100103#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100104#include <fcntl.h>
105#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100106#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100107#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100108
109// Wuffs ships as a "single file C library" or "header file library" as per
110// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
111//
112// To use that single file as a "foo.c"-like implementation, instead of a
113// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
114// compiling it.
115#define WUFFS_IMPLEMENTATION
116
117// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
Nigel Tao2f788042021-01-23 19:29:19 +1100118// release/c/etc.c choose which parts of Wuffs to build. That file contains the
119// entire Wuffs standard library, implementing a variety of codecs and file
Nigel Tao1b073492020-02-16 22:11:36 +1100120// formats. Without this macro definition, an optimizing compiler or linker may
121// very well discard Wuffs code for unused codecs, but listing the Wuffs
122// modules we use makes that process explicit. Preprocessing means that such
123// code simply isn't compiled.
124#define WUFFS_CONFIG__MODULES
125#define WUFFS_CONFIG__MODULE__BASE
126#define WUFFS_CONFIG__MODULE__JSON
127
128// If building this program in an environment that doesn't easily accommodate
129// relative includes, you can use the script/inline-c-relative-includes.go
130// program to generate a stand-alone C++ file.
131#include "../../release/c/wuffs-unsupported-snapshot.c"
132
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100133#if defined(__linux__)
134#include <linux/prctl.h>
135#include <linux/seccomp.h>
136#include <sys/prctl.h>
137#include <sys/syscall.h>
138#define WUFFS_EXAMPLE_USE_SECCOMP
139#endif
140
Nigel Tao2cf76db2020-02-27 22:42:01 +1100141#define TRY(error_msg) \
142 do { \
143 const char* z = error_msg; \
144 if (z) { \
145 return z; \
146 } \
147 } while (false)
148
Nigel Taod60815c2020-03-26 14:32:35 +1100149static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100150
Nigel Taod60815c2020-03-26 14:32:35 +1100151static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100152 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100153 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100154 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100155 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100156 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100157 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000158 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100159 " -t -tabs\n"
160 " -fail-if-unsandboxed\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000161 " -input-allow-comments\n"
162 " -input-allow-extra-comma\n"
163 " -input-allow-inf-nan-numbers\n"
Nigel Tao21042052020-08-19 23:13:54 +1000164 " -jwcc\n"
165 " -output-comments\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000166 " -output-extra-comma\n"
Nigel Tao75682542020-08-22 21:40:18 +1000167 " -output-inf-nan-numbers\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000168 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100169 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100170 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100171 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100172 "----\n"
173 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100174 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000175 "Pointer (RFC 6901) query syntax. It reads UTF-8 JSON from stdin and\n"
176 "writes canonicalized, formatted UTF-8 JSON to stdout.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100177 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000178 "Canonicalized means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-written\n"
179 "as \"abc\\n\\txÅ·z\". It does not sort object keys, nor does it reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100180 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100181 "\n"
182 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000183 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000184 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags.\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000185 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000186 "The -input-allow-comments flag allows \"/*slash-star*/\" and\n"
187 "\"//slash-slash\" C-style comments within JSON input. Such comments are\n"
Nigel Tao21042052020-08-19 23:13:54 +1000188 "stripped from the output unless -output-comments was also set.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100189 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000190 "The -input-allow-extra-comma flag allows input like \"[1,2,]\", with a\n"
191 "comma after the final element of a JSON list or dictionary.\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000192 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000193 "The -input-allow-inf-nan-numbers flag allows non-finite floating point\n"
Nigel Tao75682542020-08-22 21:40:18 +1000194 "numbers (infinities and not-a-numbers) within JSON input. This flag\n"
195 "requires that -output-inf-nan-numbers also be set.\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000196 "\n"
Nigel Tao21042052020-08-19 23:13:54 +1000197 "The -output-comments flag copies any input comments to the output. It\n"
198 "has no effect unless -input-allow-comments was also set. Comments look\n"
199 "better after commas than before them, but a closing \"]\" or \"}\" can\n"
200 "occur after arbitrarily many comments, so -output-comments also requires\n"
201 "that one or both of -compact-output and -output-extra-comma be set.\n"
202 "\n"
Nigel Tao773994c2021-02-22 10:50:08 +1100203 "With -output-comments, consecutive blank lines collapse to a single\n"
204 "blank line. Without that flag, all blank lines are removed.\n"
205 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000206 "The -output-extra-comma flag writes output like \"[1,2,]\", with a comma\n"
207 "after the final element of a JSON list or dictionary. Such commas are\n"
208 "non-compliant with the JSON specification but many parsers accept them\n"
209 "and they can produce simpler line-based diffs. This flag is ignored when\n"
210 "-compact-output is set.\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000211 "\n"
Nigel Tao21042052020-08-19 23:13:54 +1000212 "The -jwcc flag (JSON With Commas and Comments) enables all of:\n"
213 " -input-allow-comments\n"
214 " -input-allow-extra-comma\n"
215 " -output-comments\n"
216 " -output-extra-comma\n"
217 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100218 "----\n"
219 "\n"
220 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100221 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100222 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
223 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100224 "will print:\n"
225 " \"baz\"\n"
226 "\n"
227 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100228 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100229 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
230 "child (the value in a key-value pair) of the root whose key is the empty\n"
231 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100232 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000233 "If the query found a valid JSON value, this program will return a zero\n"
234 "exit code even if the rest of the input isn't valid JSON. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100235 "did not find a value, or found an invalid one, this program returns a\n"
236 "non-zero exit code, but may still print partial output to stdout.\n"
237 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000238 "The JSON specification (https://json.org/) permits implementations that\n"
239 "allow duplicate keys, as this one does. This JSON Pointer implementation\n"
240 "is also greedy, following the first match for each fragment without\n"
241 "back-tracking. For example, the \"/foo/bar\" query will fail if the root\n"
242 "object has multiple \"foo\" children but the first one doesn't have a\n"
243 "\"bar\" child, even if later ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100244 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000245 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
246 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
Nigel Tao904004e2020-11-15 20:56:04 +1100247 "\"~\" and \"/\". Without this flag, this program also lets \"~n\",\n"
248 "\"~r\" and \"~t\" escape the New Line, Carriage Return and Horizontal\n"
249 "Tab ASCII control characters, which can work better with line oriented\n"
250 "(and tab separated) Unix tools that assume exactly one record (e.g. one\n"
251 "JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100252 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100253 "----\n"
254 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100255 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000256 "output depth. JSON containers ([] arrays and {} objects) can hold other\n"
257 "containers. When this flag is set, containers at depth NUM are replaced\n"
258 "with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is equivalent to\n"
259 "-d=1. The flag's absence is equivalent to an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100260 "\n"
261 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000262 "affect whether or not the input is considered valid JSON. The JSON\n"
263 "specification permits implementations to set their own maximum input\n"
264 "depth. This JSON implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100265 "\n"
266 "Depth is measured in terms of nested containers. It is unaffected by the\n"
267 "number of spaces or tabs used to indent.\n"
268 "\n"
269 "When both -max-output-depth and -query are set, the output depth is\n"
270 "measured from when the query resolves, not from the input root. The\n"
271 "input depth (measured from the root) is still limited to 1024.\n"
272 "\n"
273 "----\n"
274 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100275 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
276 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100277 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100278
Nigel Tao2cf76db2020-02-27 22:42:01 +1100279// ----
280
Nigel Tao63441812020-08-21 14:05:48 +1000281// ascii_escapes was created by script/print-json-ascii-escapes.go.
282const uint8_t ascii_escapes[1024] = {
283 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x30, 0x00, // 0x00: "\\u0000"
284 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x31, 0x00, // 0x01: "\\u0001"
285 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x32, 0x00, // 0x02: "\\u0002"
286 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x33, 0x00, // 0x03: "\\u0003"
287 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x34, 0x00, // 0x04: "\\u0004"
288 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x35, 0x00, // 0x05: "\\u0005"
289 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x36, 0x00, // 0x06: "\\u0006"
290 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x37, 0x00, // 0x07: "\\u0007"
291 0x02, 0x5C, 0x62, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x08: "\\b"
292 0x02, 0x5C, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x09: "\\t"
293 0x02, 0x5C, 0x6E, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0A: "\\n"
294 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x42, 0x00, // 0x0B: "\\u000B"
295 0x02, 0x5C, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0C: "\\f"
296 0x02, 0x5C, 0x72, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0D: "\\r"
297 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x45, 0x00, // 0x0E: "\\u000E"
298 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x46, 0x00, // 0x0F: "\\u000F"
299 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x30, 0x00, // 0x10: "\\u0010"
300 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x31, 0x00, // 0x11: "\\u0011"
301 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x32, 0x00, // 0x12: "\\u0012"
302 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x33, 0x00, // 0x13: "\\u0013"
303 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x34, 0x00, // 0x14: "\\u0014"
304 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x35, 0x00, // 0x15: "\\u0015"
305 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x36, 0x00, // 0x16: "\\u0016"
306 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x37, 0x00, // 0x17: "\\u0017"
307 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x38, 0x00, // 0x18: "\\u0018"
308 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x39, 0x00, // 0x19: "\\u0019"
309 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x41, 0x00, // 0x1A: "\\u001A"
310 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x42, 0x00, // 0x1B: "\\u001B"
311 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x43, 0x00, // 0x1C: "\\u001C"
312 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x44, 0x00, // 0x1D: "\\u001D"
313 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x45, 0x00, // 0x1E: "\\u001E"
314 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x46, 0x00, // 0x1F: "\\u001F"
315 0x06, 0x5C, 0x75, 0x30, 0x30, 0x32, 0x30, 0x00, // 0x20: "\\u0020"
316 0x01, 0x21, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x21: "!"
317 0x02, 0x5C, 0x22, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x22: "\\\""
318 0x01, 0x23, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x23: "#"
319 0x01, 0x24, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x24: "$"
320 0x01, 0x25, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x25: "%"
321 0x01, 0x26, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x26: "&"
322 0x01, 0x27, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x27: "'"
323 0x01, 0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x28: "("
324 0x01, 0x29, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x29: ")"
325 0x01, 0x2A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2A: "*"
326 0x01, 0x2B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2B: "+"
327 0x01, 0x2C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2C: ","
328 0x01, 0x2D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2D: "-"
329 0x01, 0x2E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2E: "."
330 0x01, 0x2F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2F: "/"
331 0x01, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x30: "0"
332 0x01, 0x31, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x31: "1"
333 0x01, 0x32, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x32: "2"
334 0x01, 0x33, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x33: "3"
335 0x01, 0x34, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x34: "4"
336 0x01, 0x35, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x35: "5"
337 0x01, 0x36, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x36: "6"
338 0x01, 0x37, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x37: "7"
339 0x01, 0x38, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x38: "8"
340 0x01, 0x39, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x39: "9"
341 0x01, 0x3A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3A: ":"
342 0x01, 0x3B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3B: ";"
343 0x01, 0x3C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3C: "<"
344 0x01, 0x3D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3D: "="
345 0x01, 0x3E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3E: ">"
346 0x01, 0x3F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3F: "?"
347 0x01, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x40: "@"
348 0x01, 0x41, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x41: "A"
349 0x01, 0x42, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x42: "B"
350 0x01, 0x43, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x43: "C"
351 0x01, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x44: "D"
352 0x01, 0x45, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x45: "E"
353 0x01, 0x46, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x46: "F"
354 0x01, 0x47, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x47: "G"
355 0x01, 0x48, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x48: "H"
356 0x01, 0x49, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x49: "I"
357 0x01, 0x4A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4A: "J"
358 0x01, 0x4B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4B: "K"
359 0x01, 0x4C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4C: "L"
360 0x01, 0x4D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4D: "M"
361 0x01, 0x4E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4E: "N"
362 0x01, 0x4F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4F: "O"
363 0x01, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x50: "P"
364 0x01, 0x51, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x51: "Q"
365 0x01, 0x52, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x52: "R"
366 0x01, 0x53, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x53: "S"
367 0x01, 0x54, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x54: "T"
368 0x01, 0x55, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x55: "U"
369 0x01, 0x56, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x56: "V"
370 0x01, 0x57, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x57: "W"
371 0x01, 0x58, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x58: "X"
372 0x01, 0x59, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x59: "Y"
373 0x01, 0x5A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5A: "Z"
374 0x01, 0x5B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5B: "["
375 0x02, 0x5C, 0x5C, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5C: "\\\\"
376 0x01, 0x5D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5D: "]"
377 0x01, 0x5E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5E: "^"
378 0x01, 0x5F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5F: "_"
379 0x01, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x60: "`"
380 0x01, 0x61, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x61: "a"
381 0x01, 0x62, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x62: "b"
382 0x01, 0x63, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x63: "c"
383 0x01, 0x64, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x64: "d"
384 0x01, 0x65, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x65: "e"
385 0x01, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x66: "f"
386 0x01, 0x67, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x67: "g"
387 0x01, 0x68, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x68: "h"
388 0x01, 0x69, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x69: "i"
389 0x01, 0x6A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6A: "j"
390 0x01, 0x6B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6B: "k"
391 0x01, 0x6C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6C: "l"
392 0x01, 0x6D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6D: "m"
393 0x01, 0x6E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6E: "n"
394 0x01, 0x6F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6F: "o"
395 0x01, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x70: "p"
396 0x01, 0x71, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x71: "q"
397 0x01, 0x72, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x72: "r"
398 0x01, 0x73, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x73: "s"
399 0x01, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x74: "t"
400 0x01, 0x75, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x75: "u"
401 0x01, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x76: "v"
402 0x01, 0x77, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x77: "w"
403 0x01, 0x78, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x78: "x"
404 0x01, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x79: "y"
405 0x01, 0x7A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7A: "z"
406 0x01, 0x7B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7B: "{"
407 0x01, 0x7C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7C: "|"
408 0x01, 0x7D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7D: "}"
409 0x01, 0x7E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7E: "~"
410 0x01, 0x7F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7F: "<DEL>"
411};
412
Nigel Taof3146c22020-03-26 08:47:42 +1100413// Wuffs allows either statically or dynamically allocated work buffers. This
414// program exercises static allocation.
415#define WORK_BUFFER_ARRAY_SIZE \
416 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
417#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100418uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100419#else
420// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100421uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100422#endif
423
Nigel Taod60815c2020-03-26 14:32:35 +1100424bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100425
Nigel Taod60815c2020-03-26 14:32:35 +1100426int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100427
Nigel Tao773994c2021-02-22 10:50:08 +1100428#define TWO_NEW_LINES_THEN_256_SPACES \
429 "\n\n " \
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000430 " " \
431 " " \
Nigel Tao773994c2021-02-22 10:50:08 +1100432 " "
433#define TWO_NEW_LINES_THEN_256_TABS \
434 "\n\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000435 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
436 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
437 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
438 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
439 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
Nigel Tao773994c2021-02-22 10:50:08 +1100440 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t"
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000441
Nigel Tao773994c2021-02-22 10:50:08 +1100442const char* g_two_new_lines_then_256_indent_bytes;
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000443uint32_t g_bytes_per_indent_depth;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100444
Nigel Taofdac24a2020-03-06 21:53:08 +1100445#ifndef DST_BUFFER_ARRAY_SIZE
446#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100447#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100448#ifndef SRC_BUFFER_ARRAY_SIZE
449#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100450#endif
Nigel Tao63e67962020-08-26 00:00:32 +1000451// 1 token is 8 bytes. 4Ki tokens is 32KiB.
Nigel Taofdac24a2020-03-06 21:53:08 +1100452#ifndef TOKEN_BUFFER_ARRAY_SIZE
453#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100454#endif
455
Nigel Taod60815c2020-03-26 14:32:35 +1100456uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
457uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
458wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100459
Nigel Taod60815c2020-03-26 14:32:35 +1100460wuffs_base__io_buffer g_dst;
461wuffs_base__io_buffer g_src;
462wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100463
Nigel Tao991bd512020-08-19 09:38:16 +1000464// g_cursor_index is the g_src.data.ptr index between the previous and current
465// token. An invariant is that (g_cursor_index <= g_src.meta.ri).
466size_t g_cursor_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100467
Nigel Taod60815c2020-03-26 14:32:35 +1100468uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100469
470enum class context {
471 none,
472 in_list_after_bracket,
473 in_list_after_value,
474 in_dict_after_brace,
475 in_dict_after_key,
476 in_dict_after_value,
Nigel Taocd4cbc92020-09-22 22:22:15 +1000477 end_of_data,
Nigel Taod60815c2020-03-26 14:32:35 +1100478} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100479
Nigel Tao0cd2f982020-03-03 23:03:02 +1100480bool //
481in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100482 return (g_ctx == context::in_dict_after_brace) ||
483 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100484}
485
Nigel Tao773994c2021-02-22 10:50:08 +1100486uint64_t g_num_input_blank_lines;
487
Nigel Tao21042052020-08-19 23:13:54 +1000488bool g_is_after_comment;
489
Nigel Taod60815c2020-03-26 14:32:35 +1100490uint32_t g_suppress_write_dst;
491bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100492
Nigel Tao0291a472020-08-13 22:40:10 +1000493wuffs_json__decoder g_dec;
Nigel Taoea532452020-07-27 00:03:00 +1000494
Nigel Tao0cd2f982020-03-03 23:03:02 +1100495// ----
496
497// Query is a JSON Pointer query. After initializing with a NUL-terminated C
498// string, its multiple fragments are consumed as the program walks the JSON
499// data from stdin. For example, letting "$" denote a NUL, suppose that we
500// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100501// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100502//
503// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
504// / a p p l e / b a n a n a / 1 2 / d u r i a n $
505// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
506// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100507// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100508//
Nigel Taob48ee752020-03-13 09:27:33 +1100509// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
510// start (inclusive) and end (exclusive) of the query fragment. They satisfy
511// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
512// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100513//
Nigel Taob48ee752020-03-13 09:27:33 +1100514// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
515// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100516//
517// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
518// tokens, as backslash-escaped values within that JSON string may each get
519// their own token.
520//
Nigel Taob48ee752020-03-13 09:27:33 +1100521// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100522//
Nigel Taob48ee752020-03-13 09:27:33 +1100523// While mfj remains non-nullptr, each token's unescaped contents are then
524// compared to that part of the fragment from mfj to mfk. If it is a prefix
525// (including the case of an exact match), then mfj is advanced by the
526// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100527//
528// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
529// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100530// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
531// responsible for calling Query::validate (with a strict_json_pointer_syntax
532// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100533//
Nigel Taob48ee752020-03-13 09:27:33 +1100534// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
535// incrementally match the object key with the query fragment. For example, if
536// we have already matched the "ban" of "banana", then we would accept any of
537// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
538// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100539//
Nigel Taob48ee752020-03-13 09:27:33 +1100540// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100541// v
542// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
543// / a p p l e / b a n a n a / 1 2 / d u r i a n $
544// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
545// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100546// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100547//
548// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100549// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
550// have a fragment match: the query fragment equals the object key. If there is
551// a next fragment (in this example, "12") we move the frag_etc pointers to its
552// start and end and increment Query::m_depth. Otherwise, we have matched the
553// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100554//
555// The discussion above centers on object keys. If the query fragment is
556// numeric then it can also match as an array index: the string fragment "12"
557// will match an array's 13th element (starting counting from zero). See RFC
558// 6901 for its precise definition of an "array index" number.
559//
Nigel Taob48ee752020-03-13 09:27:33 +1100560// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100561// whose type (wuffs_base__result_u64) is a result type. An error result means
562// that the fragment is not an array index. A value result holds the number of
563// list elements remaining. When matching a query fragment in an array (instead
564// of in an object), each element ticks this number down towards zero. At zero,
565// the upcoming JSON value is the one that matches the query fragment.
566class Query {
567 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100568 uint8_t* m_frag_i;
569 uint8_t* m_frag_j;
570 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100571
Nigel Taob48ee752020-03-13 09:27:33 +1100572 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100573
Nigel Taob48ee752020-03-13 09:27:33 +1100574 wuffs_base__result_u64 m_array_index;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100575
576 public:
577 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100578 m_frag_i = (uint8_t*)query_c_string;
579 m_frag_j = (uint8_t*)query_c_string;
580 m_frag_k = (uint8_t*)query_c_string;
581 m_depth = 0;
582 m_array_index.status.repr = "#main: not an array index query fragment";
583 m_array_index.value = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100584 }
585
Nigel Taob48ee752020-03-13 09:27:33 +1100586 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100587
Nigel Taob48ee752020-03-13 09:27:33 +1100588 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100589
590 // tick returns whether the fragment is a valid array index whose value is
591 // zero. If valid but non-zero, it decrements it and returns false.
592 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100593 if (m_array_index.status.is_ok()) {
Nigel Tao0291a472020-08-13 22:40:10 +1000594 if (m_array_index.value == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100595 return true;
596 }
Nigel Tao0291a472020-08-13 22:40:10 +1000597 m_array_index.value--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100598 }
599 return false;
600 }
601
602 // next_fragment moves to the next fragment, returning whether it existed.
603 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100604 uint8_t* k = m_frag_k;
605 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100606
607 this->reset(nullptr);
608
609 if (!k || (*k != '/')) {
610 return false;
611 }
612 k++;
613
614 bool all_digits = true;
615 uint8_t* i = k;
616 while ((*k != '\x00') && (*k != '/')) {
617 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
618 k++;
619 }
Nigel Taob48ee752020-03-13 09:27:33 +1100620 m_frag_i = i;
621 m_frag_j = i;
622 m_frag_k = k;
623 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100624 if (all_digits) {
625 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000626 m_array_index = wuffs_base__parse_number_u64(
627 wuffs_base__make_slice_u8(i, k - i),
628 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100629 }
630 return true;
631 }
632
Nigel Taob48ee752020-03-13 09:27:33 +1100633 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100634
Nigel Taob48ee752020-03-13 09:27:33 +1100635 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100636
637 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100638 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100639 return;
640 }
Nigel Taob48ee752020-03-13 09:27:33 +1100641 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100642 while (true) {
643 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100644 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100645 return;
646 }
647
648 if (*j == '\x00') {
649 break;
650
651 } else if (*j == '~') {
652 j++;
653 if (*j == '0') {
654 if (*ptr != '~') {
655 break;
656 }
657 } else if (*j == '1') {
658 if (*ptr != '/') {
659 break;
660 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100661 } else if (*j == 'n') {
662 if (*ptr != '\n') {
663 break;
664 }
665 } else if (*j == 'r') {
666 if (*ptr != '\r') {
667 break;
668 }
Nigel Tao904004e2020-11-15 20:56:04 +1100669 } else if (*j == 't') {
670 if (*ptr != '\t') {
671 break;
672 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100673 } else {
674 break;
675 }
676
677 } else if (*j != *ptr) {
678 break;
679 }
680
681 j++;
682 ptr++;
683 len--;
684 }
Nigel Taob48ee752020-03-13 09:27:33 +1100685 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100686 }
687
688 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100689 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100690 return;
691 }
692 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
693 size_t n = wuffs_base__utf_8__encode(
694 wuffs_base__make_slice_u8(&u[0],
695 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
696 code_point);
697 if (n > 0) {
698 this->incremental_match_slice(&u[0], n);
699 }
700 }
701
702 // validate returns whether the (ptr, len) arguments form a valid JSON
703 // Pointer. In particular, it must be valid UTF-8, and either be empty or
704 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100705 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
Nigel Tao904004e2020-11-15 20:56:04 +1100706 // followed by either 'n', 'r' or 't'.
Nigel Taod6fdfb12020-03-11 12:24:14 +1100707 static bool validate(char* query_c_string,
708 size_t length,
709 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100710 if (length <= 0) {
711 return true;
712 }
713 if (query_c_string[0] != '/') {
714 return false;
715 }
716 wuffs_base__slice_u8 s =
717 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
718 bool previous_was_tilde = false;
719 while (s.len > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +1000720 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s.ptr, s.len);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100721 if (!o.is_valid()) {
722 return false;
723 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100724
725 if (previous_was_tilde) {
726 switch (o.code_point) {
727 case '0':
728 case '1':
729 break;
730 case 'n':
731 case 'r':
Nigel Tao904004e2020-11-15 20:56:04 +1100732 case 't':
Nigel Taod6fdfb12020-03-11 12:24:14 +1100733 if (strict_json_pointer_syntax) {
734 return false;
735 }
736 break;
737 default:
738 return false;
739 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100740 }
741 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100742
Nigel Tao0cd2f982020-03-03 23:03:02 +1100743 s.ptr += o.byte_length;
744 s.len -= o.byte_length;
745 }
746 return !previous_was_tilde;
747 }
Nigel Taod60815c2020-03-26 14:32:35 +1100748} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100749
750// ----
751
Nigel Tao68920952020-03-03 11:25:18 +1100752struct {
753 int remaining_argc;
754 char** remaining_argv;
755
Nigel Tao3690e832020-03-12 16:52:26 +1100756 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100757 bool fail_if_unsandboxed;
Nigel Tao0291a472020-08-13 22:40:10 +1000758 bool input_allow_comments;
759 bool input_allow_extra_comma;
760 bool input_allow_inf_nan_numbers;
Nigel Tao21042052020-08-19 23:13:54 +1000761 bool output_comments;
Nigel Tao0291a472020-08-13 22:40:10 +1000762 bool output_extra_comma;
Nigel Tao75682542020-08-22 21:40:18 +1000763 bool output_inf_nan_numbers;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100764 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100765 bool tabs;
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000766
767 uint32_t max_output_depth;
768 uint32_t spaces;
769
770 char* query_c_string;
Nigel Taod60815c2020-03-26 14:32:35 +1100771} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100772
773const char* //
774parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000775 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100776 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100777
778 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
779 for (; c < argc; c++) {
780 char* arg = argv[c];
781 if (*arg++ != '-') {
782 break;
783 }
784
785 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
786 // cases, a bare "-" is not a flag (some programs may interpret it as
787 // stdin) and a bare "--" means to stop parsing flags.
788 if (*arg == '\x00') {
789 break;
790 } else if (*arg == '-') {
791 arg++;
792 if (*arg == '\x00') {
793 c++;
794 break;
795 }
796 }
797
Nigel Tao3690e832020-03-12 16:52:26 +1100798 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100799 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100800 continue;
801 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100802 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
803 g_flags.max_output_depth = 1;
804 continue;
805 } else if (!strncmp(arg, "d=", 2) ||
806 !strncmp(arg, "max-output-depth=", 16)) {
807 while (*arg++ != '=') {
808 }
809 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000810 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
811 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000812 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100813 g_flags.max_output_depth = (uint32_t)(u.value);
814 continue;
815 }
816 return g_usage;
817 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100818 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100819 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100820 continue;
821 }
Nigel Tao0291a472020-08-13 22:40:10 +1000822 if (!strcmp(arg, "input-allow-comments")) {
823 g_flags.input_allow_comments = true;
Nigel Tao4e193592020-07-15 12:48:57 +1000824 continue;
825 }
Nigel Tao0291a472020-08-13 22:40:10 +1000826 if (!strcmp(arg, "input-allow-extra-comma")) {
827 g_flags.input_allow_extra_comma = true;
Nigel Tao4e193592020-07-15 12:48:57 +1000828 continue;
829 }
Nigel Tao0291a472020-08-13 22:40:10 +1000830 if (!strcmp(arg, "input-allow-inf-nan-numbers")) {
831 g_flags.input_allow_inf_nan_numbers = true;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000832 continue;
833 }
Nigel Tao21042052020-08-19 23:13:54 +1000834 if (!strcmp(arg, "jwcc")) {
835 g_flags.input_allow_comments = true;
836 g_flags.input_allow_extra_comma = true;
837 g_flags.output_comments = true;
838 g_flags.output_extra_comma = true;
839 continue;
840 }
841 if (!strcmp(arg, "output-comments")) {
842 g_flags.output_comments = true;
843 continue;
844 }
Nigel Tao0291a472020-08-13 22:40:10 +1000845 if (!strcmp(arg, "output-extra-comma")) {
846 g_flags.output_extra_comma = true;
Nigel Taodd114692020-07-25 21:54:12 +1000847 continue;
848 }
Nigel Tao75682542020-08-22 21:40:18 +1000849 if (!strcmp(arg, "output-inf-nan-numbers")) {
850 g_flags.output_inf_nan_numbers = true;
851 continue;
852 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100853 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
854 while (*arg++ != '=') {
855 }
Nigel Taod60815c2020-03-26 14:32:35 +1100856 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100857 continue;
858 }
Nigel Taoecadf722020-07-13 08:22:34 +1000859 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
860 while (*arg++ != '=') {
861 }
862 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
863 g_flags.spaces = arg[0] - '0';
864 continue;
865 }
866 return g_usage;
867 }
868 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100869 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100870 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100871 }
872 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100873 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100874 continue;
875 }
876
Nigel Taod60815c2020-03-26 14:32:35 +1100877 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100878 }
879
Nigel Taod60815c2020-03-26 14:32:35 +1100880 if (g_flags.query_c_string &&
881 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
882 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100883 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
884 }
885
Nigel Taod60815c2020-03-26 14:32:35 +1100886 g_flags.remaining_argc = argc - c;
887 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100888 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100889}
890
Nigel Tao2cf76db2020-02-27 22:42:01 +1100891const char* //
892initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100893 g_dst = wuffs_base__make_io_buffer(
894 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100895 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100896
Nigel Taod60815c2020-03-26 14:32:35 +1100897 g_src = wuffs_base__make_io_buffer(
898 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100899 wuffs_base__empty_io_buffer_meta());
900
Nigel Taod60815c2020-03-26 14:32:35 +1100901 g_tok = wuffs_base__make_token_buffer(
902 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100903 wuffs_base__empty_token_buffer_meta());
904
Nigel Tao991bd512020-08-19 09:38:16 +1000905 g_cursor_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100906
Nigel Taod60815c2020-03-26 14:32:35 +1100907 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100908
Nigel Taod60815c2020-03-26 14:32:35 +1100909 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100910
Nigel Tao773994c2021-02-22 10:50:08 +1100911 g_num_input_blank_lines = 0;
912
Nigel Tao21042052020-08-19 23:13:54 +1000913 g_is_after_comment = false;
914
Nigel Tao68920952020-03-03 11:25:18 +1100915 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100916 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100917 return "main: unsandboxed";
918 }
Nigel Tao21042052020-08-19 23:13:54 +1000919 if (g_flags.output_comments && !g_flags.compact_output &&
920 !g_flags.output_extra_comma) {
921 return "main: -output-comments requires one or both of -compact-output and "
922 "-output-extra-comma";
923 }
Nigel Tao75682542020-08-22 21:40:18 +1000924 if (g_flags.input_allow_inf_nan_numbers && !g_flags.output_inf_nan_numbers) {
925 return "main: -input-allow-inf-nan-numbers requires "
926 "-output-inf-nan-numbers";
927 }
Nigel Tao01abc842020-03-06 21:42:33 +1100928 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100929 if (g_flags.remaining_argc >
930 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
931 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100932 }
933
Nigel Tao773994c2021-02-22 10:50:08 +1100934 g_two_new_lines_then_256_indent_bytes = g_flags.tabs
935 ? TWO_NEW_LINES_THEN_256_TABS
936 : TWO_NEW_LINES_THEN_256_SPACES;
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000937 g_bytes_per_indent_depth = g_flags.tabs ? 1 : g_flags.spaces;
938
Nigel Taod60815c2020-03-26 14:32:35 +1100939 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100940
Nigel Taoc96b31c2020-07-27 22:37:23 +1000941 // If the query is non-empty, suppress writing to stdout until we've
Nigel Tao0cd2f982020-03-03 23:03:02 +1100942 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100943 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
944 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100945
Nigel Tao0291a472020-08-13 22:40:10 +1000946 TRY(g_dec.initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
947 .message());
Nigel Tao4b186b02020-03-18 14:25:21 +1100948
Nigel Tao0291a472020-08-13 22:40:10 +1000949 if (g_flags.input_allow_comments) {
950 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
951 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
Nigel Tao3c8589b2020-07-19 21:49:00 +1000952 }
Nigel Tao0291a472020-08-13 22:40:10 +1000953 if (g_flags.input_allow_extra_comma) {
954 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000955 }
Nigel Tao0291a472020-08-13 22:40:10 +1000956 if (g_flags.input_allow_inf_nan_numbers) {
957 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_INF_NAN_NUMBERS, true);
Nigel Tao51a38292020-07-19 22:43:17 +1000958 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000959
Nigel Taocd4cbc92020-09-22 22:22:15 +1000960 // Consume any optional trailing whitespace and comments. This isn't part of
961 // the JSON spec, but it works better with line oriented Unix tools (such as
962 // "echo 123 | jsonptr" where it's "echo", not "echo -n") or hand-edited JSON
963 // files which can accidentally contain trailing whitespace.
964 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_FILLER, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100965
966 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100967}
Nigel Tao1b073492020-02-16 22:11:36 +1100968
969// ----
970
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100971// ignore_return_value suppresses errors from -Wall -Werror.
972static void //
973ignore_return_value(int ignored) {}
974
Nigel Tao2914bae2020-02-26 09:40:30 +1100975const char* //
976read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +1100977 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +1100978 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +1100979 }
Nigel Taod60815c2020-03-26 14:32:35 +1100980 g_src.compact();
981 if (g_src.meta.wi >= g_src.data.len) {
982 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100983 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100984 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000985 ssize_t n = read(g_input_file_descriptor, g_src.writer_pointer(),
986 g_src.writer_length());
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100987 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100988 g_src.meta.wi += n;
989 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100990 break;
991 } else if (errno != EINTR) {
992 return strerror(errno);
993 }
Nigel Tao1b073492020-02-16 22:11:36 +1100994 }
995 return nullptr;
996}
997
Nigel Tao2914bae2020-02-26 09:40:30 +1100998const char* //
999flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001000 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +10001001 size_t n = g_dst.reader_length();
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001002 if (n == 0) {
1003 break;
Nigel Tao1b073492020-02-16 22:11:36 +11001004 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001005 const int stdout_fd = 1;
Nigel Taod6a10df2020-07-27 11:47:47 +10001006 ssize_t i = write(stdout_fd, g_dst.reader_pointer(), n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001007 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +11001008 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001009 } else if (errno != EINTR) {
1010 return strerror(errno);
1011 }
Nigel Tao1b073492020-02-16 22:11:36 +11001012 }
Nigel Taod60815c2020-03-26 14:32:35 +11001013 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +11001014 return nullptr;
1015}
1016
Nigel Tao2914bae2020-02-26 09:40:30 +11001017const char* //
Nigel Tao6b86cbc2020-08-19 11:39:56 +10001018write_dst_slow(const void* s, size_t n) {
Nigel Tao1b073492020-02-16 22:11:36 +11001019 const uint8_t* p = static_cast<const uint8_t*>(s);
1020 while (n > 0) {
Nigel Taod6a10df2020-07-27 11:47:47 +10001021 size_t i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +11001022 if (i == 0) {
1023 const char* z = flush_dst();
1024 if (z) {
1025 return z;
1026 }
Nigel Taod6a10df2020-07-27 11:47:47 +10001027 i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +11001028 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +11001029 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +11001030 }
1031 }
1032
1033 if (i > n) {
1034 i = n;
1035 }
Nigel Taod60815c2020-03-26 14:32:35 +11001036 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
1037 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +11001038 p += i;
1039 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +11001040 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +11001041 }
1042 return nullptr;
1043}
1044
Nigel Tao6b86cbc2020-08-19 11:39:56 +10001045inline const char* //
1046write_dst(const void* s, size_t n) {
1047 if (g_suppress_write_dst > 0) {
1048 return nullptr;
1049 } else if (n <= (DST_BUFFER_ARRAY_SIZE - g_dst.meta.wi)) {
1050 memcpy(g_dst.data.ptr + g_dst.meta.wi, s, n);
1051 g_dst.meta.wi += n;
1052 g_wrote_to_dst = true;
1053 return nullptr;
1054 }
1055 return write_dst_slow(s, n);
1056}
1057
Nigel Tao773994c2021-02-22 10:50:08 +11001058#define TRY_INDENT_WITH_LEADING_NEW_LINE \
1059 do { \
1060 uint32_t adj = (g_num_input_blank_lines > 1) ? 1 : 0; \
1061 g_num_input_blank_lines = 0; \
1062 uint32_t indent = g_depth * g_bytes_per_indent_depth; \
1063 TRY(write_dst(g_two_new_lines_then_256_indent_bytes + 1 - adj, \
1064 1 + adj + (indent & 0xFF))); \
1065 for (indent >>= 8; indent > 0; indent--) { \
1066 TRY(write_dst(g_two_new_lines_then_256_indent_bytes + 2, 0x100)); \
1067 } \
Nigel Tao21042052020-08-19 23:13:54 +10001068 } while (false)
1069
1070// TRY_INDENT_SANS_LEADING_NEW_LINE is used after comments, which print their
1071// own "\n".
Nigel Tao773994c2021-02-22 10:50:08 +11001072#define TRY_INDENT_SANS_LEADING_NEW_LINE \
1073 do { \
1074 uint32_t adj = (g_num_input_blank_lines > 1) ? 1 : 0; \
1075 g_num_input_blank_lines = 0; \
1076 uint32_t indent = g_depth * g_bytes_per_indent_depth; \
1077 TRY(write_dst(g_two_new_lines_then_256_indent_bytes + 2 - adj, \
1078 adj + (indent & 0xFF))); \
1079 for (indent >>= 8; indent > 0; indent--) { \
1080 TRY(write_dst(g_two_new_lines_then_256_indent_bytes + 2, 0x100)); \
1081 } \
Nigel Tao21042052020-08-19 23:13:54 +10001082 } while (false)
1083
Nigel Tao1b073492020-02-16 22:11:36 +11001084// ----
1085
Nigel Tao2914bae2020-02-26 09:40:30 +11001086const char* //
Nigel Tao7cb76542020-07-19 22:19:04 +10001087handle_unicode_code_point(uint32_t ucp) {
Nigel Tao63441812020-08-21 14:05:48 +10001088 if (ucp < 0x80) {
1089 return write_dst(&ascii_escapes[8 * ucp + 1], ascii_escapes[8 * ucp]);
Nigel Tao7cb76542020-07-19 22:19:04 +10001090 }
Nigel Tao7cb76542020-07-19 22:19:04 +10001091 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1092 size_t n = wuffs_base__utf_8__encode(
1093 wuffs_base__make_slice_u8(&u[0],
1094 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1095 ucp);
1096 if (n == 0) {
1097 return "main: internal error: unexpected Unicode code point";
1098 }
Nigel Tao0291a472020-08-13 22:40:10 +10001099 return write_dst(&u[0], n);
Nigel Tao168f60a2020-07-14 13:19:33 +10001100}
1101
Nigel Taod191a3f2020-07-19 22:14:54 +10001102// ----
1103
Nigel Tao50db4a42020-08-20 11:31:28 +10001104inline const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001105handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001106 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001107 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001108 uint64_t vbd = t.value_base_detail();
Nigel Taoee6927f2020-07-27 12:08:33 +10001109 uint64_t token_length = t.length();
Nigel Tao991bd512020-08-19 09:38:16 +10001110 // The "- token_length" is because we incremented g_cursor_index before
1111 // calling handle_token.
Nigel Taoee6927f2020-07-27 12:08:33 +10001112 wuffs_base__slice_u8 tok = wuffs_base__make_slice_u8(
Nigel Tao991bd512020-08-19 09:38:16 +10001113 g_src.data.ptr + g_cursor_index - token_length, token_length);
Nigel Tao1b073492020-02-16 22:11:36 +11001114
1115 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001116 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001117 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001118 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001119 return "main: no match for query";
1120 }
Nigel Taod60815c2020-03-26 14:32:35 +11001121 if (g_depth <= 0) {
1122 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001123 }
Nigel Taod60815c2020-03-26 14:32:35 +11001124 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001125
Nigel Taod60815c2020-03-26 14:32:35 +11001126 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1127 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001128 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao0291a472020-08-13 22:40:10 +10001129 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1130 ? "\"[…]\""
1131 : "\"{…}\"",
1132 7));
1133 } else {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001134 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001135 if ((g_ctx != context::in_list_after_bracket) &&
1136 (g_ctx != context::in_dict_after_brace) &&
1137 !g_flags.compact_output) {
Nigel Tao21042052020-08-19 23:13:54 +10001138 if (g_is_after_comment) {
1139 TRY_INDENT_SANS_LEADING_NEW_LINE;
1140 } else {
1141 if (g_flags.output_extra_comma) {
1142 TRY(write_dst(",", 1));
1143 }
1144 TRY_INDENT_WITH_LEADING_NEW_LINE;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001145 }
Nigel Tao773994c2021-02-22 10:50:08 +11001146 } else {
1147 g_num_input_blank_lines = 0;
Nigel Tao1b073492020-02-16 22:11:36 +11001148 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001149
1150 TRY(write_dst(
1151 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1152 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001153 }
1154
Nigel Taod60815c2020-03-26 14:32:35 +11001155 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1156 ? context::in_list_after_value
1157 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001158 goto after_value;
1159 }
1160
Nigel Taod1c928a2020-02-28 12:43:53 +11001161 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
Nigel Tao0291a472020-08-13 22:40:10 +10001162 // continuation of a multi-token chain.
1163 if (start_of_token_chain) {
Nigel Tao21042052020-08-19 23:13:54 +10001164 if (g_is_after_comment) {
1165 TRY_INDENT_SANS_LEADING_NEW_LINE;
1166 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001167 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1168 } else if (g_ctx != context::none) {
Nigel Tao0291a472020-08-13 22:40:10 +10001169 if ((g_ctx != context::in_list_after_bracket) &&
1170 (g_ctx != context::in_dict_after_brace)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001171 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001172 }
Nigel Taod60815c2020-03-26 14:32:35 +11001173 if (!g_flags.compact_output) {
Nigel Tao21042052020-08-19 23:13:54 +10001174 TRY_INDENT_WITH_LEADING_NEW_LINE;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001175 }
1176 }
1177
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001178 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001179 if (g_query.is_at(g_depth)) {
1180 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001181 case context::in_list_after_bracket:
1182 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001183 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001184 break;
1185 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001186 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001187 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001188 default:
1189 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001190 }
1191 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001192 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001193 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001194 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001195 // There is no next fragment. We have matched the complete query, and
1196 // the upcoming JSON value is the result of that query.
1197 //
Nigel Taod60815c2020-03-26 14:32:35 +11001198 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1199 // we were about to decode a top-level value. This makes any subsequent
1200 // indentation be relative to this point, and we will return g_eod
1201 // after the upcoming JSON value is complete.
1202 if (g_suppress_write_dst != 1) {
1203 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001204 }
Nigel Taod60815c2020-03-26 14:32:35 +11001205 g_suppress_write_dst = 0;
1206 g_ctx = context::none;
1207 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001208 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1209 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1210 // The query has moved on to the next fragment but the upcoming JSON
1211 // value is not a container.
1212 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001213 }
1214 }
1215
1216 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001217 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001218 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001219 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001220 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1221 g_suppress_write_dst++;
Nigel Tao0291a472020-08-13 22:40:10 +10001222 } else {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001223 TRY(write_dst(
1224 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1225 1));
1226 }
Nigel Taod60815c2020-03-26 14:32:35 +11001227 g_depth++;
1228 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1229 ? context::in_list_after_bracket
1230 : context::in_dict_after_brace;
Nigel Tao773994c2021-02-22 10:50:08 +11001231 g_num_input_blank_lines = 0;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001232 return nullptr;
1233
Nigel Tao2cf76db2020-02-27 22:42:01 +11001234 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Tao0291a472020-08-13 22:40:10 +10001235 if (start_of_token_chain) {
1236 TRY(write_dst("\"", 1));
1237 g_query.restart_fragment(in_dict_before_key() &&
1238 g_query.is_at(g_depth));
1239 }
1240
Nigel Taoade01652020-08-21 15:57:51 +10001241 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
Nigel Tao0291a472020-08-13 22:40:10 +10001242 TRY(write_dst(tok.ptr, tok.len));
1243 g_query.incremental_match_slice(tok.ptr, tok.len);
Nigel Tao0291a472020-08-13 22:40:10 +10001244 }
1245
Nigel Tao496e88b2020-04-09 22:10:08 +10001246 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001247 return nullptr;
1248 }
Nigel Tao0291a472020-08-13 22:40:10 +10001249 TRY(write_dst("\"", 1));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001250 goto after_value;
1251
1252 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001253 if (!t.continued()) {
1254 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001255 }
1256 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001257 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001258 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001259 }
1260
Nigel Tao3f688b22020-08-21 15:51:48 +10001261 // We have a literal or a number.
1262 TRY(write_dst(tok.ptr, tok.len));
1263 goto after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001264 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001265
Nigel Tao2cf76db2020-02-27 22:42:01 +11001266 // Book-keeping after completing a value (whether a container value or a
1267 // simple value). Empty parent containers are no longer empty. If the parent
1268 // container is a "{...}" object, toggle between keys and values.
1269after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001270 if (g_depth == 0) {
1271 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001272 }
Nigel Taod60815c2020-03-26 14:32:35 +11001273 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001274 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001275 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001276 break;
1277 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001278 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001279 break;
1280 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001281 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001282 break;
1283 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001284 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001285 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001286 default:
1287 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001288 }
1289 return nullptr;
1290}
1291
1292const char* //
1293main1(int argc, char** argv) {
1294 TRY(initialize_globals(argc, argv));
1295
Nigel Taocd183f92020-07-14 12:11:05 +10001296 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001297 while (true) {
Nigel Tao0291a472020-08-13 22:40:10 +10001298 wuffs_base__status status = g_dec.decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001299 &g_tok, &g_src,
1300 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001301
Nigel Taod60815c2020-03-26 14:32:35 +11001302 while (g_tok.meta.ri < g_tok.meta.wi) {
1303 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao991bd512020-08-19 09:38:16 +10001304 uint64_t token_length = t.length();
1305 if ((g_src.meta.ri - g_cursor_index) < token_length) {
Nigel Taod60815c2020-03-26 14:32:35 +11001306 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001307 }
Nigel Tao991bd512020-08-19 09:38:16 +10001308 g_cursor_index += token_length;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001309
Nigel Tao21042052020-08-19 23:13:54 +10001310 // Handle filler tokens (e.g. whitespace, punctuation and comments).
1311 // These are skipped, unless -output-comments is enabled.
Nigel Tao3c8589b2020-07-19 21:49:00 +10001312 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao773994c2021-02-22 10:50:08 +11001313 if (!g_flags.output_comments) {
1314 // No-op.
1315 } else if (t.value_base_detail() &
1316 WUFFS_BASE__TOKEN__VBD__FILLER__COMMENT_ANY) {
Nigel Tao21042052020-08-19 23:13:54 +10001317 if (g_flags.compact_output) {
1318 TRY(write_dst(g_src.data.ptr + g_cursor_index - token_length,
1319 token_length));
Nigel Tao773994c2021-02-22 10:50:08 +11001320
Nigel Tao21042052020-08-19 23:13:54 +10001321 } else {
1322 if (start_of_token_chain) {
1323 if (g_is_after_comment) {
1324 TRY_INDENT_SANS_LEADING_NEW_LINE;
1325 } else if (g_ctx != context::none) {
1326 if (g_ctx == context::in_dict_after_key) {
1327 TRY(write_dst(":", 1));
1328 } else if ((g_ctx != context::in_list_after_bracket) &&
Nigel Taocd4cbc92020-09-22 22:22:15 +10001329 (g_ctx != context::in_dict_after_brace) &&
1330 (g_ctx != context::end_of_data)) {
Nigel Tao21042052020-08-19 23:13:54 +10001331 TRY(write_dst(",", 1));
1332 }
Nigel Tao773994c2021-02-22 10:50:08 +11001333 TRY_INDENT_WITH_LEADING_NEW_LINE;
Nigel Tao21042052020-08-19 23:13:54 +10001334 }
1335 }
1336 TRY(write_dst(g_src.data.ptr + g_cursor_index - token_length,
1337 token_length));
1338 if (!t.continued() &&
1339 (t.value_base_detail() &
1340 WUFFS_BASE__TOKEN__VBD__FILLER__COMMENT_BLOCK)) {
1341 TRY(write_dst("\n", 1));
1342 }
1343 g_is_after_comment = true;
1344 }
Nigel Tao773994c2021-02-22 10:50:08 +11001345 if (g_ctx == context::in_list_after_bracket) {
1346 g_ctx = context::in_list_after_value;
1347 } else if (g_ctx == context::in_dict_after_brace) {
1348 g_ctx = context::in_dict_after_value;
1349 }
1350 g_num_input_blank_lines =
1351 (t.value_base_detail() &
1352 WUFFS_BASE__TOKEN__VBD__FILLER__COMMENT_LINE)
1353 ? 1
1354 : 0;
1355
1356 } else {
1357 uint8_t* p = g_src.data.ptr + g_cursor_index - token_length;
1358 uint8_t* q = g_src.data.ptr + g_cursor_index;
1359 for (; p < q; p++) {
1360 if (*p == '\n') {
1361 g_num_input_blank_lines++;
1362 }
1363 }
Nigel Tao21042052020-08-19 23:13:54 +10001364 }
Nigel Tao773994c2021-02-22 10:50:08 +11001365
Nigel Tao496e88b2020-04-09 22:10:08 +10001366 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001367 continue;
1368 }
1369
Nigel Tao2ef39992020-04-09 17:24:39 +10001370 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao21042052020-08-19 23:13:54 +10001371 g_is_after_comment = false;
Nigel Tao496e88b2020-04-09 22:10:08 +10001372 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001373 if (z == nullptr) {
1374 continue;
Nigel Taocd4cbc92020-09-22 22:22:15 +10001375 } else if (z != g_eod) {
1376 return z;
1377 } else if (g_flags.query_c_string && *g_flags.query_c_string) {
1378 // With a non-empty g_query, don't try to consume trailing filler or
1379 // confirm that we've processed all the tokens.
1380 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001381 }
Nigel Taocd4cbc92020-09-22 22:22:15 +10001382 g_ctx = context::end_of_data;
Nigel Tao1b073492020-02-16 22:11:36 +11001383 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001384
1385 if (status.repr == nullptr) {
Nigel Taocd4cbc92020-09-22 22:22:15 +10001386 if (g_ctx != context::end_of_data) {
1387 return "main: internal error: unexpected end of token stream";
1388 }
1389 // Check that we've exhausted the input.
1390 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
1391 TRY(read_src());
1392 }
1393 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
1394 return "main: valid JSON followed by further (unexpected) data";
1395 }
1396 // All done.
1397 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001398 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Tao991bd512020-08-19 09:38:16 +10001399 if (g_cursor_index != g_src.meta.ri) {
Nigel Taod60815c2020-03-26 14:32:35 +11001400 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001401 }
1402 TRY(read_src());
Nigel Tao991bd512020-08-19 09:38:16 +10001403 g_cursor_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001404 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001405 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001406 } else {
1407 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001408 }
1409 }
1410}
1411
Nigel Tao2914bae2020-02-26 09:40:30 +11001412int //
1413compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001414 if (!status_msg) {
1415 return 0;
1416 }
Nigel Tao01abc842020-03-06 21:42:33 +11001417 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001418 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001419 n = strlen(status_msg);
1420 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001421 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001422 if (n >= 2047) {
1423 status_msg = "main: internal error: error message is too long";
1424 n = strnlen(status_msg, 2047);
1425 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001426 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001427 const int stderr_fd = 2;
1428 ignore_return_value(write(stderr_fd, status_msg, n));
1429 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Tao9cc2c252020-02-23 17:05:49 +11001430 // Return an exit code of 1 for regular (forseen) errors, e.g. badly
1431 // formatted or unsupported input.
1432 //
1433 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1434 // run-time checks found that an internal invariant did not hold.
1435 //
1436 // Automated testing, including badly formatted inputs, can therefore
1437 // discriminate between expected failure (exit code 1) and unexpected failure
1438 // (other non-zero exit codes). Specifically, exit code 2 for internal
1439 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1440 // linux) for a segmentation fault (e.g. null pointer dereference).
1441 return strstr(status_msg, "internal error:") ? 2 : 1;
1442}
1443
Nigel Tao2914bae2020-02-26 09:40:30 +11001444int //
1445main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001446 // Look for an input filename (the first non-flag argument) in argv. If there
1447 // is one, open it (but do not read from it) before we self-impose a sandbox.
1448 //
1449 // Flags start with "-", unless it comes after a bare "--" arg.
1450 {
1451 bool dash_dash = false;
1452 int a;
1453 for (a = 1; a < argc; a++) {
1454 char* arg = argv[a];
1455 if ((arg[0] == '-') && !dash_dash) {
1456 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1457 continue;
1458 }
Nigel Taod60815c2020-03-26 14:32:35 +11001459 g_input_file_descriptor = open(arg, O_RDONLY);
1460 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001461 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1462 return 1;
1463 }
1464 break;
1465 }
1466 }
1467
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001468#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1469 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001470 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001471#endif
1472
Nigel Tao0cd2f982020-03-03 23:03:02 +11001473 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001474 if (g_wrote_to_dst) {
Nigel Taocd4cbc92020-09-22 22:22:15 +10001475 const char* z1 = g_is_after_comment ? nullptr : write_dst("\n", 1);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001476 const char* z2 = flush_dst();
1477 z = z ? z : (z1 ? z1 : z2);
1478 }
1479 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001480
1481#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1482 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1483 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1484 // only SYS_exit.
1485 syscall(SYS_exit, exit_code);
1486#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001487 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001488}