blob: 184177ffa1f8ac4ff19c4831c331050a94e067ce [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
Nigel Taob55d5392020-09-11 08:11:02 +100017// jsonptr is discussed extensively at
18// https://nigeltao.github.io/blog/2020/jsonptr.html
19
Nigel Tao1b073492020-02-16 22:11:36 +110020/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110021jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao0291a472020-08-13 22:40:10 +100022(RFC 6901) query syntax. It reads UTF-8 JSON from stdin and writes
23canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110024
Nigel Taod60815c2020-03-26 14:32:35 +110025See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110026
27----
28
29JSON Pointer (and this program's implementation) is one of many JSON query
30languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
31simple and fewer-featured compared to those others.
32
Nigel Tao0291a472020-08-13 22:40:10 +100033One benefit of simplicity is that this program's JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110034implementations do not dynamically allocate or free memory (yet it does not
35require that the entire input fits in memory at once). They are therefore
36trivially protected against certain bug classes: memory leaks, double-frees and
37use-after-frees.
38
Nigel Tao0291a472020-08-13 22:40:10 +100039The core JSON implementation is also written in the Wuffs programming language
40(and then transpiled to C/C++), which is memory-safe (e.g. array indexing is
41bounds-checked) but also guards against integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110042
Nigel Taofe0cbbd2020-03-05 22:01:30 +110043For defense in depth, on Linux, this program also self-imposes a
44SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
45or writing its output. Under this sandbox, the only permitted system calls are
46read, write, exit and sigreturn.
47
Nigel Tao0291a472020-08-13 22:40:10 +100048All together, this program aims to safely handle untrusted JSON files without
49fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110050
51----
Nigel Tao1b073492020-02-16 22:11:36 +110052
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110053As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
54JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
55"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
56was first published on 2016-10-26 and updated on 2018-03-30.
57
Nigel Tao0cd2f982020-03-03 23:03:02 +110058After modifying this program, run "build-example.sh example/jsonptr/" and then
59"script/run-json-test-suite.sh" to catch correctness regressions.
60
61----
62
Nigel Taod0b16cb2020-03-14 10:15:54 +110063This program uses Wuffs' JSON decoder at a relatively low level, processing the
64decoder's token-stream output individually. The core loop, in pseudo-code, is
65"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110066changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110067output text based on that state and the token's source text. Notably,
68handle_token is not recursive, even though JSON values can nest.
69
70This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
71string, object) comprises one or more JSON tokens.
72
73An alternative, higher-level approach is in the sibling example/jsonfindptrs
74program. Neither approach is better or worse per se, but when studying this
75program, be aware that there are multiple ways to use Wuffs' JSON decoder.
76
77The two programs, jsonfindptrs and jsonptr, also demonstrate different
78trade-offs with regard to JSON object duplicate keys. The JSON spec permits
79different implementations to allow or reject duplicate keys. It is not always
80clear which approach is safer. Rejecting them is certainly unambiguous, and
81security bugs can lurk in ambiguous corners of a file format, if two different
82implementations both silently accept a file but differ on how to interpret it.
83On the other hand, in the worst case, detecting duplicate keys requires O(N)
84memory, where N is the size of the (potentially untrusted) input.
85
86This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
87mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
88it runs in a SECCOMP_MODE_STRICT sandbox.
89
90----
91
Nigel Tao50bfab92020-08-05 11:39:09 +100092To run:
Nigel Tao1b073492020-02-16 22:11:36 +110093
94$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
95
96for a C++ compiler $CXX, such as clang++ or g++.
97*/
98
Nigel Tao721190a2020-04-03 22:25:21 +110099#if defined(__cplusplus) && (__cplusplus < 201103L)
100#error "This C++ program requires -std=c++11 or later"
101#endif
102
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100103#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100104#include <fcntl.h>
105#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100106#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100107#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100108
109// Wuffs ships as a "single file C library" or "header file library" as per
110// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
111//
112// To use that single file as a "foo.c"-like implementation, instead of a
113// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
114// compiling it.
115#define WUFFS_IMPLEMENTATION
116
Nigel Tao7f9f37c2021-10-04 12:35:32 +1100117// Defining the WUFFS_CONFIG__STATIC_FUNCTIONS macro is optional, but when
118// combined with WUFFS_IMPLEMENTATION, it demonstrates making all of Wuffs'
119// functions have static storage.
120//
121// This can help the compiler ignore or discard unused code, which can produce
122// faster compiles and smaller binaries. Other motivations are discussed in the
123// "ALLOW STATIC IMPLEMENTATION" section of
124// https://raw.githubusercontent.com/nothings/stb/master/docs/stb_howto.txt
125#define WUFFS_CONFIG__STATIC_FUNCTIONS
126
Nigel Tao1b073492020-02-16 22:11:36 +1100127// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
Nigel Tao2f788042021-01-23 19:29:19 +1100128// release/c/etc.c choose which parts of Wuffs to build. That file contains the
129// entire Wuffs standard library, implementing a variety of codecs and file
Nigel Tao1b073492020-02-16 22:11:36 +1100130// formats. Without this macro definition, an optimizing compiler or linker may
131// very well discard Wuffs code for unused codecs, but listing the Wuffs
132// modules we use makes that process explicit. Preprocessing means that such
133// code simply isn't compiled.
134#define WUFFS_CONFIG__MODULES
135#define WUFFS_CONFIG__MODULE__BASE
136#define WUFFS_CONFIG__MODULE__JSON
137
138// If building this program in an environment that doesn't easily accommodate
139// relative includes, you can use the script/inline-c-relative-includes.go
140// program to generate a stand-alone C++ file.
141#include "../../release/c/wuffs-unsupported-snapshot.c"
142
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100143#if defined(__linux__)
144#include <linux/prctl.h>
145#include <linux/seccomp.h>
146#include <sys/prctl.h>
147#include <sys/syscall.h>
148#define WUFFS_EXAMPLE_USE_SECCOMP
149#endif
150
Nigel Tao2cf76db2020-02-27 22:42:01 +1100151#define TRY(error_msg) \
152 do { \
153 const char* z = error_msg; \
154 if (z) { \
155 return z; \
156 } \
157 } while (false)
158
Nigel Taod60815c2020-03-26 14:32:35 +1100159static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100160
Nigel Taod60815c2020-03-26 14:32:35 +1100161static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100162 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100163 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100164 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100165 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100166 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100167 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000168 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100169 " -t -tabs\n"
170 " -fail-if-unsandboxed\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000171 " -input-allow-comments\n"
172 " -input-allow-extra-comma\n"
173 " -input-allow-inf-nan-numbers\n"
Nigel Tao04126792021-02-22 12:23:57 +1100174 " -input-jwcc\n"
Nigel Tao21042052020-08-19 23:13:54 +1000175 " -jwcc\n"
176 " -output-comments\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000177 " -output-extra-comma\n"
Nigel Tao75682542020-08-22 21:40:18 +1000178 " -output-inf-nan-numbers\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000179 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100180 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100181 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100182 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100183 "----\n"
184 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100185 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000186 "Pointer (RFC 6901) query syntax. It reads UTF-8 JSON from stdin and\n"
187 "writes canonicalized, formatted UTF-8 JSON to stdout.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100188 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000189 "Canonicalized means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-written\n"
190 "as \"abc\\n\\txÅ·z\". It does not sort object keys, nor does it reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100191 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100192 "\n"
193 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000194 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000195 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags.\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000196 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000197 "The -input-allow-comments flag allows \"/*slash-star*/\" and\n"
198 "\"//slash-slash\" C-style comments within JSON input. Such comments are\n"
Nigel Tao21042052020-08-19 23:13:54 +1000199 "stripped from the output unless -output-comments was also set.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100200 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000201 "The -input-allow-extra-comma flag allows input like \"[1,2,]\", with a\n"
202 "comma after the final element of a JSON list or dictionary.\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000203 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000204 "The -input-allow-inf-nan-numbers flag allows non-finite floating point\n"
Nigel Tao75682542020-08-22 21:40:18 +1000205 "numbers (infinities and not-a-numbers) within JSON input. This flag\n"
206 "requires that -output-inf-nan-numbers also be set.\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000207 "\n"
Nigel Tao21042052020-08-19 23:13:54 +1000208 "The -output-comments flag copies any input comments to the output. It\n"
209 "has no effect unless -input-allow-comments was also set. Comments look\n"
210 "better after commas than before them, but a closing \"]\" or \"}\" can\n"
211 "occur after arbitrarily many comments, so -output-comments also requires\n"
212 "that one or both of -compact-output and -output-extra-comma be set.\n"
213 "\n"
Nigel Tao773994c2021-02-22 10:50:08 +1100214 "With -output-comments, consecutive blank lines collapse to a single\n"
215 "blank line. Without that flag, all blank lines are removed.\n"
216 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000217 "The -output-extra-comma flag writes output like \"[1,2,]\", with a comma\n"
218 "after the final element of a JSON list or dictionary. Such commas are\n"
219 "non-compliant with the JSON specification but many parsers accept them\n"
220 "and they can produce simpler line-based diffs. This flag is ignored when\n"
221 "-compact-output is set.\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000222 "\n"
Nigel Tao04126792021-02-22 12:23:57 +1100223 "Combining some of those flags results in speaking JWCC (JSON With Commas\n"
224 "and Comments), not plain JSON. For convenience, the -input-jwcc or -jwcc\n"
225 "flags enables the first two or all four of:\n"
Nigel Tao21042052020-08-19 23:13:54 +1000226 " -input-allow-comments\n"
227 " -input-allow-extra-comma\n"
228 " -output-comments\n"
229 " -output-extra-comma\n"
230 "\n"
Nigel Tao04126792021-02-22 12:23:57 +1100231#if defined(WUFFS_EXAMPLE_SPEAK_JWCC_NOT_JSON)
232 "This program was configured at compile time to always use -jwcc.\n"
233 "\n"
234#endif
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100235 "----\n"
236 "\n"
237 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100238 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100239 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
240 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100241 "will print:\n"
242 " \"baz\"\n"
243 "\n"
244 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100245 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100246 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
247 "child (the value in a key-value pair) of the root whose key is the empty\n"
248 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100249 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000250 "If the query found a valid JSON value, this program will return a zero\n"
251 "exit code even if the rest of the input isn't valid JSON. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100252 "did not find a value, or found an invalid one, this program returns a\n"
253 "non-zero exit code, but may still print partial output to stdout.\n"
254 "\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000255 "The JSON specification (https://json.org/) permits implementations that\n"
256 "allow duplicate keys, as this one does. This JSON Pointer implementation\n"
257 "is also greedy, following the first match for each fragment without\n"
258 "back-tracking. For example, the \"/foo/bar\" query will fail if the root\n"
259 "object has multiple \"foo\" children but the first one doesn't have a\n"
260 "\"bar\" child, even if later ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100261 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000262 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
263 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
Nigel Tao904004e2020-11-15 20:56:04 +1100264 "\"~\" and \"/\". Without this flag, this program also lets \"~n\",\n"
265 "\"~r\" and \"~t\" escape the New Line, Carriage Return and Horizontal\n"
266 "Tab ASCII control characters, which can work better with line oriented\n"
267 "(and tab separated) Unix tools that assume exactly one record (e.g. one\n"
268 "JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100269 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100270 "----\n"
271 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100272 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000273 "output depth. JSON containers ([] arrays and {} objects) can hold other\n"
274 "containers. When this flag is set, containers at depth NUM are replaced\n"
275 "with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is equivalent to\n"
276 "-d=1. The flag's absence is equivalent to an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100277 "\n"
278 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao0291a472020-08-13 22:40:10 +1000279 "affect whether or not the input is considered valid JSON. The JSON\n"
280 "specification permits implementations to set their own maximum input\n"
281 "depth. This JSON implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100282 "\n"
283 "Depth is measured in terms of nested containers. It is unaffected by the\n"
284 "number of spaces or tabs used to indent.\n"
285 "\n"
286 "When both -max-output-depth and -query are set, the output depth is\n"
287 "measured from when the query resolves, not from the input root. The\n"
288 "input depth (measured from the root) is still limited to 1024.\n"
289 "\n"
290 "----\n"
291 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100292 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
293 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100294 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100295
Nigel Tao2cf76db2020-02-27 22:42:01 +1100296// ----
297
Nigel Tao63441812020-08-21 14:05:48 +1000298// ascii_escapes was created by script/print-json-ascii-escapes.go.
299const uint8_t ascii_escapes[1024] = {
300 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x30, 0x00, // 0x00: "\\u0000"
301 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x31, 0x00, // 0x01: "\\u0001"
302 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x32, 0x00, // 0x02: "\\u0002"
303 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x33, 0x00, // 0x03: "\\u0003"
304 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x34, 0x00, // 0x04: "\\u0004"
305 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x35, 0x00, // 0x05: "\\u0005"
306 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x36, 0x00, // 0x06: "\\u0006"
307 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x37, 0x00, // 0x07: "\\u0007"
308 0x02, 0x5C, 0x62, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x08: "\\b"
309 0x02, 0x5C, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x09: "\\t"
310 0x02, 0x5C, 0x6E, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0A: "\\n"
311 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x42, 0x00, // 0x0B: "\\u000B"
312 0x02, 0x5C, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0C: "\\f"
313 0x02, 0x5C, 0x72, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x0D: "\\r"
314 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x45, 0x00, // 0x0E: "\\u000E"
315 0x06, 0x5C, 0x75, 0x30, 0x30, 0x30, 0x46, 0x00, // 0x0F: "\\u000F"
316 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x30, 0x00, // 0x10: "\\u0010"
317 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x31, 0x00, // 0x11: "\\u0011"
318 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x32, 0x00, // 0x12: "\\u0012"
319 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x33, 0x00, // 0x13: "\\u0013"
320 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x34, 0x00, // 0x14: "\\u0014"
321 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x35, 0x00, // 0x15: "\\u0015"
322 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x36, 0x00, // 0x16: "\\u0016"
323 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x37, 0x00, // 0x17: "\\u0017"
324 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x38, 0x00, // 0x18: "\\u0018"
325 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x39, 0x00, // 0x19: "\\u0019"
326 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x41, 0x00, // 0x1A: "\\u001A"
327 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x42, 0x00, // 0x1B: "\\u001B"
328 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x43, 0x00, // 0x1C: "\\u001C"
329 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x44, 0x00, // 0x1D: "\\u001D"
330 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x45, 0x00, // 0x1E: "\\u001E"
331 0x06, 0x5C, 0x75, 0x30, 0x30, 0x31, 0x46, 0x00, // 0x1F: "\\u001F"
332 0x06, 0x5C, 0x75, 0x30, 0x30, 0x32, 0x30, 0x00, // 0x20: "\\u0020"
333 0x01, 0x21, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x21: "!"
334 0x02, 0x5C, 0x22, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x22: "\\\""
335 0x01, 0x23, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x23: "#"
336 0x01, 0x24, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x24: "$"
337 0x01, 0x25, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x25: "%"
338 0x01, 0x26, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x26: "&"
339 0x01, 0x27, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x27: "'"
340 0x01, 0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x28: "("
341 0x01, 0x29, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x29: ")"
342 0x01, 0x2A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2A: "*"
343 0x01, 0x2B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2B: "+"
344 0x01, 0x2C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2C: ","
345 0x01, 0x2D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2D: "-"
346 0x01, 0x2E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2E: "."
347 0x01, 0x2F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x2F: "/"
348 0x01, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x30: "0"
349 0x01, 0x31, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x31: "1"
350 0x01, 0x32, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x32: "2"
351 0x01, 0x33, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x33: "3"
352 0x01, 0x34, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x34: "4"
353 0x01, 0x35, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x35: "5"
354 0x01, 0x36, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x36: "6"
355 0x01, 0x37, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x37: "7"
356 0x01, 0x38, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x38: "8"
357 0x01, 0x39, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x39: "9"
358 0x01, 0x3A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3A: ":"
359 0x01, 0x3B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3B: ";"
360 0x01, 0x3C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3C: "<"
361 0x01, 0x3D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3D: "="
362 0x01, 0x3E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3E: ">"
363 0x01, 0x3F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x3F: "?"
364 0x01, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x40: "@"
365 0x01, 0x41, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x41: "A"
366 0x01, 0x42, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x42: "B"
367 0x01, 0x43, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x43: "C"
368 0x01, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x44: "D"
369 0x01, 0x45, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x45: "E"
370 0x01, 0x46, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x46: "F"
371 0x01, 0x47, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x47: "G"
372 0x01, 0x48, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x48: "H"
373 0x01, 0x49, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x49: "I"
374 0x01, 0x4A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4A: "J"
375 0x01, 0x4B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4B: "K"
376 0x01, 0x4C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4C: "L"
377 0x01, 0x4D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4D: "M"
378 0x01, 0x4E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4E: "N"
379 0x01, 0x4F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x4F: "O"
380 0x01, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x50: "P"
381 0x01, 0x51, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x51: "Q"
382 0x01, 0x52, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x52: "R"
383 0x01, 0x53, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x53: "S"
384 0x01, 0x54, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x54: "T"
385 0x01, 0x55, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x55: "U"
386 0x01, 0x56, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x56: "V"
387 0x01, 0x57, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x57: "W"
388 0x01, 0x58, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x58: "X"
389 0x01, 0x59, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x59: "Y"
390 0x01, 0x5A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5A: "Z"
391 0x01, 0x5B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5B: "["
392 0x02, 0x5C, 0x5C, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5C: "\\\\"
393 0x01, 0x5D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5D: "]"
394 0x01, 0x5E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5E: "^"
395 0x01, 0x5F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x5F: "_"
396 0x01, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x60: "`"
397 0x01, 0x61, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x61: "a"
398 0x01, 0x62, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x62: "b"
399 0x01, 0x63, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x63: "c"
400 0x01, 0x64, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x64: "d"
401 0x01, 0x65, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x65: "e"
402 0x01, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x66: "f"
403 0x01, 0x67, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x67: "g"
404 0x01, 0x68, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x68: "h"
405 0x01, 0x69, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x69: "i"
406 0x01, 0x6A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6A: "j"
407 0x01, 0x6B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6B: "k"
408 0x01, 0x6C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6C: "l"
409 0x01, 0x6D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6D: "m"
410 0x01, 0x6E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6E: "n"
411 0x01, 0x6F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x6F: "o"
412 0x01, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x70: "p"
413 0x01, 0x71, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x71: "q"
414 0x01, 0x72, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x72: "r"
415 0x01, 0x73, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x73: "s"
416 0x01, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x74: "t"
417 0x01, 0x75, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x75: "u"
418 0x01, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x76: "v"
419 0x01, 0x77, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x77: "w"
420 0x01, 0x78, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x78: "x"
421 0x01, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x79: "y"
422 0x01, 0x7A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7A: "z"
423 0x01, 0x7B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7B: "{"
424 0x01, 0x7C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7C: "|"
425 0x01, 0x7D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7D: "}"
426 0x01, 0x7E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7E: "~"
427 0x01, 0x7F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 0x7F: "<DEL>"
428};
429
Nigel Taof3146c22020-03-26 08:47:42 +1100430// Wuffs allows either statically or dynamically allocated work buffers. This
431// program exercises static allocation.
432#define WORK_BUFFER_ARRAY_SIZE \
433 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
434#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100435uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100436#else
437// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100438uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100439#endif
440
Nigel Taod60815c2020-03-26 14:32:35 +1100441bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100442
Nigel Taod60815c2020-03-26 14:32:35 +1100443int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100444
Nigel Tao773994c2021-02-22 10:50:08 +1100445#define TWO_NEW_LINES_THEN_256_SPACES \
446 "\n\n " \
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000447 " " \
448 " " \
Nigel Tao773994c2021-02-22 10:50:08 +1100449 " "
450#define TWO_NEW_LINES_THEN_256_TABS \
451 "\n\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000452 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
453 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
454 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
455 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
456 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t" \
Nigel Tao773994c2021-02-22 10:50:08 +1100457 "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t"
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000458
Nigel Tao773994c2021-02-22 10:50:08 +1100459const char* g_two_new_lines_then_256_indent_bytes;
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000460uint32_t g_bytes_per_indent_depth;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100461
Nigel Taofdac24a2020-03-06 21:53:08 +1100462#ifndef DST_BUFFER_ARRAY_SIZE
463#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100464#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100465#ifndef SRC_BUFFER_ARRAY_SIZE
466#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100467#endif
Nigel Tao63e67962020-08-26 00:00:32 +1000468// 1 token is 8 bytes. 4Ki tokens is 32KiB.
Nigel Taofdac24a2020-03-06 21:53:08 +1100469#ifndef TOKEN_BUFFER_ARRAY_SIZE
470#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100471#endif
472
Nigel Taod60815c2020-03-26 14:32:35 +1100473uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
474uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
475wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100476
Nigel Taod60815c2020-03-26 14:32:35 +1100477wuffs_base__io_buffer g_dst;
478wuffs_base__io_buffer g_src;
479wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100480
Nigel Tao991bd512020-08-19 09:38:16 +1000481// g_cursor_index is the g_src.data.ptr index between the previous and current
482// token. An invariant is that (g_cursor_index <= g_src.meta.ri).
483size_t g_cursor_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100484
Nigel Taod60815c2020-03-26 14:32:35 +1100485uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100486
487enum class context {
488 none,
489 in_list_after_bracket,
490 in_list_after_value,
491 in_dict_after_brace,
492 in_dict_after_key,
493 in_dict_after_value,
Nigel Taocd4cbc92020-09-22 22:22:15 +1000494 end_of_data,
Nigel Taod60815c2020-03-26 14:32:35 +1100495} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100496
Nigel Tao0cd2f982020-03-03 23:03:02 +1100497bool //
498in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100499 return (g_ctx == context::in_dict_after_brace) ||
500 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100501}
502
Nigel Tao773994c2021-02-22 10:50:08 +1100503uint64_t g_num_input_blank_lines;
504
Nigel Tao21042052020-08-19 23:13:54 +1000505bool g_is_after_comment;
506
Nigel Taod60815c2020-03-26 14:32:35 +1100507uint32_t g_suppress_write_dst;
508bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100509
Nigel Tao0291a472020-08-13 22:40:10 +1000510wuffs_json__decoder g_dec;
Nigel Taoea532452020-07-27 00:03:00 +1000511
Nigel Tao0cd2f982020-03-03 23:03:02 +1100512// ----
513
514// Query is a JSON Pointer query. After initializing with a NUL-terminated C
515// string, its multiple fragments are consumed as the program walks the JSON
516// data from stdin. For example, letting "$" denote a NUL, suppose that we
517// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100518// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100519//
520// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
521// / a p p l e / b a n a n a / 1 2 / d u r i a n $
522// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
523// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100524// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100525//
Nigel Taob48ee752020-03-13 09:27:33 +1100526// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
527// start (inclusive) and end (exclusive) of the query fragment. They satisfy
528// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
529// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100530//
Nigel Taob48ee752020-03-13 09:27:33 +1100531// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
532// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100533//
534// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
535// tokens, as backslash-escaped values within that JSON string may each get
536// their own token.
537//
Nigel Taob48ee752020-03-13 09:27:33 +1100538// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100539//
Nigel Taob48ee752020-03-13 09:27:33 +1100540// While mfj remains non-nullptr, each token's unescaped contents are then
541// compared to that part of the fragment from mfj to mfk. If it is a prefix
542// (including the case of an exact match), then mfj is advanced by the
543// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100544//
545// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
546// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100547// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
548// responsible for calling Query::validate (with a strict_json_pointer_syntax
549// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100550//
Nigel Taob48ee752020-03-13 09:27:33 +1100551// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
552// incrementally match the object key with the query fragment. For example, if
553// we have already matched the "ban" of "banana", then we would accept any of
554// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
555// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100556//
Nigel Taob48ee752020-03-13 09:27:33 +1100557// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100558// v
559// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
560// / a p p l e / b a n a n a / 1 2 / d u r i a n $
561// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
562// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100563// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100564//
565// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100566// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
567// have a fragment match: the query fragment equals the object key. If there is
568// a next fragment (in this example, "12") we move the frag_etc pointers to its
569// start and end and increment Query::m_depth. Otherwise, we have matched the
570// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100571//
572// The discussion above centers on object keys. If the query fragment is
573// numeric then it can also match as an array index: the string fragment "12"
574// will match an array's 13th element (starting counting from zero). See RFC
575// 6901 for its precise definition of an "array index" number.
576//
Nigel Taob48ee752020-03-13 09:27:33 +1100577// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100578// whose type (wuffs_base__result_u64) is a result type. An error result means
579// that the fragment is not an array index. A value result holds the number of
580// list elements remaining. When matching a query fragment in an array (instead
581// of in an object), each element ticks this number down towards zero. At zero,
582// the upcoming JSON value is the one that matches the query fragment.
583class Query {
584 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100585 uint8_t* m_frag_i;
586 uint8_t* m_frag_j;
587 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100588
Nigel Taob48ee752020-03-13 09:27:33 +1100589 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100590
Nigel Taob48ee752020-03-13 09:27:33 +1100591 wuffs_base__result_u64 m_array_index;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100592
593 public:
594 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100595 m_frag_i = (uint8_t*)query_c_string;
596 m_frag_j = (uint8_t*)query_c_string;
597 m_frag_k = (uint8_t*)query_c_string;
598 m_depth = 0;
599 m_array_index.status.repr = "#main: not an array index query fragment";
600 m_array_index.value = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100601 }
602
Nigel Taob48ee752020-03-13 09:27:33 +1100603 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100604
Nigel Taob48ee752020-03-13 09:27:33 +1100605 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100606
607 // tick returns whether the fragment is a valid array index whose value is
608 // zero. If valid but non-zero, it decrements it and returns false.
609 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100610 if (m_array_index.status.is_ok()) {
Nigel Tao0291a472020-08-13 22:40:10 +1000611 if (m_array_index.value == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100612 return true;
613 }
Nigel Tao0291a472020-08-13 22:40:10 +1000614 m_array_index.value--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100615 }
616 return false;
617 }
618
619 // next_fragment moves to the next fragment, returning whether it existed.
620 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100621 uint8_t* k = m_frag_k;
622 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100623
624 this->reset(nullptr);
625
626 if (!k || (*k != '/')) {
627 return false;
628 }
629 k++;
630
631 bool all_digits = true;
632 uint8_t* i = k;
633 while ((*k != '\x00') && (*k != '/')) {
634 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
635 k++;
636 }
Nigel Taob48ee752020-03-13 09:27:33 +1100637 m_frag_i = i;
638 m_frag_j = i;
639 m_frag_k = k;
640 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100641 if (all_digits) {
642 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000643 m_array_index = wuffs_base__parse_number_u64(
644 wuffs_base__make_slice_u8(i, k - i),
645 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100646 }
647 return true;
648 }
649
Nigel Taob48ee752020-03-13 09:27:33 +1100650 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100651
Nigel Taob48ee752020-03-13 09:27:33 +1100652 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100653
654 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100655 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100656 return;
657 }
Nigel Taob48ee752020-03-13 09:27:33 +1100658 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100659 while (true) {
660 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100661 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100662 return;
663 }
664
665 if (*j == '\x00') {
666 break;
667
668 } else if (*j == '~') {
669 j++;
670 if (*j == '0') {
671 if (*ptr != '~') {
672 break;
673 }
674 } else if (*j == '1') {
675 if (*ptr != '/') {
676 break;
677 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100678 } else if (*j == 'n') {
679 if (*ptr != '\n') {
680 break;
681 }
682 } else if (*j == 'r') {
683 if (*ptr != '\r') {
684 break;
685 }
Nigel Tao904004e2020-11-15 20:56:04 +1100686 } else if (*j == 't') {
687 if (*ptr != '\t') {
688 break;
689 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100690 } else {
691 break;
692 }
693
694 } else if (*j != *ptr) {
695 break;
696 }
697
698 j++;
699 ptr++;
700 len--;
701 }
Nigel Taob48ee752020-03-13 09:27:33 +1100702 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100703 }
704
705 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100706 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100707 return;
708 }
709 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
710 size_t n = wuffs_base__utf_8__encode(
711 wuffs_base__make_slice_u8(&u[0],
712 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
713 code_point);
714 if (n > 0) {
715 this->incremental_match_slice(&u[0], n);
716 }
717 }
718
719 // validate returns whether the (ptr, len) arguments form a valid JSON
720 // Pointer. In particular, it must be valid UTF-8, and either be empty or
721 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100722 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
Nigel Tao904004e2020-11-15 20:56:04 +1100723 // followed by either 'n', 'r' or 't'.
Nigel Taod6fdfb12020-03-11 12:24:14 +1100724 static bool validate(char* query_c_string,
725 size_t length,
726 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100727 if (length <= 0) {
728 return true;
729 }
730 if (query_c_string[0] != '/') {
731 return false;
732 }
733 wuffs_base__slice_u8 s =
734 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
735 bool previous_was_tilde = false;
736 while (s.len > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +1000737 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s.ptr, s.len);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100738 if (!o.is_valid()) {
739 return false;
740 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100741
742 if (previous_was_tilde) {
743 switch (o.code_point) {
744 case '0':
745 case '1':
746 break;
747 case 'n':
748 case 'r':
Nigel Tao904004e2020-11-15 20:56:04 +1100749 case 't':
Nigel Taod6fdfb12020-03-11 12:24:14 +1100750 if (strict_json_pointer_syntax) {
751 return false;
752 }
753 break;
754 default:
755 return false;
756 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100757 }
758 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100759
Nigel Tao0cd2f982020-03-03 23:03:02 +1100760 s.ptr += o.byte_length;
761 s.len -= o.byte_length;
762 }
763 return !previous_was_tilde;
764 }
Nigel Taod60815c2020-03-26 14:32:35 +1100765} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100766
767// ----
768
Nigel Tao68920952020-03-03 11:25:18 +1100769struct {
770 int remaining_argc;
771 char** remaining_argv;
772
Nigel Tao3690e832020-03-12 16:52:26 +1100773 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100774 bool fail_if_unsandboxed;
Nigel Tao0291a472020-08-13 22:40:10 +1000775 bool input_allow_comments;
776 bool input_allow_extra_comma;
777 bool input_allow_inf_nan_numbers;
Nigel Tao21042052020-08-19 23:13:54 +1000778 bool output_comments;
Nigel Tao0291a472020-08-13 22:40:10 +1000779 bool output_extra_comma;
Nigel Tao75682542020-08-22 21:40:18 +1000780 bool output_inf_nan_numbers;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100781 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100782 bool tabs;
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000783
784 uint32_t max_output_depth;
785 uint32_t spaces;
786
787 char* query_c_string;
Nigel Taod60815c2020-03-26 14:32:35 +1100788} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100789
790const char* //
791parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000792 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100793 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100794
Nigel Tao04126792021-02-22 12:23:57 +1100795#if defined(WUFFS_EXAMPLE_SPEAK_JWCC_NOT_JSON)
796 g_flags.input_allow_comments = true;
797 g_flags.input_allow_extra_comma = true;
798 g_flags.output_comments = true;
799 g_flags.output_extra_comma = true;
800#endif
801
Nigel Tao68920952020-03-03 11:25:18 +1100802 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
803 for (; c < argc; c++) {
804 char* arg = argv[c];
805 if (*arg++ != '-') {
806 break;
807 }
808
809 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
810 // cases, a bare "-" is not a flag (some programs may interpret it as
811 // stdin) and a bare "--" means to stop parsing flags.
812 if (*arg == '\x00') {
813 break;
814 } else if (*arg == '-') {
815 arg++;
816 if (*arg == '\x00') {
817 c++;
818 break;
819 }
820 }
821
Nigel Tao3690e832020-03-12 16:52:26 +1100822 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100823 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100824 continue;
825 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100826 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
827 g_flags.max_output_depth = 1;
828 continue;
829 } else if (!strncmp(arg, "d=", 2) ||
830 !strncmp(arg, "max-output-depth=", 16)) {
831 while (*arg++ != '=') {
832 }
833 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000834 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
835 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000836 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100837 g_flags.max_output_depth = (uint32_t)(u.value);
838 continue;
839 }
840 return g_usage;
841 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100842 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100843 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100844 continue;
845 }
Nigel Tao0291a472020-08-13 22:40:10 +1000846 if (!strcmp(arg, "input-allow-comments")) {
847 g_flags.input_allow_comments = true;
Nigel Tao4e193592020-07-15 12:48:57 +1000848 continue;
849 }
Nigel Tao0291a472020-08-13 22:40:10 +1000850 if (!strcmp(arg, "input-allow-extra-comma")) {
851 g_flags.input_allow_extra_comma = true;
Nigel Tao4e193592020-07-15 12:48:57 +1000852 continue;
853 }
Nigel Tao0291a472020-08-13 22:40:10 +1000854 if (!strcmp(arg, "input-allow-inf-nan-numbers")) {
855 g_flags.input_allow_inf_nan_numbers = true;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000856 continue;
857 }
Nigel Tao04126792021-02-22 12:23:57 +1100858 if (!strcmp(arg, "input-jwcc")) {
859 g_flags.input_allow_comments = true;
860 g_flags.input_allow_extra_comma = true;
861 continue;
862 }
Nigel Tao21042052020-08-19 23:13:54 +1000863 if (!strcmp(arg, "jwcc")) {
864 g_flags.input_allow_comments = true;
865 g_flags.input_allow_extra_comma = true;
866 g_flags.output_comments = true;
867 g_flags.output_extra_comma = true;
868 continue;
869 }
870 if (!strcmp(arg, "output-comments")) {
871 g_flags.output_comments = true;
872 continue;
873 }
Nigel Tao0291a472020-08-13 22:40:10 +1000874 if (!strcmp(arg, "output-extra-comma")) {
875 g_flags.output_extra_comma = true;
Nigel Taodd114692020-07-25 21:54:12 +1000876 continue;
877 }
Nigel Tao75682542020-08-22 21:40:18 +1000878 if (!strcmp(arg, "output-inf-nan-numbers")) {
879 g_flags.output_inf_nan_numbers = true;
880 continue;
881 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100882 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
883 while (*arg++ != '=') {
884 }
Nigel Taod60815c2020-03-26 14:32:35 +1100885 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100886 continue;
887 }
Nigel Taoecadf722020-07-13 08:22:34 +1000888 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
889 while (*arg++ != '=') {
890 }
891 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
892 g_flags.spaces = arg[0] - '0';
893 continue;
894 }
895 return g_usage;
896 }
897 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100898 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100899 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100900 }
901 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100902 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100903 continue;
904 }
905
Nigel Taod60815c2020-03-26 14:32:35 +1100906 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100907 }
908
Nigel Taod60815c2020-03-26 14:32:35 +1100909 if (g_flags.query_c_string &&
910 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
911 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100912 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
913 }
914
Nigel Taod60815c2020-03-26 14:32:35 +1100915 g_flags.remaining_argc = argc - c;
916 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100917 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100918}
919
Nigel Tao2cf76db2020-02-27 22:42:01 +1100920const char* //
921initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100922 g_dst = wuffs_base__make_io_buffer(
923 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100924 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100925
Nigel Taod60815c2020-03-26 14:32:35 +1100926 g_src = wuffs_base__make_io_buffer(
927 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100928 wuffs_base__empty_io_buffer_meta());
929
Nigel Taod60815c2020-03-26 14:32:35 +1100930 g_tok = wuffs_base__make_token_buffer(
931 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100932 wuffs_base__empty_token_buffer_meta());
933
Nigel Tao991bd512020-08-19 09:38:16 +1000934 g_cursor_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100935
Nigel Taod60815c2020-03-26 14:32:35 +1100936 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100937
Nigel Taod60815c2020-03-26 14:32:35 +1100938 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100939
Nigel Tao773994c2021-02-22 10:50:08 +1100940 g_num_input_blank_lines = 0;
941
Nigel Tao21042052020-08-19 23:13:54 +1000942 g_is_after_comment = false;
943
Nigel Tao68920952020-03-03 11:25:18 +1100944 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100945 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100946 return "main: unsandboxed";
947 }
Nigel Tao21042052020-08-19 23:13:54 +1000948 if (g_flags.output_comments && !g_flags.compact_output &&
949 !g_flags.output_extra_comma) {
950 return "main: -output-comments requires one or both of -compact-output and "
951 "-output-extra-comma";
952 }
Nigel Tao75682542020-08-22 21:40:18 +1000953 if (g_flags.input_allow_inf_nan_numbers && !g_flags.output_inf_nan_numbers) {
954 return "main: -input-allow-inf-nan-numbers requires "
955 "-output-inf-nan-numbers";
956 }
Nigel Tao01abc842020-03-06 21:42:33 +1100957 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100958 if (g_flags.remaining_argc >
959 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
960 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100961 }
962
Nigel Tao773994c2021-02-22 10:50:08 +1100963 g_two_new_lines_then_256_indent_bytes = g_flags.tabs
964 ? TWO_NEW_LINES_THEN_256_TABS
965 : TWO_NEW_LINES_THEN_256_SPACES;
Nigel Tao0a0c7d62020-08-18 23:31:27 +1000966 g_bytes_per_indent_depth = g_flags.tabs ? 1 : g_flags.spaces;
967
Nigel Taod60815c2020-03-26 14:32:35 +1100968 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100969
Nigel Taoc96b31c2020-07-27 22:37:23 +1000970 // If the query is non-empty, suppress writing to stdout until we've
Nigel Tao0cd2f982020-03-03 23:03:02 +1100971 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100972 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
973 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100974
Nigel Tao0291a472020-08-13 22:40:10 +1000975 TRY(g_dec.initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
976 .message());
Nigel Tao4b186b02020-03-18 14:25:21 +1100977
Nigel Tao0291a472020-08-13 22:40:10 +1000978 if (g_flags.input_allow_comments) {
979 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
980 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
Nigel Tao3c8589b2020-07-19 21:49:00 +1000981 }
Nigel Tao0291a472020-08-13 22:40:10 +1000982 if (g_flags.input_allow_extra_comma) {
983 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000984 }
Nigel Tao0291a472020-08-13 22:40:10 +1000985 if (g_flags.input_allow_inf_nan_numbers) {
986 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_INF_NAN_NUMBERS, true);
Nigel Tao51a38292020-07-19 22:43:17 +1000987 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000988
Nigel Taocd4cbc92020-09-22 22:22:15 +1000989 // Consume any optional trailing whitespace and comments. This isn't part of
990 // the JSON spec, but it works better with line oriented Unix tools (such as
991 // "echo 123 | jsonptr" where it's "echo", not "echo -n") or hand-edited JSON
992 // files which can accidentally contain trailing whitespace.
993 g_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_FILLER, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100994
995 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100996}
Nigel Tao1b073492020-02-16 22:11:36 +1100997
998// ----
999
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001000// ignore_return_value suppresses errors from -Wall -Werror.
1001static void //
1002ignore_return_value(int ignored) {}
1003
Nigel Tao2914bae2020-02-26 09:40:30 +11001004const char* //
1005read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +11001006 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001007 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +11001008 }
Nigel Taod60815c2020-03-26 14:32:35 +11001009 g_src.compact();
1010 if (g_src.meta.wi >= g_src.data.len) {
1011 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +11001012 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001013 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +10001014 ssize_t n = read(g_input_file_descriptor, g_src.writer_pointer(),
1015 g_src.writer_length());
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001016 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +11001017 g_src.meta.wi += n;
1018 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001019 break;
1020 } else if (errno != EINTR) {
1021 return strerror(errno);
1022 }
Nigel Tao1b073492020-02-16 22:11:36 +11001023 }
1024 return nullptr;
1025}
1026
Nigel Tao2914bae2020-02-26 09:40:30 +11001027const char* //
1028flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001029 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +10001030 size_t n = g_dst.reader_length();
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001031 if (n == 0) {
1032 break;
Nigel Tao1b073492020-02-16 22:11:36 +11001033 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001034 const int stdout_fd = 1;
Nigel Taod6a10df2020-07-27 11:47:47 +10001035 ssize_t i = write(stdout_fd, g_dst.reader_pointer(), n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001036 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +11001037 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001038 } else if (errno != EINTR) {
1039 return strerror(errno);
1040 }
Nigel Tao1b073492020-02-16 22:11:36 +11001041 }
Nigel Taod60815c2020-03-26 14:32:35 +11001042 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +11001043 return nullptr;
1044}
1045
Nigel Tao2914bae2020-02-26 09:40:30 +11001046const char* //
Nigel Tao6b86cbc2020-08-19 11:39:56 +10001047write_dst_slow(const void* s, size_t n) {
Nigel Tao1b073492020-02-16 22:11:36 +11001048 const uint8_t* p = static_cast<const uint8_t*>(s);
1049 while (n > 0) {
Nigel Taod6a10df2020-07-27 11:47:47 +10001050 size_t i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +11001051 if (i == 0) {
1052 const char* z = flush_dst();
1053 if (z) {
1054 return z;
1055 }
Nigel Taod6a10df2020-07-27 11:47:47 +10001056 i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +11001057 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +11001058 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +11001059 }
1060 }
1061
1062 if (i > n) {
1063 i = n;
1064 }
Nigel Taod60815c2020-03-26 14:32:35 +11001065 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
1066 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +11001067 p += i;
1068 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +11001069 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +11001070 }
1071 return nullptr;
1072}
1073
Nigel Tao6b86cbc2020-08-19 11:39:56 +10001074inline const char* //
1075write_dst(const void* s, size_t n) {
1076 if (g_suppress_write_dst > 0) {
1077 return nullptr;
1078 } else if (n <= (DST_BUFFER_ARRAY_SIZE - g_dst.meta.wi)) {
1079 memcpy(g_dst.data.ptr + g_dst.meta.wi, s, n);
1080 g_dst.meta.wi += n;
1081 g_wrote_to_dst = true;
1082 return nullptr;
1083 }
1084 return write_dst_slow(s, n);
1085}
1086
Nigel Tao77f85522021-07-19 00:00:13 +10001087#define TRY_INDENT \
Nigel Tao773994c2021-02-22 10:50:08 +11001088 do { \
1089 uint32_t adj = (g_num_input_blank_lines > 1) ? 1 : 0; \
1090 g_num_input_blank_lines = 0; \
1091 uint32_t indent = g_depth * g_bytes_per_indent_depth; \
1092 TRY(write_dst(g_two_new_lines_then_256_indent_bytes + 1 - adj, \
1093 1 + adj + (indent & 0xFF))); \
1094 for (indent >>= 8; indent > 0; indent--) { \
1095 TRY(write_dst(g_two_new_lines_then_256_indent_bytes + 2, 0x100)); \
1096 } \
Nigel Tao21042052020-08-19 23:13:54 +10001097 } while (false)
1098
Nigel Tao1b073492020-02-16 22:11:36 +11001099// ----
1100
Nigel Tao2914bae2020-02-26 09:40:30 +11001101const char* //
Nigel Tao7cb76542020-07-19 22:19:04 +10001102handle_unicode_code_point(uint32_t ucp) {
Nigel Tao63441812020-08-21 14:05:48 +10001103 if (ucp < 0x80) {
1104 return write_dst(&ascii_escapes[8 * ucp + 1], ascii_escapes[8 * ucp]);
Nigel Tao7cb76542020-07-19 22:19:04 +10001105 }
Nigel Tao7cb76542020-07-19 22:19:04 +10001106 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1107 size_t n = wuffs_base__utf_8__encode(
1108 wuffs_base__make_slice_u8(&u[0],
1109 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1110 ucp);
1111 if (n == 0) {
1112 return "main: internal error: unexpected Unicode code point";
1113 }
Nigel Tao0291a472020-08-13 22:40:10 +10001114 return write_dst(&u[0], n);
Nigel Tao168f60a2020-07-14 13:19:33 +10001115}
1116
Nigel Taod191a3f2020-07-19 22:14:54 +10001117// ----
1118
Nigel Tao50db4a42020-08-20 11:31:28 +10001119inline const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001120handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001121 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001122 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001123 uint64_t vbd = t.value_base_detail();
Nigel Taoee6927f2020-07-27 12:08:33 +10001124 uint64_t token_length = t.length();
Nigel Tao991bd512020-08-19 09:38:16 +10001125 // The "- token_length" is because we incremented g_cursor_index before
1126 // calling handle_token.
Nigel Taoee6927f2020-07-27 12:08:33 +10001127 wuffs_base__slice_u8 tok = wuffs_base__make_slice_u8(
Nigel Tao991bd512020-08-19 09:38:16 +10001128 g_src.data.ptr + g_cursor_index - token_length, token_length);
Nigel Tao1b073492020-02-16 22:11:36 +11001129
1130 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001131 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001132 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001133 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001134 return "main: no match for query";
1135 }
Nigel Taod60815c2020-03-26 14:32:35 +11001136 if (g_depth <= 0) {
1137 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001138 }
Nigel Taod60815c2020-03-26 14:32:35 +11001139 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001140
Nigel Taod60815c2020-03-26 14:32:35 +11001141 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1142 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001143 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao0291a472020-08-13 22:40:10 +10001144 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1145 ? "\"[…]\""
1146 : "\"{…}\"",
1147 7));
1148 } else {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001149 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001150 if ((g_ctx != context::in_list_after_bracket) &&
1151 (g_ctx != context::in_dict_after_brace) &&
1152 !g_flags.compact_output) {
Nigel Tao21042052020-08-19 23:13:54 +10001153 if (g_is_after_comment) {
Nigel Tao77f85522021-07-19 00:00:13 +10001154 TRY_INDENT;
Nigel Tao21042052020-08-19 23:13:54 +10001155 } else {
1156 if (g_flags.output_extra_comma) {
1157 TRY(write_dst(",", 1));
1158 }
Nigel Tao77f85522021-07-19 00:00:13 +10001159 TRY_INDENT;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001160 }
Nigel Tao773994c2021-02-22 10:50:08 +11001161 } else {
1162 g_num_input_blank_lines = 0;
Nigel Tao1b073492020-02-16 22:11:36 +11001163 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001164
1165 TRY(write_dst(
1166 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1167 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001168 }
1169
Nigel Taod60815c2020-03-26 14:32:35 +11001170 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1171 ? context::in_list_after_value
1172 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001173 goto after_value;
1174 }
1175
Nigel Taod1c928a2020-02-28 12:43:53 +11001176 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
Nigel Tao0291a472020-08-13 22:40:10 +10001177 // continuation of a multi-token chain.
1178 if (start_of_token_chain) {
Nigel Tao21042052020-08-19 23:13:54 +10001179 if (g_is_after_comment) {
Nigel Tao77f85522021-07-19 00:00:13 +10001180 TRY_INDENT;
Nigel Tao21042052020-08-19 23:13:54 +10001181 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001182 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1183 } else if (g_ctx != context::none) {
Nigel Tao0291a472020-08-13 22:40:10 +10001184 if ((g_ctx != context::in_list_after_bracket) &&
1185 (g_ctx != context::in_dict_after_brace)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001186 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001187 }
Nigel Taod60815c2020-03-26 14:32:35 +11001188 if (!g_flags.compact_output) {
Nigel Tao77f85522021-07-19 00:00:13 +10001189 TRY_INDENT;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001190 }
1191 }
1192
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001193 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001194 if (g_query.is_at(g_depth)) {
1195 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001196 case context::in_list_after_bracket:
1197 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001198 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001199 break;
1200 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001201 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001202 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001203 default:
1204 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001205 }
1206 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001207 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001208 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001209 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001210 // There is no next fragment. We have matched the complete query, and
1211 // the upcoming JSON value is the result of that query.
1212 //
Nigel Taod60815c2020-03-26 14:32:35 +11001213 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1214 // we were about to decode a top-level value. This makes any subsequent
1215 // indentation be relative to this point, and we will return g_eod
1216 // after the upcoming JSON value is complete.
1217 if (g_suppress_write_dst != 1) {
1218 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001219 }
Nigel Taod60815c2020-03-26 14:32:35 +11001220 g_suppress_write_dst = 0;
1221 g_ctx = context::none;
1222 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001223 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1224 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1225 // The query has moved on to the next fragment but the upcoming JSON
1226 // value is not a container.
1227 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001228 }
1229 }
1230
1231 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001232 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001233 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001234 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001235 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1236 g_suppress_write_dst++;
Nigel Tao0291a472020-08-13 22:40:10 +10001237 } else {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001238 TRY(write_dst(
1239 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1240 1));
1241 }
Nigel Taod60815c2020-03-26 14:32:35 +11001242 g_depth++;
1243 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1244 ? context::in_list_after_bracket
1245 : context::in_dict_after_brace;
Nigel Tao773994c2021-02-22 10:50:08 +11001246 g_num_input_blank_lines = 0;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001247 return nullptr;
1248
Nigel Tao2cf76db2020-02-27 22:42:01 +11001249 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Tao0291a472020-08-13 22:40:10 +10001250 if (start_of_token_chain) {
1251 TRY(write_dst("\"", 1));
1252 g_query.restart_fragment(in_dict_before_key() &&
1253 g_query.is_at(g_depth));
1254 }
1255
Nigel Taoade01652020-08-21 15:57:51 +10001256 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
Nigel Tao0291a472020-08-13 22:40:10 +10001257 TRY(write_dst(tok.ptr, tok.len));
1258 g_query.incremental_match_slice(tok.ptr, tok.len);
Nigel Tao0291a472020-08-13 22:40:10 +10001259 }
1260
Nigel Tao496e88b2020-04-09 22:10:08 +10001261 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001262 return nullptr;
1263 }
Nigel Tao0291a472020-08-13 22:40:10 +10001264 TRY(write_dst("\"", 1));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001265 goto after_value;
1266
1267 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001268 if (!t.continued()) {
1269 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001270 }
1271 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001272 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001273 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001274 }
1275
Nigel Tao3f688b22020-08-21 15:51:48 +10001276 // We have a literal or a number.
1277 TRY(write_dst(tok.ptr, tok.len));
1278 goto after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001279 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001280
Nigel Tao2cf76db2020-02-27 22:42:01 +11001281 // Book-keeping after completing a value (whether a container value or a
1282 // simple value). Empty parent containers are no longer empty. If the parent
1283 // container is a "{...}" object, toggle between keys and values.
1284after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001285 if (g_depth == 0) {
1286 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001287 }
Nigel Taod60815c2020-03-26 14:32:35 +11001288 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001289 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001290 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001291 break;
1292 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001293 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001294 break;
1295 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001296 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001297 break;
1298 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001299 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001300 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001301 default:
1302 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001303 }
1304 return nullptr;
1305}
1306
1307const char* //
1308main1(int argc, char** argv) {
1309 TRY(initialize_globals(argc, argv));
1310
Nigel Taocd183f92020-07-14 12:11:05 +10001311 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001312 while (true) {
Nigel Tao0291a472020-08-13 22:40:10 +10001313 wuffs_base__status status = g_dec.decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001314 &g_tok, &g_src,
1315 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001316
Nigel Taod60815c2020-03-26 14:32:35 +11001317 while (g_tok.meta.ri < g_tok.meta.wi) {
1318 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao991bd512020-08-19 09:38:16 +10001319 uint64_t token_length = t.length();
1320 if ((g_src.meta.ri - g_cursor_index) < token_length) {
Nigel Taod60815c2020-03-26 14:32:35 +11001321 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001322 }
Nigel Tao991bd512020-08-19 09:38:16 +10001323 g_cursor_index += token_length;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001324
Nigel Tao21042052020-08-19 23:13:54 +10001325 // Handle filler tokens (e.g. whitespace, punctuation and comments).
1326 // These are skipped, unless -output-comments is enabled.
Nigel Tao3c8589b2020-07-19 21:49:00 +10001327 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao773994c2021-02-22 10:50:08 +11001328 if (!g_flags.output_comments) {
1329 // No-op.
1330 } else if (t.value_base_detail() &
1331 WUFFS_BASE__TOKEN__VBD__FILLER__COMMENT_ANY) {
Nigel Tao21042052020-08-19 23:13:54 +10001332 if (g_flags.compact_output) {
1333 TRY(write_dst(g_src.data.ptr + g_cursor_index - token_length,
1334 token_length));
Nigel Tao77f85522021-07-19 00:00:13 +10001335 if (!t.continued() &&
1336 (t.value_base_detail() &
1337 WUFFS_BASE__TOKEN__VBD__FILLER__COMMENT_LINE)) {
1338 TRY(write_dst("\n", 1));
1339 }
Nigel Tao773994c2021-02-22 10:50:08 +11001340
Nigel Tao21042052020-08-19 23:13:54 +10001341 } else {
1342 if (start_of_token_chain) {
1343 if (g_is_after_comment) {
Nigel Tao77f85522021-07-19 00:00:13 +10001344 TRY_INDENT;
Nigel Tao21042052020-08-19 23:13:54 +10001345 } else if (g_ctx != context::none) {
1346 if (g_ctx == context::in_dict_after_key) {
1347 TRY(write_dst(":", 1));
1348 } else if ((g_ctx != context::in_list_after_bracket) &&
Nigel Taocd4cbc92020-09-22 22:22:15 +10001349 (g_ctx != context::in_dict_after_brace) &&
1350 (g_ctx != context::end_of_data)) {
Nigel Tao21042052020-08-19 23:13:54 +10001351 TRY(write_dst(",", 1));
1352 }
Nigel Tao77f85522021-07-19 00:00:13 +10001353 TRY_INDENT;
Nigel Tao21042052020-08-19 23:13:54 +10001354 }
1355 }
1356 TRY(write_dst(g_src.data.ptr + g_cursor_index - token_length,
1357 token_length));
Nigel Tao21042052020-08-19 23:13:54 +10001358 g_is_after_comment = true;
1359 }
Nigel Tao773994c2021-02-22 10:50:08 +11001360 if (g_ctx == context::in_list_after_bracket) {
1361 g_ctx = context::in_list_after_value;
1362 } else if (g_ctx == context::in_dict_after_brace) {
1363 g_ctx = context::in_dict_after_value;
1364 }
Nigel Tao77f85522021-07-19 00:00:13 +10001365 g_num_input_blank_lines = 0;
Nigel Tao773994c2021-02-22 10:50:08 +11001366
1367 } else {
1368 uint8_t* p = g_src.data.ptr + g_cursor_index - token_length;
1369 uint8_t* q = g_src.data.ptr + g_cursor_index;
1370 for (; p < q; p++) {
1371 if (*p == '\n') {
1372 g_num_input_blank_lines++;
1373 }
1374 }
Nigel Tao21042052020-08-19 23:13:54 +10001375 }
Nigel Tao773994c2021-02-22 10:50:08 +11001376
Nigel Tao496e88b2020-04-09 22:10:08 +10001377 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001378 continue;
1379 }
1380
Nigel Tao2ef39992020-04-09 17:24:39 +10001381 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao21042052020-08-19 23:13:54 +10001382 g_is_after_comment = false;
Nigel Tao496e88b2020-04-09 22:10:08 +10001383 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001384 if (z == nullptr) {
1385 continue;
Nigel Taocd4cbc92020-09-22 22:22:15 +10001386 } else if (z != g_eod) {
1387 return z;
1388 } else if (g_flags.query_c_string && *g_flags.query_c_string) {
1389 // With a non-empty g_query, don't try to consume trailing filler or
1390 // confirm that we've processed all the tokens.
1391 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001392 }
Nigel Taocd4cbc92020-09-22 22:22:15 +10001393 g_ctx = context::end_of_data;
Nigel Tao1b073492020-02-16 22:11:36 +11001394 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001395
1396 if (status.repr == nullptr) {
Nigel Taocd4cbc92020-09-22 22:22:15 +10001397 if (g_ctx != context::end_of_data) {
1398 return "main: internal error: unexpected end of token stream";
1399 }
1400 // Check that we've exhausted the input.
1401 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
1402 TRY(read_src());
1403 }
1404 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
1405 return "main: valid JSON followed by further (unexpected) data";
1406 }
1407 // All done.
1408 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001409 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Tao991bd512020-08-19 09:38:16 +10001410 if (g_cursor_index != g_src.meta.ri) {
Nigel Taod60815c2020-03-26 14:32:35 +11001411 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001412 }
1413 TRY(read_src());
Nigel Tao991bd512020-08-19 09:38:16 +10001414 g_cursor_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001415 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001416 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001417 } else {
1418 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001419 }
1420 }
1421}
1422
Nigel Tao2914bae2020-02-26 09:40:30 +11001423int //
1424compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001425 if (!status_msg) {
1426 return 0;
1427 }
Nigel Tao01abc842020-03-06 21:42:33 +11001428 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001429 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001430 n = strlen(status_msg);
1431 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001432 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001433 if (n >= 2047) {
1434 status_msg = "main: internal error: error message is too long";
1435 n = strnlen(status_msg, 2047);
1436 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001437 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001438 const int stderr_fd = 2;
1439 ignore_return_value(write(stderr_fd, status_msg, n));
1440 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Taoa51867d2021-05-19 21:34:09 +10001441 // Return an exit code of 1 for regular (foreseen) errors, e.g. badly
Nigel Tao9cc2c252020-02-23 17:05:49 +11001442 // formatted or unsupported input.
1443 //
1444 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1445 // run-time checks found that an internal invariant did not hold.
1446 //
1447 // Automated testing, including badly formatted inputs, can therefore
1448 // discriminate between expected failure (exit code 1) and unexpected failure
1449 // (other non-zero exit codes). Specifically, exit code 2 for internal
1450 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1451 // linux) for a segmentation fault (e.g. null pointer dereference).
1452 return strstr(status_msg, "internal error:") ? 2 : 1;
1453}
1454
Nigel Tao2914bae2020-02-26 09:40:30 +11001455int //
1456main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001457 // Look for an input filename (the first non-flag argument) in argv. If there
1458 // is one, open it (but do not read from it) before we self-impose a sandbox.
1459 //
1460 // Flags start with "-", unless it comes after a bare "--" arg.
1461 {
1462 bool dash_dash = false;
1463 int a;
1464 for (a = 1; a < argc; a++) {
1465 char* arg = argv[a];
1466 if ((arg[0] == '-') && !dash_dash) {
1467 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1468 continue;
1469 }
Nigel Taod60815c2020-03-26 14:32:35 +11001470 g_input_file_descriptor = open(arg, O_RDONLY);
1471 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001472 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1473 return 1;
1474 }
1475 break;
1476 }
1477 }
1478
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001479#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1480 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001481 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001482#endif
1483
Nigel Tao0cd2f982020-03-03 23:03:02 +11001484 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001485 if (g_wrote_to_dst) {
Nigel Taocd4cbc92020-09-22 22:22:15 +10001486 const char* z1 = g_is_after_comment ? nullptr : write_dst("\n", 1);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001487 const char* z2 = flush_dst();
1488 z = z ? z : (z1 ? z1 : z2);
1489 }
1490 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001491
1492#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1493 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1494 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1495 // only SYS_exit.
1496 syscall(SYS_exit, exit_code);
1497#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001498 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001499}