blob: c285b27ba6a4883e4d5da50ec7b57ba9862fd8f9 [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
17/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110018jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao168f60a2020-07-14 13:19:33 +100019(RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin and writes CBOR
20or canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110021
Nigel Taod60815c2020-03-26 14:32:35 +110022See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110023
24----
25
26JSON Pointer (and this program's implementation) is one of many JSON query
27languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
28simple and fewer-featured compared to those others.
29
Nigel Tao168f60a2020-07-14 13:19:33 +100030One benefit of simplicity is that this program's CBOR, JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110031implementations do not dynamically allocate or free memory (yet it does not
32require that the entire input fits in memory at once). They are therefore
33trivially protected against certain bug classes: memory leaks, double-frees and
34use-after-frees.
35
Nigel Tao168f60a2020-07-14 13:19:33 +100036The CBOR and JSON implementations are also written in the Wuffs programming
37language (and then transpiled to C/C++), which is memory-safe (e.g. array
38indexing is bounds-checked) but also prevents integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110039
Nigel Taofe0cbbd2020-03-05 22:01:30 +110040For defense in depth, on Linux, this program also self-imposes a
41SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
42or writing its output. Under this sandbox, the only permitted system calls are
43read, write, exit and sigreturn.
44
Nigel Tao168f60a2020-07-14 13:19:33 +100045All together, this program aims to safely handle untrusted CBOR or JSON files
46without fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110047
48----
Nigel Tao1b073492020-02-16 22:11:36 +110049
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110050As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
51JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
52"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
53was first published on 2016-10-26 and updated on 2018-03-30.
54
Nigel Tao0cd2f982020-03-03 23:03:02 +110055After modifying this program, run "build-example.sh example/jsonptr/" and then
56"script/run-json-test-suite.sh" to catch correctness regressions.
57
58----
59
Nigel Taod0b16cb2020-03-14 10:15:54 +110060This program uses Wuffs' JSON decoder at a relatively low level, processing the
61decoder's token-stream output individually. The core loop, in pseudo-code, is
62"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110063changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110064output text based on that state and the token's source text. Notably,
65handle_token is not recursive, even though JSON values can nest.
66
67This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
68string, object) comprises one or more JSON tokens.
69
70An alternative, higher-level approach is in the sibling example/jsonfindptrs
71program. Neither approach is better or worse per se, but when studying this
72program, be aware that there are multiple ways to use Wuffs' JSON decoder.
73
74The two programs, jsonfindptrs and jsonptr, also demonstrate different
75trade-offs with regard to JSON object duplicate keys. The JSON spec permits
76different implementations to allow or reject duplicate keys. It is not always
77clear which approach is safer. Rejecting them is certainly unambiguous, and
78security bugs can lurk in ambiguous corners of a file format, if two different
79implementations both silently accept a file but differ on how to interpret it.
80On the other hand, in the worst case, detecting duplicate keys requires O(N)
81memory, where N is the size of the (potentially untrusted) input.
82
83This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
84mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
85it runs in a SECCOMP_MODE_STRICT sandbox.
86
87----
88
Nigel Tao1b073492020-02-16 22:11:36 +110089This example program differs from most other example Wuffs programs in that it
90is written in C++, not C.
91
92$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
93
94for a C++ compiler $CXX, such as clang++ or g++.
95*/
96
Nigel Tao721190a2020-04-03 22:25:21 +110097#if defined(__cplusplus) && (__cplusplus < 201103L)
98#error "This C++ program requires -std=c++11 or later"
99#endif
100
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100101#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100102#include <fcntl.h>
103#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100104#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100105#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100106
107// Wuffs ships as a "single file C library" or "header file library" as per
108// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
109//
110// To use that single file as a "foo.c"-like implementation, instead of a
111// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
112// compiling it.
113#define WUFFS_IMPLEMENTATION
114
115// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
116// release/c/etc.c whitelist which parts of Wuffs to build. That file contains
117// the entire Wuffs standard library, implementing a variety of codecs and file
118// formats. Without this macro definition, an optimizing compiler or linker may
119// very well discard Wuffs code for unused codecs, but listing the Wuffs
120// modules we use makes that process explicit. Preprocessing means that such
121// code simply isn't compiled.
122#define WUFFS_CONFIG__MODULES
123#define WUFFS_CONFIG__MODULE__BASE
Nigel Tao4e193592020-07-15 12:48:57 +1000124#define WUFFS_CONFIG__MODULE__CBOR
Nigel Tao1b073492020-02-16 22:11:36 +1100125#define WUFFS_CONFIG__MODULE__JSON
126
127// If building this program in an environment that doesn't easily accommodate
128// relative includes, you can use the script/inline-c-relative-includes.go
129// program to generate a stand-alone C++ file.
130#include "../../release/c/wuffs-unsupported-snapshot.c"
131
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100132#if defined(__linux__)
133#include <linux/prctl.h>
134#include <linux/seccomp.h>
135#include <sys/prctl.h>
136#include <sys/syscall.h>
137#define WUFFS_EXAMPLE_USE_SECCOMP
138#endif
139
Nigel Tao2cf76db2020-02-27 22:42:01 +1100140#define TRY(error_msg) \
141 do { \
142 const char* z = error_msg; \
143 if (z) { \
144 return z; \
145 } \
146 } while (false)
147
Nigel Taod60815c2020-03-26 14:32:35 +1100148static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100149
Nigel Taod60815c2020-03-26 14:32:35 +1100150static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100151 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100152 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100153 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100154 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100155 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao4e193592020-07-15 12:48:57 +1000156 " -i=FMT -input-format={json,cbor}\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000157 " -o=FMT -output-format={json,cbor}\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100158 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000159 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100160 " -t -tabs\n"
161 " -fail-if-unsandboxed\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000162 " -input-allow-json-comments\n"
163 " -input-allow-json-extra-comma\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000164 " -input-allow-json-inf-nan-numbers\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000165 " -output-cbor-metadata-as-json-comments\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000166 " -output-json-extra-comma\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000167 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100168 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100169 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100170 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100171 "----\n"
172 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100173 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000174 "Pointer (RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin\n"
175 "and writes CBOR or canonicalized, formatted UTF-8 JSON to stdout. The\n"
176 "input and output formats do not have to match, but conversion between\n"
177 "formats may be lossy.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100178 "\n"
179 "Canonicalized means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-written\n"
180 "as \"abc\\n\\txÅ·z\". It does not sort object keys, nor does it reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100181 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100182 "\n"
183 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000184 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000185 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags. Those\n"
186 "flags only apply to JSON (not CBOR) output.\n"
187 "\n"
188 "The -input-format and -output-format flags select between reading and\n"
189 "writing JSON (the default, a textual format) or CBOR (a binary format).\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100190 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000191 "The -input-allow-json-comments flag allows \"/*slash-star*/\" and\n"
192 "\"//slash-slash\" C-style comments within JSON input.\n"
193 "\n"
194 "The -input-allow-json-extra-comma flag allows input like \"[1,2,]\",\n"
195 "with a comma after the final element of a JSON list or dictionary.\n"
196 "\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000197 "The -input-allow-json-inf-nan-numbers flag allows non-finite floating\n"
198 "point numbers (infinities and not-a-numbers) within JSON input.\n"
199 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000200 "The -output-cbor-metadata-as-json-comments writes CBOR tags and other\n"
201 "metadata as /*comments*/, when -i=json and -o=cbor are also set. Such\n"
202 "comments are non-compliant with the JSON specification but many parsers\n"
203 "accept them.\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000204 "\n"
205 "The -output-json-extra-comma flag writes extra commas, regardless of\n"
206 "whether the input had it. Extra commas are non-compliant with the JSON\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000207 "specification but many parsers accept them and they can produce simpler\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000208 "line-based diffs. This flag is ignored when -compact-output is set.\n"
209 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100210 "----\n"
211 "\n"
212 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100213 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100214 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
215 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100216 "will print:\n"
217 " \"baz\"\n"
218 "\n"
219 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100220 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100221 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
222 "child (the value in a key-value pair) of the root whose key is the empty\n"
223 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100224 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000225 "If the query found a valid JSON|CBOR value, this program will return a\n"
226 "zero exit code even if the rest of the input isn't valid. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100227 "did not find a value, or found an invalid one, this program returns a\n"
228 "non-zero exit code, but may still print partial output to stdout.\n"
229 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000230 "The JSON and CBOR specifications (https://json.org/ or RFC 8259; RFC\n"
231 "7049) permit implementations to allow duplicate keys, as this one does.\n"
232 "This JSON Pointer implementation is also greedy, following the first\n"
233 "match for each fragment without back-tracking. For example, the\n"
234 "\"/foo/bar\" query will fail if the root object has multiple \"foo\"\n"
235 "children but the first one doesn't have a \"bar\" child, even if later\n"
236 "ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100237 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000238 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
239 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
240 "\"~\" and \"/\". Without this flag, this program also lets \"~n\" and\n"
241 "\"~r\" escape the New Line and Carriage Return ASCII control characters,\n"
242 "which can work better with line oriented Unix tools that assume exactly\n"
243 "one value (i.e. one JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100244 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100245 "----\n"
246 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100247 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000248 "output depth. JSON|CBOR containers ([] arrays and {} objects) can hold\n"
249 "other containers. When this flag is set, containers at depth NUM are\n"
250 "replaced with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is\n"
251 "equivalent to -d=1. The flag's absence means an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100252 "\n"
253 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000254 "affect whether or not the input is considered valid JSON|CBOR. The\n"
255 "format specifications permit implementations to set their own maximum\n"
256 "input depth. This JSON|CBOR implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100257 "\n"
258 "Depth is measured in terms of nested containers. It is unaffected by the\n"
259 "number of spaces or tabs used to indent.\n"
260 "\n"
261 "When both -max-output-depth and -query are set, the output depth is\n"
262 "measured from when the query resolves, not from the input root. The\n"
263 "input depth (measured from the root) is still limited to 1024.\n"
264 "\n"
265 "----\n"
266 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100267 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
268 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100269 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100270
Nigel Tao2cf76db2020-02-27 22:42:01 +1100271// ----
272
Nigel Taof3146c22020-03-26 08:47:42 +1100273// Wuffs allows either statically or dynamically allocated work buffers. This
274// program exercises static allocation.
275#define WORK_BUFFER_ARRAY_SIZE \
276 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
277#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100278uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100279#else
280// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100281uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100282#endif
283
Nigel Taod60815c2020-03-26 14:32:35 +1100284bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100285
Nigel Taod60815c2020-03-26 14:32:35 +1100286int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100287
Nigel Tao2cf76db2020-02-27 22:42:01 +1100288#define MAX_INDENT 8
Nigel Tao107f0ef2020-03-01 21:35:02 +1100289#define INDENT_SPACES_STRING " "
Nigel Tao6e7d1412020-03-06 09:21:35 +1100290#define INDENT_TAB_STRING "\t"
Nigel Tao107f0ef2020-03-01 21:35:02 +1100291
Nigel Taofdac24a2020-03-06 21:53:08 +1100292#ifndef DST_BUFFER_ARRAY_SIZE
293#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100294#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100295#ifndef SRC_BUFFER_ARRAY_SIZE
296#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100297#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100298#ifndef TOKEN_BUFFER_ARRAY_SIZE
299#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100300#endif
301
Nigel Taod60815c2020-03-26 14:32:35 +1100302uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
303uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
304wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100305
Nigel Taod60815c2020-03-26 14:32:35 +1100306wuffs_base__io_buffer g_dst;
307wuffs_base__io_buffer g_src;
308wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100309
Nigel Taod60815c2020-03-26 14:32:35 +1100310// g_curr_token_end_src_index is the g_src.data.ptr index of the end of the
311// current token. An invariant is that (g_curr_token_end_src_index <=
312// g_src.meta.ri).
313size_t g_curr_token_end_src_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100314
Nigel Tao850dc182020-07-21 22:52:04 +1000315struct {
316 uint64_t category;
317 uint64_t detail;
318} g_token_extension;
319
Nigel Taod60815c2020-03-26 14:32:35 +1100320uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100321
322enum class context {
323 none,
324 in_list_after_bracket,
325 in_list_after_value,
326 in_dict_after_brace,
327 in_dict_after_key,
328 in_dict_after_value,
Nigel Taod60815c2020-03-26 14:32:35 +1100329} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100330
Nigel Tao0cd2f982020-03-03 23:03:02 +1100331bool //
332in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100333 return (g_ctx == context::in_dict_after_brace) ||
334 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100335}
336
Nigel Taod60815c2020-03-26 14:32:35 +1100337uint32_t g_suppress_write_dst;
338bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100339
Nigel Tao4e193592020-07-15 12:48:57 +1000340wuffs_cbor__decoder g_cbor_decoder;
341wuffs_json__decoder g_json_decoder;
342wuffs_base__token_decoder* g_dec;
Nigel Tao1b073492020-02-16 22:11:36 +1100343
Nigel Tao168f60a2020-07-14 13:19:33 +1000344// cbor_output_string_array is a 4 KiB buffer. For -output-format=cbor, strings
345// whose length are 4096 or less are written as a single definite-length
346// string. Longer strings are written as an indefinite-length string containing
347// multiple definite-length chunks, each of length up to 4 KiB. See the CBOR
348// RFC (RFC 7049) section 2.2.2 "Indefinite-Length Byte Strings and Text
349// Strings". The output is determinate even when the input is streamed.
350//
351// If raising CBOR_OUTPUT_STRING_ARRAY_SIZE above 0xFFFF then you will also
352// have to update flush_cbor_output_string.
353#define CBOR_OUTPUT_STRING_ARRAY_SIZE 4096
354uint8_t g_cbor_output_string_array[CBOR_OUTPUT_STRING_ARRAY_SIZE];
355
356uint32_t g_cbor_output_string_length;
357bool g_cbor_output_string_is_multiple_chunks;
358bool g_cbor_output_string_is_utf_8;
359
Nigel Tao0cd2f982020-03-03 23:03:02 +1100360// ----
361
362// Query is a JSON Pointer query. After initializing with a NUL-terminated C
363// string, its multiple fragments are consumed as the program walks the JSON
364// data from stdin. For example, letting "$" denote a NUL, suppose that we
365// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100366// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100367//
368// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
369// / a p p l e / b a n a n a / 1 2 / d u r i a n $
370// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
371// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100372// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100373//
Nigel Taob48ee752020-03-13 09:27:33 +1100374// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
375// start (inclusive) and end (exclusive) of the query fragment. They satisfy
376// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
377// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100378//
Nigel Taob48ee752020-03-13 09:27:33 +1100379// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
380// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100381//
382// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
383// tokens, as backslash-escaped values within that JSON string may each get
384// their own token.
385//
Nigel Taob48ee752020-03-13 09:27:33 +1100386// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100387//
Nigel Taob48ee752020-03-13 09:27:33 +1100388// While mfj remains non-nullptr, each token's unescaped contents are then
389// compared to that part of the fragment from mfj to mfk. If it is a prefix
390// (including the case of an exact match), then mfj is advanced by the
391// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100392//
393// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
394// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100395// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
396// responsible for calling Query::validate (with a strict_json_pointer_syntax
397// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100398//
Nigel Taob48ee752020-03-13 09:27:33 +1100399// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
400// incrementally match the object key with the query fragment. For example, if
401// we have already matched the "ban" of "banana", then we would accept any of
402// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
403// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100404//
Nigel Taob48ee752020-03-13 09:27:33 +1100405// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100406// v
407// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
408// / a p p l e / b a n a n a / 1 2 / d u r i a n $
409// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
410// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100411// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100412//
413// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100414// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
415// have a fragment match: the query fragment equals the object key. If there is
416// a next fragment (in this example, "12") we move the frag_etc pointers to its
417// start and end and increment Query::m_depth. Otherwise, we have matched the
418// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100419//
420// The discussion above centers on object keys. If the query fragment is
421// numeric then it can also match as an array index: the string fragment "12"
422// will match an array's 13th element (starting counting from zero). See RFC
423// 6901 for its precise definition of an "array index" number.
424//
Nigel Taob48ee752020-03-13 09:27:33 +1100425// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100426// whose type (wuffs_base__result_u64) is a result type. An error result means
427// that the fragment is not an array index. A value result holds the number of
428// list elements remaining. When matching a query fragment in an array (instead
429// of in an object), each element ticks this number down towards zero. At zero,
430// the upcoming JSON value is the one that matches the query fragment.
431class Query {
432 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100433 uint8_t* m_frag_i;
434 uint8_t* m_frag_j;
435 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100436
Nigel Taob48ee752020-03-13 09:27:33 +1100437 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100438
Nigel Taob48ee752020-03-13 09:27:33 +1100439 wuffs_base__result_u64 m_array_index;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100440
441 public:
442 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100443 m_frag_i = (uint8_t*)query_c_string;
444 m_frag_j = (uint8_t*)query_c_string;
445 m_frag_k = (uint8_t*)query_c_string;
446 m_depth = 0;
447 m_array_index.status.repr = "#main: not an array index query fragment";
448 m_array_index.value = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100449 }
450
Nigel Taob48ee752020-03-13 09:27:33 +1100451 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100452
Nigel Taob48ee752020-03-13 09:27:33 +1100453 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100454
455 // tick returns whether the fragment is a valid array index whose value is
456 // zero. If valid but non-zero, it decrements it and returns false.
457 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100458 if (m_array_index.status.is_ok()) {
459 if (m_array_index.value == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100460 return true;
461 }
Nigel Taob48ee752020-03-13 09:27:33 +1100462 m_array_index.value--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100463 }
464 return false;
465 }
466
467 // next_fragment moves to the next fragment, returning whether it existed.
468 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100469 uint8_t* k = m_frag_k;
470 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100471
472 this->reset(nullptr);
473
474 if (!k || (*k != '/')) {
475 return false;
476 }
477 k++;
478
479 bool all_digits = true;
480 uint8_t* i = k;
481 while ((*k != '\x00') && (*k != '/')) {
482 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
483 k++;
484 }
Nigel Taob48ee752020-03-13 09:27:33 +1100485 m_frag_i = i;
486 m_frag_j = i;
487 m_frag_k = k;
488 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100489 if (all_digits) {
490 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000491 m_array_index = wuffs_base__parse_number_u64(
492 wuffs_base__make_slice_u8(i, k - i),
493 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100494 }
495 return true;
496 }
497
Nigel Taob48ee752020-03-13 09:27:33 +1100498 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100499
Nigel Taob48ee752020-03-13 09:27:33 +1100500 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100501
502 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100503 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100504 return;
505 }
Nigel Taob48ee752020-03-13 09:27:33 +1100506 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100507 while (true) {
508 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100509 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100510 return;
511 }
512
513 if (*j == '\x00') {
514 break;
515
516 } else if (*j == '~') {
517 j++;
518 if (*j == '0') {
519 if (*ptr != '~') {
520 break;
521 }
522 } else if (*j == '1') {
523 if (*ptr != '/') {
524 break;
525 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100526 } else if (*j == 'n') {
527 if (*ptr != '\n') {
528 break;
529 }
530 } else if (*j == 'r') {
531 if (*ptr != '\r') {
532 break;
533 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100534 } else {
535 break;
536 }
537
538 } else if (*j != *ptr) {
539 break;
540 }
541
542 j++;
543 ptr++;
544 len--;
545 }
Nigel Taob48ee752020-03-13 09:27:33 +1100546 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100547 }
548
549 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100550 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100551 return;
552 }
553 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
554 size_t n = wuffs_base__utf_8__encode(
555 wuffs_base__make_slice_u8(&u[0],
556 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
557 code_point);
558 if (n > 0) {
559 this->incremental_match_slice(&u[0], n);
560 }
561 }
562
563 // validate returns whether the (ptr, len) arguments form a valid JSON
564 // Pointer. In particular, it must be valid UTF-8, and either be empty or
565 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100566 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
567 // followed by either 'n' or 'r'.
568 static bool validate(char* query_c_string,
569 size_t length,
570 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100571 if (length <= 0) {
572 return true;
573 }
574 if (query_c_string[0] != '/') {
575 return false;
576 }
577 wuffs_base__slice_u8 s =
578 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
579 bool previous_was_tilde = false;
580 while (s.len > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +1000581 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s.ptr, s.len);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100582 if (!o.is_valid()) {
583 return false;
584 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100585
586 if (previous_was_tilde) {
587 switch (o.code_point) {
588 case '0':
589 case '1':
590 break;
591 case 'n':
592 case 'r':
593 if (strict_json_pointer_syntax) {
594 return false;
595 }
596 break;
597 default:
598 return false;
599 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100600 }
601 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100602
Nigel Tao0cd2f982020-03-03 23:03:02 +1100603 s.ptr += o.byte_length;
604 s.len -= o.byte_length;
605 }
606 return !previous_was_tilde;
607 }
Nigel Taod60815c2020-03-26 14:32:35 +1100608} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100609
610// ----
611
Nigel Tao168f60a2020-07-14 13:19:33 +1000612enum class file_format {
613 json,
614 cbor,
615};
616
Nigel Tao68920952020-03-03 11:25:18 +1100617struct {
618 int remaining_argc;
619 char** remaining_argv;
620
Nigel Tao3690e832020-03-12 16:52:26 +1100621 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100622 bool fail_if_unsandboxed;
Nigel Tao4e193592020-07-15 12:48:57 +1000623 file_format input_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000624 bool input_allow_json_comments;
625 bool input_allow_json_extra_comma;
Nigel Tao51a38292020-07-19 22:43:17 +1000626 bool input_allow_json_inf_nan_numbers;
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100627 uint32_t max_output_depth;
Nigel Tao168f60a2020-07-14 13:19:33 +1000628 file_format output_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000629 bool output_cbor_metadata_as_json_comments;
Nigel Taoc766bb72020-07-09 12:59:32 +1000630 bool output_json_extra_comma;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100631 char* query_c_string;
Nigel Taoecadf722020-07-13 08:22:34 +1000632 size_t spaces;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100633 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100634 bool tabs;
Nigel Taod60815c2020-03-26 14:32:35 +1100635} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100636
637const char* //
638parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000639 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100640 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100641
642 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
643 for (; c < argc; c++) {
644 char* arg = argv[c];
645 if (*arg++ != '-') {
646 break;
647 }
648
649 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
650 // cases, a bare "-" is not a flag (some programs may interpret it as
651 // stdin) and a bare "--" means to stop parsing flags.
652 if (*arg == '\x00') {
653 break;
654 } else if (*arg == '-') {
655 arg++;
656 if (*arg == '\x00') {
657 c++;
658 break;
659 }
660 }
661
Nigel Tao3690e832020-03-12 16:52:26 +1100662 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100663 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100664 continue;
665 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100666 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
667 g_flags.max_output_depth = 1;
668 continue;
669 } else if (!strncmp(arg, "d=", 2) ||
670 !strncmp(arg, "max-output-depth=", 16)) {
671 while (*arg++ != '=') {
672 }
673 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000674 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
675 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000676 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100677 g_flags.max_output_depth = (uint32_t)(u.value);
678 continue;
679 }
680 return g_usage;
681 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100682 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100683 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100684 continue;
685 }
Nigel Tao4e193592020-07-15 12:48:57 +1000686 if (!strcmp(arg, "i=cbor") || !strcmp(arg, "input-format=cbor")) {
687 g_flags.input_format = file_format::cbor;
688 continue;
689 }
690 if (!strcmp(arg, "i=json") || !strcmp(arg, "input-format=json")) {
691 g_flags.input_format = file_format::json;
692 continue;
693 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000694 if (!strcmp(arg, "input-allow-json-comments")) {
695 g_flags.input_allow_json_comments = true;
696 continue;
697 }
698 if (!strcmp(arg, "input-allow-json-extra-comma")) {
699 g_flags.input_allow_json_extra_comma = true;
Nigel Taoc766bb72020-07-09 12:59:32 +1000700 continue;
701 }
Nigel Tao51a38292020-07-19 22:43:17 +1000702 if (!strcmp(arg, "input-allow-json-inf-nan-numbers")) {
703 g_flags.input_allow_json_inf_nan_numbers = true;
704 continue;
705 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000706 if (!strcmp(arg, "o=cbor") || !strcmp(arg, "output-format=cbor")) {
707 g_flags.output_format = file_format::cbor;
708 continue;
709 }
710 if (!strcmp(arg, "o=json") || !strcmp(arg, "output-format=json")) {
711 g_flags.output_format = file_format::json;
712 continue;
713 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000714 if (!strcmp(arg, "output-cbor-metadata-as-json-comments")) {
715 g_flags.output_cbor_metadata_as_json_comments = true;
716 continue;
717 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000718 if (!strcmp(arg, "output-json-extra-comma")) {
719 g_flags.output_json_extra_comma = true;
720 continue;
721 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100722 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
723 while (*arg++ != '=') {
724 }
Nigel Taod60815c2020-03-26 14:32:35 +1100725 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100726 continue;
727 }
Nigel Taoecadf722020-07-13 08:22:34 +1000728 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
729 while (*arg++ != '=') {
730 }
731 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
732 g_flags.spaces = arg[0] - '0';
733 continue;
734 }
735 return g_usage;
736 }
737 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100738 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100739 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100740 }
741 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100742 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100743 continue;
744 }
745
Nigel Taod60815c2020-03-26 14:32:35 +1100746 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100747 }
748
Nigel Taod60815c2020-03-26 14:32:35 +1100749 if (g_flags.query_c_string &&
750 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
751 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100752 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
753 }
754
Nigel Taod60815c2020-03-26 14:32:35 +1100755 g_flags.remaining_argc = argc - c;
756 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100757 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100758}
759
Nigel Tao2cf76db2020-02-27 22:42:01 +1100760const char* //
761initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100762 g_dst = wuffs_base__make_io_buffer(
763 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100764 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100765
Nigel Taod60815c2020-03-26 14:32:35 +1100766 g_src = wuffs_base__make_io_buffer(
767 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100768 wuffs_base__empty_io_buffer_meta());
769
Nigel Taod60815c2020-03-26 14:32:35 +1100770 g_tok = wuffs_base__make_token_buffer(
771 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100772 wuffs_base__empty_token_buffer_meta());
773
Nigel Taod60815c2020-03-26 14:32:35 +1100774 g_curr_token_end_src_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100775
Nigel Tao850dc182020-07-21 22:52:04 +1000776 g_token_extension.category = 0;
777 g_token_extension.detail = 0;
778
Nigel Taod60815c2020-03-26 14:32:35 +1100779 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100780
Nigel Taod60815c2020-03-26 14:32:35 +1100781 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100782
Nigel Tao68920952020-03-03 11:25:18 +1100783 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100784 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100785 return "main: unsandboxed";
786 }
Nigel Tao01abc842020-03-06 21:42:33 +1100787 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100788 if (g_flags.remaining_argc >
789 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
790 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100791 }
792
Nigel Taod60815c2020-03-26 14:32:35 +1100793 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100794
795 // If the query is non-empty, suprress writing to stdout until we've
796 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100797 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
798 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100799
Nigel Tao4e193592020-07-15 12:48:57 +1000800 if (g_flags.input_format == file_format::json) {
801 TRY(g_json_decoder
802 .initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
803 .message());
804 g_dec = g_json_decoder.upcast_as__wuffs_base__token_decoder();
805 } else {
806 TRY(g_cbor_decoder
807 .initialize(sizeof__wuffs_cbor__decoder(), WUFFS_VERSION, 0)
808 .message());
809 g_dec = g_cbor_decoder.upcast_as__wuffs_base__token_decoder();
810 }
Nigel Tao4b186b02020-03-18 14:25:21 +1100811
Nigel Tao3c8589b2020-07-19 21:49:00 +1000812 if (g_flags.input_allow_json_comments) {
813 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
814 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
815 }
816 if (g_flags.input_allow_json_extra_comma) {
Nigel Tao4e193592020-07-15 12:48:57 +1000817 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000818 }
Nigel Tao51a38292020-07-19 22:43:17 +1000819 if (g_flags.input_allow_json_inf_nan_numbers) {
820 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_INF_NAN_NUMBERS, true);
821 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000822
Nigel Tao4b186b02020-03-18 14:25:21 +1100823 // Consume an optional whitespace trailer. This isn't part of the JSON spec,
824 // but it works better with line oriented Unix tools (such as "echo 123 |
825 // jsonptr" where it's "echo", not "echo -n") or hand-edited JSON files which
826 // can accidentally contain trailing whitespace.
Nigel Tao4e193592020-07-15 12:48:57 +1000827 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100828
829 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100830}
Nigel Tao1b073492020-02-16 22:11:36 +1100831
832// ----
833
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100834// ignore_return_value suppresses errors from -Wall -Werror.
835static void //
836ignore_return_value(int ignored) {}
837
Nigel Tao2914bae2020-02-26 09:40:30 +1100838const char* //
839read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +1100840 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +1100841 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +1100842 }
Nigel Taod60815c2020-03-26 14:32:35 +1100843 g_src.compact();
844 if (g_src.meta.wi >= g_src.data.len) {
845 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100846 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100847 while (true) {
Nigel Taod60815c2020-03-26 14:32:35 +1100848 ssize_t n = read(g_input_file_descriptor, g_src.data.ptr + g_src.meta.wi,
849 g_src.data.len - g_src.meta.wi);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100850 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100851 g_src.meta.wi += n;
852 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100853 break;
854 } else if (errno != EINTR) {
855 return strerror(errno);
856 }
Nigel Tao1b073492020-02-16 22:11:36 +1100857 }
858 return nullptr;
859}
860
Nigel Tao2914bae2020-02-26 09:40:30 +1100861const char* //
862flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100863 while (true) {
Nigel Taod60815c2020-03-26 14:32:35 +1100864 size_t n = g_dst.meta.wi - g_dst.meta.ri;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100865 if (n == 0) {
866 break;
Nigel Tao1b073492020-02-16 22:11:36 +1100867 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100868 const int stdout_fd = 1;
Nigel Taod60815c2020-03-26 14:32:35 +1100869 ssize_t i = write(stdout_fd, g_dst.data.ptr + g_dst.meta.ri, n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100870 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100871 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100872 } else if (errno != EINTR) {
873 return strerror(errno);
874 }
Nigel Tao1b073492020-02-16 22:11:36 +1100875 }
Nigel Taod60815c2020-03-26 14:32:35 +1100876 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +1100877 return nullptr;
878}
879
Nigel Tao2914bae2020-02-26 09:40:30 +1100880const char* //
881write_dst(const void* s, size_t n) {
Nigel Taod60815c2020-03-26 14:32:35 +1100882 if (g_suppress_write_dst > 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100883 return nullptr;
884 }
Nigel Tao1b073492020-02-16 22:11:36 +1100885 const uint8_t* p = static_cast<const uint8_t*>(s);
886 while (n > 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100887 size_t i = g_dst.writer_available();
Nigel Tao1b073492020-02-16 22:11:36 +1100888 if (i == 0) {
889 const char* z = flush_dst();
890 if (z) {
891 return z;
892 }
Nigel Taod60815c2020-03-26 14:32:35 +1100893 i = g_dst.writer_available();
Nigel Tao1b073492020-02-16 22:11:36 +1100894 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100895 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100896 }
897 }
898
899 if (i > n) {
900 i = n;
901 }
Nigel Taod60815c2020-03-26 14:32:35 +1100902 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
903 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +1100904 p += i;
905 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +1100906 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +1100907 }
908 return nullptr;
909}
910
911// ----
912
Nigel Tao168f60a2020-07-14 13:19:33 +1000913const char* //
914write_literal(uint64_t vbd) {
915 const char* ptr = nullptr;
916 size_t len = 0;
917 if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__UNDEFINED) {
918 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +1000919 // JSON's closest approximation to "undefined" is "null".
920 if (g_flags.output_cbor_metadata_as_json_comments) {
921 ptr = "/*cbor:undefined*/null";
922 len = 22;
923 } else {
924 ptr = "null";
925 len = 4;
926 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000927 } else {
928 ptr = "\xF7";
929 len = 1;
930 }
931 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__NULL) {
932 if (g_flags.output_format == file_format::json) {
933 ptr = "null";
934 len = 4;
935 } else {
936 ptr = "\xF6";
937 len = 1;
938 }
939 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__FALSE) {
940 if (g_flags.output_format == file_format::json) {
941 ptr = "false";
942 len = 5;
943 } else {
944 ptr = "\xF4";
945 len = 1;
946 }
947 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__TRUE) {
948 if (g_flags.output_format == file_format::json) {
949 ptr = "true";
950 len = 4;
951 } else {
952 ptr = "\xF5";
953 len = 1;
954 }
955 } else {
956 return "main: internal error: unexpected write_literal argument";
957 }
958 return write_dst(ptr, len);
959}
960
961// ----
962
963const char* //
Nigel Tao664f8432020-07-16 21:25:14 +1000964write_number_as_cbor_f64(double f) {
Nigel Tao168f60a2020-07-14 13:19:33 +1000965 uint8_t buf[9];
966 wuffs_base__lossy_value_u16 lv16 =
967 wuffs_base__ieee_754_bit_representation__from_f64_to_u16_truncate(f);
968 if (!lv16.lossy) {
969 buf[0] = 0xF9;
970 wuffs_base__store_u16be__no_bounds_check(&buf[1], lv16.value);
971 return write_dst(&buf[0], 3);
972 }
973 wuffs_base__lossy_value_u32 lv32 =
974 wuffs_base__ieee_754_bit_representation__from_f64_to_u32_truncate(f);
975 if (!lv32.lossy) {
976 buf[0] = 0xFA;
977 wuffs_base__store_u32be__no_bounds_check(&buf[1], lv32.value);
978 return write_dst(&buf[0], 5);
979 }
980 buf[0] = 0xFB;
981 wuffs_base__store_u64be__no_bounds_check(
982 &buf[1], wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f));
983 return write_dst(&buf[0], 9);
984}
985
986const char* //
Nigel Tao664f8432020-07-16 21:25:14 +1000987write_number_as_cbor_u64(uint8_t base, uint64_t u) {
Nigel Tao168f60a2020-07-14 13:19:33 +1000988 uint8_t buf[9];
989 if (u < 0x18) {
990 buf[0] = base | ((uint8_t)u);
991 return write_dst(&buf[0], 1);
992 } else if ((u >> 8) == 0) {
993 buf[0] = base | 0x18;
994 buf[1] = ((uint8_t)u);
995 return write_dst(&buf[0], 2);
996 } else if ((u >> 16) == 0) {
997 buf[0] = base | 0x19;
998 wuffs_base__store_u16be__no_bounds_check(&buf[1], ((uint16_t)u));
999 return write_dst(&buf[0], 3);
1000 } else if ((u >> 32) == 0) {
1001 buf[0] = base | 0x1A;
1002 wuffs_base__store_u32be__no_bounds_check(&buf[1], ((uint32_t)u));
1003 return write_dst(&buf[0], 5);
1004 }
1005 buf[0] = base | 0x1B;
1006 wuffs_base__store_u64be__no_bounds_check(&buf[1], u);
1007 return write_dst(&buf[0], 9);
1008}
1009
1010const char* //
Nigel Tao850dc182020-07-21 22:52:04 +10001011write_cbor_minus_1_minus_x(uint8_t* ptr, size_t len) {
1012 if (len != 9) {
1013 return "main: internal error: invalid ETC__MINUS_1_MINUS_X token length";
Nigel Tao664f8432020-07-16 21:25:14 +10001014 }
Nigel Tao850dc182020-07-21 22:52:04 +10001015 uint64_t u = 1 + wuffs_base__load_u64be__no_bounds_check(ptr + 1);
1016 if (u == 0) {
1017 // See the cbor.TOKEN_VALUE_MINOR__MINUS_1_MINUS_X comment re overflow.
1018 return write_dst("-18446744073709551616", 21);
Nigel Tao664f8432020-07-16 21:25:14 +10001019 }
1020 uint8_t buf[1 + WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1021 uint8_t* b = &buf[0];
Nigel Tao850dc182020-07-21 22:52:04 +10001022 *b++ = '-';
Nigel Tao664f8432020-07-16 21:25:14 +10001023 size_t n = wuffs_base__render_number_u64(
1024 wuffs_base__make_slice_u8(b, WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL), u,
1025 WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao850dc182020-07-21 22:52:04 +10001026 return write_dst(&buf[0], 1 + n);
Nigel Tao664f8432020-07-16 21:25:14 +10001027}
1028
1029const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001030write_number(uint64_t vbd, uint8_t* ptr, size_t len) {
Nigel Tao4e193592020-07-15 12:48:57 +10001031 if (g_flags.output_format == file_format::json) {
Nigel Tao51a38292020-07-19 22:43:17 +10001032 if (g_flags.input_format == file_format::json) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001033 return write_dst(ptr, len);
1034 }
1035
Nigel Tao4e193592020-07-15 12:48:57 +10001036 // From here on, (g_flags.output_format == file_format::cbor).
Nigel Tao4e193592020-07-15 12:48:57 +10001037 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_TEXT) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001038 // First try to parse (ptr, len) as an integer. Something like
1039 // "1180591620717411303424" is a valid number (in the JSON sense) but will
1040 // overflow int64_t or uint64_t, so fall back to parsing it as a float64.
1041 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_INTEGER_SIGNED) {
1042 if ((len > 0) && (ptr[0] == '-')) {
1043 wuffs_base__result_i64 ri = wuffs_base__parse_number_i64(
1044 wuffs_base__make_slice_u8(ptr, len),
1045 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1046 if (ri.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001047 return write_number_as_cbor_u64(0x20, ~ri.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001048 }
1049 } else {
1050 wuffs_base__result_u64 ru = wuffs_base__parse_number_u64(
1051 wuffs_base__make_slice_u8(ptr, len),
1052 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1053 if (ru.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001054 return write_number_as_cbor_u64(0x00, ru.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001055 }
1056 }
1057 }
1058
1059 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT) {
1060 wuffs_base__result_f64 rf = wuffs_base__parse_number_f64(
1061 wuffs_base__make_slice_u8(ptr, len),
1062 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1063 if (rf.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001064 return write_number_as_cbor_f64(rf.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001065 }
1066 }
Nigel Tao51a38292020-07-19 22:43:17 +10001067 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_INF) {
1068 return write_dst("\xF9\xFC\x00", 3);
1069 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_INF) {
1070 return write_dst("\xF9\x7C\x00", 3);
1071 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_NAN) {
1072 return write_dst("\xF9\xFF\xFF", 3);
1073 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_NAN) {
1074 return write_dst("\xF9\x7F\xFF", 3);
Nigel Tao168f60a2020-07-14 13:19:33 +10001075 }
1076
Nigel Tao4e193592020-07-15 12:48:57 +10001077fail:
Nigel Tao168f60a2020-07-14 13:19:33 +10001078 return "main: internal error: unexpected write_number argument";
1079}
1080
Nigel Tao4e193592020-07-15 12:48:57 +10001081const char* //
Nigel Taoc9d4e342020-07-21 15:20:34 +10001082write_inline_integer(uint64_t x, bool x_is_signed, uint8_t* ptr, size_t len) {
Nigel Tao4e193592020-07-15 12:48:57 +10001083 if (g_flags.output_format == file_format::cbor) {
1084 return write_dst(ptr, len);
1085 }
1086
Nigel Taoc9d4e342020-07-21 15:20:34 +10001087 // Adding the two ETC__BYTE_LENGTH__ETC constants is overkill, but it's
1088 // simpler (for producing a constant-expression array size) than taking the
1089 // maximum of the two.
1090 uint8_t buf[WUFFS_BASE__I64__BYTE_LENGTH__MAX_INCL +
1091 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1092 wuffs_base__slice_u8 dst = wuffs_base__make_slice_u8(&buf[0], sizeof buf);
1093 size_t n =
1094 x_is_signed
1095 ? wuffs_base__render_number_i64(
1096 dst, (int64_t)x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS)
1097 : wuffs_base__render_number_u64(
1098 dst, x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao4e193592020-07-15 12:48:57 +10001099 return write_dst(&buf[0], n);
1100}
1101
Nigel Tao168f60a2020-07-14 13:19:33 +10001102// ----
1103
Nigel Tao2914bae2020-02-26 09:40:30 +11001104uint8_t //
1105hex_digit(uint8_t nibble) {
Nigel Taob5461bd2020-02-21 14:13:37 +11001106 nibble &= 0x0F;
1107 if (nibble <= 9) {
1108 return '0' + nibble;
1109 }
1110 return ('A' - 10) + nibble;
1111}
1112
Nigel Tao2914bae2020-02-26 09:40:30 +11001113const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001114flush_cbor_output_string() {
1115 uint8_t prefix[3];
1116 prefix[0] = g_cbor_output_string_is_utf_8 ? 0x60 : 0x40;
1117 if (g_cbor_output_string_length < 0x18) {
1118 prefix[0] |= g_cbor_output_string_length;
1119 TRY(write_dst(&prefix[0], 1));
1120 } else if (g_cbor_output_string_length <= 0xFF) {
1121 prefix[0] |= 0x18;
1122 prefix[1] = g_cbor_output_string_length;
1123 TRY(write_dst(&prefix[0], 2));
1124 } else if (g_cbor_output_string_length <= 0xFFFF) {
1125 prefix[0] |= 0x19;
1126 prefix[1] = g_cbor_output_string_length >> 8;
1127 prefix[2] = g_cbor_output_string_length;
1128 TRY(write_dst(&prefix[0], 3));
1129 } else {
1130 return "main: internal error: CBOR string output is too long";
1131 }
1132
1133 size_t n = g_cbor_output_string_length;
1134 g_cbor_output_string_length = 0;
1135 return write_dst(&g_cbor_output_string_array[0], n);
1136}
1137
1138const char* //
1139write_cbor_output_string(uint8_t* ptr, size_t len, bool finish) {
1140 // Check that g_cbor_output_string_array can hold any UTF-8 code point.
1141 if (CBOR_OUTPUT_STRING_ARRAY_SIZE < 4) {
1142 return "main: internal error: CBOR_OUTPUT_STRING_ARRAY_SIZE is too short";
1143 }
1144
1145 while (len > 0) {
1146 size_t available =
1147 CBOR_OUTPUT_STRING_ARRAY_SIZE - g_cbor_output_string_length;
1148 if (available >= len) {
1149 memcpy(&g_cbor_output_string_array[g_cbor_output_string_length], ptr,
1150 len);
1151 g_cbor_output_string_length += len;
1152 ptr += len;
1153 len = 0;
1154 break;
1155
1156 } else if (available > 0) {
1157 if (!g_cbor_output_string_is_multiple_chunks) {
1158 g_cbor_output_string_is_multiple_chunks = true;
1159 TRY(write_dst(g_cbor_output_string_is_utf_8 ? "\x7F" : "\x5F", 1));
Nigel Tao3b486982020-02-27 15:05:59 +11001160 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001161
1162 if (g_cbor_output_string_is_utf_8) {
1163 // Walk the end backwards to a UTF-8 boundary, so that each chunk of
1164 // the multi-chunk string is also valid UTF-8.
1165 while (available > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +10001166 wuffs_base__utf_8__next__output o =
1167 wuffs_base__utf_8__next_from_end(ptr, available);
Nigel Tao168f60a2020-07-14 13:19:33 +10001168 if ((o.code_point != WUFFS_BASE__UNICODE_REPLACEMENT_CHARACTER) ||
1169 (o.byte_length != 1)) {
1170 break;
1171 }
1172 available--;
1173 }
1174 }
1175
1176 memcpy(&g_cbor_output_string_array[g_cbor_output_string_length], ptr,
1177 available);
1178 g_cbor_output_string_length += available;
1179 ptr += available;
1180 len -= available;
Nigel Tao3b486982020-02-27 15:05:59 +11001181 }
1182
Nigel Tao168f60a2020-07-14 13:19:33 +10001183 TRY(flush_cbor_output_string());
1184 }
Nigel Taob9ad34f2020-03-03 12:44:01 +11001185
Nigel Tao168f60a2020-07-14 13:19:33 +10001186 if (finish) {
1187 TRY(flush_cbor_output_string());
1188 if (g_cbor_output_string_is_multiple_chunks) {
1189 TRY(write_dst("\xFF", 1));
1190 }
1191 }
1192 return nullptr;
1193}
Nigel Taob9ad34f2020-03-03 12:44:01 +11001194
Nigel Tao168f60a2020-07-14 13:19:33 +10001195const char* //
Nigel Tao7cb76542020-07-19 22:19:04 +10001196handle_unicode_code_point(uint32_t ucp) {
1197 if (g_flags.output_format == file_format::json) {
1198 if (ucp < 0x0020) {
1199 switch (ucp) {
1200 case '\b':
1201 return write_dst("\\b", 2);
1202 case '\f':
1203 return write_dst("\\f", 2);
1204 case '\n':
1205 return write_dst("\\n", 2);
1206 case '\r':
1207 return write_dst("\\r", 2);
1208 case '\t':
1209 return write_dst("\\t", 2);
1210 }
1211
1212 // Other bytes less than 0x0020 are valid UTF-8 but not valid in a
1213 // JSON string. They need to remain escaped.
1214 uint8_t esc6[6];
1215 esc6[0] = '\\';
1216 esc6[1] = 'u';
1217 esc6[2] = '0';
1218 esc6[3] = '0';
1219 esc6[4] = hex_digit(ucp >> 4);
1220 esc6[5] = hex_digit(ucp >> 0);
1221 return write_dst(&esc6[0], 6);
1222
1223 } else if (ucp == '\"') {
1224 return write_dst("\\\"", 2);
1225
1226 } else if (ucp == '\\') {
1227 return write_dst("\\\\", 2);
1228 }
1229 }
1230
1231 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1232 size_t n = wuffs_base__utf_8__encode(
1233 wuffs_base__make_slice_u8(&u[0],
1234 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1235 ucp);
1236 if (n == 0) {
1237 return "main: internal error: unexpected Unicode code point";
1238 }
1239
1240 if (g_flags.output_format == file_format::json) {
1241 return write_dst(&u[0], n);
1242 }
1243 return write_cbor_output_string(&u[0], n, false);
1244}
Nigel Taod191a3f2020-07-19 22:14:54 +10001245
1246const char* //
1247write_json_escaped_string(uint8_t* ptr, size_t len) {
1248restart:
1249 while (true) {
1250 size_t i;
1251 for (i = 0; i < len; i++) {
1252 uint8_t c = ptr[i];
1253 if ((c == '"') || (c == '\\') || (c < 0x20)) {
1254 TRY(write_dst(ptr, i));
1255 TRY(handle_unicode_code_point(c));
1256 ptr += i + 1;
1257 len -= i + 1;
1258 goto restart;
1259 }
1260 }
1261 TRY(write_dst(ptr, len));
1262 break;
1263 }
1264 return nullptr;
1265}
1266
1267const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001268handle_string(uint64_t vbd,
1269 uint64_t len,
1270 bool start_of_token_chain,
1271 bool continued) {
1272 if (start_of_token_chain) {
1273 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +10001274 if (g_flags.output_cbor_metadata_as_json_comments &&
1275 !(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
1276 TRY(write_dst("/*cbor:hex*/\"", 13));
1277 } else {
1278 TRY(write_dst("\"", 1));
1279 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001280 } else {
1281 g_cbor_output_string_length = 0;
1282 g_cbor_output_string_is_multiple_chunks = false;
1283 g_cbor_output_string_is_utf_8 =
1284 vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8;
1285 }
1286 g_query.restart_fragment(in_dict_before_key() && g_query.is_at(g_depth));
1287 }
1288
1289 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_0_DST_1_SRC_DROP) {
1290 // No-op.
1291 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
1292 uint8_t* ptr = g_src.data.ptr + g_curr_token_end_src_index - len;
1293 if (g_flags.output_format == file_format::json) {
Nigel Taoaf757722020-07-18 17:27:11 +10001294 if (g_flags.input_format == file_format::json) {
1295 TRY(write_dst(ptr, len));
1296 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8) {
Nigel Taod191a3f2020-07-19 22:14:54 +10001297 TRY(write_json_escaped_string(ptr, len));
Nigel Taoaf757722020-07-18 17:27:11 +10001298 } else {
1299 uint8_t as_hex[512];
1300 uint8_t* p = ptr;
1301 size_t n = len;
1302 while (n > 0) {
1303 wuffs_base__transform__output o = wuffs_base__base_16__encode2(
1304 wuffs_base__make_slice_u8(&as_hex[0], sizeof as_hex),
1305 wuffs_base__make_slice_u8(p, n), true,
1306 WUFFS_BASE__BASE_16__DEFAULT_OPTIONS);
1307 TRY(write_dst(&as_hex[0], o.num_dst));
1308 p += o.num_src;
1309 n -= o.num_src;
1310 if (!o.status.is_ok()) {
1311 return o.status.message();
1312 }
1313 }
1314 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001315 } else {
1316 TRY(write_cbor_output_string(ptr, len, false));
1317 }
1318 g_query.incremental_match_slice(ptr, len);
Nigel Taob9ad34f2020-03-03 12:44:01 +11001319 } else {
Nigel Tao168f60a2020-07-14 13:19:33 +10001320 return "main: internal error: unexpected string-token conversion";
1321 }
1322
1323 if (continued) {
1324 return nullptr;
1325 }
1326
1327 if (g_flags.output_format == file_format::json) {
1328 TRY(write_dst("\"", 1));
1329 } else {
1330 TRY(write_cbor_output_string(nullptr, 0, true));
1331 }
1332 return nullptr;
1333}
1334
Nigel Taod191a3f2020-07-19 22:14:54 +10001335// ----
1336
Nigel Tao3b486982020-02-27 15:05:59 +11001337const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001338handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001339 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001340 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001341 uint64_t vbd = t.value_base_detail();
1342 uint64_t len = t.length();
Nigel Tao1b073492020-02-16 22:11:36 +11001343
1344 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001345 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001346 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001347 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001348 return "main: no match for query";
1349 }
Nigel Taod60815c2020-03-26 14:32:35 +11001350 if (g_depth <= 0) {
1351 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001352 }
Nigel Taod60815c2020-03-26 14:32:35 +11001353 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001354
Nigel Taod60815c2020-03-26 14:32:35 +11001355 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1356 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001357 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao168f60a2020-07-14 13:19:33 +10001358 if (g_flags.output_format == file_format::json) {
1359 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1360 ? "\"[…]\""
1361 : "\"{…}\"",
1362 7));
1363 } else {
1364 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1365 ? "\x65[…]"
1366 : "\x65{…}",
1367 6));
1368 }
1369 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001370 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001371 if ((g_ctx != context::in_list_after_bracket) &&
1372 (g_ctx != context::in_dict_after_brace) &&
1373 !g_flags.compact_output) {
Nigel Taoc766bb72020-07-09 12:59:32 +10001374 if (g_flags.output_json_extra_comma) {
1375 TRY(write_dst(",\n", 2));
1376 } else {
1377 TRY(write_dst("\n", 1));
1378 }
Nigel Taod60815c2020-03-26 14:32:35 +11001379 for (uint32_t i = 0; i < g_depth; i++) {
1380 TRY(write_dst(
1381 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001382 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001383 }
Nigel Tao1b073492020-02-16 22:11:36 +11001384 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001385
1386 TRY(write_dst(
1387 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1388 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001389 } else {
1390 TRY(write_dst("\xFF", 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001391 }
1392
Nigel Taod60815c2020-03-26 14:32:35 +11001393 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1394 ? context::in_list_after_value
1395 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001396 goto after_value;
1397 }
1398
Nigel Taod1c928a2020-02-28 12:43:53 +11001399 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
1400 // continuation of a multi-token chain.
Nigel Tao2ef39992020-04-09 17:24:39 +10001401 if (start_of_token_chain) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001402 if (g_flags.output_format != file_format::json) {
1403 // No-op.
1404 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001405 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1406 } else if (g_ctx != context::none) {
1407 if ((g_ctx != context::in_list_after_bracket) &&
1408 (g_ctx != context::in_dict_after_brace)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001409 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001410 }
Nigel Taod60815c2020-03-26 14:32:35 +11001411 if (!g_flags.compact_output) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001412 TRY(write_dst("\n", 1));
Nigel Taod60815c2020-03-26 14:32:35 +11001413 for (size_t i = 0; i < g_depth; i++) {
1414 TRY(write_dst(
1415 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001416 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao0cd2f982020-03-03 23:03:02 +11001417 }
1418 }
1419 }
1420
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001421 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001422 if (g_query.is_at(g_depth)) {
1423 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001424 case context::in_list_after_bracket:
1425 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001426 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001427 break;
1428 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001429 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001430 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001431 default:
1432 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001433 }
1434 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001435 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001436 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001437 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001438 // There is no next fragment. We have matched the complete query, and
1439 // the upcoming JSON value is the result of that query.
1440 //
Nigel Taod60815c2020-03-26 14:32:35 +11001441 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1442 // we were about to decode a top-level value. This makes any subsequent
1443 // indentation be relative to this point, and we will return g_eod
1444 // after the upcoming JSON value is complete.
1445 if (g_suppress_write_dst != 1) {
1446 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001447 }
Nigel Taod60815c2020-03-26 14:32:35 +11001448 g_suppress_write_dst = 0;
1449 g_ctx = context::none;
1450 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001451 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1452 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1453 // The query has moved on to the next fragment but the upcoming JSON
1454 // value is not a container.
1455 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001456 }
1457 }
1458
1459 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001460 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001461 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001462 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001463 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1464 g_suppress_write_dst++;
Nigel Tao168f60a2020-07-14 13:19:33 +10001465 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001466 TRY(write_dst(
1467 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1468 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001469 } else {
1470 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1471 ? "\x9F"
1472 : "\xBF",
1473 1));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001474 }
Nigel Taod60815c2020-03-26 14:32:35 +11001475 g_depth++;
1476 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1477 ? context::in_list_after_bracket
1478 : context::in_dict_after_brace;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001479 return nullptr;
1480
Nigel Tao2cf76db2020-02-27 22:42:01 +11001481 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Tao168f60a2020-07-14 13:19:33 +10001482 TRY(handle_string(vbd, len, start_of_token_chain, t.continued()));
Nigel Tao496e88b2020-04-09 22:10:08 +10001483 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001484 return nullptr;
1485 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001486 goto after_value;
1487
1488 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001489 if (!t.continued()) {
1490 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001491 }
1492 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001493 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001494 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001495
Nigel Tao85fba7f2020-02-29 16:28:06 +11001496 case WUFFS_BASE__TOKEN__VBC__LITERAL:
Nigel Tao168f60a2020-07-14 13:19:33 +10001497 TRY(write_literal(vbd));
1498 goto after_value;
1499
Nigel Tao2cf76db2020-02-27 22:42:01 +11001500 case WUFFS_BASE__TOKEN__VBC__NUMBER:
Nigel Tao168f60a2020-07-14 13:19:33 +10001501 TRY(write_number(vbd, g_src.data.ptr + g_curr_token_end_src_index - len,
1502 len));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001503 goto after_value;
Nigel Tao4e193592020-07-15 12:48:57 +10001504
Nigel Taoc9d4e342020-07-21 15:20:34 +10001505 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1506 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED: {
1507 bool x_is_signed = vbc == WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED;
1508 uint64_t x = x_is_signed
1509 ? ((uint64_t)(t.value_base_detail__sign_extended()))
1510 : vbd;
Nigel Tao850dc182020-07-21 22:52:04 +10001511 if (t.continued()) {
Nigel Tao03a87ea2020-07-21 23:29:26 +10001512 if (len != 0) {
1513 return "main: internal error: unexpected to-be-extended length";
1514 }
Nigel Tao850dc182020-07-21 22:52:04 +10001515 g_token_extension.category = vbc;
1516 g_token_extension.detail = x;
1517 return nullptr;
1518 }
Nigel Tao4e193592020-07-15 12:48:57 +10001519 TRY(write_inline_integer(
Nigel Taoc9d4e342020-07-21 15:20:34 +10001520 x, x_is_signed, g_src.data.ptr + g_curr_token_end_src_index - len,
1521 len));
Nigel Tao4e193592020-07-15 12:48:57 +10001522 goto after_value;
Nigel Taoc9d4e342020-07-21 15:20:34 +10001523 }
Nigel Tao1b073492020-02-16 22:11:36 +11001524 }
1525
Nigel Tao850dc182020-07-21 22:52:04 +10001526 int64_t ext = t.value_extension();
1527 if (ext >= 0) {
1528 switch (g_token_extension.category) {
1529 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1530 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED:
1531 uint64_t x = (g_token_extension.detail
1532 << WUFFS_BASE__TOKEN__VALUE_EXTENSION__NUM_BITS) |
1533 ((uint64_t)ext);
1534 TRY(write_inline_integer(
1535 x,
1536 g_token_extension.category ==
1537 WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED,
1538 g_src.data.ptr + g_curr_token_end_src_index - len, len));
1539 g_token_extension.category = 0;
1540 g_token_extension.detail = 0;
1541 goto after_value;
1542 }
1543 }
1544
Nigel Tao664f8432020-07-16 21:25:14 +10001545 if (t.value_major() == WUFFS_CBOR__TOKEN_VALUE_MAJOR) {
1546 uint64_t value_minor = t.value_minor();
1547 if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__TAG) {
1548 // TODO: CBOR tags.
1549 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__MINUS_1_MINUS_X) {
Nigel Tao850dc182020-07-21 22:52:04 +10001550 TRY(write_cbor_minus_1_minus_x(
1551 g_src.data.ptr + g_curr_token_end_src_index - len, len));
Nigel Tao664f8432020-07-16 21:25:14 +10001552 goto after_value;
1553 }
1554 }
1555
1556 // Return an error if we didn't match the (value_major, value_minor) or
1557 // (vbc, vbd) pair.
Nigel Tao2cf76db2020-02-27 22:42:01 +11001558 return "main: internal error: unexpected token";
1559 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001560
Nigel Tao2cf76db2020-02-27 22:42:01 +11001561 // Book-keeping after completing a value (whether a container value or a
1562 // simple value). Empty parent containers are no longer empty. If the parent
1563 // container is a "{...}" object, toggle between keys and values.
1564after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001565 if (g_depth == 0) {
1566 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001567 }
Nigel Taod60815c2020-03-26 14:32:35 +11001568 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001569 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001570 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001571 break;
1572 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001573 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001574 break;
1575 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001576 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001577 break;
1578 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001579 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001580 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001581 default:
1582 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001583 }
1584 return nullptr;
1585}
1586
1587const char* //
1588main1(int argc, char** argv) {
1589 TRY(initialize_globals(argc, argv));
1590
Nigel Taocd183f92020-07-14 12:11:05 +10001591 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001592 while (true) {
Nigel Tao4e193592020-07-15 12:48:57 +10001593 wuffs_base__status status = g_dec->decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001594 &g_tok, &g_src,
1595 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001596
Nigel Taod60815c2020-03-26 14:32:35 +11001597 while (g_tok.meta.ri < g_tok.meta.wi) {
1598 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao2cf76db2020-02-27 22:42:01 +11001599 uint64_t n = t.length();
Nigel Taod60815c2020-03-26 14:32:35 +11001600 if ((g_src.meta.ri - g_curr_token_end_src_index) < n) {
1601 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001602 }
Nigel Taod60815c2020-03-26 14:32:35 +11001603 g_curr_token_end_src_index += n;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001604
Nigel Taod0b16cb2020-03-14 10:15:54 +11001605 // Skip filler tokens (e.g. whitespace).
Nigel Tao3c8589b2020-07-19 21:49:00 +10001606 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao496e88b2020-04-09 22:10:08 +10001607 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001608 continue;
1609 }
1610
Nigel Tao2ef39992020-04-09 17:24:39 +10001611 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao496e88b2020-04-09 22:10:08 +10001612 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001613 if (z == nullptr) {
1614 continue;
Nigel Taod60815c2020-03-26 14:32:35 +11001615 } else if (z == g_eod) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001616 goto end_of_data;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001617 }
1618 return z;
Nigel Tao1b073492020-02-16 22:11:36 +11001619 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001620
1621 if (status.repr == nullptr) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001622 return "main: internal error: unexpected end of token stream";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001623 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Taod60815c2020-03-26 14:32:35 +11001624 if (g_curr_token_end_src_index != g_src.meta.ri) {
1625 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001626 }
1627 TRY(read_src());
Nigel Taod60815c2020-03-26 14:32:35 +11001628 g_curr_token_end_src_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001629 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001630 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001631 } else {
1632 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001633 }
1634 }
Nigel Tao0cd2f982020-03-03 23:03:02 +11001635end_of_data:
1636
Nigel Taod60815c2020-03-26 14:32:35 +11001637 // With a non-empty g_query, don't try to consume trailing whitespace or
Nigel Tao0cd2f982020-03-03 23:03:02 +11001638 // confirm that we've processed all the tokens.
Nigel Taod60815c2020-03-26 14:32:35 +11001639 if (g_flags.query_c_string && *g_flags.query_c_string) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001640 return nullptr;
1641 }
Nigel Tao6b161af2020-02-24 11:01:48 +11001642
Nigel Tao6b161af2020-02-24 11:01:48 +11001643 // Check that we've exhausted the input.
Nigel Taod60815c2020-03-26 14:32:35 +11001644 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001645 TRY(read_src());
1646 }
Nigel Taod60815c2020-03-26 14:32:35 +11001647 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
Nigel Tao51a38292020-07-19 22:43:17 +10001648 return "main: valid JSON|CBOR followed by further (unexpected) data";
Nigel Tao6b161af2020-02-24 11:01:48 +11001649 }
1650
1651 // Check that we've used all of the decoded tokens, other than trailing
Nigel Tao4b186b02020-03-18 14:25:21 +11001652 // filler tokens. For example, "true\n" is valid JSON (and fully consumed
1653 // with WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE enabled) with a trailing
1654 // filler token for the "\n".
Nigel Taod60815c2020-03-26 14:32:35 +11001655 for (; g_tok.meta.ri < g_tok.meta.wi; g_tok.meta.ri++) {
1656 if (g_tok.data.ptr[g_tok.meta.ri].value_base_category() !=
Nigel Tao6b161af2020-02-24 11:01:48 +11001657 WUFFS_BASE__TOKEN__VBC__FILLER) {
1658 return "main: internal error: decoded OK but unprocessed tokens remain";
1659 }
1660 }
1661
1662 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001663}
1664
Nigel Tao2914bae2020-02-26 09:40:30 +11001665int //
1666compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001667 if (!status_msg) {
1668 return 0;
1669 }
Nigel Tao01abc842020-03-06 21:42:33 +11001670 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001671 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001672 n = strlen(status_msg);
1673 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001674 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001675 if (n >= 2047) {
1676 status_msg = "main: internal error: error message is too long";
1677 n = strnlen(status_msg, 2047);
1678 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001679 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001680 const int stderr_fd = 2;
1681 ignore_return_value(write(stderr_fd, status_msg, n));
1682 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Tao9cc2c252020-02-23 17:05:49 +11001683 // Return an exit code of 1 for regular (forseen) errors, e.g. badly
1684 // formatted or unsupported input.
1685 //
1686 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1687 // run-time checks found that an internal invariant did not hold.
1688 //
1689 // Automated testing, including badly formatted inputs, can therefore
1690 // discriminate between expected failure (exit code 1) and unexpected failure
1691 // (other non-zero exit codes). Specifically, exit code 2 for internal
1692 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1693 // linux) for a segmentation fault (e.g. null pointer dereference).
1694 return strstr(status_msg, "internal error:") ? 2 : 1;
1695}
1696
Nigel Tao2914bae2020-02-26 09:40:30 +11001697int //
1698main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001699 // Look for an input filename (the first non-flag argument) in argv. If there
1700 // is one, open it (but do not read from it) before we self-impose a sandbox.
1701 //
1702 // Flags start with "-", unless it comes after a bare "--" arg.
1703 {
1704 bool dash_dash = false;
1705 int a;
1706 for (a = 1; a < argc; a++) {
1707 char* arg = argv[a];
1708 if ((arg[0] == '-') && !dash_dash) {
1709 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1710 continue;
1711 }
Nigel Taod60815c2020-03-26 14:32:35 +11001712 g_input_file_descriptor = open(arg, O_RDONLY);
1713 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001714 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1715 return 1;
1716 }
1717 break;
1718 }
1719 }
1720
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001721#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1722 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001723 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001724#endif
1725
Nigel Tao0cd2f982020-03-03 23:03:02 +11001726 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001727 if (g_wrote_to_dst) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001728 const char* z1 = (g_flags.output_format == file_format::json)
1729 ? write_dst("\n", 1)
1730 : nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001731 const char* z2 = flush_dst();
1732 z = z ? z : (z1 ? z1 : z2);
1733 }
1734 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001735
1736#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1737 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1738 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1739 // only SYS_exit.
1740 syscall(SYS_exit, exit_code);
1741#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001742 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001743}