blob: 28dbf9bebfa8e04642fa4368ca7483d6b1d41935 [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
17/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110018jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao168f60a2020-07-14 13:19:33 +100019(RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin and writes CBOR
20or canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110021
Nigel Taod60815c2020-03-26 14:32:35 +110022See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110023
24----
25
26JSON Pointer (and this program's implementation) is one of many JSON query
27languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
28simple and fewer-featured compared to those others.
29
Nigel Tao168f60a2020-07-14 13:19:33 +100030One benefit of simplicity is that this program's CBOR, JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110031implementations do not dynamically allocate or free memory (yet it does not
32require that the entire input fits in memory at once). They are therefore
33trivially protected against certain bug classes: memory leaks, double-frees and
34use-after-frees.
35
Nigel Tao168f60a2020-07-14 13:19:33 +100036The CBOR and JSON implementations are also written in the Wuffs programming
37language (and then transpiled to C/C++), which is memory-safe (e.g. array
38indexing is bounds-checked) but also prevents integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110039
Nigel Taofe0cbbd2020-03-05 22:01:30 +110040For defense in depth, on Linux, this program also self-imposes a
41SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
42or writing its output. Under this sandbox, the only permitted system calls are
43read, write, exit and sigreturn.
44
Nigel Tao168f60a2020-07-14 13:19:33 +100045All together, this program aims to safely handle untrusted CBOR or JSON files
46without fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110047
48----
Nigel Tao1b073492020-02-16 22:11:36 +110049
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110050As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
51JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
52"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
53was first published on 2016-10-26 and updated on 2018-03-30.
54
Nigel Tao0cd2f982020-03-03 23:03:02 +110055After modifying this program, run "build-example.sh example/jsonptr/" and then
56"script/run-json-test-suite.sh" to catch correctness regressions.
57
58----
59
Nigel Taod0b16cb2020-03-14 10:15:54 +110060This program uses Wuffs' JSON decoder at a relatively low level, processing the
61decoder's token-stream output individually. The core loop, in pseudo-code, is
62"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110063changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110064output text based on that state and the token's source text. Notably,
65handle_token is not recursive, even though JSON values can nest.
66
67This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
68string, object) comprises one or more JSON tokens.
69
70An alternative, higher-level approach is in the sibling example/jsonfindptrs
71program. Neither approach is better or worse per se, but when studying this
72program, be aware that there are multiple ways to use Wuffs' JSON decoder.
73
74The two programs, jsonfindptrs and jsonptr, also demonstrate different
75trade-offs with regard to JSON object duplicate keys. The JSON spec permits
76different implementations to allow or reject duplicate keys. It is not always
77clear which approach is safer. Rejecting them is certainly unambiguous, and
78security bugs can lurk in ambiguous corners of a file format, if two different
79implementations both silently accept a file but differ on how to interpret it.
80On the other hand, in the worst case, detecting duplicate keys requires O(N)
81memory, where N is the size of the (potentially untrusted) input.
82
83This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
84mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
85it runs in a SECCOMP_MODE_STRICT sandbox.
86
87----
88
Nigel Tao1b073492020-02-16 22:11:36 +110089This example program differs from most other example Wuffs programs in that it
90is written in C++, not C.
91
92$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
93
94for a C++ compiler $CXX, such as clang++ or g++.
95*/
96
Nigel Tao721190a2020-04-03 22:25:21 +110097#if defined(__cplusplus) && (__cplusplus < 201103L)
98#error "This C++ program requires -std=c++11 or later"
99#endif
100
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100101#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100102#include <fcntl.h>
103#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100104#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100105#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100106
107// Wuffs ships as a "single file C library" or "header file library" as per
108// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
109//
110// To use that single file as a "foo.c"-like implementation, instead of a
111// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
112// compiling it.
113#define WUFFS_IMPLEMENTATION
114
115// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
116// release/c/etc.c whitelist which parts of Wuffs to build. That file contains
117// the entire Wuffs standard library, implementing a variety of codecs and file
118// formats. Without this macro definition, an optimizing compiler or linker may
119// very well discard Wuffs code for unused codecs, but listing the Wuffs
120// modules we use makes that process explicit. Preprocessing means that such
121// code simply isn't compiled.
122#define WUFFS_CONFIG__MODULES
123#define WUFFS_CONFIG__MODULE__BASE
Nigel Tao4e193592020-07-15 12:48:57 +1000124#define WUFFS_CONFIG__MODULE__CBOR
Nigel Tao1b073492020-02-16 22:11:36 +1100125#define WUFFS_CONFIG__MODULE__JSON
126
127// If building this program in an environment that doesn't easily accommodate
128// relative includes, you can use the script/inline-c-relative-includes.go
129// program to generate a stand-alone C++ file.
130#include "../../release/c/wuffs-unsupported-snapshot.c"
131
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100132#if defined(__linux__)
133#include <linux/prctl.h>
134#include <linux/seccomp.h>
135#include <sys/prctl.h>
136#include <sys/syscall.h>
137#define WUFFS_EXAMPLE_USE_SECCOMP
138#endif
139
Nigel Tao2cf76db2020-02-27 22:42:01 +1100140#define TRY(error_msg) \
141 do { \
142 const char* z = error_msg; \
143 if (z) { \
144 return z; \
145 } \
146 } while (false)
147
Nigel Taod60815c2020-03-26 14:32:35 +1100148static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100149
Nigel Taod60815c2020-03-26 14:32:35 +1100150static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100151 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100152 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100153 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100154 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100155 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao4e193592020-07-15 12:48:57 +1000156 " -i=FMT -input-format={json,cbor}\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000157 " -o=FMT -output-format={json,cbor}\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100158 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000159 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100160 " -t -tabs\n"
161 " -fail-if-unsandboxed\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000162 " -input-allow-json-comments\n"
163 " -input-allow-json-extra-comma\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000164 " -input-allow-json-inf-nan-numbers\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000165 " -output-cbor-metadata-as-json-comments\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000166 " -output-json-extra-comma\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000167 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100168 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100169 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100170 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100171 "----\n"
172 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100173 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000174 "Pointer (RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin\n"
175 "and writes CBOR or canonicalized, formatted UTF-8 JSON to stdout. The\n"
176 "input and output formats do not have to match, but conversion between\n"
177 "formats may be lossy.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100178 "\n"
179 "Canonicalized means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-written\n"
180 "as \"abc\\n\\txÅ·z\". It does not sort object keys, nor does it reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100181 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100182 "\n"
183 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000184 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000185 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags. Those\n"
186 "flags only apply to JSON (not CBOR) output.\n"
187 "\n"
188 "The -input-format and -output-format flags select between reading and\n"
189 "writing JSON (the default, a textual format) or CBOR (a binary format).\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100190 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000191 "The -input-allow-json-comments flag allows \"/*slash-star*/\" and\n"
192 "\"//slash-slash\" C-style comments within JSON input.\n"
193 "\n"
194 "The -input-allow-json-extra-comma flag allows input like \"[1,2,]\",\n"
195 "with a comma after the final element of a JSON list or dictionary.\n"
196 "\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000197 "The -input-allow-json-inf-nan-numbers flag allows non-finite floating\n"
198 "point numbers (infinities and not-a-numbers) within JSON input.\n"
199 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000200 "The -output-cbor-metadata-as-json-comments writes CBOR tags and other\n"
201 "metadata as /*comments*/, when -i=json and -o=cbor are also set. Such\n"
202 "comments are non-compliant with the JSON specification but many parsers\n"
203 "accept them.\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000204 "\n"
205 "The -output-json-extra-comma flag writes extra commas, regardless of\n"
206 "whether the input had it. Extra commas are non-compliant with the JSON\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000207 "specification but many parsers accept them and they can produce simpler\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000208 "line-based diffs. This flag is ignored when -compact-output is set.\n"
209 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100210 "----\n"
211 "\n"
212 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100213 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100214 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
215 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100216 "will print:\n"
217 " \"baz\"\n"
218 "\n"
219 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100220 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100221 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
222 "child (the value in a key-value pair) of the root whose key is the empty\n"
223 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100224 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000225 "If the query found a valid JSON|CBOR value, this program will return a\n"
226 "zero exit code even if the rest of the input isn't valid. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100227 "did not find a value, or found an invalid one, this program returns a\n"
228 "non-zero exit code, but may still print partial output to stdout.\n"
229 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000230 "The JSON and CBOR specifications (https://json.org/ or RFC 8259; RFC\n"
231 "7049) permit implementations to allow duplicate keys, as this one does.\n"
232 "This JSON Pointer implementation is also greedy, following the first\n"
233 "match for each fragment without back-tracking. For example, the\n"
234 "\"/foo/bar\" query will fail if the root object has multiple \"foo\"\n"
235 "children but the first one doesn't have a \"bar\" child, even if later\n"
236 "ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100237 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000238 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
239 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
240 "\"~\" and \"/\". Without this flag, this program also lets \"~n\" and\n"
241 "\"~r\" escape the New Line and Carriage Return ASCII control characters,\n"
242 "which can work better with line oriented Unix tools that assume exactly\n"
243 "one value (i.e. one JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100244 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100245 "----\n"
246 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100247 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000248 "output depth. JSON|CBOR containers ([] arrays and {} objects) can hold\n"
249 "other containers. When this flag is set, containers at depth NUM are\n"
250 "replaced with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is\n"
251 "equivalent to -d=1. The flag's absence means an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100252 "\n"
253 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000254 "affect whether or not the input is considered valid JSON|CBOR. The\n"
255 "format specifications permit implementations to set their own maximum\n"
256 "input depth. This JSON|CBOR implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100257 "\n"
258 "Depth is measured in terms of nested containers. It is unaffected by the\n"
259 "number of spaces or tabs used to indent.\n"
260 "\n"
261 "When both -max-output-depth and -query are set, the output depth is\n"
262 "measured from when the query resolves, not from the input root. The\n"
263 "input depth (measured from the root) is still limited to 1024.\n"
264 "\n"
265 "----\n"
266 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100267 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
268 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100269 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100270
Nigel Tao2cf76db2020-02-27 22:42:01 +1100271// ----
272
Nigel Taof3146c22020-03-26 08:47:42 +1100273// Wuffs allows either statically or dynamically allocated work buffers. This
274// program exercises static allocation.
275#define WORK_BUFFER_ARRAY_SIZE \
276 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
277#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100278uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100279#else
280// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100281uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100282#endif
283
Nigel Taod60815c2020-03-26 14:32:35 +1100284bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100285
Nigel Taod60815c2020-03-26 14:32:35 +1100286int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100287
Nigel Tao2cf76db2020-02-27 22:42:01 +1100288#define MAX_INDENT 8
Nigel Tao107f0ef2020-03-01 21:35:02 +1100289#define INDENT_SPACES_STRING " "
Nigel Tao6e7d1412020-03-06 09:21:35 +1100290#define INDENT_TAB_STRING "\t"
Nigel Tao107f0ef2020-03-01 21:35:02 +1100291
Nigel Taofdac24a2020-03-06 21:53:08 +1100292#ifndef DST_BUFFER_ARRAY_SIZE
293#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100294#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100295#ifndef SRC_BUFFER_ARRAY_SIZE
296#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100297#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100298#ifndef TOKEN_BUFFER_ARRAY_SIZE
299#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100300#endif
301
Nigel Taod60815c2020-03-26 14:32:35 +1100302uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
303uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
304wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100305
Nigel Taod60815c2020-03-26 14:32:35 +1100306wuffs_base__io_buffer g_dst;
307wuffs_base__io_buffer g_src;
308wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100309
Nigel Taod60815c2020-03-26 14:32:35 +1100310// g_curr_token_end_src_index is the g_src.data.ptr index of the end of the
311// current token. An invariant is that (g_curr_token_end_src_index <=
312// g_src.meta.ri).
313size_t g_curr_token_end_src_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100314
Nigel Taod60815c2020-03-26 14:32:35 +1100315uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100316
317enum class context {
318 none,
319 in_list_after_bracket,
320 in_list_after_value,
321 in_dict_after_brace,
322 in_dict_after_key,
323 in_dict_after_value,
Nigel Taod60815c2020-03-26 14:32:35 +1100324} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100325
Nigel Tao0cd2f982020-03-03 23:03:02 +1100326bool //
327in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100328 return (g_ctx == context::in_dict_after_brace) ||
329 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100330}
331
Nigel Taod60815c2020-03-26 14:32:35 +1100332uint32_t g_suppress_write_dst;
333bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100334
Nigel Tao4e193592020-07-15 12:48:57 +1000335wuffs_cbor__decoder g_cbor_decoder;
336wuffs_json__decoder g_json_decoder;
337wuffs_base__token_decoder* g_dec;
Nigel Tao1b073492020-02-16 22:11:36 +1100338
Nigel Tao168f60a2020-07-14 13:19:33 +1000339// cbor_output_string_array is a 4 KiB buffer. For -output-format=cbor, strings
340// whose length are 4096 or less are written as a single definite-length
341// string. Longer strings are written as an indefinite-length string containing
342// multiple definite-length chunks, each of length up to 4 KiB. See the CBOR
343// RFC (RFC 7049) section 2.2.2 "Indefinite-Length Byte Strings and Text
344// Strings". The output is determinate even when the input is streamed.
345//
346// If raising CBOR_OUTPUT_STRING_ARRAY_SIZE above 0xFFFF then you will also
347// have to update flush_cbor_output_string.
348#define CBOR_OUTPUT_STRING_ARRAY_SIZE 4096
349uint8_t g_cbor_output_string_array[CBOR_OUTPUT_STRING_ARRAY_SIZE];
350
351uint32_t g_cbor_output_string_length;
352bool g_cbor_output_string_is_multiple_chunks;
353bool g_cbor_output_string_is_utf_8;
354
Nigel Tao0cd2f982020-03-03 23:03:02 +1100355// ----
356
357// Query is a JSON Pointer query. After initializing with a NUL-terminated C
358// string, its multiple fragments are consumed as the program walks the JSON
359// data from stdin. For example, letting "$" denote a NUL, suppose that we
360// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100361// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100362//
363// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
364// / a p p l e / b a n a n a / 1 2 / d u r i a n $
365// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
366// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100367// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100368//
Nigel Taob48ee752020-03-13 09:27:33 +1100369// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
370// start (inclusive) and end (exclusive) of the query fragment. They satisfy
371// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
372// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100373//
Nigel Taob48ee752020-03-13 09:27:33 +1100374// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
375// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100376//
377// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
378// tokens, as backslash-escaped values within that JSON string may each get
379// their own token.
380//
Nigel Taob48ee752020-03-13 09:27:33 +1100381// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100382//
Nigel Taob48ee752020-03-13 09:27:33 +1100383// While mfj remains non-nullptr, each token's unescaped contents are then
384// compared to that part of the fragment from mfj to mfk. If it is a prefix
385// (including the case of an exact match), then mfj is advanced by the
386// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100387//
388// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
389// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100390// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
391// responsible for calling Query::validate (with a strict_json_pointer_syntax
392// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100393//
Nigel Taob48ee752020-03-13 09:27:33 +1100394// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
395// incrementally match the object key with the query fragment. For example, if
396// we have already matched the "ban" of "banana", then we would accept any of
397// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
398// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100399//
Nigel Taob48ee752020-03-13 09:27:33 +1100400// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100401// v
402// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
403// / a p p l e / b a n a n a / 1 2 / d u r i a n $
404// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
405// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100406// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100407//
408// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100409// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
410// have a fragment match: the query fragment equals the object key. If there is
411// a next fragment (in this example, "12") we move the frag_etc pointers to its
412// start and end and increment Query::m_depth. Otherwise, we have matched the
413// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100414//
415// The discussion above centers on object keys. If the query fragment is
416// numeric then it can also match as an array index: the string fragment "12"
417// will match an array's 13th element (starting counting from zero). See RFC
418// 6901 for its precise definition of an "array index" number.
419//
Nigel Taob48ee752020-03-13 09:27:33 +1100420// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100421// whose type (wuffs_base__result_u64) is a result type. An error result means
422// that the fragment is not an array index. A value result holds the number of
423// list elements remaining. When matching a query fragment in an array (instead
424// of in an object), each element ticks this number down towards zero. At zero,
425// the upcoming JSON value is the one that matches the query fragment.
426class Query {
427 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100428 uint8_t* m_frag_i;
429 uint8_t* m_frag_j;
430 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100431
Nigel Taob48ee752020-03-13 09:27:33 +1100432 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100433
Nigel Taob48ee752020-03-13 09:27:33 +1100434 wuffs_base__result_u64 m_array_index;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100435
436 public:
437 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100438 m_frag_i = (uint8_t*)query_c_string;
439 m_frag_j = (uint8_t*)query_c_string;
440 m_frag_k = (uint8_t*)query_c_string;
441 m_depth = 0;
442 m_array_index.status.repr = "#main: not an array index query fragment";
443 m_array_index.value = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100444 }
445
Nigel Taob48ee752020-03-13 09:27:33 +1100446 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100447
Nigel Taob48ee752020-03-13 09:27:33 +1100448 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100449
450 // tick returns whether the fragment is a valid array index whose value is
451 // zero. If valid but non-zero, it decrements it and returns false.
452 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100453 if (m_array_index.status.is_ok()) {
454 if (m_array_index.value == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100455 return true;
456 }
Nigel Taob48ee752020-03-13 09:27:33 +1100457 m_array_index.value--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100458 }
459 return false;
460 }
461
462 // next_fragment moves to the next fragment, returning whether it existed.
463 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100464 uint8_t* k = m_frag_k;
465 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100466
467 this->reset(nullptr);
468
469 if (!k || (*k != '/')) {
470 return false;
471 }
472 k++;
473
474 bool all_digits = true;
475 uint8_t* i = k;
476 while ((*k != '\x00') && (*k != '/')) {
477 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
478 k++;
479 }
Nigel Taob48ee752020-03-13 09:27:33 +1100480 m_frag_i = i;
481 m_frag_j = i;
482 m_frag_k = k;
483 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100484 if (all_digits) {
485 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000486 m_array_index = wuffs_base__parse_number_u64(
487 wuffs_base__make_slice_u8(i, k - i),
488 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100489 }
490 return true;
491 }
492
Nigel Taob48ee752020-03-13 09:27:33 +1100493 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100494
Nigel Taob48ee752020-03-13 09:27:33 +1100495 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100496
497 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100498 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100499 return;
500 }
Nigel Taob48ee752020-03-13 09:27:33 +1100501 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100502 while (true) {
503 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100504 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100505 return;
506 }
507
508 if (*j == '\x00') {
509 break;
510
511 } else if (*j == '~') {
512 j++;
513 if (*j == '0') {
514 if (*ptr != '~') {
515 break;
516 }
517 } else if (*j == '1') {
518 if (*ptr != '/') {
519 break;
520 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100521 } else if (*j == 'n') {
522 if (*ptr != '\n') {
523 break;
524 }
525 } else if (*j == 'r') {
526 if (*ptr != '\r') {
527 break;
528 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100529 } else {
530 break;
531 }
532
533 } else if (*j != *ptr) {
534 break;
535 }
536
537 j++;
538 ptr++;
539 len--;
540 }
Nigel Taob48ee752020-03-13 09:27:33 +1100541 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100542 }
543
544 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100545 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100546 return;
547 }
548 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
549 size_t n = wuffs_base__utf_8__encode(
550 wuffs_base__make_slice_u8(&u[0],
551 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
552 code_point);
553 if (n > 0) {
554 this->incremental_match_slice(&u[0], n);
555 }
556 }
557
558 // validate returns whether the (ptr, len) arguments form a valid JSON
559 // Pointer. In particular, it must be valid UTF-8, and either be empty or
560 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100561 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
562 // followed by either 'n' or 'r'.
563 static bool validate(char* query_c_string,
564 size_t length,
565 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100566 if (length <= 0) {
567 return true;
568 }
569 if (query_c_string[0] != '/') {
570 return false;
571 }
572 wuffs_base__slice_u8 s =
573 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
574 bool previous_was_tilde = false;
575 while (s.len > 0) {
576 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s);
577 if (!o.is_valid()) {
578 return false;
579 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100580
581 if (previous_was_tilde) {
582 switch (o.code_point) {
583 case '0':
584 case '1':
585 break;
586 case 'n':
587 case 'r':
588 if (strict_json_pointer_syntax) {
589 return false;
590 }
591 break;
592 default:
593 return false;
594 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100595 }
596 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100597
Nigel Tao0cd2f982020-03-03 23:03:02 +1100598 s.ptr += o.byte_length;
599 s.len -= o.byte_length;
600 }
601 return !previous_was_tilde;
602 }
Nigel Taod60815c2020-03-26 14:32:35 +1100603} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100604
605// ----
606
Nigel Tao168f60a2020-07-14 13:19:33 +1000607enum class file_format {
608 json,
609 cbor,
610};
611
Nigel Tao68920952020-03-03 11:25:18 +1100612struct {
613 int remaining_argc;
614 char** remaining_argv;
615
Nigel Tao3690e832020-03-12 16:52:26 +1100616 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100617 bool fail_if_unsandboxed;
Nigel Tao4e193592020-07-15 12:48:57 +1000618 file_format input_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000619 bool input_allow_json_comments;
620 bool input_allow_json_extra_comma;
Nigel Tao51a38292020-07-19 22:43:17 +1000621 bool input_allow_json_inf_nan_numbers;
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100622 uint32_t max_output_depth;
Nigel Tao168f60a2020-07-14 13:19:33 +1000623 file_format output_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000624 bool output_cbor_metadata_as_json_comments;
Nigel Taoc766bb72020-07-09 12:59:32 +1000625 bool output_json_extra_comma;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100626 char* query_c_string;
Nigel Taoecadf722020-07-13 08:22:34 +1000627 size_t spaces;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100628 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100629 bool tabs;
Nigel Taod60815c2020-03-26 14:32:35 +1100630} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100631
632const char* //
633parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000634 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100635 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100636
637 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
638 for (; c < argc; c++) {
639 char* arg = argv[c];
640 if (*arg++ != '-') {
641 break;
642 }
643
644 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
645 // cases, a bare "-" is not a flag (some programs may interpret it as
646 // stdin) and a bare "--" means to stop parsing flags.
647 if (*arg == '\x00') {
648 break;
649 } else if (*arg == '-') {
650 arg++;
651 if (*arg == '\x00') {
652 c++;
653 break;
654 }
655 }
656
Nigel Tao3690e832020-03-12 16:52:26 +1100657 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100658 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100659 continue;
660 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100661 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
662 g_flags.max_output_depth = 1;
663 continue;
664 } else if (!strncmp(arg, "d=", 2) ||
665 !strncmp(arg, "max-output-depth=", 16)) {
666 while (*arg++ != '=') {
667 }
668 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000669 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
670 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000671 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100672 g_flags.max_output_depth = (uint32_t)(u.value);
673 continue;
674 }
675 return g_usage;
676 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100677 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100678 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100679 continue;
680 }
Nigel Tao4e193592020-07-15 12:48:57 +1000681 if (!strcmp(arg, "i=cbor") || !strcmp(arg, "input-format=cbor")) {
682 g_flags.input_format = file_format::cbor;
683 continue;
684 }
685 if (!strcmp(arg, "i=json") || !strcmp(arg, "input-format=json")) {
686 g_flags.input_format = file_format::json;
687 continue;
688 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000689 if (!strcmp(arg, "input-allow-json-comments")) {
690 g_flags.input_allow_json_comments = true;
691 continue;
692 }
693 if (!strcmp(arg, "input-allow-json-extra-comma")) {
694 g_flags.input_allow_json_extra_comma = true;
Nigel Taoc766bb72020-07-09 12:59:32 +1000695 continue;
696 }
Nigel Tao51a38292020-07-19 22:43:17 +1000697 if (!strcmp(arg, "input-allow-json-inf-nan-numbers")) {
698 g_flags.input_allow_json_inf_nan_numbers = true;
699 continue;
700 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000701 if (!strcmp(arg, "o=cbor") || !strcmp(arg, "output-format=cbor")) {
702 g_flags.output_format = file_format::cbor;
703 continue;
704 }
705 if (!strcmp(arg, "o=json") || !strcmp(arg, "output-format=json")) {
706 g_flags.output_format = file_format::json;
707 continue;
708 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000709 if (!strcmp(arg, "output-cbor-metadata-as-json-comments")) {
710 g_flags.output_cbor_metadata_as_json_comments = true;
711 continue;
712 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000713 if (!strcmp(arg, "output-json-extra-comma")) {
714 g_flags.output_json_extra_comma = true;
715 continue;
716 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100717 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
718 while (*arg++ != '=') {
719 }
Nigel Taod60815c2020-03-26 14:32:35 +1100720 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100721 continue;
722 }
Nigel Taoecadf722020-07-13 08:22:34 +1000723 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
724 while (*arg++ != '=') {
725 }
726 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
727 g_flags.spaces = arg[0] - '0';
728 continue;
729 }
730 return g_usage;
731 }
732 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100733 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100734 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100735 }
736 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100737 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100738 continue;
739 }
740
Nigel Taod60815c2020-03-26 14:32:35 +1100741 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100742 }
743
Nigel Taod60815c2020-03-26 14:32:35 +1100744 if (g_flags.query_c_string &&
745 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
746 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100747 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
748 }
749
Nigel Taod60815c2020-03-26 14:32:35 +1100750 g_flags.remaining_argc = argc - c;
751 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100752 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100753}
754
Nigel Tao2cf76db2020-02-27 22:42:01 +1100755const char* //
756initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100757 g_dst = wuffs_base__make_io_buffer(
758 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100759 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100760
Nigel Taod60815c2020-03-26 14:32:35 +1100761 g_src = wuffs_base__make_io_buffer(
762 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100763 wuffs_base__empty_io_buffer_meta());
764
Nigel Taod60815c2020-03-26 14:32:35 +1100765 g_tok = wuffs_base__make_token_buffer(
766 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100767 wuffs_base__empty_token_buffer_meta());
768
Nigel Taod60815c2020-03-26 14:32:35 +1100769 g_curr_token_end_src_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100770
Nigel Taod60815c2020-03-26 14:32:35 +1100771 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100772
Nigel Taod60815c2020-03-26 14:32:35 +1100773 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100774
Nigel Tao68920952020-03-03 11:25:18 +1100775 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100776 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100777 return "main: unsandboxed";
778 }
Nigel Tao01abc842020-03-06 21:42:33 +1100779 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100780 if (g_flags.remaining_argc >
781 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
782 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100783 }
784
Nigel Taod60815c2020-03-26 14:32:35 +1100785 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100786
787 // If the query is non-empty, suprress writing to stdout until we've
788 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100789 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
790 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100791
Nigel Tao4e193592020-07-15 12:48:57 +1000792 if (g_flags.input_format == file_format::json) {
793 TRY(g_json_decoder
794 .initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
795 .message());
796 g_dec = g_json_decoder.upcast_as__wuffs_base__token_decoder();
797 } else {
798 TRY(g_cbor_decoder
799 .initialize(sizeof__wuffs_cbor__decoder(), WUFFS_VERSION, 0)
800 .message());
801 g_dec = g_cbor_decoder.upcast_as__wuffs_base__token_decoder();
802 }
Nigel Tao4b186b02020-03-18 14:25:21 +1100803
Nigel Tao3c8589b2020-07-19 21:49:00 +1000804 if (g_flags.input_allow_json_comments) {
805 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
806 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
807 }
808 if (g_flags.input_allow_json_extra_comma) {
Nigel Tao4e193592020-07-15 12:48:57 +1000809 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000810 }
Nigel Tao51a38292020-07-19 22:43:17 +1000811 if (g_flags.input_allow_json_inf_nan_numbers) {
812 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_INF_NAN_NUMBERS, true);
813 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000814
Nigel Tao4b186b02020-03-18 14:25:21 +1100815 // Consume an optional whitespace trailer. This isn't part of the JSON spec,
816 // but it works better with line oriented Unix tools (such as "echo 123 |
817 // jsonptr" where it's "echo", not "echo -n") or hand-edited JSON files which
818 // can accidentally contain trailing whitespace.
Nigel Tao4e193592020-07-15 12:48:57 +1000819 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100820
821 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100822}
Nigel Tao1b073492020-02-16 22:11:36 +1100823
824// ----
825
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100826// ignore_return_value suppresses errors from -Wall -Werror.
827static void //
828ignore_return_value(int ignored) {}
829
Nigel Tao2914bae2020-02-26 09:40:30 +1100830const char* //
831read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +1100832 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +1100833 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +1100834 }
Nigel Taod60815c2020-03-26 14:32:35 +1100835 g_src.compact();
836 if (g_src.meta.wi >= g_src.data.len) {
837 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100838 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100839 while (true) {
Nigel Taod60815c2020-03-26 14:32:35 +1100840 ssize_t n = read(g_input_file_descriptor, g_src.data.ptr + g_src.meta.wi,
841 g_src.data.len - g_src.meta.wi);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100842 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100843 g_src.meta.wi += n;
844 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100845 break;
846 } else if (errno != EINTR) {
847 return strerror(errno);
848 }
Nigel Tao1b073492020-02-16 22:11:36 +1100849 }
850 return nullptr;
851}
852
Nigel Tao2914bae2020-02-26 09:40:30 +1100853const char* //
854flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100855 while (true) {
Nigel Taod60815c2020-03-26 14:32:35 +1100856 size_t n = g_dst.meta.wi - g_dst.meta.ri;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100857 if (n == 0) {
858 break;
Nigel Tao1b073492020-02-16 22:11:36 +1100859 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100860 const int stdout_fd = 1;
Nigel Taod60815c2020-03-26 14:32:35 +1100861 ssize_t i = write(stdout_fd, g_dst.data.ptr + g_dst.meta.ri, n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100862 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100863 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100864 } else if (errno != EINTR) {
865 return strerror(errno);
866 }
Nigel Tao1b073492020-02-16 22:11:36 +1100867 }
Nigel Taod60815c2020-03-26 14:32:35 +1100868 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +1100869 return nullptr;
870}
871
Nigel Tao2914bae2020-02-26 09:40:30 +1100872const char* //
873write_dst(const void* s, size_t n) {
Nigel Taod60815c2020-03-26 14:32:35 +1100874 if (g_suppress_write_dst > 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100875 return nullptr;
876 }
Nigel Tao1b073492020-02-16 22:11:36 +1100877 const uint8_t* p = static_cast<const uint8_t*>(s);
878 while (n > 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100879 size_t i = g_dst.writer_available();
Nigel Tao1b073492020-02-16 22:11:36 +1100880 if (i == 0) {
881 const char* z = flush_dst();
882 if (z) {
883 return z;
884 }
Nigel Taod60815c2020-03-26 14:32:35 +1100885 i = g_dst.writer_available();
Nigel Tao1b073492020-02-16 22:11:36 +1100886 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100887 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100888 }
889 }
890
891 if (i > n) {
892 i = n;
893 }
Nigel Taod60815c2020-03-26 14:32:35 +1100894 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
895 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +1100896 p += i;
897 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +1100898 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +1100899 }
900 return nullptr;
901}
902
903// ----
904
Nigel Tao168f60a2020-07-14 13:19:33 +1000905const char* //
906write_literal(uint64_t vbd) {
907 const char* ptr = nullptr;
908 size_t len = 0;
909 if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__UNDEFINED) {
910 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +1000911 // JSON's closest approximation to "undefined" is "null".
912 if (g_flags.output_cbor_metadata_as_json_comments) {
913 ptr = "/*cbor:undefined*/null";
914 len = 22;
915 } else {
916 ptr = "null";
917 len = 4;
918 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000919 } else {
920 ptr = "\xF7";
921 len = 1;
922 }
923 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__NULL) {
924 if (g_flags.output_format == file_format::json) {
925 ptr = "null";
926 len = 4;
927 } else {
928 ptr = "\xF6";
929 len = 1;
930 }
931 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__FALSE) {
932 if (g_flags.output_format == file_format::json) {
933 ptr = "false";
934 len = 5;
935 } else {
936 ptr = "\xF4";
937 len = 1;
938 }
939 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__TRUE) {
940 if (g_flags.output_format == file_format::json) {
941 ptr = "true";
942 len = 4;
943 } else {
944 ptr = "\xF5";
945 len = 1;
946 }
947 } else {
948 return "main: internal error: unexpected write_literal argument";
949 }
950 return write_dst(ptr, len);
951}
952
953// ----
954
955const char* //
Nigel Tao664f8432020-07-16 21:25:14 +1000956write_number_as_cbor_f64(double f) {
Nigel Tao168f60a2020-07-14 13:19:33 +1000957 uint8_t buf[9];
958 wuffs_base__lossy_value_u16 lv16 =
959 wuffs_base__ieee_754_bit_representation__from_f64_to_u16_truncate(f);
960 if (!lv16.lossy) {
961 buf[0] = 0xF9;
962 wuffs_base__store_u16be__no_bounds_check(&buf[1], lv16.value);
963 return write_dst(&buf[0], 3);
964 }
965 wuffs_base__lossy_value_u32 lv32 =
966 wuffs_base__ieee_754_bit_representation__from_f64_to_u32_truncate(f);
967 if (!lv32.lossy) {
968 buf[0] = 0xFA;
969 wuffs_base__store_u32be__no_bounds_check(&buf[1], lv32.value);
970 return write_dst(&buf[0], 5);
971 }
972 buf[0] = 0xFB;
973 wuffs_base__store_u64be__no_bounds_check(
974 &buf[1], wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f));
975 return write_dst(&buf[0], 9);
976}
977
978const char* //
Nigel Tao664f8432020-07-16 21:25:14 +1000979write_number_as_cbor_u64(uint8_t base, uint64_t u) {
Nigel Tao168f60a2020-07-14 13:19:33 +1000980 uint8_t buf[9];
981 if (u < 0x18) {
982 buf[0] = base | ((uint8_t)u);
983 return write_dst(&buf[0], 1);
984 } else if ((u >> 8) == 0) {
985 buf[0] = base | 0x18;
986 buf[1] = ((uint8_t)u);
987 return write_dst(&buf[0], 2);
988 } else if ((u >> 16) == 0) {
989 buf[0] = base | 0x19;
990 wuffs_base__store_u16be__no_bounds_check(&buf[1], ((uint16_t)u));
991 return write_dst(&buf[0], 3);
992 } else if ((u >> 32) == 0) {
993 buf[0] = base | 0x1A;
994 wuffs_base__store_u32be__no_bounds_check(&buf[1], ((uint32_t)u));
995 return write_dst(&buf[0], 5);
996 }
997 buf[0] = base | 0x1B;
998 wuffs_base__store_u64be__no_bounds_check(&buf[1], u);
999 return write_dst(&buf[0], 9);
1000}
1001
1002const char* //
Nigel Tao664f8432020-07-16 21:25:14 +10001003write_cbor_number_as_json(uint8_t* ptr,
1004 size_t len,
1005 bool ignore_first_byte,
1006 bool minus_1_minus_x) {
1007 if (ignore_first_byte) {
1008 if (len == 0) {
1009 return "main: internal error: ignore_first_byte with no bytes";
1010 }
1011 ptr++;
1012 len--;
1013 }
1014 uint64_t u;
1015 switch (len) {
1016 case 1:
1017 u = wuffs_base__load_u8__no_bounds_check(ptr);
1018 break;
1019 case 2:
1020 u = wuffs_base__load_u16be__no_bounds_check(ptr);
1021 break;
1022 case 4:
1023 u = wuffs_base__load_u32be__no_bounds_check(ptr);
1024 break;
1025 case 8:
1026 u = wuffs_base__load_u64be__no_bounds_check(ptr);
1027 break;
1028 default:
1029 return "main: internal error: unexpected cbor number byte length";
1030 }
1031 uint8_t buf[1 + WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1032 uint8_t* b = &buf[0];
1033 if (minus_1_minus_x) {
1034 u++;
1035 if (u == 0) {
1036 // See the cbor.TOKEN_VALUE_MINOR__MINUS_1_MINUS_X comment re overflow.
1037 return write_dst("-18446744073709551616", 21);
1038 }
1039 *b++ = '-';
1040 }
1041 size_t n = wuffs_base__render_number_u64(
1042 wuffs_base__make_slice_u8(b, WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL), u,
1043 WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1044 return write_dst(&buf[0], n + (minus_1_minus_x ? 1 : 0));
1045}
1046
1047const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001048write_number(uint64_t vbd, uint8_t* ptr, size_t len) {
Nigel Tao4e193592020-07-15 12:48:57 +10001049 if (g_flags.output_format == file_format::json) {
Nigel Tao51a38292020-07-19 22:43:17 +10001050 if (g_flags.input_format == file_format::json) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001051 return write_dst(ptr, len);
Nigel Tao4e193592020-07-15 12:48:57 +10001052 } else if ((vbd &
1053 WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_INTEGER_UNSIGNED) &&
1054 (vbd &
1055 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_BINARY_BIG_ENDIAN)) {
Nigel Tao664f8432020-07-16 21:25:14 +10001056 return write_cbor_number_as_json(
1057 ptr, len,
1058 vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_IGNORE_FIRST_BYTE,
1059 false);
Nigel Tao168f60a2020-07-14 13:19:33 +10001060 }
1061
Nigel Tao4e193592020-07-15 12:48:57 +10001062 // From here on, (g_flags.output_format == file_format::cbor).
1063 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_BINARY_BIG_ENDIAN) {
1064 return write_dst(ptr, len);
1065 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_TEXT) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001066 // First try to parse (ptr, len) as an integer. Something like
1067 // "1180591620717411303424" is a valid number (in the JSON sense) but will
1068 // overflow int64_t or uint64_t, so fall back to parsing it as a float64.
1069 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_INTEGER_SIGNED) {
1070 if ((len > 0) && (ptr[0] == '-')) {
1071 wuffs_base__result_i64 ri = wuffs_base__parse_number_i64(
1072 wuffs_base__make_slice_u8(ptr, len),
1073 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1074 if (ri.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001075 return write_number_as_cbor_u64(0x20, ~ri.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001076 }
1077 } else {
1078 wuffs_base__result_u64 ru = wuffs_base__parse_number_u64(
1079 wuffs_base__make_slice_u8(ptr, len),
1080 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1081 if (ru.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001082 return write_number_as_cbor_u64(0x00, ru.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001083 }
1084 }
1085 }
1086
1087 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT) {
1088 wuffs_base__result_f64 rf = wuffs_base__parse_number_f64(
1089 wuffs_base__make_slice_u8(ptr, len),
1090 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1091 if (rf.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001092 return write_number_as_cbor_f64(rf.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001093 }
1094 }
Nigel Tao51a38292020-07-19 22:43:17 +10001095 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_INF) {
1096 return write_dst("\xF9\xFC\x00", 3);
1097 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_INF) {
1098 return write_dst("\xF9\x7C\x00", 3);
1099 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_NAN) {
1100 return write_dst("\xF9\xFF\xFF", 3);
1101 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_NAN) {
1102 return write_dst("\xF9\x7F\xFF", 3);
Nigel Tao168f60a2020-07-14 13:19:33 +10001103 }
1104
Nigel Tao4e193592020-07-15 12:48:57 +10001105fail:
Nigel Tao168f60a2020-07-14 13:19:33 +10001106 return "main: internal error: unexpected write_number argument";
1107}
1108
Nigel Tao4e193592020-07-15 12:48:57 +10001109const char* //
Nigel Taoc9d4e342020-07-21 15:20:34 +10001110write_inline_integer(uint64_t x, bool x_is_signed, uint8_t* ptr, size_t len) {
Nigel Tao4e193592020-07-15 12:48:57 +10001111 if (g_flags.output_format == file_format::cbor) {
1112 return write_dst(ptr, len);
1113 }
1114
Nigel Taoc9d4e342020-07-21 15:20:34 +10001115 // Adding the two ETC__BYTE_LENGTH__ETC constants is overkill, but it's
1116 // simpler (for producing a constant-expression array size) than taking the
1117 // maximum of the two.
1118 uint8_t buf[WUFFS_BASE__I64__BYTE_LENGTH__MAX_INCL +
1119 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1120 wuffs_base__slice_u8 dst = wuffs_base__make_slice_u8(&buf[0], sizeof buf);
1121 size_t n =
1122 x_is_signed
1123 ? wuffs_base__render_number_i64(
1124 dst, (int64_t)x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS)
1125 : wuffs_base__render_number_u64(
1126 dst, x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao4e193592020-07-15 12:48:57 +10001127 return write_dst(&buf[0], n);
1128}
1129
Nigel Tao168f60a2020-07-14 13:19:33 +10001130// ----
1131
Nigel Tao2914bae2020-02-26 09:40:30 +11001132uint8_t //
1133hex_digit(uint8_t nibble) {
Nigel Taob5461bd2020-02-21 14:13:37 +11001134 nibble &= 0x0F;
1135 if (nibble <= 9) {
1136 return '0' + nibble;
1137 }
1138 return ('A' - 10) + nibble;
1139}
1140
Nigel Tao2914bae2020-02-26 09:40:30 +11001141const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001142flush_cbor_output_string() {
1143 uint8_t prefix[3];
1144 prefix[0] = g_cbor_output_string_is_utf_8 ? 0x60 : 0x40;
1145 if (g_cbor_output_string_length < 0x18) {
1146 prefix[0] |= g_cbor_output_string_length;
1147 TRY(write_dst(&prefix[0], 1));
1148 } else if (g_cbor_output_string_length <= 0xFF) {
1149 prefix[0] |= 0x18;
1150 prefix[1] = g_cbor_output_string_length;
1151 TRY(write_dst(&prefix[0], 2));
1152 } else if (g_cbor_output_string_length <= 0xFFFF) {
1153 prefix[0] |= 0x19;
1154 prefix[1] = g_cbor_output_string_length >> 8;
1155 prefix[2] = g_cbor_output_string_length;
1156 TRY(write_dst(&prefix[0], 3));
1157 } else {
1158 return "main: internal error: CBOR string output is too long";
1159 }
1160
1161 size_t n = g_cbor_output_string_length;
1162 g_cbor_output_string_length = 0;
1163 return write_dst(&g_cbor_output_string_array[0], n);
1164}
1165
1166const char* //
1167write_cbor_output_string(uint8_t* ptr, size_t len, bool finish) {
1168 // Check that g_cbor_output_string_array can hold any UTF-8 code point.
1169 if (CBOR_OUTPUT_STRING_ARRAY_SIZE < 4) {
1170 return "main: internal error: CBOR_OUTPUT_STRING_ARRAY_SIZE is too short";
1171 }
1172
1173 while (len > 0) {
1174 size_t available =
1175 CBOR_OUTPUT_STRING_ARRAY_SIZE - g_cbor_output_string_length;
1176 if (available >= len) {
1177 memcpy(&g_cbor_output_string_array[g_cbor_output_string_length], ptr,
1178 len);
1179 g_cbor_output_string_length += len;
1180 ptr += len;
1181 len = 0;
1182 break;
1183
1184 } else if (available > 0) {
1185 if (!g_cbor_output_string_is_multiple_chunks) {
1186 g_cbor_output_string_is_multiple_chunks = true;
1187 TRY(write_dst(g_cbor_output_string_is_utf_8 ? "\x7F" : "\x5F", 1));
Nigel Tao3b486982020-02-27 15:05:59 +11001188 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001189
1190 if (g_cbor_output_string_is_utf_8) {
1191 // Walk the end backwards to a UTF-8 boundary, so that each chunk of
1192 // the multi-chunk string is also valid UTF-8.
1193 while (available > 0) {
1194 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next_from_end(
1195 wuffs_base__make_slice_u8(ptr, available));
1196 if ((o.code_point != WUFFS_BASE__UNICODE_REPLACEMENT_CHARACTER) ||
1197 (o.byte_length != 1)) {
1198 break;
1199 }
1200 available--;
1201 }
1202 }
1203
1204 memcpy(&g_cbor_output_string_array[g_cbor_output_string_length], ptr,
1205 available);
1206 g_cbor_output_string_length += available;
1207 ptr += available;
1208 len -= available;
Nigel Tao3b486982020-02-27 15:05:59 +11001209 }
1210
Nigel Tao168f60a2020-07-14 13:19:33 +10001211 TRY(flush_cbor_output_string());
1212 }
Nigel Taob9ad34f2020-03-03 12:44:01 +11001213
Nigel Tao168f60a2020-07-14 13:19:33 +10001214 if (finish) {
1215 TRY(flush_cbor_output_string());
1216 if (g_cbor_output_string_is_multiple_chunks) {
1217 TRY(write_dst("\xFF", 1));
1218 }
1219 }
1220 return nullptr;
1221}
Nigel Taob9ad34f2020-03-03 12:44:01 +11001222
Nigel Tao168f60a2020-07-14 13:19:33 +10001223const char* //
Nigel Tao7cb76542020-07-19 22:19:04 +10001224handle_unicode_code_point(uint32_t ucp) {
1225 if (g_flags.output_format == file_format::json) {
1226 if (ucp < 0x0020) {
1227 switch (ucp) {
1228 case '\b':
1229 return write_dst("\\b", 2);
1230 case '\f':
1231 return write_dst("\\f", 2);
1232 case '\n':
1233 return write_dst("\\n", 2);
1234 case '\r':
1235 return write_dst("\\r", 2);
1236 case '\t':
1237 return write_dst("\\t", 2);
1238 }
1239
1240 // Other bytes less than 0x0020 are valid UTF-8 but not valid in a
1241 // JSON string. They need to remain escaped.
1242 uint8_t esc6[6];
1243 esc6[0] = '\\';
1244 esc6[1] = 'u';
1245 esc6[2] = '0';
1246 esc6[3] = '0';
1247 esc6[4] = hex_digit(ucp >> 4);
1248 esc6[5] = hex_digit(ucp >> 0);
1249 return write_dst(&esc6[0], 6);
1250
1251 } else if (ucp == '\"') {
1252 return write_dst("\\\"", 2);
1253
1254 } else if (ucp == '\\') {
1255 return write_dst("\\\\", 2);
1256 }
1257 }
1258
1259 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1260 size_t n = wuffs_base__utf_8__encode(
1261 wuffs_base__make_slice_u8(&u[0],
1262 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1263 ucp);
1264 if (n == 0) {
1265 return "main: internal error: unexpected Unicode code point";
1266 }
1267
1268 if (g_flags.output_format == file_format::json) {
1269 return write_dst(&u[0], n);
1270 }
1271 return write_cbor_output_string(&u[0], n, false);
1272}
Nigel Taod191a3f2020-07-19 22:14:54 +10001273
1274const char* //
1275write_json_escaped_string(uint8_t* ptr, size_t len) {
1276restart:
1277 while (true) {
1278 size_t i;
1279 for (i = 0; i < len; i++) {
1280 uint8_t c = ptr[i];
1281 if ((c == '"') || (c == '\\') || (c < 0x20)) {
1282 TRY(write_dst(ptr, i));
1283 TRY(handle_unicode_code_point(c));
1284 ptr += i + 1;
1285 len -= i + 1;
1286 goto restart;
1287 }
1288 }
1289 TRY(write_dst(ptr, len));
1290 break;
1291 }
1292 return nullptr;
1293}
1294
1295const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001296handle_string(uint64_t vbd,
1297 uint64_t len,
1298 bool start_of_token_chain,
1299 bool continued) {
1300 if (start_of_token_chain) {
1301 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +10001302 if (g_flags.output_cbor_metadata_as_json_comments &&
1303 !(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
1304 TRY(write_dst("/*cbor:hex*/\"", 13));
1305 } else {
1306 TRY(write_dst("\"", 1));
1307 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001308 } else {
1309 g_cbor_output_string_length = 0;
1310 g_cbor_output_string_is_multiple_chunks = false;
1311 g_cbor_output_string_is_utf_8 =
1312 vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8;
1313 }
1314 g_query.restart_fragment(in_dict_before_key() && g_query.is_at(g_depth));
1315 }
1316
1317 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_0_DST_1_SRC_DROP) {
1318 // No-op.
1319 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
1320 uint8_t* ptr = g_src.data.ptr + g_curr_token_end_src_index - len;
1321 if (g_flags.output_format == file_format::json) {
Nigel Taoaf757722020-07-18 17:27:11 +10001322 if (g_flags.input_format == file_format::json) {
1323 TRY(write_dst(ptr, len));
1324 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8) {
Nigel Taod191a3f2020-07-19 22:14:54 +10001325 TRY(write_json_escaped_string(ptr, len));
Nigel Taoaf757722020-07-18 17:27:11 +10001326 } else {
1327 uint8_t as_hex[512];
1328 uint8_t* p = ptr;
1329 size_t n = len;
1330 while (n > 0) {
1331 wuffs_base__transform__output o = wuffs_base__base_16__encode2(
1332 wuffs_base__make_slice_u8(&as_hex[0], sizeof as_hex),
1333 wuffs_base__make_slice_u8(p, n), true,
1334 WUFFS_BASE__BASE_16__DEFAULT_OPTIONS);
1335 TRY(write_dst(&as_hex[0], o.num_dst));
1336 p += o.num_src;
1337 n -= o.num_src;
1338 if (!o.status.is_ok()) {
1339 return o.status.message();
1340 }
1341 }
1342 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001343 } else {
1344 TRY(write_cbor_output_string(ptr, len, false));
1345 }
1346 g_query.incremental_match_slice(ptr, len);
Nigel Taob9ad34f2020-03-03 12:44:01 +11001347 } else {
Nigel Tao168f60a2020-07-14 13:19:33 +10001348 return "main: internal error: unexpected string-token conversion";
1349 }
1350
1351 if (continued) {
1352 return nullptr;
1353 }
1354
1355 if (g_flags.output_format == file_format::json) {
1356 TRY(write_dst("\"", 1));
1357 } else {
1358 TRY(write_cbor_output_string(nullptr, 0, true));
1359 }
1360 return nullptr;
1361}
1362
Nigel Taod191a3f2020-07-19 22:14:54 +10001363// ----
1364
Nigel Tao3b486982020-02-27 15:05:59 +11001365const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001366handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001367 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001368 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001369 uint64_t vbd = t.value_base_detail();
1370 uint64_t len = t.length();
Nigel Tao1b073492020-02-16 22:11:36 +11001371
1372 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001373 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001374 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001375 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001376 return "main: no match for query";
1377 }
Nigel Taod60815c2020-03-26 14:32:35 +11001378 if (g_depth <= 0) {
1379 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001380 }
Nigel Taod60815c2020-03-26 14:32:35 +11001381 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001382
Nigel Taod60815c2020-03-26 14:32:35 +11001383 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1384 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001385 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao168f60a2020-07-14 13:19:33 +10001386 if (g_flags.output_format == file_format::json) {
1387 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1388 ? "\"[…]\""
1389 : "\"{…}\"",
1390 7));
1391 } else {
1392 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1393 ? "\x65[…]"
1394 : "\x65{…}",
1395 6));
1396 }
1397 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001398 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001399 if ((g_ctx != context::in_list_after_bracket) &&
1400 (g_ctx != context::in_dict_after_brace) &&
1401 !g_flags.compact_output) {
Nigel Taoc766bb72020-07-09 12:59:32 +10001402 if (g_flags.output_json_extra_comma) {
1403 TRY(write_dst(",\n", 2));
1404 } else {
1405 TRY(write_dst("\n", 1));
1406 }
Nigel Taod60815c2020-03-26 14:32:35 +11001407 for (uint32_t i = 0; i < g_depth; i++) {
1408 TRY(write_dst(
1409 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001410 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001411 }
Nigel Tao1b073492020-02-16 22:11:36 +11001412 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001413
1414 TRY(write_dst(
1415 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1416 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001417 } else {
1418 TRY(write_dst("\xFF", 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001419 }
1420
Nigel Taod60815c2020-03-26 14:32:35 +11001421 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1422 ? context::in_list_after_value
1423 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001424 goto after_value;
1425 }
1426
Nigel Taod1c928a2020-02-28 12:43:53 +11001427 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
1428 // continuation of a multi-token chain.
Nigel Tao2ef39992020-04-09 17:24:39 +10001429 if (start_of_token_chain) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001430 if (g_flags.output_format != file_format::json) {
1431 // No-op.
1432 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001433 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1434 } else if (g_ctx != context::none) {
1435 if ((g_ctx != context::in_list_after_bracket) &&
1436 (g_ctx != context::in_dict_after_brace)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001437 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001438 }
Nigel Taod60815c2020-03-26 14:32:35 +11001439 if (!g_flags.compact_output) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001440 TRY(write_dst("\n", 1));
Nigel Taod60815c2020-03-26 14:32:35 +11001441 for (size_t i = 0; i < g_depth; i++) {
1442 TRY(write_dst(
1443 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001444 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao0cd2f982020-03-03 23:03:02 +11001445 }
1446 }
1447 }
1448
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001449 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001450 if (g_query.is_at(g_depth)) {
1451 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001452 case context::in_list_after_bracket:
1453 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001454 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001455 break;
1456 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001457 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001458 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001459 default:
1460 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001461 }
1462 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001463 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001464 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001465 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001466 // There is no next fragment. We have matched the complete query, and
1467 // the upcoming JSON value is the result of that query.
1468 //
Nigel Taod60815c2020-03-26 14:32:35 +11001469 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1470 // we were about to decode a top-level value. This makes any subsequent
1471 // indentation be relative to this point, and we will return g_eod
1472 // after the upcoming JSON value is complete.
1473 if (g_suppress_write_dst != 1) {
1474 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001475 }
Nigel Taod60815c2020-03-26 14:32:35 +11001476 g_suppress_write_dst = 0;
1477 g_ctx = context::none;
1478 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001479 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1480 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1481 // The query has moved on to the next fragment but the upcoming JSON
1482 // value is not a container.
1483 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001484 }
1485 }
1486
1487 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001488 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001489 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001490 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001491 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1492 g_suppress_write_dst++;
Nigel Tao168f60a2020-07-14 13:19:33 +10001493 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001494 TRY(write_dst(
1495 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1496 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001497 } else {
1498 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1499 ? "\x9F"
1500 : "\xBF",
1501 1));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001502 }
Nigel Taod60815c2020-03-26 14:32:35 +11001503 g_depth++;
1504 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1505 ? context::in_list_after_bracket
1506 : context::in_dict_after_brace;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001507 return nullptr;
1508
Nigel Tao2cf76db2020-02-27 22:42:01 +11001509 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Tao168f60a2020-07-14 13:19:33 +10001510 TRY(handle_string(vbd, len, start_of_token_chain, t.continued()));
Nigel Tao496e88b2020-04-09 22:10:08 +10001511 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001512 return nullptr;
1513 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001514 goto after_value;
1515
1516 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001517 if (!t.continued()) {
1518 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001519 }
1520 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001521 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001522 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001523
Nigel Tao85fba7f2020-02-29 16:28:06 +11001524 case WUFFS_BASE__TOKEN__VBC__LITERAL:
Nigel Tao168f60a2020-07-14 13:19:33 +10001525 TRY(write_literal(vbd));
1526 goto after_value;
1527
Nigel Tao2cf76db2020-02-27 22:42:01 +11001528 case WUFFS_BASE__TOKEN__VBC__NUMBER:
Nigel Tao168f60a2020-07-14 13:19:33 +10001529 TRY(write_number(vbd, g_src.data.ptr + g_curr_token_end_src_index - len,
1530 len));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001531 goto after_value;
Nigel Tao4e193592020-07-15 12:48:57 +10001532
Nigel Taoc9d4e342020-07-21 15:20:34 +10001533 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1534 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED: {
1535 bool x_is_signed = vbc == WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED;
1536 uint64_t x = x_is_signed
1537 ? ((uint64_t)(t.value_base_detail__sign_extended()))
1538 : vbd;
Nigel Tao4e193592020-07-15 12:48:57 +10001539 TRY(write_inline_integer(
Nigel Taoc9d4e342020-07-21 15:20:34 +10001540 x, x_is_signed, g_src.data.ptr + g_curr_token_end_src_index - len,
1541 len));
Nigel Tao4e193592020-07-15 12:48:57 +10001542 goto after_value;
Nigel Taoc9d4e342020-07-21 15:20:34 +10001543 }
Nigel Tao1b073492020-02-16 22:11:36 +11001544 }
1545
Nigel Tao664f8432020-07-16 21:25:14 +10001546 if (t.value_major() == WUFFS_CBOR__TOKEN_VALUE_MAJOR) {
1547 uint64_t value_minor = t.value_minor();
1548 if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__TAG) {
1549 // TODO: CBOR tags.
1550 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__MINUS_1_MINUS_X) {
1551 TRY(write_cbor_number_as_json(
1552 g_src.data.ptr + g_curr_token_end_src_index - len, len, true,
1553 true));
1554 goto after_value;
1555 }
1556 }
1557
1558 // Return an error if we didn't match the (value_major, value_minor) or
1559 // (vbc, vbd) pair.
Nigel Tao2cf76db2020-02-27 22:42:01 +11001560 return "main: internal error: unexpected token";
1561 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001562
Nigel Tao2cf76db2020-02-27 22:42:01 +11001563 // Book-keeping after completing a value (whether a container value or a
1564 // simple value). Empty parent containers are no longer empty. If the parent
1565 // container is a "{...}" object, toggle between keys and values.
1566after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001567 if (g_depth == 0) {
1568 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001569 }
Nigel Taod60815c2020-03-26 14:32:35 +11001570 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001571 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001572 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001573 break;
1574 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001575 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001576 break;
1577 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001578 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001579 break;
1580 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001581 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001582 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001583 default:
1584 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001585 }
1586 return nullptr;
1587}
1588
1589const char* //
1590main1(int argc, char** argv) {
1591 TRY(initialize_globals(argc, argv));
1592
Nigel Taocd183f92020-07-14 12:11:05 +10001593 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001594 while (true) {
Nigel Tao4e193592020-07-15 12:48:57 +10001595 wuffs_base__status status = g_dec->decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001596 &g_tok, &g_src,
1597 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001598
Nigel Taod60815c2020-03-26 14:32:35 +11001599 while (g_tok.meta.ri < g_tok.meta.wi) {
1600 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao2cf76db2020-02-27 22:42:01 +11001601 uint64_t n = t.length();
Nigel Taod60815c2020-03-26 14:32:35 +11001602 if ((g_src.meta.ri - g_curr_token_end_src_index) < n) {
1603 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001604 }
Nigel Taod60815c2020-03-26 14:32:35 +11001605 g_curr_token_end_src_index += n;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001606
Nigel Taod0b16cb2020-03-14 10:15:54 +11001607 // Skip filler tokens (e.g. whitespace).
Nigel Tao3c8589b2020-07-19 21:49:00 +10001608 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao496e88b2020-04-09 22:10:08 +10001609 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001610 continue;
1611 }
1612
Nigel Tao2ef39992020-04-09 17:24:39 +10001613 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao496e88b2020-04-09 22:10:08 +10001614 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001615 if (z == nullptr) {
1616 continue;
Nigel Taod60815c2020-03-26 14:32:35 +11001617 } else if (z == g_eod) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001618 goto end_of_data;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001619 }
1620 return z;
Nigel Tao1b073492020-02-16 22:11:36 +11001621 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001622
1623 if (status.repr == nullptr) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001624 return "main: internal error: unexpected end of token stream";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001625 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Taod60815c2020-03-26 14:32:35 +11001626 if (g_curr_token_end_src_index != g_src.meta.ri) {
1627 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001628 }
1629 TRY(read_src());
Nigel Taod60815c2020-03-26 14:32:35 +11001630 g_curr_token_end_src_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001631 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001632 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001633 } else {
1634 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001635 }
1636 }
Nigel Tao0cd2f982020-03-03 23:03:02 +11001637end_of_data:
1638
Nigel Taod60815c2020-03-26 14:32:35 +11001639 // With a non-empty g_query, don't try to consume trailing whitespace or
Nigel Tao0cd2f982020-03-03 23:03:02 +11001640 // confirm that we've processed all the tokens.
Nigel Taod60815c2020-03-26 14:32:35 +11001641 if (g_flags.query_c_string && *g_flags.query_c_string) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001642 return nullptr;
1643 }
Nigel Tao6b161af2020-02-24 11:01:48 +11001644
Nigel Tao6b161af2020-02-24 11:01:48 +11001645 // Check that we've exhausted the input.
Nigel Taod60815c2020-03-26 14:32:35 +11001646 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001647 TRY(read_src());
1648 }
Nigel Taod60815c2020-03-26 14:32:35 +11001649 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
Nigel Tao51a38292020-07-19 22:43:17 +10001650 return "main: valid JSON|CBOR followed by further (unexpected) data";
Nigel Tao6b161af2020-02-24 11:01:48 +11001651 }
1652
1653 // Check that we've used all of the decoded tokens, other than trailing
Nigel Tao4b186b02020-03-18 14:25:21 +11001654 // filler tokens. For example, "true\n" is valid JSON (and fully consumed
1655 // with WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE enabled) with a trailing
1656 // filler token for the "\n".
Nigel Taod60815c2020-03-26 14:32:35 +11001657 for (; g_tok.meta.ri < g_tok.meta.wi; g_tok.meta.ri++) {
1658 if (g_tok.data.ptr[g_tok.meta.ri].value_base_category() !=
Nigel Tao6b161af2020-02-24 11:01:48 +11001659 WUFFS_BASE__TOKEN__VBC__FILLER) {
1660 return "main: internal error: decoded OK but unprocessed tokens remain";
1661 }
1662 }
1663
1664 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001665}
1666
Nigel Tao2914bae2020-02-26 09:40:30 +11001667int //
1668compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001669 if (!status_msg) {
1670 return 0;
1671 }
Nigel Tao01abc842020-03-06 21:42:33 +11001672 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001673 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001674 n = strlen(status_msg);
1675 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001676 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001677 if (n >= 2047) {
1678 status_msg = "main: internal error: error message is too long";
1679 n = strnlen(status_msg, 2047);
1680 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001681 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001682 const int stderr_fd = 2;
1683 ignore_return_value(write(stderr_fd, status_msg, n));
1684 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Tao9cc2c252020-02-23 17:05:49 +11001685 // Return an exit code of 1 for regular (forseen) errors, e.g. badly
1686 // formatted or unsupported input.
1687 //
1688 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1689 // run-time checks found that an internal invariant did not hold.
1690 //
1691 // Automated testing, including badly formatted inputs, can therefore
1692 // discriminate between expected failure (exit code 1) and unexpected failure
1693 // (other non-zero exit codes). Specifically, exit code 2 for internal
1694 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1695 // linux) for a segmentation fault (e.g. null pointer dereference).
1696 return strstr(status_msg, "internal error:") ? 2 : 1;
1697}
1698
Nigel Tao2914bae2020-02-26 09:40:30 +11001699int //
1700main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001701 // Look for an input filename (the first non-flag argument) in argv. If there
1702 // is one, open it (but do not read from it) before we self-impose a sandbox.
1703 //
1704 // Flags start with "-", unless it comes after a bare "--" arg.
1705 {
1706 bool dash_dash = false;
1707 int a;
1708 for (a = 1; a < argc; a++) {
1709 char* arg = argv[a];
1710 if ((arg[0] == '-') && !dash_dash) {
1711 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1712 continue;
1713 }
Nigel Taod60815c2020-03-26 14:32:35 +11001714 g_input_file_descriptor = open(arg, O_RDONLY);
1715 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001716 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1717 return 1;
1718 }
1719 break;
1720 }
1721 }
1722
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001723#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1724 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001725 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001726#endif
1727
Nigel Tao0cd2f982020-03-03 23:03:02 +11001728 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001729 if (g_wrote_to_dst) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001730 const char* z1 = (g_flags.output_format == file_format::json)
1731 ? write_dst("\n", 1)
1732 : nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001733 const char* z2 = flush_dst();
1734 z = z ? z : (z1 ? z1 : z2);
1735 }
1736 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001737
1738#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1739 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1740 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1741 // only SYS_exit.
1742 syscall(SYS_exit, exit_code);
1743#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001744 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001745}