blob: beac1461557d2e760cf5aaea6fcc353c5f22b0b6 [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
17/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110018jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao168f60a2020-07-14 13:19:33 +100019(RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin and writes CBOR
20or canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110021
Nigel Taod60815c2020-03-26 14:32:35 +110022See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110023
24----
25
26JSON Pointer (and this program's implementation) is one of many JSON query
27languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
28simple and fewer-featured compared to those others.
29
Nigel Tao168f60a2020-07-14 13:19:33 +100030One benefit of simplicity is that this program's CBOR, JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110031implementations do not dynamically allocate or free memory (yet it does not
32require that the entire input fits in memory at once). They are therefore
33trivially protected against certain bug classes: memory leaks, double-frees and
34use-after-frees.
35
Nigel Tao168f60a2020-07-14 13:19:33 +100036The CBOR and JSON implementations are also written in the Wuffs programming
37language (and then transpiled to C/C++), which is memory-safe (e.g. array
38indexing is bounds-checked) but also prevents integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110039
Nigel Taofe0cbbd2020-03-05 22:01:30 +110040For defense in depth, on Linux, this program also self-imposes a
41SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
42or writing its output. Under this sandbox, the only permitted system calls are
43read, write, exit and sigreturn.
44
Nigel Tao168f60a2020-07-14 13:19:33 +100045All together, this program aims to safely handle untrusted CBOR or JSON files
46without fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110047
48----
Nigel Tao1b073492020-02-16 22:11:36 +110049
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110050As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
51JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
52"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
53was first published on 2016-10-26 and updated on 2018-03-30.
54
Nigel Tao0cd2f982020-03-03 23:03:02 +110055After modifying this program, run "build-example.sh example/jsonptr/" and then
56"script/run-json-test-suite.sh" to catch correctness regressions.
57
58----
59
Nigel Taod0b16cb2020-03-14 10:15:54 +110060This program uses Wuffs' JSON decoder at a relatively low level, processing the
61decoder's token-stream output individually. The core loop, in pseudo-code, is
62"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110063changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110064output text based on that state and the token's source text. Notably,
65handle_token is not recursive, even though JSON values can nest.
66
67This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
68string, object) comprises one or more JSON tokens.
69
70An alternative, higher-level approach is in the sibling example/jsonfindptrs
71program. Neither approach is better or worse per se, but when studying this
72program, be aware that there are multiple ways to use Wuffs' JSON decoder.
73
74The two programs, jsonfindptrs and jsonptr, also demonstrate different
75trade-offs with regard to JSON object duplicate keys. The JSON spec permits
76different implementations to allow or reject duplicate keys. It is not always
77clear which approach is safer. Rejecting them is certainly unambiguous, and
78security bugs can lurk in ambiguous corners of a file format, if two different
79implementations both silently accept a file but differ on how to interpret it.
80On the other hand, in the worst case, detecting duplicate keys requires O(N)
81memory, where N is the size of the (potentially untrusted) input.
82
83This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
84mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
85it runs in a SECCOMP_MODE_STRICT sandbox.
86
87----
88
Nigel Tao1b073492020-02-16 22:11:36 +110089This example program differs from most other example Wuffs programs in that it
90is written in C++, not C.
91
92$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
93
94for a C++ compiler $CXX, such as clang++ or g++.
95*/
96
Nigel Tao721190a2020-04-03 22:25:21 +110097#if defined(__cplusplus) && (__cplusplus < 201103L)
98#error "This C++ program requires -std=c++11 or later"
99#endif
100
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100101#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100102#include <fcntl.h>
103#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100104#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100105#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100106
107// Wuffs ships as a "single file C library" or "header file library" as per
108// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
109//
110// To use that single file as a "foo.c"-like implementation, instead of a
111// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
112// compiling it.
113#define WUFFS_IMPLEMENTATION
114
115// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
116// release/c/etc.c whitelist which parts of Wuffs to build. That file contains
117// the entire Wuffs standard library, implementing a variety of codecs and file
118// formats. Without this macro definition, an optimizing compiler or linker may
119// very well discard Wuffs code for unused codecs, but listing the Wuffs
120// modules we use makes that process explicit. Preprocessing means that such
121// code simply isn't compiled.
122#define WUFFS_CONFIG__MODULES
123#define WUFFS_CONFIG__MODULE__BASE
Nigel Tao4e193592020-07-15 12:48:57 +1000124#define WUFFS_CONFIG__MODULE__CBOR
Nigel Tao1b073492020-02-16 22:11:36 +1100125#define WUFFS_CONFIG__MODULE__JSON
126
127// If building this program in an environment that doesn't easily accommodate
128// relative includes, you can use the script/inline-c-relative-includes.go
129// program to generate a stand-alone C++ file.
130#include "../../release/c/wuffs-unsupported-snapshot.c"
131
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100132#if defined(__linux__)
133#include <linux/prctl.h>
134#include <linux/seccomp.h>
135#include <sys/prctl.h>
136#include <sys/syscall.h>
137#define WUFFS_EXAMPLE_USE_SECCOMP
138#endif
139
Nigel Tao2cf76db2020-02-27 22:42:01 +1100140#define TRY(error_msg) \
141 do { \
142 const char* z = error_msg; \
143 if (z) { \
144 return z; \
145 } \
146 } while (false)
147
Nigel Taod60815c2020-03-26 14:32:35 +1100148static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100149
Nigel Taod60815c2020-03-26 14:32:35 +1100150static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100151 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100152 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100153 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100154 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100155 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao4e193592020-07-15 12:48:57 +1000156 " -i=FMT -input-format={json,cbor}\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000157 " -o=FMT -output-format={json,cbor}\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100158 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000159 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100160 " -t -tabs\n"
161 " -fail-if-unsandboxed\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000162 " -input-allow-json-comments\n"
163 " -input-allow-json-extra-comma\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000164 " -input-allow-json-inf-nan-numbers\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000165 " -output-cbor-metadata-as-json-comments\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000166 " -output-json-extra-comma\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000167 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100168 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100169 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100170 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100171 "----\n"
172 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100173 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000174 "Pointer (RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin\n"
175 "and writes CBOR or canonicalized, formatted UTF-8 JSON to stdout. The\n"
176 "input and output formats do not have to match, but conversion between\n"
177 "formats may be lossy.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100178 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000179 "Canonicalized JSON means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-\n"
180 "written as \"abc\\n\\txÅ·z\". It does not sort object keys or reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100181 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100182 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000183 "CBOR output is non-canonical (in the RFC 7049 Section 3.9 sense), as\n"
184 "sorting map keys and measuring indefinite-length containers requires\n"
185 "O(input_length) memory but this program runs in O(1) memory.\n"
186 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100187 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000188 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000189 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags. Those\n"
190 "flags only apply to JSON (not CBOR) output.\n"
191 "\n"
192 "The -input-format and -output-format flags select between reading and\n"
193 "writing JSON (the default, a textual format) or CBOR (a binary format).\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100194 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000195 "The -input-allow-json-comments flag allows \"/*slash-star*/\" and\n"
196 "\"//slash-slash\" C-style comments within JSON input.\n"
197 "\n"
198 "The -input-allow-json-extra-comma flag allows input like \"[1,2,]\",\n"
199 "with a comma after the final element of a JSON list or dictionary.\n"
200 "\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000201 "The -input-allow-json-inf-nan-numbers flag allows non-finite floating\n"
202 "point numbers (infinities and not-a-numbers) within JSON input.\n"
203 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000204 "The -output-cbor-metadata-as-json-comments writes CBOR tags and other\n"
205 "metadata as /*comments*/, when -i=json and -o=cbor are also set. Such\n"
206 "comments are non-compliant with the JSON specification but many parsers\n"
207 "accept them.\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000208 "\n"
209 "The -output-json-extra-comma flag writes extra commas, regardless of\n"
210 "whether the input had it. Extra commas are non-compliant with the JSON\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000211 "specification but many parsers accept them and they can produce simpler\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000212 "line-based diffs. This flag is ignored when -compact-output is set.\n"
213 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000214 "When converting from -i=cbor to -o=json, CBOR permits map keys other\n"
215 "than (untagged) UTF-8 strings but JSON does not. This program rejects\n"
216 "such input, as doing otherwise has complicated interactions with the\n"
217 "-query=STR flag and streaming input.\n"
218 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100219 "----\n"
220 "\n"
221 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100222 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100223 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
224 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100225 "will print:\n"
226 " \"baz\"\n"
227 "\n"
228 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100229 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100230 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
231 "child (the value in a key-value pair) of the root whose key is the empty\n"
232 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100233 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000234 "If the query found a valid JSON|CBOR value, this program will return a\n"
235 "zero exit code even if the rest of the input isn't valid. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100236 "did not find a value, or found an invalid one, this program returns a\n"
237 "non-zero exit code, but may still print partial output to stdout.\n"
238 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000239 "The JSON and CBOR specifications (https://json.org/ or RFC 8259; RFC\n"
240 "7049) permit implementations to allow duplicate keys, as this one does.\n"
241 "This JSON Pointer implementation is also greedy, following the first\n"
242 "match for each fragment without back-tracking. For example, the\n"
243 "\"/foo/bar\" query will fail if the root object has multiple \"foo\"\n"
244 "children but the first one doesn't have a \"bar\" child, even if later\n"
245 "ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100246 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000247 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
248 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
249 "\"~\" and \"/\". Without this flag, this program also lets \"~n\" and\n"
250 "\"~r\" escape the New Line and Carriage Return ASCII control characters,\n"
251 "which can work better with line oriented Unix tools that assume exactly\n"
252 "one value (i.e. one JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100253 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100254 "----\n"
255 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100256 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000257 "output depth. JSON|CBOR containers ([] arrays and {} objects) can hold\n"
258 "other containers. When this flag is set, containers at depth NUM are\n"
259 "replaced with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is\n"
260 "equivalent to -d=1. The flag's absence means an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100261 "\n"
262 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000263 "affect whether or not the input is considered valid JSON|CBOR. The\n"
264 "format specifications permit implementations to set their own maximum\n"
265 "input depth. This JSON|CBOR implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100266 "\n"
267 "Depth is measured in terms of nested containers. It is unaffected by the\n"
268 "number of spaces or tabs used to indent.\n"
269 "\n"
270 "When both -max-output-depth and -query are set, the output depth is\n"
271 "measured from when the query resolves, not from the input root. The\n"
272 "input depth (measured from the root) is still limited to 1024.\n"
273 "\n"
274 "----\n"
275 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100276 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
277 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100278 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100279
Nigel Tao2cf76db2020-02-27 22:42:01 +1100280// ----
281
Nigel Taof3146c22020-03-26 08:47:42 +1100282// Wuffs allows either statically or dynamically allocated work buffers. This
283// program exercises static allocation.
284#define WORK_BUFFER_ARRAY_SIZE \
285 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
286#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100287uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100288#else
289// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100290uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100291#endif
292
Nigel Taod60815c2020-03-26 14:32:35 +1100293bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100294
Nigel Taod60815c2020-03-26 14:32:35 +1100295int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100296
Nigel Tao2cf76db2020-02-27 22:42:01 +1100297#define MAX_INDENT 8
Nigel Tao107f0ef2020-03-01 21:35:02 +1100298#define INDENT_SPACES_STRING " "
Nigel Tao6e7d1412020-03-06 09:21:35 +1100299#define INDENT_TAB_STRING "\t"
Nigel Tao107f0ef2020-03-01 21:35:02 +1100300
Nigel Taofdac24a2020-03-06 21:53:08 +1100301#ifndef DST_BUFFER_ARRAY_SIZE
302#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100303#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100304#ifndef SRC_BUFFER_ARRAY_SIZE
305#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100306#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100307#ifndef TOKEN_BUFFER_ARRAY_SIZE
308#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100309#endif
310
Nigel Taod60815c2020-03-26 14:32:35 +1100311uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
312uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
313wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100314
Nigel Taod60815c2020-03-26 14:32:35 +1100315wuffs_base__io_buffer g_dst;
316wuffs_base__io_buffer g_src;
317wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100318
Nigel Taod60815c2020-03-26 14:32:35 +1100319// g_curr_token_end_src_index is the g_src.data.ptr index of the end of the
320// current token. An invariant is that (g_curr_token_end_src_index <=
321// g_src.meta.ri).
322size_t g_curr_token_end_src_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100323
Nigel Tao27168032020-07-24 13:05:05 +1000324// Valid token's VBCs range in 0 ..= 15. Values over that are for tokens from
325// outside of the base package, such as the CBOR package.
326#define CATEGORY_CBOR_TAG 16
327
Nigel Tao850dc182020-07-21 22:52:04 +1000328struct {
329 uint64_t category;
330 uint64_t detail;
331} g_token_extension;
332
Nigel Taod60815c2020-03-26 14:32:35 +1100333uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100334
335enum class context {
336 none,
337 in_list_after_bracket,
338 in_list_after_value,
339 in_dict_after_brace,
340 in_dict_after_key,
341 in_dict_after_value,
Nigel Taod60815c2020-03-26 14:32:35 +1100342} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100343
Nigel Tao0cd2f982020-03-03 23:03:02 +1100344bool //
345in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100346 return (g_ctx == context::in_dict_after_brace) ||
347 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100348}
349
Nigel Taod60815c2020-03-26 14:32:35 +1100350uint32_t g_suppress_write_dst;
351bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100352
Nigel Tao4e193592020-07-15 12:48:57 +1000353wuffs_cbor__decoder g_cbor_decoder;
354wuffs_json__decoder g_json_decoder;
355wuffs_base__token_decoder* g_dec;
Nigel Tao1b073492020-02-16 22:11:36 +1100356
Nigel Tao168f60a2020-07-14 13:19:33 +1000357// cbor_output_string_array is a 4 KiB buffer. For -output-format=cbor, strings
358// whose length are 4096 or less are written as a single definite-length
359// string. Longer strings are written as an indefinite-length string containing
360// multiple definite-length chunks, each of length up to 4 KiB. See the CBOR
361// RFC (RFC 7049) section 2.2.2 "Indefinite-Length Byte Strings and Text
362// Strings". The output is determinate even when the input is streamed.
363//
364// If raising CBOR_OUTPUT_STRING_ARRAY_SIZE above 0xFFFF then you will also
365// have to update flush_cbor_output_string.
366#define CBOR_OUTPUT_STRING_ARRAY_SIZE 4096
367uint8_t g_cbor_output_string_array[CBOR_OUTPUT_STRING_ARRAY_SIZE];
368
369uint32_t g_cbor_output_string_length;
370bool g_cbor_output_string_is_multiple_chunks;
371bool g_cbor_output_string_is_utf_8;
372
Nigel Tao0cd2f982020-03-03 23:03:02 +1100373// ----
374
375// Query is a JSON Pointer query. After initializing with a NUL-terminated C
376// string, its multiple fragments are consumed as the program walks the JSON
377// data from stdin. For example, letting "$" denote a NUL, suppose that we
378// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100379// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100380//
381// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
382// / a p p l e / b a n a n a / 1 2 / d u r i a n $
383// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
384// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100385// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100386//
Nigel Taob48ee752020-03-13 09:27:33 +1100387// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
388// start (inclusive) and end (exclusive) of the query fragment. They satisfy
389// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
390// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100391//
Nigel Taob48ee752020-03-13 09:27:33 +1100392// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
393// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100394//
395// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
396// tokens, as backslash-escaped values within that JSON string may each get
397// their own token.
398//
Nigel Taob48ee752020-03-13 09:27:33 +1100399// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100400//
Nigel Taob48ee752020-03-13 09:27:33 +1100401// While mfj remains non-nullptr, each token's unescaped contents are then
402// compared to that part of the fragment from mfj to mfk. If it is a prefix
403// (including the case of an exact match), then mfj is advanced by the
404// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100405//
406// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
407// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100408// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
409// responsible for calling Query::validate (with a strict_json_pointer_syntax
410// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100411//
Nigel Taob48ee752020-03-13 09:27:33 +1100412// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
413// incrementally match the object key with the query fragment. For example, if
414// we have already matched the "ban" of "banana", then we would accept any of
415// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
416// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100417//
Nigel Taob48ee752020-03-13 09:27:33 +1100418// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100419// v
420// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
421// / a p p l e / b a n a n a / 1 2 / d u r i a n $
422// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
423// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100424// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100425//
426// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100427// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
428// have a fragment match: the query fragment equals the object key. If there is
429// a next fragment (in this example, "12") we move the frag_etc pointers to its
430// start and end and increment Query::m_depth. Otherwise, we have matched the
431// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100432//
433// The discussion above centers on object keys. If the query fragment is
434// numeric then it can also match as an array index: the string fragment "12"
435// will match an array's 13th element (starting counting from zero). See RFC
436// 6901 for its precise definition of an "array index" number.
437//
Nigel Taob48ee752020-03-13 09:27:33 +1100438// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100439// whose type (wuffs_base__result_u64) is a result type. An error result means
440// that the fragment is not an array index. A value result holds the number of
441// list elements remaining. When matching a query fragment in an array (instead
442// of in an object), each element ticks this number down towards zero. At zero,
443// the upcoming JSON value is the one that matches the query fragment.
444class Query {
445 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100446 uint8_t* m_frag_i;
447 uint8_t* m_frag_j;
448 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100449
Nigel Taob48ee752020-03-13 09:27:33 +1100450 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100451
Nigel Taob48ee752020-03-13 09:27:33 +1100452 wuffs_base__result_u64 m_array_index;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100453
454 public:
455 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100456 m_frag_i = (uint8_t*)query_c_string;
457 m_frag_j = (uint8_t*)query_c_string;
458 m_frag_k = (uint8_t*)query_c_string;
459 m_depth = 0;
460 m_array_index.status.repr = "#main: not an array index query fragment";
461 m_array_index.value = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100462 }
463
Nigel Taob48ee752020-03-13 09:27:33 +1100464 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100465
Nigel Taob48ee752020-03-13 09:27:33 +1100466 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100467
468 // tick returns whether the fragment is a valid array index whose value is
469 // zero. If valid but non-zero, it decrements it and returns false.
470 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100471 if (m_array_index.status.is_ok()) {
472 if (m_array_index.value == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100473 return true;
474 }
Nigel Taob48ee752020-03-13 09:27:33 +1100475 m_array_index.value--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100476 }
477 return false;
478 }
479
480 // next_fragment moves to the next fragment, returning whether it existed.
481 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100482 uint8_t* k = m_frag_k;
483 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100484
485 this->reset(nullptr);
486
487 if (!k || (*k != '/')) {
488 return false;
489 }
490 k++;
491
492 bool all_digits = true;
493 uint8_t* i = k;
494 while ((*k != '\x00') && (*k != '/')) {
495 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
496 k++;
497 }
Nigel Taob48ee752020-03-13 09:27:33 +1100498 m_frag_i = i;
499 m_frag_j = i;
500 m_frag_k = k;
501 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100502 if (all_digits) {
503 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000504 m_array_index = wuffs_base__parse_number_u64(
505 wuffs_base__make_slice_u8(i, k - i),
506 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100507 }
508 return true;
509 }
510
Nigel Taob48ee752020-03-13 09:27:33 +1100511 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100512
Nigel Taob48ee752020-03-13 09:27:33 +1100513 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100514
515 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100516 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100517 return;
518 }
Nigel Taob48ee752020-03-13 09:27:33 +1100519 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100520 while (true) {
521 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100522 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100523 return;
524 }
525
526 if (*j == '\x00') {
527 break;
528
529 } else if (*j == '~') {
530 j++;
531 if (*j == '0') {
532 if (*ptr != '~') {
533 break;
534 }
535 } else if (*j == '1') {
536 if (*ptr != '/') {
537 break;
538 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100539 } else if (*j == 'n') {
540 if (*ptr != '\n') {
541 break;
542 }
543 } else if (*j == 'r') {
544 if (*ptr != '\r') {
545 break;
546 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100547 } else {
548 break;
549 }
550
551 } else if (*j != *ptr) {
552 break;
553 }
554
555 j++;
556 ptr++;
557 len--;
558 }
Nigel Taob48ee752020-03-13 09:27:33 +1100559 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100560 }
561
562 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100563 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100564 return;
565 }
566 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
567 size_t n = wuffs_base__utf_8__encode(
568 wuffs_base__make_slice_u8(&u[0],
569 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
570 code_point);
571 if (n > 0) {
572 this->incremental_match_slice(&u[0], n);
573 }
574 }
575
576 // validate returns whether the (ptr, len) arguments form a valid JSON
577 // Pointer. In particular, it must be valid UTF-8, and either be empty or
578 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100579 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
580 // followed by either 'n' or 'r'.
581 static bool validate(char* query_c_string,
582 size_t length,
583 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100584 if (length <= 0) {
585 return true;
586 }
587 if (query_c_string[0] != '/') {
588 return false;
589 }
590 wuffs_base__slice_u8 s =
591 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
592 bool previous_was_tilde = false;
593 while (s.len > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +1000594 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s.ptr, s.len);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100595 if (!o.is_valid()) {
596 return false;
597 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100598
599 if (previous_was_tilde) {
600 switch (o.code_point) {
601 case '0':
602 case '1':
603 break;
604 case 'n':
605 case 'r':
606 if (strict_json_pointer_syntax) {
607 return false;
608 }
609 break;
610 default:
611 return false;
612 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100613 }
614 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100615
Nigel Tao0cd2f982020-03-03 23:03:02 +1100616 s.ptr += o.byte_length;
617 s.len -= o.byte_length;
618 }
619 return !previous_was_tilde;
620 }
Nigel Taod60815c2020-03-26 14:32:35 +1100621} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100622
623// ----
624
Nigel Tao168f60a2020-07-14 13:19:33 +1000625enum class file_format {
626 json,
627 cbor,
628};
629
Nigel Tao68920952020-03-03 11:25:18 +1100630struct {
631 int remaining_argc;
632 char** remaining_argv;
633
Nigel Tao3690e832020-03-12 16:52:26 +1100634 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100635 bool fail_if_unsandboxed;
Nigel Tao4e193592020-07-15 12:48:57 +1000636 file_format input_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000637 bool input_allow_json_comments;
638 bool input_allow_json_extra_comma;
Nigel Tao51a38292020-07-19 22:43:17 +1000639 bool input_allow_json_inf_nan_numbers;
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100640 uint32_t max_output_depth;
Nigel Tao168f60a2020-07-14 13:19:33 +1000641 file_format output_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000642 bool output_cbor_metadata_as_json_comments;
Nigel Taoc766bb72020-07-09 12:59:32 +1000643 bool output_json_extra_comma;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100644 char* query_c_string;
Nigel Taoecadf722020-07-13 08:22:34 +1000645 size_t spaces;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100646 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100647 bool tabs;
Nigel Taod60815c2020-03-26 14:32:35 +1100648} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100649
650const char* //
651parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000652 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100653 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100654
655 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
656 for (; c < argc; c++) {
657 char* arg = argv[c];
658 if (*arg++ != '-') {
659 break;
660 }
661
662 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
663 // cases, a bare "-" is not a flag (some programs may interpret it as
664 // stdin) and a bare "--" means to stop parsing flags.
665 if (*arg == '\x00') {
666 break;
667 } else if (*arg == '-') {
668 arg++;
669 if (*arg == '\x00') {
670 c++;
671 break;
672 }
673 }
674
Nigel Tao3690e832020-03-12 16:52:26 +1100675 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100676 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100677 continue;
678 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100679 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
680 g_flags.max_output_depth = 1;
681 continue;
682 } else if (!strncmp(arg, "d=", 2) ||
683 !strncmp(arg, "max-output-depth=", 16)) {
684 while (*arg++ != '=') {
685 }
686 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000687 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
688 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000689 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100690 g_flags.max_output_depth = (uint32_t)(u.value);
691 continue;
692 }
693 return g_usage;
694 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100695 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100696 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100697 continue;
698 }
Nigel Tao4e193592020-07-15 12:48:57 +1000699 if (!strcmp(arg, "i=cbor") || !strcmp(arg, "input-format=cbor")) {
700 g_flags.input_format = file_format::cbor;
701 continue;
702 }
703 if (!strcmp(arg, "i=json") || !strcmp(arg, "input-format=json")) {
704 g_flags.input_format = file_format::json;
705 continue;
706 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000707 if (!strcmp(arg, "input-allow-json-comments")) {
708 g_flags.input_allow_json_comments = true;
709 continue;
710 }
711 if (!strcmp(arg, "input-allow-json-extra-comma")) {
712 g_flags.input_allow_json_extra_comma = true;
Nigel Taoc766bb72020-07-09 12:59:32 +1000713 continue;
714 }
Nigel Tao51a38292020-07-19 22:43:17 +1000715 if (!strcmp(arg, "input-allow-json-inf-nan-numbers")) {
716 g_flags.input_allow_json_inf_nan_numbers = true;
717 continue;
718 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000719 if (!strcmp(arg, "o=cbor") || !strcmp(arg, "output-format=cbor")) {
720 g_flags.output_format = file_format::cbor;
721 continue;
722 }
723 if (!strcmp(arg, "o=json") || !strcmp(arg, "output-format=json")) {
724 g_flags.output_format = file_format::json;
725 continue;
726 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000727 if (!strcmp(arg, "output-cbor-metadata-as-json-comments")) {
728 g_flags.output_cbor_metadata_as_json_comments = true;
729 continue;
730 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000731 if (!strcmp(arg, "output-json-extra-comma")) {
732 g_flags.output_json_extra_comma = true;
733 continue;
734 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100735 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
736 while (*arg++ != '=') {
737 }
Nigel Taod60815c2020-03-26 14:32:35 +1100738 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100739 continue;
740 }
Nigel Taoecadf722020-07-13 08:22:34 +1000741 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
742 while (*arg++ != '=') {
743 }
744 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
745 g_flags.spaces = arg[0] - '0';
746 continue;
747 }
748 return g_usage;
749 }
750 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100751 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100752 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100753 }
754 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100755 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100756 continue;
757 }
758
Nigel Taod60815c2020-03-26 14:32:35 +1100759 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100760 }
761
Nigel Taod60815c2020-03-26 14:32:35 +1100762 if (g_flags.query_c_string &&
763 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
764 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100765 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
766 }
767
Nigel Taod60815c2020-03-26 14:32:35 +1100768 g_flags.remaining_argc = argc - c;
769 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100770 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100771}
772
Nigel Tao2cf76db2020-02-27 22:42:01 +1100773const char* //
774initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100775 g_dst = wuffs_base__make_io_buffer(
776 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100777 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100778
Nigel Taod60815c2020-03-26 14:32:35 +1100779 g_src = wuffs_base__make_io_buffer(
780 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100781 wuffs_base__empty_io_buffer_meta());
782
Nigel Taod60815c2020-03-26 14:32:35 +1100783 g_tok = wuffs_base__make_token_buffer(
784 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100785 wuffs_base__empty_token_buffer_meta());
786
Nigel Taod60815c2020-03-26 14:32:35 +1100787 g_curr_token_end_src_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100788
Nigel Tao850dc182020-07-21 22:52:04 +1000789 g_token_extension.category = 0;
790 g_token_extension.detail = 0;
791
Nigel Taod60815c2020-03-26 14:32:35 +1100792 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100793
Nigel Taod60815c2020-03-26 14:32:35 +1100794 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100795
Nigel Tao68920952020-03-03 11:25:18 +1100796 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100797 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100798 return "main: unsandboxed";
799 }
Nigel Tao01abc842020-03-06 21:42:33 +1100800 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100801 if (g_flags.remaining_argc >
802 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
803 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100804 }
805
Nigel Taod60815c2020-03-26 14:32:35 +1100806 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100807
808 // If the query is non-empty, suprress writing to stdout until we've
809 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100810 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
811 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100812
Nigel Tao4e193592020-07-15 12:48:57 +1000813 if (g_flags.input_format == file_format::json) {
814 TRY(g_json_decoder
815 .initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
816 .message());
817 g_dec = g_json_decoder.upcast_as__wuffs_base__token_decoder();
818 } else {
819 TRY(g_cbor_decoder
820 .initialize(sizeof__wuffs_cbor__decoder(), WUFFS_VERSION, 0)
821 .message());
822 g_dec = g_cbor_decoder.upcast_as__wuffs_base__token_decoder();
823 }
Nigel Tao4b186b02020-03-18 14:25:21 +1100824
Nigel Tao3c8589b2020-07-19 21:49:00 +1000825 if (g_flags.input_allow_json_comments) {
826 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
827 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
828 }
829 if (g_flags.input_allow_json_extra_comma) {
Nigel Tao4e193592020-07-15 12:48:57 +1000830 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000831 }
Nigel Tao51a38292020-07-19 22:43:17 +1000832 if (g_flags.input_allow_json_inf_nan_numbers) {
833 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_INF_NAN_NUMBERS, true);
834 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000835
Nigel Tao4b186b02020-03-18 14:25:21 +1100836 // Consume an optional whitespace trailer. This isn't part of the JSON spec,
837 // but it works better with line oriented Unix tools (such as "echo 123 |
838 // jsonptr" where it's "echo", not "echo -n") or hand-edited JSON files which
839 // can accidentally contain trailing whitespace.
Nigel Tao4e193592020-07-15 12:48:57 +1000840 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100841
842 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100843}
Nigel Tao1b073492020-02-16 22:11:36 +1100844
845// ----
846
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100847// ignore_return_value suppresses errors from -Wall -Werror.
848static void //
849ignore_return_value(int ignored) {}
850
Nigel Tao2914bae2020-02-26 09:40:30 +1100851const char* //
852read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +1100853 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +1100854 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +1100855 }
Nigel Taod60815c2020-03-26 14:32:35 +1100856 g_src.compact();
857 if (g_src.meta.wi >= g_src.data.len) {
858 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100859 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100860 while (true) {
Nigel Taod60815c2020-03-26 14:32:35 +1100861 ssize_t n = read(g_input_file_descriptor, g_src.data.ptr + g_src.meta.wi,
862 g_src.data.len - g_src.meta.wi);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100863 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100864 g_src.meta.wi += n;
865 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100866 break;
867 } else if (errno != EINTR) {
868 return strerror(errno);
869 }
Nigel Tao1b073492020-02-16 22:11:36 +1100870 }
871 return nullptr;
872}
873
Nigel Tao2914bae2020-02-26 09:40:30 +1100874const char* //
875flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100876 while (true) {
Nigel Taod60815c2020-03-26 14:32:35 +1100877 size_t n = g_dst.meta.wi - g_dst.meta.ri;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100878 if (n == 0) {
879 break;
Nigel Tao1b073492020-02-16 22:11:36 +1100880 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100881 const int stdout_fd = 1;
Nigel Taod60815c2020-03-26 14:32:35 +1100882 ssize_t i = write(stdout_fd, g_dst.data.ptr + g_dst.meta.ri, n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100883 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100884 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100885 } else if (errno != EINTR) {
886 return strerror(errno);
887 }
Nigel Tao1b073492020-02-16 22:11:36 +1100888 }
Nigel Taod60815c2020-03-26 14:32:35 +1100889 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +1100890 return nullptr;
891}
892
Nigel Tao2914bae2020-02-26 09:40:30 +1100893const char* //
894write_dst(const void* s, size_t n) {
Nigel Taod60815c2020-03-26 14:32:35 +1100895 if (g_suppress_write_dst > 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100896 return nullptr;
897 }
Nigel Tao1b073492020-02-16 22:11:36 +1100898 const uint8_t* p = static_cast<const uint8_t*>(s);
899 while (n > 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100900 size_t i = g_dst.writer_available();
Nigel Tao1b073492020-02-16 22:11:36 +1100901 if (i == 0) {
902 const char* z = flush_dst();
903 if (z) {
904 return z;
905 }
Nigel Taod60815c2020-03-26 14:32:35 +1100906 i = g_dst.writer_available();
Nigel Tao1b073492020-02-16 22:11:36 +1100907 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100908 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100909 }
910 }
911
912 if (i > n) {
913 i = n;
914 }
Nigel Taod60815c2020-03-26 14:32:35 +1100915 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
916 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +1100917 p += i;
918 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +1100919 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +1100920 }
921 return nullptr;
922}
923
924// ----
925
Nigel Tao168f60a2020-07-14 13:19:33 +1000926const char* //
927write_literal(uint64_t vbd) {
928 const char* ptr = nullptr;
929 size_t len = 0;
930 if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__UNDEFINED) {
931 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +1000932 // JSON's closest approximation to "undefined" is "null".
933 if (g_flags.output_cbor_metadata_as_json_comments) {
934 ptr = "/*cbor:undefined*/null";
935 len = 22;
936 } else {
937 ptr = "null";
938 len = 4;
939 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000940 } else {
941 ptr = "\xF7";
942 len = 1;
943 }
944 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__NULL) {
945 if (g_flags.output_format == file_format::json) {
946 ptr = "null";
947 len = 4;
948 } else {
949 ptr = "\xF6";
950 len = 1;
951 }
952 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__FALSE) {
953 if (g_flags.output_format == file_format::json) {
954 ptr = "false";
955 len = 5;
956 } else {
957 ptr = "\xF4";
958 len = 1;
959 }
960 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__TRUE) {
961 if (g_flags.output_format == file_format::json) {
962 ptr = "true";
963 len = 4;
964 } else {
965 ptr = "\xF5";
966 len = 1;
967 }
968 } else {
969 return "main: internal error: unexpected write_literal argument";
970 }
971 return write_dst(ptr, len);
972}
973
974// ----
975
976const char* //
Nigel Tao664f8432020-07-16 21:25:14 +1000977write_number_as_cbor_f64(double f) {
Nigel Tao168f60a2020-07-14 13:19:33 +1000978 uint8_t buf[9];
979 wuffs_base__lossy_value_u16 lv16 =
980 wuffs_base__ieee_754_bit_representation__from_f64_to_u16_truncate(f);
981 if (!lv16.lossy) {
982 buf[0] = 0xF9;
983 wuffs_base__store_u16be__no_bounds_check(&buf[1], lv16.value);
984 return write_dst(&buf[0], 3);
985 }
986 wuffs_base__lossy_value_u32 lv32 =
987 wuffs_base__ieee_754_bit_representation__from_f64_to_u32_truncate(f);
988 if (!lv32.lossy) {
989 buf[0] = 0xFA;
990 wuffs_base__store_u32be__no_bounds_check(&buf[1], lv32.value);
991 return write_dst(&buf[0], 5);
992 }
993 buf[0] = 0xFB;
994 wuffs_base__store_u64be__no_bounds_check(
995 &buf[1], wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f));
996 return write_dst(&buf[0], 9);
997}
998
999const char* //
Nigel Tao664f8432020-07-16 21:25:14 +10001000write_number_as_cbor_u64(uint8_t base, uint64_t u) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001001 uint8_t buf[9];
1002 if (u < 0x18) {
1003 buf[0] = base | ((uint8_t)u);
1004 return write_dst(&buf[0], 1);
1005 } else if ((u >> 8) == 0) {
1006 buf[0] = base | 0x18;
1007 buf[1] = ((uint8_t)u);
1008 return write_dst(&buf[0], 2);
1009 } else if ((u >> 16) == 0) {
1010 buf[0] = base | 0x19;
1011 wuffs_base__store_u16be__no_bounds_check(&buf[1], ((uint16_t)u));
1012 return write_dst(&buf[0], 3);
1013 } else if ((u >> 32) == 0) {
1014 buf[0] = base | 0x1A;
1015 wuffs_base__store_u32be__no_bounds_check(&buf[1], ((uint32_t)u));
1016 return write_dst(&buf[0], 5);
1017 }
1018 buf[0] = base | 0x1B;
1019 wuffs_base__store_u64be__no_bounds_check(&buf[1], u);
1020 return write_dst(&buf[0], 9);
1021}
1022
1023const char* //
Nigel Tao5a616b62020-07-24 23:54:52 +10001024write_number_as_json_f64(uint8_t* ptr, size_t len) {
1025 double f;
1026 switch (len) {
1027 case 3:
1028 f = wuffs_base__ieee_754_bit_representation__from_u16_to_f64(
1029 wuffs_base__load_u16be__no_bounds_check(ptr + 1));
1030 break;
1031 case 5:
1032 f = wuffs_base__ieee_754_bit_representation__from_u32_to_f64(
1033 wuffs_base__load_u32be__no_bounds_check(ptr + 1));
1034 break;
1035 case 9:
1036 f = wuffs_base__ieee_754_bit_representation__from_u64_to_f64(
1037 wuffs_base__load_u64be__no_bounds_check(ptr + 1));
1038 break;
1039 default:
1040 return "main: internal error: unexpected write_number_as_json_f64 len";
1041 }
1042 uint8_t buf[512];
1043 const uint32_t precision = 0;
1044 size_t n = wuffs_base__render_number_f64(
1045 wuffs_base__make_slice_u8(&buf[0], sizeof buf), f, precision,
1046 WUFFS_BASE__RENDER_NUMBER_FXX__JUST_ENOUGH_PRECISION);
1047
1048 // JSON numbers don't include Infinities or NaNs. For such numbers, their
1049 // IEEE 754 bit representation's 11 exponent bits are all on.
1050 uint64_t u = wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f);
1051 if (((u >> 52) & 0x7FF) == 0x7FF) {
1052 if (g_flags.output_cbor_metadata_as_json_comments) {
1053 TRY(write_dst("/*cbor:", 7));
1054 TRY(write_dst(&buf[0], n));
1055 TRY(write_dst("*/", 2));
1056 }
1057 return write_dst("null", 4);
1058 }
1059
1060 return write_dst(&buf[0], n);
1061}
1062
1063const char* //
Nigel Tao850dc182020-07-21 22:52:04 +10001064write_cbor_minus_1_minus_x(uint8_t* ptr, size_t len) {
Nigel Tao27168032020-07-24 13:05:05 +10001065 if (g_flags.output_format == file_format::cbor) {
1066 return write_dst(ptr, len);
1067 }
1068
Nigel Tao850dc182020-07-21 22:52:04 +10001069 if (len != 9) {
1070 return "main: internal error: invalid ETC__MINUS_1_MINUS_X token length";
Nigel Tao664f8432020-07-16 21:25:14 +10001071 }
Nigel Tao850dc182020-07-21 22:52:04 +10001072 uint64_t u = 1 + wuffs_base__load_u64be__no_bounds_check(ptr + 1);
1073 if (u == 0) {
1074 // See the cbor.TOKEN_VALUE_MINOR__MINUS_1_MINUS_X comment re overflow.
1075 return write_dst("-18446744073709551616", 21);
Nigel Tao664f8432020-07-16 21:25:14 +10001076 }
1077 uint8_t buf[1 + WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1078 uint8_t* b = &buf[0];
Nigel Tao850dc182020-07-21 22:52:04 +10001079 *b++ = '-';
Nigel Tao664f8432020-07-16 21:25:14 +10001080 size_t n = wuffs_base__render_number_u64(
1081 wuffs_base__make_slice_u8(b, WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL), u,
1082 WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao850dc182020-07-21 22:52:04 +10001083 return write_dst(&buf[0], 1 + n);
Nigel Tao664f8432020-07-16 21:25:14 +10001084}
1085
1086const char* //
Nigel Tao042e94f2020-07-24 23:14:27 +10001087write_cbor_simple_value(uint64_t tag, uint8_t* ptr, size_t len) {
1088 if (g_flags.output_format == file_format::cbor) {
1089 return write_dst(ptr, len);
1090 }
1091
1092 if (!g_flags.output_cbor_metadata_as_json_comments) {
1093 return nullptr;
1094 }
1095 uint8_t buf[WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1096 size_t n = wuffs_base__render_number_u64(
1097 wuffs_base__make_slice_u8(&buf[0],
1098 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL),
1099 tag, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1100 TRY(write_dst("/*cbor:simple", 13));
1101 TRY(write_dst(&buf[0], n));
1102 return write_dst("*/null", 6);
1103}
1104
1105const char* //
Nigel Tao27168032020-07-24 13:05:05 +10001106write_cbor_tag(uint64_t tag, uint8_t* ptr, size_t len) {
1107 if (g_flags.output_format == file_format::cbor) {
1108 return write_dst(ptr, len);
1109 }
1110
1111 if (!g_flags.output_cbor_metadata_as_json_comments) {
1112 return nullptr;
1113 }
1114 uint8_t buf[WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1115 size_t n = wuffs_base__render_number_u64(
1116 wuffs_base__make_slice_u8(&buf[0],
1117 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL),
1118 tag, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1119 TRY(write_dst("/*cbor:tag", 10));
1120 TRY(write_dst(&buf[0], n));
1121 return write_dst("*/", 2);
1122}
1123
1124const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001125write_number(uint64_t vbd, uint8_t* ptr, size_t len) {
Nigel Tao4e193592020-07-15 12:48:57 +10001126 if (g_flags.output_format == file_format::json) {
Nigel Tao5a616b62020-07-24 23:54:52 +10001127 const uint64_t cfp_fbbe_fifb =
1128 WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT |
1129 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_BINARY_BIG_ENDIAN |
1130 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_IGNORE_FIRST_BYTE;
Nigel Tao51a38292020-07-19 22:43:17 +10001131 if (g_flags.input_format == file_format::json) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001132 return write_dst(ptr, len);
Nigel Tao5a616b62020-07-24 23:54:52 +10001133 } else if ((vbd & cfp_fbbe_fifb) == cfp_fbbe_fifb) {
1134 return write_number_as_json_f64(ptr, len);
Nigel Tao168f60a2020-07-14 13:19:33 +10001135 }
1136
Nigel Tao4e193592020-07-15 12:48:57 +10001137 // From here on, (g_flags.output_format == file_format::cbor).
Nigel Tao4e193592020-07-15 12:48:57 +10001138 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_TEXT) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001139 // First try to parse (ptr, len) as an integer. Something like
1140 // "1180591620717411303424" is a valid number (in the JSON sense) but will
1141 // overflow int64_t or uint64_t, so fall back to parsing it as a float64.
1142 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_INTEGER_SIGNED) {
1143 if ((len > 0) && (ptr[0] == '-')) {
1144 wuffs_base__result_i64 ri = wuffs_base__parse_number_i64(
1145 wuffs_base__make_slice_u8(ptr, len),
1146 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1147 if (ri.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001148 return write_number_as_cbor_u64(0x20, ~ri.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001149 }
1150 } else {
1151 wuffs_base__result_u64 ru = wuffs_base__parse_number_u64(
1152 wuffs_base__make_slice_u8(ptr, len),
1153 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1154 if (ru.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001155 return write_number_as_cbor_u64(0x00, ru.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001156 }
1157 }
1158 }
1159
1160 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT) {
1161 wuffs_base__result_f64 rf = wuffs_base__parse_number_f64(
1162 wuffs_base__make_slice_u8(ptr, len),
1163 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1164 if (rf.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001165 return write_number_as_cbor_f64(rf.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001166 }
1167 }
Nigel Tao51a38292020-07-19 22:43:17 +10001168 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_INF) {
1169 return write_dst("\xF9\xFC\x00", 3);
1170 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_INF) {
1171 return write_dst("\xF9\x7C\x00", 3);
1172 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_NAN) {
1173 return write_dst("\xF9\xFF\xFF", 3);
1174 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_NAN) {
1175 return write_dst("\xF9\x7F\xFF", 3);
Nigel Tao168f60a2020-07-14 13:19:33 +10001176 }
1177
Nigel Tao4e193592020-07-15 12:48:57 +10001178fail:
Nigel Tao168f60a2020-07-14 13:19:33 +10001179 return "main: internal error: unexpected write_number argument";
1180}
1181
Nigel Tao4e193592020-07-15 12:48:57 +10001182const char* //
Nigel Taoc9d4e342020-07-21 15:20:34 +10001183write_inline_integer(uint64_t x, bool x_is_signed, uint8_t* ptr, size_t len) {
Nigel Tao4e193592020-07-15 12:48:57 +10001184 if (g_flags.output_format == file_format::cbor) {
1185 return write_dst(ptr, len);
1186 }
1187
Nigel Taoc9d4e342020-07-21 15:20:34 +10001188 // Adding the two ETC__BYTE_LENGTH__ETC constants is overkill, but it's
1189 // simpler (for producing a constant-expression array size) than taking the
1190 // maximum of the two.
1191 uint8_t buf[WUFFS_BASE__I64__BYTE_LENGTH__MAX_INCL +
1192 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1193 wuffs_base__slice_u8 dst = wuffs_base__make_slice_u8(&buf[0], sizeof buf);
1194 size_t n =
1195 x_is_signed
1196 ? wuffs_base__render_number_i64(
1197 dst, (int64_t)x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS)
1198 : wuffs_base__render_number_u64(
1199 dst, x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao4e193592020-07-15 12:48:57 +10001200 return write_dst(&buf[0], n);
1201}
1202
Nigel Tao168f60a2020-07-14 13:19:33 +10001203// ----
1204
Nigel Tao2914bae2020-02-26 09:40:30 +11001205uint8_t //
1206hex_digit(uint8_t nibble) {
Nigel Taob5461bd2020-02-21 14:13:37 +11001207 nibble &= 0x0F;
1208 if (nibble <= 9) {
1209 return '0' + nibble;
1210 }
1211 return ('A' - 10) + nibble;
1212}
1213
Nigel Tao2914bae2020-02-26 09:40:30 +11001214const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001215flush_cbor_output_string() {
1216 uint8_t prefix[3];
1217 prefix[0] = g_cbor_output_string_is_utf_8 ? 0x60 : 0x40;
1218 if (g_cbor_output_string_length < 0x18) {
1219 prefix[0] |= g_cbor_output_string_length;
1220 TRY(write_dst(&prefix[0], 1));
1221 } else if (g_cbor_output_string_length <= 0xFF) {
1222 prefix[0] |= 0x18;
1223 prefix[1] = g_cbor_output_string_length;
1224 TRY(write_dst(&prefix[0], 2));
1225 } else if (g_cbor_output_string_length <= 0xFFFF) {
1226 prefix[0] |= 0x19;
1227 prefix[1] = g_cbor_output_string_length >> 8;
1228 prefix[2] = g_cbor_output_string_length;
1229 TRY(write_dst(&prefix[0], 3));
1230 } else {
1231 return "main: internal error: CBOR string output is too long";
1232 }
1233
1234 size_t n = g_cbor_output_string_length;
1235 g_cbor_output_string_length = 0;
1236 return write_dst(&g_cbor_output_string_array[0], n);
1237}
1238
1239const char* //
1240write_cbor_output_string(uint8_t* ptr, size_t len, bool finish) {
1241 // Check that g_cbor_output_string_array can hold any UTF-8 code point.
1242 if (CBOR_OUTPUT_STRING_ARRAY_SIZE < 4) {
1243 return "main: internal error: CBOR_OUTPUT_STRING_ARRAY_SIZE is too short";
1244 }
1245
1246 while (len > 0) {
1247 size_t available =
1248 CBOR_OUTPUT_STRING_ARRAY_SIZE - g_cbor_output_string_length;
1249 if (available >= len) {
1250 memcpy(&g_cbor_output_string_array[g_cbor_output_string_length], ptr,
1251 len);
1252 g_cbor_output_string_length += len;
1253 ptr += len;
1254 len = 0;
1255 break;
1256
1257 } else if (available > 0) {
1258 if (!g_cbor_output_string_is_multiple_chunks) {
1259 g_cbor_output_string_is_multiple_chunks = true;
1260 TRY(write_dst(g_cbor_output_string_is_utf_8 ? "\x7F" : "\x5F", 1));
Nigel Tao3b486982020-02-27 15:05:59 +11001261 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001262
1263 if (g_cbor_output_string_is_utf_8) {
1264 // Walk the end backwards to a UTF-8 boundary, so that each chunk of
1265 // the multi-chunk string is also valid UTF-8.
1266 while (available > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +10001267 wuffs_base__utf_8__next__output o =
1268 wuffs_base__utf_8__next_from_end(ptr, available);
Nigel Tao168f60a2020-07-14 13:19:33 +10001269 if ((o.code_point != WUFFS_BASE__UNICODE_REPLACEMENT_CHARACTER) ||
1270 (o.byte_length != 1)) {
1271 break;
1272 }
1273 available--;
1274 }
1275 }
1276
1277 memcpy(&g_cbor_output_string_array[g_cbor_output_string_length], ptr,
1278 available);
1279 g_cbor_output_string_length += available;
1280 ptr += available;
1281 len -= available;
Nigel Tao3b486982020-02-27 15:05:59 +11001282 }
1283
Nigel Tao168f60a2020-07-14 13:19:33 +10001284 TRY(flush_cbor_output_string());
1285 }
Nigel Taob9ad34f2020-03-03 12:44:01 +11001286
Nigel Tao168f60a2020-07-14 13:19:33 +10001287 if (finish) {
1288 TRY(flush_cbor_output_string());
1289 if (g_cbor_output_string_is_multiple_chunks) {
1290 TRY(write_dst("\xFF", 1));
1291 }
1292 }
1293 return nullptr;
1294}
Nigel Taob9ad34f2020-03-03 12:44:01 +11001295
Nigel Tao168f60a2020-07-14 13:19:33 +10001296const char* //
Nigel Tao7cb76542020-07-19 22:19:04 +10001297handle_unicode_code_point(uint32_t ucp) {
1298 if (g_flags.output_format == file_format::json) {
1299 if (ucp < 0x0020) {
1300 switch (ucp) {
1301 case '\b':
1302 return write_dst("\\b", 2);
1303 case '\f':
1304 return write_dst("\\f", 2);
1305 case '\n':
1306 return write_dst("\\n", 2);
1307 case '\r':
1308 return write_dst("\\r", 2);
1309 case '\t':
1310 return write_dst("\\t", 2);
1311 }
1312
1313 // Other bytes less than 0x0020 are valid UTF-8 but not valid in a
1314 // JSON string. They need to remain escaped.
1315 uint8_t esc6[6];
1316 esc6[0] = '\\';
1317 esc6[1] = 'u';
1318 esc6[2] = '0';
1319 esc6[3] = '0';
1320 esc6[4] = hex_digit(ucp >> 4);
1321 esc6[5] = hex_digit(ucp >> 0);
1322 return write_dst(&esc6[0], 6);
1323
1324 } else if (ucp == '\"') {
1325 return write_dst("\\\"", 2);
1326
1327 } else if (ucp == '\\') {
1328 return write_dst("\\\\", 2);
1329 }
1330 }
1331
1332 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1333 size_t n = wuffs_base__utf_8__encode(
1334 wuffs_base__make_slice_u8(&u[0],
1335 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1336 ucp);
1337 if (n == 0) {
1338 return "main: internal error: unexpected Unicode code point";
1339 }
1340
1341 if (g_flags.output_format == file_format::json) {
1342 return write_dst(&u[0], n);
1343 }
1344 return write_cbor_output_string(&u[0], n, false);
1345}
Nigel Taod191a3f2020-07-19 22:14:54 +10001346
1347const char* //
1348write_json_escaped_string(uint8_t* ptr, size_t len) {
1349restart:
1350 while (true) {
1351 size_t i;
1352 for (i = 0; i < len; i++) {
1353 uint8_t c = ptr[i];
1354 if ((c == '"') || (c == '\\') || (c < 0x20)) {
1355 TRY(write_dst(ptr, i));
1356 TRY(handle_unicode_code_point(c));
1357 ptr += i + 1;
1358 len -= i + 1;
1359 goto restart;
1360 }
1361 }
1362 TRY(write_dst(ptr, len));
1363 break;
1364 }
1365 return nullptr;
1366}
1367
1368const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001369handle_string(uint64_t vbd,
1370 uint64_t len,
1371 bool start_of_token_chain,
1372 bool continued) {
1373 if (start_of_token_chain) {
1374 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +10001375 if (g_flags.output_cbor_metadata_as_json_comments &&
1376 !(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
1377 TRY(write_dst("/*cbor:hex*/\"", 13));
1378 } else {
1379 TRY(write_dst("\"", 1));
1380 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001381 } else {
1382 g_cbor_output_string_length = 0;
1383 g_cbor_output_string_is_multiple_chunks = false;
1384 g_cbor_output_string_is_utf_8 =
1385 vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8;
1386 }
1387 g_query.restart_fragment(in_dict_before_key() && g_query.is_at(g_depth));
1388 }
1389
1390 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_0_DST_1_SRC_DROP) {
1391 // No-op.
1392 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
1393 uint8_t* ptr = g_src.data.ptr + g_curr_token_end_src_index - len;
1394 if (g_flags.output_format == file_format::json) {
Nigel Taoaf757722020-07-18 17:27:11 +10001395 if (g_flags.input_format == file_format::json) {
1396 TRY(write_dst(ptr, len));
1397 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8) {
Nigel Taod191a3f2020-07-19 22:14:54 +10001398 TRY(write_json_escaped_string(ptr, len));
Nigel Taoaf757722020-07-18 17:27:11 +10001399 } else {
1400 uint8_t as_hex[512];
1401 uint8_t* p = ptr;
1402 size_t n = len;
1403 while (n > 0) {
1404 wuffs_base__transform__output o = wuffs_base__base_16__encode2(
1405 wuffs_base__make_slice_u8(&as_hex[0], sizeof as_hex),
1406 wuffs_base__make_slice_u8(p, n), true,
1407 WUFFS_BASE__BASE_16__DEFAULT_OPTIONS);
1408 TRY(write_dst(&as_hex[0], o.num_dst));
1409 p += o.num_src;
1410 n -= o.num_src;
1411 if (!o.status.is_ok()) {
1412 return o.status.message();
1413 }
1414 }
1415 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001416 } else {
1417 TRY(write_cbor_output_string(ptr, len, false));
1418 }
1419 g_query.incremental_match_slice(ptr, len);
Nigel Taob9ad34f2020-03-03 12:44:01 +11001420 } else {
Nigel Tao168f60a2020-07-14 13:19:33 +10001421 return "main: internal error: unexpected string-token conversion";
1422 }
1423
1424 if (continued) {
1425 return nullptr;
1426 }
1427
1428 if (g_flags.output_format == file_format::json) {
1429 TRY(write_dst("\"", 1));
1430 } else {
1431 TRY(write_cbor_output_string(nullptr, 0, true));
1432 }
1433 return nullptr;
1434}
1435
Nigel Taod191a3f2020-07-19 22:14:54 +10001436// ----
1437
Nigel Tao3b486982020-02-27 15:05:59 +11001438const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001439handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001440 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001441 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001442 uint64_t vbd = t.value_base_detail();
1443 uint64_t len = t.length();
Nigel Tao1b073492020-02-16 22:11:36 +11001444
1445 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001446 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001447 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001448 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001449 return "main: no match for query";
1450 }
Nigel Taod60815c2020-03-26 14:32:35 +11001451 if (g_depth <= 0) {
1452 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001453 }
Nigel Taod60815c2020-03-26 14:32:35 +11001454 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001455
Nigel Taod60815c2020-03-26 14:32:35 +11001456 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1457 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001458 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao168f60a2020-07-14 13:19:33 +10001459 if (g_flags.output_format == file_format::json) {
1460 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1461 ? "\"[…]\""
1462 : "\"{…}\"",
1463 7));
1464 } else {
1465 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1466 ? "\x65[…]"
1467 : "\x65{…}",
1468 6));
1469 }
1470 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001471 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001472 if ((g_ctx != context::in_list_after_bracket) &&
1473 (g_ctx != context::in_dict_after_brace) &&
1474 !g_flags.compact_output) {
Nigel Taoc766bb72020-07-09 12:59:32 +10001475 if (g_flags.output_json_extra_comma) {
1476 TRY(write_dst(",\n", 2));
1477 } else {
1478 TRY(write_dst("\n", 1));
1479 }
Nigel Taod60815c2020-03-26 14:32:35 +11001480 for (uint32_t i = 0; i < g_depth; i++) {
1481 TRY(write_dst(
1482 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001483 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001484 }
Nigel Tao1b073492020-02-16 22:11:36 +11001485 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001486
1487 TRY(write_dst(
1488 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1489 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001490 } else {
1491 TRY(write_dst("\xFF", 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001492 }
1493
Nigel Taod60815c2020-03-26 14:32:35 +11001494 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1495 ? context::in_list_after_value
1496 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001497 goto after_value;
1498 }
1499
Nigel Taod1c928a2020-02-28 12:43:53 +11001500 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
1501 // continuation of a multi-token chain.
Nigel Tao2ef39992020-04-09 17:24:39 +10001502 if (start_of_token_chain) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001503 if (g_flags.output_format != file_format::json) {
1504 // No-op.
1505 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001506 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1507 } else if (g_ctx != context::none) {
Nigel Taof8dfc762020-07-23 23:35:44 +10001508 if ((g_ctx == context::in_dict_after_brace) ||
1509 (g_ctx == context::in_dict_after_value)) {
1510 // Reject dict keys that aren't UTF-8 strings, which could otherwise
1511 // happen with -i=cbor -o=json.
1512 if ((vbc != WUFFS_BASE__TOKEN__VBC__STRING) ||
1513 !(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
1514 return "main: cannot convert CBOR non-text-string to JSON map key";
1515 }
1516 }
1517 if ((g_ctx == context::in_list_after_value) ||
1518 (g_ctx == context::in_dict_after_value)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001519 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001520 }
Nigel Taod60815c2020-03-26 14:32:35 +11001521 if (!g_flags.compact_output) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001522 TRY(write_dst("\n", 1));
Nigel Taod60815c2020-03-26 14:32:35 +11001523 for (size_t i = 0; i < g_depth; i++) {
1524 TRY(write_dst(
1525 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001526 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao0cd2f982020-03-03 23:03:02 +11001527 }
1528 }
1529 }
1530
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001531 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001532 if (g_query.is_at(g_depth)) {
1533 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001534 case context::in_list_after_bracket:
1535 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001536 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001537 break;
1538 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001539 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001540 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001541 default:
1542 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001543 }
1544 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001545 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001546 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001547 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001548 // There is no next fragment. We have matched the complete query, and
1549 // the upcoming JSON value is the result of that query.
1550 //
Nigel Taod60815c2020-03-26 14:32:35 +11001551 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1552 // we were about to decode a top-level value. This makes any subsequent
1553 // indentation be relative to this point, and we will return g_eod
1554 // after the upcoming JSON value is complete.
1555 if (g_suppress_write_dst != 1) {
1556 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001557 }
Nigel Taod60815c2020-03-26 14:32:35 +11001558 g_suppress_write_dst = 0;
1559 g_ctx = context::none;
1560 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001561 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1562 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1563 // The query has moved on to the next fragment but the upcoming JSON
1564 // value is not a container.
1565 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001566 }
1567 }
1568
1569 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001570 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001571 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001572 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001573 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1574 g_suppress_write_dst++;
Nigel Tao168f60a2020-07-14 13:19:33 +10001575 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001576 TRY(write_dst(
1577 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1578 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001579 } else {
1580 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1581 ? "\x9F"
1582 : "\xBF",
1583 1));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001584 }
Nigel Taod60815c2020-03-26 14:32:35 +11001585 g_depth++;
1586 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1587 ? context::in_list_after_bracket
1588 : context::in_dict_after_brace;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001589 return nullptr;
1590
Nigel Tao2cf76db2020-02-27 22:42:01 +11001591 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Tao168f60a2020-07-14 13:19:33 +10001592 TRY(handle_string(vbd, len, start_of_token_chain, t.continued()));
Nigel Tao496e88b2020-04-09 22:10:08 +10001593 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001594 return nullptr;
1595 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001596 goto after_value;
1597
1598 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001599 if (!t.continued()) {
1600 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001601 }
1602 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001603 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001604 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001605
Nigel Tao85fba7f2020-02-29 16:28:06 +11001606 case WUFFS_BASE__TOKEN__VBC__LITERAL:
Nigel Tao168f60a2020-07-14 13:19:33 +10001607 TRY(write_literal(vbd));
1608 goto after_value;
1609
Nigel Tao2cf76db2020-02-27 22:42:01 +11001610 case WUFFS_BASE__TOKEN__VBC__NUMBER:
Nigel Tao168f60a2020-07-14 13:19:33 +10001611 TRY(write_number(vbd, g_src.data.ptr + g_curr_token_end_src_index - len,
1612 len));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001613 goto after_value;
Nigel Tao4e193592020-07-15 12:48:57 +10001614
Nigel Taoc9d4e342020-07-21 15:20:34 +10001615 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1616 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED: {
1617 bool x_is_signed = vbc == WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED;
1618 uint64_t x = x_is_signed
1619 ? ((uint64_t)(t.value_base_detail__sign_extended()))
1620 : vbd;
Nigel Tao850dc182020-07-21 22:52:04 +10001621 if (t.continued()) {
Nigel Tao03a87ea2020-07-21 23:29:26 +10001622 if (len != 0) {
1623 return "main: internal error: unexpected to-be-extended length";
1624 }
Nigel Tao850dc182020-07-21 22:52:04 +10001625 g_token_extension.category = vbc;
1626 g_token_extension.detail = x;
1627 return nullptr;
1628 }
Nigel Tao4e193592020-07-15 12:48:57 +10001629 TRY(write_inline_integer(
Nigel Taoc9d4e342020-07-21 15:20:34 +10001630 x, x_is_signed, g_src.data.ptr + g_curr_token_end_src_index - len,
1631 len));
Nigel Tao4e193592020-07-15 12:48:57 +10001632 goto after_value;
Nigel Taoc9d4e342020-07-21 15:20:34 +10001633 }
Nigel Tao1b073492020-02-16 22:11:36 +11001634 }
1635
Nigel Tao850dc182020-07-21 22:52:04 +10001636 int64_t ext = t.value_extension();
1637 if (ext >= 0) {
Nigel Tao27168032020-07-24 13:05:05 +10001638 uint64_t x = (g_token_extension.detail
1639 << WUFFS_BASE__TOKEN__VALUE_EXTENSION__NUM_BITS) |
1640 ((uint64_t)ext);
Nigel Tao850dc182020-07-21 22:52:04 +10001641 switch (g_token_extension.category) {
1642 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1643 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED:
Nigel Tao850dc182020-07-21 22:52:04 +10001644 TRY(write_inline_integer(
1645 x,
1646 g_token_extension.category ==
1647 WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED,
1648 g_src.data.ptr + g_curr_token_end_src_index - len, len));
1649 g_token_extension.category = 0;
1650 g_token_extension.detail = 0;
1651 goto after_value;
Nigel Tao27168032020-07-24 13:05:05 +10001652 case CATEGORY_CBOR_TAG:
1653 TRY(write_cbor_tag(
1654 x, g_src.data.ptr + g_curr_token_end_src_index - len, len));
1655 g_token_extension.category = 0;
1656 g_token_extension.detail = 0;
1657 return nullptr;
Nigel Tao850dc182020-07-21 22:52:04 +10001658 }
1659 }
1660
Nigel Tao664f8432020-07-16 21:25:14 +10001661 if (t.value_major() == WUFFS_CBOR__TOKEN_VALUE_MAJOR) {
1662 uint64_t value_minor = t.value_minor();
Nigel Taoc9e20102020-07-24 23:19:12 +10001663 if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__MINUS_1_MINUS_X) {
1664 TRY(write_cbor_minus_1_minus_x(
1665 g_src.data.ptr + g_curr_token_end_src_index - len, len));
1666 goto after_value;
1667 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__SIMPLE_VALUE) {
1668 TRY(write_cbor_simple_value(
1669 vbd, g_src.data.ptr + g_curr_token_end_src_index - len, len));
1670 goto after_value;
1671 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__TAG) {
Nigel Tao27168032020-07-24 13:05:05 +10001672 if (t.continued()) {
1673 if (len != 0) {
1674 return "main: internal error: unexpected to-be-extended length";
1675 }
1676 g_token_extension.category = CATEGORY_CBOR_TAG;
1677 g_token_extension.detail = vbd;
1678 return nullptr;
1679 }
1680 return write_cbor_tag(
1681 vbd, g_src.data.ptr + g_curr_token_end_src_index - len, len);
Nigel Tao664f8432020-07-16 21:25:14 +10001682 }
1683 }
1684
1685 // Return an error if we didn't match the (value_major, value_minor) or
1686 // (vbc, vbd) pair.
Nigel Tao2cf76db2020-02-27 22:42:01 +11001687 return "main: internal error: unexpected token";
1688 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001689
Nigel Tao2cf76db2020-02-27 22:42:01 +11001690 // Book-keeping after completing a value (whether a container value or a
1691 // simple value). Empty parent containers are no longer empty. If the parent
1692 // container is a "{...}" object, toggle between keys and values.
1693after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001694 if (g_depth == 0) {
1695 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001696 }
Nigel Taod60815c2020-03-26 14:32:35 +11001697 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001698 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001699 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001700 break;
1701 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001702 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001703 break;
1704 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001705 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001706 break;
1707 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001708 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001709 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001710 default:
1711 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001712 }
1713 return nullptr;
1714}
1715
1716const char* //
1717main1(int argc, char** argv) {
1718 TRY(initialize_globals(argc, argv));
1719
Nigel Taocd183f92020-07-14 12:11:05 +10001720 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001721 while (true) {
Nigel Tao4e193592020-07-15 12:48:57 +10001722 wuffs_base__status status = g_dec->decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001723 &g_tok, &g_src,
1724 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001725
Nigel Taod60815c2020-03-26 14:32:35 +11001726 while (g_tok.meta.ri < g_tok.meta.wi) {
1727 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao2cf76db2020-02-27 22:42:01 +11001728 uint64_t n = t.length();
Nigel Taod60815c2020-03-26 14:32:35 +11001729 if ((g_src.meta.ri - g_curr_token_end_src_index) < n) {
1730 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001731 }
Nigel Taod60815c2020-03-26 14:32:35 +11001732 g_curr_token_end_src_index += n;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001733
Nigel Taod0b16cb2020-03-14 10:15:54 +11001734 // Skip filler tokens (e.g. whitespace).
Nigel Tao3c8589b2020-07-19 21:49:00 +10001735 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao496e88b2020-04-09 22:10:08 +10001736 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001737 continue;
1738 }
1739
Nigel Tao2ef39992020-04-09 17:24:39 +10001740 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao496e88b2020-04-09 22:10:08 +10001741 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001742 if (z == nullptr) {
1743 continue;
Nigel Taod60815c2020-03-26 14:32:35 +11001744 } else if (z == g_eod) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001745 goto end_of_data;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001746 }
1747 return z;
Nigel Tao1b073492020-02-16 22:11:36 +11001748 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001749
1750 if (status.repr == nullptr) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001751 return "main: internal error: unexpected end of token stream";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001752 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Taod60815c2020-03-26 14:32:35 +11001753 if (g_curr_token_end_src_index != g_src.meta.ri) {
1754 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001755 }
1756 TRY(read_src());
Nigel Taod60815c2020-03-26 14:32:35 +11001757 g_curr_token_end_src_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001758 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001759 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001760 } else {
1761 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001762 }
1763 }
Nigel Tao0cd2f982020-03-03 23:03:02 +11001764end_of_data:
1765
Nigel Taod60815c2020-03-26 14:32:35 +11001766 // With a non-empty g_query, don't try to consume trailing whitespace or
Nigel Tao0cd2f982020-03-03 23:03:02 +11001767 // confirm that we've processed all the tokens.
Nigel Taod60815c2020-03-26 14:32:35 +11001768 if (g_flags.query_c_string && *g_flags.query_c_string) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001769 return nullptr;
1770 }
Nigel Tao6b161af2020-02-24 11:01:48 +11001771
Nigel Tao6b161af2020-02-24 11:01:48 +11001772 // Check that we've exhausted the input.
Nigel Taod60815c2020-03-26 14:32:35 +11001773 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001774 TRY(read_src());
1775 }
Nigel Taod60815c2020-03-26 14:32:35 +11001776 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
Nigel Tao51a38292020-07-19 22:43:17 +10001777 return "main: valid JSON|CBOR followed by further (unexpected) data";
Nigel Tao6b161af2020-02-24 11:01:48 +11001778 }
1779
1780 // Check that we've used all of the decoded tokens, other than trailing
Nigel Tao4b186b02020-03-18 14:25:21 +11001781 // filler tokens. For example, "true\n" is valid JSON (and fully consumed
1782 // with WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE enabled) with a trailing
1783 // filler token for the "\n".
Nigel Taod60815c2020-03-26 14:32:35 +11001784 for (; g_tok.meta.ri < g_tok.meta.wi; g_tok.meta.ri++) {
1785 if (g_tok.data.ptr[g_tok.meta.ri].value_base_category() !=
Nigel Tao6b161af2020-02-24 11:01:48 +11001786 WUFFS_BASE__TOKEN__VBC__FILLER) {
1787 return "main: internal error: decoded OK but unprocessed tokens remain";
1788 }
1789 }
1790
1791 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001792}
1793
Nigel Tao2914bae2020-02-26 09:40:30 +11001794int //
1795compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001796 if (!status_msg) {
1797 return 0;
1798 }
Nigel Tao01abc842020-03-06 21:42:33 +11001799 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001800 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001801 n = strlen(status_msg);
1802 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001803 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001804 if (n >= 2047) {
1805 status_msg = "main: internal error: error message is too long";
1806 n = strnlen(status_msg, 2047);
1807 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001808 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001809 const int stderr_fd = 2;
1810 ignore_return_value(write(stderr_fd, status_msg, n));
1811 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Tao9cc2c252020-02-23 17:05:49 +11001812 // Return an exit code of 1 for regular (forseen) errors, e.g. badly
1813 // formatted or unsupported input.
1814 //
1815 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1816 // run-time checks found that an internal invariant did not hold.
1817 //
1818 // Automated testing, including badly formatted inputs, can therefore
1819 // discriminate between expected failure (exit code 1) and unexpected failure
1820 // (other non-zero exit codes). Specifically, exit code 2 for internal
1821 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1822 // linux) for a segmentation fault (e.g. null pointer dereference).
1823 return strstr(status_msg, "internal error:") ? 2 : 1;
1824}
1825
Nigel Tao2914bae2020-02-26 09:40:30 +11001826int //
1827main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001828 // Look for an input filename (the first non-flag argument) in argv. If there
1829 // is one, open it (but do not read from it) before we self-impose a sandbox.
1830 //
1831 // Flags start with "-", unless it comes after a bare "--" arg.
1832 {
1833 bool dash_dash = false;
1834 int a;
1835 for (a = 1; a < argc; a++) {
1836 char* arg = argv[a];
1837 if ((arg[0] == '-') && !dash_dash) {
1838 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1839 continue;
1840 }
Nigel Taod60815c2020-03-26 14:32:35 +11001841 g_input_file_descriptor = open(arg, O_RDONLY);
1842 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001843 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1844 return 1;
1845 }
1846 break;
1847 }
1848 }
1849
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001850#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1851 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001852 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001853#endif
1854
Nigel Tao0cd2f982020-03-03 23:03:02 +11001855 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001856 if (g_wrote_to_dst) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001857 const char* z1 = (g_flags.output_format == file_format::json)
1858 ? write_dst("\n", 1)
1859 : nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001860 const char* z2 = flush_dst();
1861 z = z ? z : (z1 ? z1 : z2);
1862 }
1863 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001864
1865#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1866 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1867 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1868 // only SYS_exit.
1869 syscall(SYS_exit, exit_code);
1870#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001871 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001872}