blob: ca2f1fb1d93af2fafae73061476777af18f07c86 [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
17/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110018jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao168f60a2020-07-14 13:19:33 +100019(RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin and writes CBOR
20or canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110021
Nigel Taod60815c2020-03-26 14:32:35 +110022See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110023
24----
25
26JSON Pointer (and this program's implementation) is one of many JSON query
27languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
28simple and fewer-featured compared to those others.
29
Nigel Tao168f60a2020-07-14 13:19:33 +100030One benefit of simplicity is that this program's CBOR, JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110031implementations do not dynamically allocate or free memory (yet it does not
32require that the entire input fits in memory at once). They are therefore
33trivially protected against certain bug classes: memory leaks, double-frees and
34use-after-frees.
35
Nigel Tao168f60a2020-07-14 13:19:33 +100036The CBOR and JSON implementations are also written in the Wuffs programming
37language (and then transpiled to C/C++), which is memory-safe (e.g. array
38indexing is bounds-checked) but also prevents integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110039
Nigel Taofe0cbbd2020-03-05 22:01:30 +110040For defense in depth, on Linux, this program also self-imposes a
41SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
42or writing its output. Under this sandbox, the only permitted system calls are
43read, write, exit and sigreturn.
44
Nigel Tao168f60a2020-07-14 13:19:33 +100045All together, this program aims to safely handle untrusted CBOR or JSON files
46without fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110047
48----
Nigel Tao1b073492020-02-16 22:11:36 +110049
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110050As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
51JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
52"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
53was first published on 2016-10-26 and updated on 2018-03-30.
54
Nigel Tao0cd2f982020-03-03 23:03:02 +110055After modifying this program, run "build-example.sh example/jsonptr/" and then
56"script/run-json-test-suite.sh" to catch correctness regressions.
57
58----
59
Nigel Taod0b16cb2020-03-14 10:15:54 +110060This program uses Wuffs' JSON decoder at a relatively low level, processing the
61decoder's token-stream output individually. The core loop, in pseudo-code, is
62"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110063changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110064output text based on that state and the token's source text. Notably,
65handle_token is not recursive, even though JSON values can nest.
66
67This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
68string, object) comprises one or more JSON tokens.
69
70An alternative, higher-level approach is in the sibling example/jsonfindptrs
71program. Neither approach is better or worse per se, but when studying this
72program, be aware that there are multiple ways to use Wuffs' JSON decoder.
73
74The two programs, jsonfindptrs and jsonptr, also demonstrate different
75trade-offs with regard to JSON object duplicate keys. The JSON spec permits
76different implementations to allow or reject duplicate keys. It is not always
77clear which approach is safer. Rejecting them is certainly unambiguous, and
78security bugs can lurk in ambiguous corners of a file format, if two different
79implementations both silently accept a file but differ on how to interpret it.
80On the other hand, in the worst case, detecting duplicate keys requires O(N)
81memory, where N is the size of the (potentially untrusted) input.
82
83This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
84mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
85it runs in a SECCOMP_MODE_STRICT sandbox.
86
87----
88
Nigel Tao1b073492020-02-16 22:11:36 +110089This example program differs from most other example Wuffs programs in that it
90is written in C++, not C.
91
92$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
93
94for a C++ compiler $CXX, such as clang++ or g++.
95*/
96
Nigel Tao721190a2020-04-03 22:25:21 +110097#if defined(__cplusplus) && (__cplusplus < 201103L)
98#error "This C++ program requires -std=c++11 or later"
99#endif
100
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100101#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100102#include <fcntl.h>
103#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100104#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100105#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100106
107// Wuffs ships as a "single file C library" or "header file library" as per
108// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
109//
110// To use that single file as a "foo.c"-like implementation, instead of a
111// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
112// compiling it.
113#define WUFFS_IMPLEMENTATION
114
115// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
116// release/c/etc.c whitelist which parts of Wuffs to build. That file contains
117// the entire Wuffs standard library, implementing a variety of codecs and file
118// formats. Without this macro definition, an optimizing compiler or linker may
119// very well discard Wuffs code for unused codecs, but listing the Wuffs
120// modules we use makes that process explicit. Preprocessing means that such
121// code simply isn't compiled.
122#define WUFFS_CONFIG__MODULES
123#define WUFFS_CONFIG__MODULE__BASE
Nigel Tao4e193592020-07-15 12:48:57 +1000124#define WUFFS_CONFIG__MODULE__CBOR
Nigel Tao1b073492020-02-16 22:11:36 +1100125#define WUFFS_CONFIG__MODULE__JSON
126
127// If building this program in an environment that doesn't easily accommodate
128// relative includes, you can use the script/inline-c-relative-includes.go
129// program to generate a stand-alone C++ file.
130#include "../../release/c/wuffs-unsupported-snapshot.c"
131
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100132#if defined(__linux__)
133#include <linux/prctl.h>
134#include <linux/seccomp.h>
135#include <sys/prctl.h>
136#include <sys/syscall.h>
137#define WUFFS_EXAMPLE_USE_SECCOMP
138#endif
139
Nigel Tao2cf76db2020-02-27 22:42:01 +1100140#define TRY(error_msg) \
141 do { \
142 const char* z = error_msg; \
143 if (z) { \
144 return z; \
145 } \
146 } while (false)
147
Nigel Taod60815c2020-03-26 14:32:35 +1100148static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100149
Nigel Taod60815c2020-03-26 14:32:35 +1100150static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100151 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100152 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100153 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100154 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100155 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao4e193592020-07-15 12:48:57 +1000156 " -i=FMT -input-format={json,cbor}\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000157 " -o=FMT -output-format={json,cbor}\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100158 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000159 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100160 " -t -tabs\n"
161 " -fail-if-unsandboxed\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000162 " -input-allow-json-comments\n"
163 " -input-allow-json-extra-comma\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000164 " -input-allow-json-inf-nan-numbers\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000165 " -output-cbor-metadata-as-json-comments\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000166 " -output-json-extra-comma\n"
Nigel Taodd114692020-07-25 21:54:12 +1000167 " -output-json-inf-nan-numbers\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000168 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100169 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100170 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100171 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100172 "----\n"
173 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100174 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000175 "Pointer (RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin\n"
176 "and writes CBOR or canonicalized, formatted UTF-8 JSON to stdout. The\n"
177 "input and output formats do not have to match, but conversion between\n"
178 "formats may be lossy.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100179 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000180 "Canonicalized JSON means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-\n"
181 "written as \"abc\\n\\txÅ·z\". It does not sort object keys or reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100182 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100183 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000184 "CBOR output is non-canonical (in the RFC 7049 Section 3.9 sense), as\n"
185 "sorting map keys and measuring indefinite-length containers requires\n"
186 "O(input_length) memory but this program runs in O(1) memory.\n"
187 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100188 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000189 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000190 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags. Those\n"
191 "flags only apply to JSON (not CBOR) output.\n"
192 "\n"
193 "The -input-format and -output-format flags select between reading and\n"
194 "writing JSON (the default, a textual format) or CBOR (a binary format).\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100195 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000196 "The -input-allow-json-comments flag allows \"/*slash-star*/\" and\n"
197 "\"//slash-slash\" C-style comments within JSON input.\n"
198 "\n"
199 "The -input-allow-json-extra-comma flag allows input like \"[1,2,]\",\n"
200 "with a comma after the final element of a JSON list or dictionary.\n"
201 "\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000202 "The -input-allow-json-inf-nan-numbers flag allows non-finite floating\n"
203 "point numbers (infinities and not-a-numbers) within JSON input.\n"
204 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000205 "The -output-cbor-metadata-as-json-comments writes CBOR tags and other\n"
206 "metadata as /*comments*/, when -i=json and -o=cbor are also set. Such\n"
207 "comments are non-compliant with the JSON specification but many parsers\n"
208 "accept them.\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000209 "\n"
210 "The -output-json-extra-comma flag writes extra commas, regardless of\n"
Nigel Taodd114692020-07-25 21:54:12 +1000211 "whether the input had it. Such commas are non-compliant with the JSON\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000212 "specification but many parsers accept them and they can produce simpler\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000213 "line-based diffs. This flag is ignored when -compact-output is set.\n"
214 "\n"
Nigel Taodd114692020-07-25 21:54:12 +1000215 "The -output-json-inf-nan-numbers flag writes Inf and NaN instead of a\n"
216 "substitute null value, when converting from -i=cbor to -o=json. Such\n"
217 "values are non-compliant with the JSON specification but many parsers\n"
218 "accept them.\n"
219 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000220 "When converting from -i=cbor to -o=json, CBOR permits map keys other\n"
221 "than (untagged) UTF-8 strings but JSON does not. This program rejects\n"
222 "such input, as doing otherwise has complicated interactions with the\n"
223 "-query=STR flag and streaming input.\n"
224 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100225 "----\n"
226 "\n"
227 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100228 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100229 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
230 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100231 "will print:\n"
232 " \"baz\"\n"
233 "\n"
234 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100235 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100236 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
237 "child (the value in a key-value pair) of the root whose key is the empty\n"
238 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100239 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000240 "If the query found a valid JSON|CBOR value, this program will return a\n"
241 "zero exit code even if the rest of the input isn't valid. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100242 "did not find a value, or found an invalid one, this program returns a\n"
243 "non-zero exit code, but may still print partial output to stdout.\n"
244 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000245 "The JSON and CBOR specifications (https://json.org/ or RFC 8259; RFC\n"
246 "7049) permit implementations to allow duplicate keys, as this one does.\n"
247 "This JSON Pointer implementation is also greedy, following the first\n"
248 "match for each fragment without back-tracking. For example, the\n"
249 "\"/foo/bar\" query will fail if the root object has multiple \"foo\"\n"
250 "children but the first one doesn't have a \"bar\" child, even if later\n"
251 "ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100252 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000253 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
254 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
255 "\"~\" and \"/\". Without this flag, this program also lets \"~n\" and\n"
256 "\"~r\" escape the New Line and Carriage Return ASCII control characters,\n"
257 "which can work better with line oriented Unix tools that assume exactly\n"
258 "one value (i.e. one JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100259 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100260 "----\n"
261 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100262 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000263 "output depth. JSON|CBOR containers ([] arrays and {} objects) can hold\n"
264 "other containers. When this flag is set, containers at depth NUM are\n"
265 "replaced with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is\n"
266 "equivalent to -d=1. The flag's absence means an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100267 "\n"
268 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000269 "affect whether or not the input is considered valid JSON|CBOR. The\n"
270 "format specifications permit implementations to set their own maximum\n"
271 "input depth. This JSON|CBOR implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100272 "\n"
273 "Depth is measured in terms of nested containers. It is unaffected by the\n"
274 "number of spaces or tabs used to indent.\n"
275 "\n"
276 "When both -max-output-depth and -query are set, the output depth is\n"
277 "measured from when the query resolves, not from the input root. The\n"
278 "input depth (measured from the root) is still limited to 1024.\n"
279 "\n"
280 "----\n"
281 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100282 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
283 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100284 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100285
Nigel Tao2cf76db2020-02-27 22:42:01 +1100286// ----
287
Nigel Taof3146c22020-03-26 08:47:42 +1100288// Wuffs allows either statically or dynamically allocated work buffers. This
289// program exercises static allocation.
290#define WORK_BUFFER_ARRAY_SIZE \
291 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
292#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100293uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100294#else
295// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100296uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100297#endif
298
Nigel Taod60815c2020-03-26 14:32:35 +1100299bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100300
Nigel Taod60815c2020-03-26 14:32:35 +1100301int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100302
Nigel Tao2cf76db2020-02-27 22:42:01 +1100303#define MAX_INDENT 8
Nigel Tao107f0ef2020-03-01 21:35:02 +1100304#define INDENT_SPACES_STRING " "
Nigel Tao6e7d1412020-03-06 09:21:35 +1100305#define INDENT_TAB_STRING "\t"
Nigel Tao107f0ef2020-03-01 21:35:02 +1100306
Nigel Taofdac24a2020-03-06 21:53:08 +1100307#ifndef DST_BUFFER_ARRAY_SIZE
308#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100309#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100310#ifndef SRC_BUFFER_ARRAY_SIZE
311#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100312#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100313#ifndef TOKEN_BUFFER_ARRAY_SIZE
314#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100315#endif
316
Nigel Taod60815c2020-03-26 14:32:35 +1100317uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
318uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
319wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100320
Nigel Taod60815c2020-03-26 14:32:35 +1100321wuffs_base__io_buffer g_dst;
322wuffs_base__io_buffer g_src;
323wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100324
Nigel Taod60815c2020-03-26 14:32:35 +1100325// g_curr_token_end_src_index is the g_src.data.ptr index of the end of the
326// current token. An invariant is that (g_curr_token_end_src_index <=
327// g_src.meta.ri).
328size_t g_curr_token_end_src_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100329
Nigel Tao27168032020-07-24 13:05:05 +1000330// Valid token's VBCs range in 0 ..= 15. Values over that are for tokens from
331// outside of the base package, such as the CBOR package.
332#define CATEGORY_CBOR_TAG 16
333
Nigel Tao850dc182020-07-21 22:52:04 +1000334struct {
335 uint64_t category;
336 uint64_t detail;
337} g_token_extension;
338
Nigel Taod60815c2020-03-26 14:32:35 +1100339uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100340
341enum class context {
342 none,
343 in_list_after_bracket,
344 in_list_after_value,
345 in_dict_after_brace,
346 in_dict_after_key,
347 in_dict_after_value,
Nigel Taod60815c2020-03-26 14:32:35 +1100348} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100349
Nigel Tao0cd2f982020-03-03 23:03:02 +1100350bool //
351in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100352 return (g_ctx == context::in_dict_after_brace) ||
353 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100354}
355
Nigel Taod60815c2020-03-26 14:32:35 +1100356uint32_t g_suppress_write_dst;
357bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100358
Nigel Tao4e193592020-07-15 12:48:57 +1000359wuffs_cbor__decoder g_cbor_decoder;
360wuffs_json__decoder g_json_decoder;
361wuffs_base__token_decoder* g_dec;
Nigel Tao1b073492020-02-16 22:11:36 +1100362
Nigel Tao168f60a2020-07-14 13:19:33 +1000363// cbor_output_string_array is a 4 KiB buffer. For -output-format=cbor, strings
364// whose length are 4096 or less are written as a single definite-length
365// string. Longer strings are written as an indefinite-length string containing
366// multiple definite-length chunks, each of length up to 4 KiB. See the CBOR
367// RFC (RFC 7049) section 2.2.2 "Indefinite-Length Byte Strings and Text
368// Strings". The output is determinate even when the input is streamed.
369//
370// If raising CBOR_OUTPUT_STRING_ARRAY_SIZE above 0xFFFF then you will also
371// have to update flush_cbor_output_string.
372#define CBOR_OUTPUT_STRING_ARRAY_SIZE 4096
373uint8_t g_cbor_output_string_array[CBOR_OUTPUT_STRING_ARRAY_SIZE];
374
375uint32_t g_cbor_output_string_length;
376bool g_cbor_output_string_is_multiple_chunks;
377bool g_cbor_output_string_is_utf_8;
378
Nigel Tao0cd2f982020-03-03 23:03:02 +1100379// ----
380
381// Query is a JSON Pointer query. After initializing with a NUL-terminated C
382// string, its multiple fragments are consumed as the program walks the JSON
383// data from stdin. For example, letting "$" denote a NUL, suppose that we
384// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100385// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100386//
387// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
388// / a p p l e / b a n a n a / 1 2 / d u r i a n $
389// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
390// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100391// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100392//
Nigel Taob48ee752020-03-13 09:27:33 +1100393// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
394// start (inclusive) and end (exclusive) of the query fragment. They satisfy
395// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
396// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100397//
Nigel Taob48ee752020-03-13 09:27:33 +1100398// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
399// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100400//
401// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
402// tokens, as backslash-escaped values within that JSON string may each get
403// their own token.
404//
Nigel Taob48ee752020-03-13 09:27:33 +1100405// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100406//
Nigel Taob48ee752020-03-13 09:27:33 +1100407// While mfj remains non-nullptr, each token's unescaped contents are then
408// compared to that part of the fragment from mfj to mfk. If it is a prefix
409// (including the case of an exact match), then mfj is advanced by the
410// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100411//
412// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
413// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100414// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
415// responsible for calling Query::validate (with a strict_json_pointer_syntax
416// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100417//
Nigel Taob48ee752020-03-13 09:27:33 +1100418// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
419// incrementally match the object key with the query fragment. For example, if
420// we have already matched the "ban" of "banana", then we would accept any of
421// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
422// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100423//
Nigel Taob48ee752020-03-13 09:27:33 +1100424// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100425// v
426// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
427// / a p p l e / b a n a n a / 1 2 / d u r i a n $
428// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
429// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100430// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100431//
432// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100433// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
434// have a fragment match: the query fragment equals the object key. If there is
435// a next fragment (in this example, "12") we move the frag_etc pointers to its
436// start and end and increment Query::m_depth. Otherwise, we have matched the
437// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100438//
439// The discussion above centers on object keys. If the query fragment is
440// numeric then it can also match as an array index: the string fragment "12"
441// will match an array's 13th element (starting counting from zero). See RFC
442// 6901 for its precise definition of an "array index" number.
443//
Nigel Taob48ee752020-03-13 09:27:33 +1100444// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100445// whose type (wuffs_base__result_u64) is a result type. An error result means
446// that the fragment is not an array index. A value result holds the number of
447// list elements remaining. When matching a query fragment in an array (instead
448// of in an object), each element ticks this number down towards zero. At zero,
449// the upcoming JSON value is the one that matches the query fragment.
450class Query {
451 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100452 uint8_t* m_frag_i;
453 uint8_t* m_frag_j;
454 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100455
Nigel Taob48ee752020-03-13 09:27:33 +1100456 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100457
Nigel Taob48ee752020-03-13 09:27:33 +1100458 wuffs_base__result_u64 m_array_index;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100459
460 public:
461 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100462 m_frag_i = (uint8_t*)query_c_string;
463 m_frag_j = (uint8_t*)query_c_string;
464 m_frag_k = (uint8_t*)query_c_string;
465 m_depth = 0;
466 m_array_index.status.repr = "#main: not an array index query fragment";
467 m_array_index.value = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100468 }
469
Nigel Taob48ee752020-03-13 09:27:33 +1100470 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100471
Nigel Taob48ee752020-03-13 09:27:33 +1100472 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100473
474 // tick returns whether the fragment is a valid array index whose value is
475 // zero. If valid but non-zero, it decrements it and returns false.
476 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100477 if (m_array_index.status.is_ok()) {
478 if (m_array_index.value == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100479 return true;
480 }
Nigel Taob48ee752020-03-13 09:27:33 +1100481 m_array_index.value--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100482 }
483 return false;
484 }
485
486 // next_fragment moves to the next fragment, returning whether it existed.
487 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100488 uint8_t* k = m_frag_k;
489 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100490
491 this->reset(nullptr);
492
493 if (!k || (*k != '/')) {
494 return false;
495 }
496 k++;
497
498 bool all_digits = true;
499 uint8_t* i = k;
500 while ((*k != '\x00') && (*k != '/')) {
501 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
502 k++;
503 }
Nigel Taob48ee752020-03-13 09:27:33 +1100504 m_frag_i = i;
505 m_frag_j = i;
506 m_frag_k = k;
507 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100508 if (all_digits) {
509 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000510 m_array_index = wuffs_base__parse_number_u64(
511 wuffs_base__make_slice_u8(i, k - i),
512 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100513 }
514 return true;
515 }
516
Nigel Taob48ee752020-03-13 09:27:33 +1100517 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100518
Nigel Taob48ee752020-03-13 09:27:33 +1100519 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100520
521 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100522 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100523 return;
524 }
Nigel Taob48ee752020-03-13 09:27:33 +1100525 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100526 while (true) {
527 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100528 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100529 return;
530 }
531
532 if (*j == '\x00') {
533 break;
534
535 } else if (*j == '~') {
536 j++;
537 if (*j == '0') {
538 if (*ptr != '~') {
539 break;
540 }
541 } else if (*j == '1') {
542 if (*ptr != '/') {
543 break;
544 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100545 } else if (*j == 'n') {
546 if (*ptr != '\n') {
547 break;
548 }
549 } else if (*j == 'r') {
550 if (*ptr != '\r') {
551 break;
552 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100553 } else {
554 break;
555 }
556
557 } else if (*j != *ptr) {
558 break;
559 }
560
561 j++;
562 ptr++;
563 len--;
564 }
Nigel Taob48ee752020-03-13 09:27:33 +1100565 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100566 }
567
568 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100569 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100570 return;
571 }
572 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
573 size_t n = wuffs_base__utf_8__encode(
574 wuffs_base__make_slice_u8(&u[0],
575 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
576 code_point);
577 if (n > 0) {
578 this->incremental_match_slice(&u[0], n);
579 }
580 }
581
582 // validate returns whether the (ptr, len) arguments form a valid JSON
583 // Pointer. In particular, it must be valid UTF-8, and either be empty or
584 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100585 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
586 // followed by either 'n' or 'r'.
587 static bool validate(char* query_c_string,
588 size_t length,
589 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100590 if (length <= 0) {
591 return true;
592 }
593 if (query_c_string[0] != '/') {
594 return false;
595 }
596 wuffs_base__slice_u8 s =
597 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
598 bool previous_was_tilde = false;
599 while (s.len > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +1000600 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s.ptr, s.len);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100601 if (!o.is_valid()) {
602 return false;
603 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100604
605 if (previous_was_tilde) {
606 switch (o.code_point) {
607 case '0':
608 case '1':
609 break;
610 case 'n':
611 case 'r':
612 if (strict_json_pointer_syntax) {
613 return false;
614 }
615 break;
616 default:
617 return false;
618 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100619 }
620 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100621
Nigel Tao0cd2f982020-03-03 23:03:02 +1100622 s.ptr += o.byte_length;
623 s.len -= o.byte_length;
624 }
625 return !previous_was_tilde;
626 }
Nigel Taod60815c2020-03-26 14:32:35 +1100627} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100628
629// ----
630
Nigel Tao168f60a2020-07-14 13:19:33 +1000631enum class file_format {
632 json,
633 cbor,
634};
635
Nigel Tao68920952020-03-03 11:25:18 +1100636struct {
637 int remaining_argc;
638 char** remaining_argv;
639
Nigel Tao3690e832020-03-12 16:52:26 +1100640 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100641 bool fail_if_unsandboxed;
Nigel Tao4e193592020-07-15 12:48:57 +1000642 file_format input_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000643 bool input_allow_json_comments;
644 bool input_allow_json_extra_comma;
Nigel Tao51a38292020-07-19 22:43:17 +1000645 bool input_allow_json_inf_nan_numbers;
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100646 uint32_t max_output_depth;
Nigel Tao168f60a2020-07-14 13:19:33 +1000647 file_format output_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000648 bool output_cbor_metadata_as_json_comments;
Nigel Taoc766bb72020-07-09 12:59:32 +1000649 bool output_json_extra_comma;
Nigel Taodd114692020-07-25 21:54:12 +1000650 bool output_json_inf_nan_numbers;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100651 char* query_c_string;
Nigel Taoecadf722020-07-13 08:22:34 +1000652 size_t spaces;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100653 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100654 bool tabs;
Nigel Taod60815c2020-03-26 14:32:35 +1100655} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100656
657const char* //
658parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000659 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100660 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100661
662 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
663 for (; c < argc; c++) {
664 char* arg = argv[c];
665 if (*arg++ != '-') {
666 break;
667 }
668
669 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
670 // cases, a bare "-" is not a flag (some programs may interpret it as
671 // stdin) and a bare "--" means to stop parsing flags.
672 if (*arg == '\x00') {
673 break;
674 } else if (*arg == '-') {
675 arg++;
676 if (*arg == '\x00') {
677 c++;
678 break;
679 }
680 }
681
Nigel Tao3690e832020-03-12 16:52:26 +1100682 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100683 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100684 continue;
685 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100686 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
687 g_flags.max_output_depth = 1;
688 continue;
689 } else if (!strncmp(arg, "d=", 2) ||
690 !strncmp(arg, "max-output-depth=", 16)) {
691 while (*arg++ != '=') {
692 }
693 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000694 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
695 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000696 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100697 g_flags.max_output_depth = (uint32_t)(u.value);
698 continue;
699 }
700 return g_usage;
701 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100702 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100703 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100704 continue;
705 }
Nigel Tao4e193592020-07-15 12:48:57 +1000706 if (!strcmp(arg, "i=cbor") || !strcmp(arg, "input-format=cbor")) {
707 g_flags.input_format = file_format::cbor;
708 continue;
709 }
710 if (!strcmp(arg, "i=json") || !strcmp(arg, "input-format=json")) {
711 g_flags.input_format = file_format::json;
712 continue;
713 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000714 if (!strcmp(arg, "input-allow-json-comments")) {
715 g_flags.input_allow_json_comments = true;
716 continue;
717 }
718 if (!strcmp(arg, "input-allow-json-extra-comma")) {
719 g_flags.input_allow_json_extra_comma = true;
Nigel Taoc766bb72020-07-09 12:59:32 +1000720 continue;
721 }
Nigel Tao51a38292020-07-19 22:43:17 +1000722 if (!strcmp(arg, "input-allow-json-inf-nan-numbers")) {
723 g_flags.input_allow_json_inf_nan_numbers = true;
724 continue;
725 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000726 if (!strcmp(arg, "o=cbor") || !strcmp(arg, "output-format=cbor")) {
727 g_flags.output_format = file_format::cbor;
728 continue;
729 }
730 if (!strcmp(arg, "o=json") || !strcmp(arg, "output-format=json")) {
731 g_flags.output_format = file_format::json;
732 continue;
733 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000734 if (!strcmp(arg, "output-cbor-metadata-as-json-comments")) {
735 g_flags.output_cbor_metadata_as_json_comments = true;
736 continue;
737 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000738 if (!strcmp(arg, "output-json-extra-comma")) {
739 g_flags.output_json_extra_comma = true;
740 continue;
741 }
Nigel Taodd114692020-07-25 21:54:12 +1000742 if (!strcmp(arg, "output-json-inf-nan-numbers")) {
743 g_flags.output_json_inf_nan_numbers = true;
744 continue;
745 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100746 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
747 while (*arg++ != '=') {
748 }
Nigel Taod60815c2020-03-26 14:32:35 +1100749 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100750 continue;
751 }
Nigel Taoecadf722020-07-13 08:22:34 +1000752 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
753 while (*arg++ != '=') {
754 }
755 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
756 g_flags.spaces = arg[0] - '0';
757 continue;
758 }
759 return g_usage;
760 }
761 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100762 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100763 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100764 }
765 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100766 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100767 continue;
768 }
769
Nigel Taod60815c2020-03-26 14:32:35 +1100770 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100771 }
772
Nigel Taod60815c2020-03-26 14:32:35 +1100773 if (g_flags.query_c_string &&
774 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
775 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100776 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
777 }
778
Nigel Taod60815c2020-03-26 14:32:35 +1100779 g_flags.remaining_argc = argc - c;
780 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100781 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100782}
783
Nigel Tao2cf76db2020-02-27 22:42:01 +1100784const char* //
785initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100786 g_dst = wuffs_base__make_io_buffer(
787 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100788 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100789
Nigel Taod60815c2020-03-26 14:32:35 +1100790 g_src = wuffs_base__make_io_buffer(
791 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100792 wuffs_base__empty_io_buffer_meta());
793
Nigel Taod60815c2020-03-26 14:32:35 +1100794 g_tok = wuffs_base__make_token_buffer(
795 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100796 wuffs_base__empty_token_buffer_meta());
797
Nigel Taod60815c2020-03-26 14:32:35 +1100798 g_curr_token_end_src_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100799
Nigel Tao850dc182020-07-21 22:52:04 +1000800 g_token_extension.category = 0;
801 g_token_extension.detail = 0;
802
Nigel Taod60815c2020-03-26 14:32:35 +1100803 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100804
Nigel Taod60815c2020-03-26 14:32:35 +1100805 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100806
Nigel Tao68920952020-03-03 11:25:18 +1100807 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100808 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100809 return "main: unsandboxed";
810 }
Nigel Tao01abc842020-03-06 21:42:33 +1100811 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100812 if (g_flags.remaining_argc >
813 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
814 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100815 }
816
Nigel Taod60815c2020-03-26 14:32:35 +1100817 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100818
819 // If the query is non-empty, suprress writing to stdout until we've
820 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100821 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
822 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100823
Nigel Tao4e193592020-07-15 12:48:57 +1000824 if (g_flags.input_format == file_format::json) {
825 TRY(g_json_decoder
826 .initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
827 .message());
828 g_dec = g_json_decoder.upcast_as__wuffs_base__token_decoder();
829 } else {
830 TRY(g_cbor_decoder
831 .initialize(sizeof__wuffs_cbor__decoder(), WUFFS_VERSION, 0)
832 .message());
833 g_dec = g_cbor_decoder.upcast_as__wuffs_base__token_decoder();
834 }
Nigel Tao4b186b02020-03-18 14:25:21 +1100835
Nigel Tao3c8589b2020-07-19 21:49:00 +1000836 if (g_flags.input_allow_json_comments) {
837 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
838 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
839 }
840 if (g_flags.input_allow_json_extra_comma) {
Nigel Tao4e193592020-07-15 12:48:57 +1000841 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000842 }
Nigel Tao51a38292020-07-19 22:43:17 +1000843 if (g_flags.input_allow_json_inf_nan_numbers) {
844 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_INF_NAN_NUMBERS, true);
845 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000846
Nigel Tao4b186b02020-03-18 14:25:21 +1100847 // Consume an optional whitespace trailer. This isn't part of the JSON spec,
848 // but it works better with line oriented Unix tools (such as "echo 123 |
849 // jsonptr" where it's "echo", not "echo -n") or hand-edited JSON files which
850 // can accidentally contain trailing whitespace.
Nigel Tao4e193592020-07-15 12:48:57 +1000851 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100852
853 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100854}
Nigel Tao1b073492020-02-16 22:11:36 +1100855
856// ----
857
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100858// ignore_return_value suppresses errors from -Wall -Werror.
859static void //
860ignore_return_value(int ignored) {}
861
Nigel Tao2914bae2020-02-26 09:40:30 +1100862const char* //
863read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +1100864 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +1100865 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +1100866 }
Nigel Taod60815c2020-03-26 14:32:35 +1100867 g_src.compact();
868 if (g_src.meta.wi >= g_src.data.len) {
869 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100870 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100871 while (true) {
Nigel Taod60815c2020-03-26 14:32:35 +1100872 ssize_t n = read(g_input_file_descriptor, g_src.data.ptr + g_src.meta.wi,
873 g_src.data.len - g_src.meta.wi);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100874 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100875 g_src.meta.wi += n;
876 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100877 break;
878 } else if (errno != EINTR) {
879 return strerror(errno);
880 }
Nigel Tao1b073492020-02-16 22:11:36 +1100881 }
882 return nullptr;
883}
884
Nigel Tao2914bae2020-02-26 09:40:30 +1100885const char* //
886flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100887 while (true) {
Nigel Taod60815c2020-03-26 14:32:35 +1100888 size_t n = g_dst.meta.wi - g_dst.meta.ri;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100889 if (n == 0) {
890 break;
Nigel Tao1b073492020-02-16 22:11:36 +1100891 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100892 const int stdout_fd = 1;
Nigel Taod60815c2020-03-26 14:32:35 +1100893 ssize_t i = write(stdout_fd, g_dst.data.ptr + g_dst.meta.ri, n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100894 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100895 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100896 } else if (errno != EINTR) {
897 return strerror(errno);
898 }
Nigel Tao1b073492020-02-16 22:11:36 +1100899 }
Nigel Taod60815c2020-03-26 14:32:35 +1100900 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +1100901 return nullptr;
902}
903
Nigel Tao2914bae2020-02-26 09:40:30 +1100904const char* //
905write_dst(const void* s, size_t n) {
Nigel Taod60815c2020-03-26 14:32:35 +1100906 if (g_suppress_write_dst > 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100907 return nullptr;
908 }
Nigel Tao1b073492020-02-16 22:11:36 +1100909 const uint8_t* p = static_cast<const uint8_t*>(s);
910 while (n > 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100911 size_t i = g_dst.writer_available();
Nigel Tao1b073492020-02-16 22:11:36 +1100912 if (i == 0) {
913 const char* z = flush_dst();
914 if (z) {
915 return z;
916 }
Nigel Taod60815c2020-03-26 14:32:35 +1100917 i = g_dst.writer_available();
Nigel Tao1b073492020-02-16 22:11:36 +1100918 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100919 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100920 }
921 }
922
923 if (i > n) {
924 i = n;
925 }
Nigel Taod60815c2020-03-26 14:32:35 +1100926 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
927 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +1100928 p += i;
929 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +1100930 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +1100931 }
932 return nullptr;
933}
934
935// ----
936
Nigel Tao168f60a2020-07-14 13:19:33 +1000937const char* //
938write_literal(uint64_t vbd) {
939 const char* ptr = nullptr;
940 size_t len = 0;
941 if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__UNDEFINED) {
942 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +1000943 // JSON's closest approximation to "undefined" is "null".
944 if (g_flags.output_cbor_metadata_as_json_comments) {
945 ptr = "/*cbor:undefined*/null";
946 len = 22;
947 } else {
948 ptr = "null";
949 len = 4;
950 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000951 } else {
952 ptr = "\xF7";
953 len = 1;
954 }
955 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__NULL) {
956 if (g_flags.output_format == file_format::json) {
957 ptr = "null";
958 len = 4;
959 } else {
960 ptr = "\xF6";
961 len = 1;
962 }
963 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__FALSE) {
964 if (g_flags.output_format == file_format::json) {
965 ptr = "false";
966 len = 5;
967 } else {
968 ptr = "\xF4";
969 len = 1;
970 }
971 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__TRUE) {
972 if (g_flags.output_format == file_format::json) {
973 ptr = "true";
974 len = 4;
975 } else {
976 ptr = "\xF5";
977 len = 1;
978 }
979 } else {
980 return "main: internal error: unexpected write_literal argument";
981 }
982 return write_dst(ptr, len);
983}
984
985// ----
986
987const char* //
Nigel Tao664f8432020-07-16 21:25:14 +1000988write_number_as_cbor_f64(double f) {
Nigel Tao168f60a2020-07-14 13:19:33 +1000989 uint8_t buf[9];
990 wuffs_base__lossy_value_u16 lv16 =
991 wuffs_base__ieee_754_bit_representation__from_f64_to_u16_truncate(f);
992 if (!lv16.lossy) {
993 buf[0] = 0xF9;
994 wuffs_base__store_u16be__no_bounds_check(&buf[1], lv16.value);
995 return write_dst(&buf[0], 3);
996 }
997 wuffs_base__lossy_value_u32 lv32 =
998 wuffs_base__ieee_754_bit_representation__from_f64_to_u32_truncate(f);
999 if (!lv32.lossy) {
1000 buf[0] = 0xFA;
1001 wuffs_base__store_u32be__no_bounds_check(&buf[1], lv32.value);
1002 return write_dst(&buf[0], 5);
1003 }
1004 buf[0] = 0xFB;
1005 wuffs_base__store_u64be__no_bounds_check(
1006 &buf[1], wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f));
1007 return write_dst(&buf[0], 9);
1008}
1009
1010const char* //
Nigel Tao664f8432020-07-16 21:25:14 +10001011write_number_as_cbor_u64(uint8_t base, uint64_t u) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001012 uint8_t buf[9];
1013 if (u < 0x18) {
1014 buf[0] = base | ((uint8_t)u);
1015 return write_dst(&buf[0], 1);
1016 } else if ((u >> 8) == 0) {
1017 buf[0] = base | 0x18;
1018 buf[1] = ((uint8_t)u);
1019 return write_dst(&buf[0], 2);
1020 } else if ((u >> 16) == 0) {
1021 buf[0] = base | 0x19;
1022 wuffs_base__store_u16be__no_bounds_check(&buf[1], ((uint16_t)u));
1023 return write_dst(&buf[0], 3);
1024 } else if ((u >> 32) == 0) {
1025 buf[0] = base | 0x1A;
1026 wuffs_base__store_u32be__no_bounds_check(&buf[1], ((uint32_t)u));
1027 return write_dst(&buf[0], 5);
1028 }
1029 buf[0] = base | 0x1B;
1030 wuffs_base__store_u64be__no_bounds_check(&buf[1], u);
1031 return write_dst(&buf[0], 9);
1032}
1033
1034const char* //
Nigel Tao5a616b62020-07-24 23:54:52 +10001035write_number_as_json_f64(uint8_t* ptr, size_t len) {
1036 double f;
1037 switch (len) {
1038 case 3:
1039 f = wuffs_base__ieee_754_bit_representation__from_u16_to_f64(
1040 wuffs_base__load_u16be__no_bounds_check(ptr + 1));
1041 break;
1042 case 5:
1043 f = wuffs_base__ieee_754_bit_representation__from_u32_to_f64(
1044 wuffs_base__load_u32be__no_bounds_check(ptr + 1));
1045 break;
1046 case 9:
1047 f = wuffs_base__ieee_754_bit_representation__from_u64_to_f64(
1048 wuffs_base__load_u64be__no_bounds_check(ptr + 1));
1049 break;
1050 default:
1051 return "main: internal error: unexpected write_number_as_json_f64 len";
1052 }
1053 uint8_t buf[512];
1054 const uint32_t precision = 0;
1055 size_t n = wuffs_base__render_number_f64(
1056 wuffs_base__make_slice_u8(&buf[0], sizeof buf), f, precision,
1057 WUFFS_BASE__RENDER_NUMBER_FXX__JUST_ENOUGH_PRECISION);
1058
Nigel Taodd114692020-07-25 21:54:12 +10001059 if (!g_flags.output_json_inf_nan_numbers) {
1060 // JSON numbers don't include Infinities or NaNs. For such numbers, their
1061 // IEEE 754 bit representation's 11 exponent bits are all on.
1062 uint64_t u = wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f);
1063 if (((u >> 52) & 0x7FF) == 0x7FF) {
1064 if (g_flags.output_cbor_metadata_as_json_comments) {
1065 TRY(write_dst("/*cbor:", 7));
1066 TRY(write_dst(&buf[0], n));
1067 TRY(write_dst("*/", 2));
1068 }
1069 return write_dst("null", 4);
Nigel Tao5a616b62020-07-24 23:54:52 +10001070 }
Nigel Tao5a616b62020-07-24 23:54:52 +10001071 }
1072
1073 return write_dst(&buf[0], n);
1074}
1075
1076const char* //
Nigel Tao850dc182020-07-21 22:52:04 +10001077write_cbor_minus_1_minus_x(uint8_t* ptr, size_t len) {
Nigel Tao27168032020-07-24 13:05:05 +10001078 if (g_flags.output_format == file_format::cbor) {
1079 return write_dst(ptr, len);
1080 }
1081
Nigel Tao850dc182020-07-21 22:52:04 +10001082 if (len != 9) {
1083 return "main: internal error: invalid ETC__MINUS_1_MINUS_X token length";
Nigel Tao664f8432020-07-16 21:25:14 +10001084 }
Nigel Tao850dc182020-07-21 22:52:04 +10001085 uint64_t u = 1 + wuffs_base__load_u64be__no_bounds_check(ptr + 1);
1086 if (u == 0) {
1087 // See the cbor.TOKEN_VALUE_MINOR__MINUS_1_MINUS_X comment re overflow.
1088 return write_dst("-18446744073709551616", 21);
Nigel Tao664f8432020-07-16 21:25:14 +10001089 }
1090 uint8_t buf[1 + WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1091 uint8_t* b = &buf[0];
Nigel Tao850dc182020-07-21 22:52:04 +10001092 *b++ = '-';
Nigel Tao664f8432020-07-16 21:25:14 +10001093 size_t n = wuffs_base__render_number_u64(
1094 wuffs_base__make_slice_u8(b, WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL), u,
1095 WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao850dc182020-07-21 22:52:04 +10001096 return write_dst(&buf[0], 1 + n);
Nigel Tao664f8432020-07-16 21:25:14 +10001097}
1098
1099const char* //
Nigel Tao042e94f2020-07-24 23:14:27 +10001100write_cbor_simple_value(uint64_t tag, uint8_t* ptr, size_t len) {
1101 if (g_flags.output_format == file_format::cbor) {
1102 return write_dst(ptr, len);
1103 }
1104
1105 if (!g_flags.output_cbor_metadata_as_json_comments) {
1106 return nullptr;
1107 }
1108 uint8_t buf[WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1109 size_t n = wuffs_base__render_number_u64(
1110 wuffs_base__make_slice_u8(&buf[0],
1111 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL),
1112 tag, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1113 TRY(write_dst("/*cbor:simple", 13));
1114 TRY(write_dst(&buf[0], n));
1115 return write_dst("*/null", 6);
1116}
1117
1118const char* //
Nigel Tao27168032020-07-24 13:05:05 +10001119write_cbor_tag(uint64_t tag, uint8_t* ptr, size_t len) {
1120 if (g_flags.output_format == file_format::cbor) {
1121 return write_dst(ptr, len);
1122 }
1123
1124 if (!g_flags.output_cbor_metadata_as_json_comments) {
1125 return nullptr;
1126 }
1127 uint8_t buf[WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1128 size_t n = wuffs_base__render_number_u64(
1129 wuffs_base__make_slice_u8(&buf[0],
1130 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL),
1131 tag, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1132 TRY(write_dst("/*cbor:tag", 10));
1133 TRY(write_dst(&buf[0], n));
1134 return write_dst("*/", 2);
1135}
1136
1137const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001138write_number(uint64_t vbd, uint8_t* ptr, size_t len) {
Nigel Tao4e193592020-07-15 12:48:57 +10001139 if (g_flags.output_format == file_format::json) {
Nigel Tao5a616b62020-07-24 23:54:52 +10001140 const uint64_t cfp_fbbe_fifb =
1141 WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT |
1142 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_BINARY_BIG_ENDIAN |
1143 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_IGNORE_FIRST_BYTE;
Nigel Tao51a38292020-07-19 22:43:17 +10001144 if (g_flags.input_format == file_format::json) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001145 return write_dst(ptr, len);
Nigel Tao5a616b62020-07-24 23:54:52 +10001146 } else if ((vbd & cfp_fbbe_fifb) == cfp_fbbe_fifb) {
1147 return write_number_as_json_f64(ptr, len);
Nigel Tao168f60a2020-07-14 13:19:33 +10001148 }
1149
Nigel Tao4e193592020-07-15 12:48:57 +10001150 // From here on, (g_flags.output_format == file_format::cbor).
Nigel Tao4e193592020-07-15 12:48:57 +10001151 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_TEXT) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001152 // First try to parse (ptr, len) as an integer. Something like
1153 // "1180591620717411303424" is a valid number (in the JSON sense) but will
1154 // overflow int64_t or uint64_t, so fall back to parsing it as a float64.
1155 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_INTEGER_SIGNED) {
1156 if ((len > 0) && (ptr[0] == '-')) {
1157 wuffs_base__result_i64 ri = wuffs_base__parse_number_i64(
1158 wuffs_base__make_slice_u8(ptr, len),
1159 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1160 if (ri.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001161 return write_number_as_cbor_u64(0x20, ~ri.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001162 }
1163 } else {
1164 wuffs_base__result_u64 ru = wuffs_base__parse_number_u64(
1165 wuffs_base__make_slice_u8(ptr, len),
1166 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1167 if (ru.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001168 return write_number_as_cbor_u64(0x00, ru.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001169 }
1170 }
1171 }
1172
1173 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT) {
1174 wuffs_base__result_f64 rf = wuffs_base__parse_number_f64(
1175 wuffs_base__make_slice_u8(ptr, len),
1176 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1177 if (rf.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001178 return write_number_as_cbor_f64(rf.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001179 }
1180 }
Nigel Tao51a38292020-07-19 22:43:17 +10001181 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_INF) {
1182 return write_dst("\xF9\xFC\x00", 3);
1183 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_INF) {
1184 return write_dst("\xF9\x7C\x00", 3);
1185 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_NAN) {
1186 return write_dst("\xF9\xFF\xFF", 3);
1187 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_NAN) {
1188 return write_dst("\xF9\x7F\xFF", 3);
Nigel Tao168f60a2020-07-14 13:19:33 +10001189 }
1190
Nigel Tao4e193592020-07-15 12:48:57 +10001191fail:
Nigel Tao168f60a2020-07-14 13:19:33 +10001192 return "main: internal error: unexpected write_number argument";
1193}
1194
Nigel Tao4e193592020-07-15 12:48:57 +10001195const char* //
Nigel Taoc9d4e342020-07-21 15:20:34 +10001196write_inline_integer(uint64_t x, bool x_is_signed, uint8_t* ptr, size_t len) {
Nigel Tao4e193592020-07-15 12:48:57 +10001197 if (g_flags.output_format == file_format::cbor) {
1198 return write_dst(ptr, len);
1199 }
1200
Nigel Taoc9d4e342020-07-21 15:20:34 +10001201 // Adding the two ETC__BYTE_LENGTH__ETC constants is overkill, but it's
1202 // simpler (for producing a constant-expression array size) than taking the
1203 // maximum of the two.
1204 uint8_t buf[WUFFS_BASE__I64__BYTE_LENGTH__MAX_INCL +
1205 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1206 wuffs_base__slice_u8 dst = wuffs_base__make_slice_u8(&buf[0], sizeof buf);
1207 size_t n =
1208 x_is_signed
1209 ? wuffs_base__render_number_i64(
1210 dst, (int64_t)x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS)
1211 : wuffs_base__render_number_u64(
1212 dst, x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao4e193592020-07-15 12:48:57 +10001213 return write_dst(&buf[0], n);
1214}
1215
Nigel Tao168f60a2020-07-14 13:19:33 +10001216// ----
1217
Nigel Tao2914bae2020-02-26 09:40:30 +11001218uint8_t //
1219hex_digit(uint8_t nibble) {
Nigel Taob5461bd2020-02-21 14:13:37 +11001220 nibble &= 0x0F;
1221 if (nibble <= 9) {
1222 return '0' + nibble;
1223 }
1224 return ('A' - 10) + nibble;
1225}
1226
Nigel Tao2914bae2020-02-26 09:40:30 +11001227const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001228flush_cbor_output_string() {
1229 uint8_t prefix[3];
1230 prefix[0] = g_cbor_output_string_is_utf_8 ? 0x60 : 0x40;
1231 if (g_cbor_output_string_length < 0x18) {
1232 prefix[0] |= g_cbor_output_string_length;
1233 TRY(write_dst(&prefix[0], 1));
1234 } else if (g_cbor_output_string_length <= 0xFF) {
1235 prefix[0] |= 0x18;
1236 prefix[1] = g_cbor_output_string_length;
1237 TRY(write_dst(&prefix[0], 2));
1238 } else if (g_cbor_output_string_length <= 0xFFFF) {
1239 prefix[0] |= 0x19;
1240 prefix[1] = g_cbor_output_string_length >> 8;
1241 prefix[2] = g_cbor_output_string_length;
1242 TRY(write_dst(&prefix[0], 3));
1243 } else {
1244 return "main: internal error: CBOR string output is too long";
1245 }
1246
1247 size_t n = g_cbor_output_string_length;
1248 g_cbor_output_string_length = 0;
1249 return write_dst(&g_cbor_output_string_array[0], n);
1250}
1251
1252const char* //
1253write_cbor_output_string(uint8_t* ptr, size_t len, bool finish) {
1254 // Check that g_cbor_output_string_array can hold any UTF-8 code point.
1255 if (CBOR_OUTPUT_STRING_ARRAY_SIZE < 4) {
1256 return "main: internal error: CBOR_OUTPUT_STRING_ARRAY_SIZE is too short";
1257 }
1258
1259 while (len > 0) {
1260 size_t available =
1261 CBOR_OUTPUT_STRING_ARRAY_SIZE - g_cbor_output_string_length;
1262 if (available >= len) {
1263 memcpy(&g_cbor_output_string_array[g_cbor_output_string_length], ptr,
1264 len);
1265 g_cbor_output_string_length += len;
1266 ptr += len;
1267 len = 0;
1268 break;
1269
1270 } else if (available > 0) {
1271 if (!g_cbor_output_string_is_multiple_chunks) {
1272 g_cbor_output_string_is_multiple_chunks = true;
1273 TRY(write_dst(g_cbor_output_string_is_utf_8 ? "\x7F" : "\x5F", 1));
Nigel Tao3b486982020-02-27 15:05:59 +11001274 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001275
1276 if (g_cbor_output_string_is_utf_8) {
1277 // Walk the end backwards to a UTF-8 boundary, so that each chunk of
1278 // the multi-chunk string is also valid UTF-8.
1279 while (available > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +10001280 wuffs_base__utf_8__next__output o =
1281 wuffs_base__utf_8__next_from_end(ptr, available);
Nigel Tao168f60a2020-07-14 13:19:33 +10001282 if ((o.code_point != WUFFS_BASE__UNICODE_REPLACEMENT_CHARACTER) ||
1283 (o.byte_length != 1)) {
1284 break;
1285 }
1286 available--;
1287 }
1288 }
1289
1290 memcpy(&g_cbor_output_string_array[g_cbor_output_string_length], ptr,
1291 available);
1292 g_cbor_output_string_length += available;
1293 ptr += available;
1294 len -= available;
Nigel Tao3b486982020-02-27 15:05:59 +11001295 }
1296
Nigel Tao168f60a2020-07-14 13:19:33 +10001297 TRY(flush_cbor_output_string());
1298 }
Nigel Taob9ad34f2020-03-03 12:44:01 +11001299
Nigel Tao168f60a2020-07-14 13:19:33 +10001300 if (finish) {
1301 TRY(flush_cbor_output_string());
1302 if (g_cbor_output_string_is_multiple_chunks) {
1303 TRY(write_dst("\xFF", 1));
1304 }
1305 }
1306 return nullptr;
1307}
Nigel Taob9ad34f2020-03-03 12:44:01 +11001308
Nigel Tao168f60a2020-07-14 13:19:33 +10001309const char* //
Nigel Tao7cb76542020-07-19 22:19:04 +10001310handle_unicode_code_point(uint32_t ucp) {
1311 if (g_flags.output_format == file_format::json) {
1312 if (ucp < 0x0020) {
1313 switch (ucp) {
1314 case '\b':
1315 return write_dst("\\b", 2);
1316 case '\f':
1317 return write_dst("\\f", 2);
1318 case '\n':
1319 return write_dst("\\n", 2);
1320 case '\r':
1321 return write_dst("\\r", 2);
1322 case '\t':
1323 return write_dst("\\t", 2);
1324 }
1325
1326 // Other bytes less than 0x0020 are valid UTF-8 but not valid in a
1327 // JSON string. They need to remain escaped.
1328 uint8_t esc6[6];
1329 esc6[0] = '\\';
1330 esc6[1] = 'u';
1331 esc6[2] = '0';
1332 esc6[3] = '0';
1333 esc6[4] = hex_digit(ucp >> 4);
1334 esc6[5] = hex_digit(ucp >> 0);
1335 return write_dst(&esc6[0], 6);
1336
1337 } else if (ucp == '\"') {
1338 return write_dst("\\\"", 2);
1339
1340 } else if (ucp == '\\') {
1341 return write_dst("\\\\", 2);
1342 }
1343 }
1344
1345 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1346 size_t n = wuffs_base__utf_8__encode(
1347 wuffs_base__make_slice_u8(&u[0],
1348 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1349 ucp);
1350 if (n == 0) {
1351 return "main: internal error: unexpected Unicode code point";
1352 }
1353
1354 if (g_flags.output_format == file_format::json) {
1355 return write_dst(&u[0], n);
1356 }
1357 return write_cbor_output_string(&u[0], n, false);
1358}
Nigel Taod191a3f2020-07-19 22:14:54 +10001359
1360const char* //
1361write_json_escaped_string(uint8_t* ptr, size_t len) {
1362restart:
1363 while (true) {
1364 size_t i;
1365 for (i = 0; i < len; i++) {
1366 uint8_t c = ptr[i];
1367 if ((c == '"') || (c == '\\') || (c < 0x20)) {
1368 TRY(write_dst(ptr, i));
1369 TRY(handle_unicode_code_point(c));
1370 ptr += i + 1;
1371 len -= i + 1;
1372 goto restart;
1373 }
1374 }
1375 TRY(write_dst(ptr, len));
1376 break;
1377 }
1378 return nullptr;
1379}
1380
1381const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001382handle_string(uint64_t vbd,
1383 uint64_t len,
1384 bool start_of_token_chain,
1385 bool continued) {
1386 if (start_of_token_chain) {
1387 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +10001388 if (g_flags.output_cbor_metadata_as_json_comments &&
1389 !(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
1390 TRY(write_dst("/*cbor:hex*/\"", 13));
1391 } else {
1392 TRY(write_dst("\"", 1));
1393 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001394 } else {
1395 g_cbor_output_string_length = 0;
1396 g_cbor_output_string_is_multiple_chunks = false;
1397 g_cbor_output_string_is_utf_8 =
1398 vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8;
1399 }
1400 g_query.restart_fragment(in_dict_before_key() && g_query.is_at(g_depth));
1401 }
1402
1403 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_0_DST_1_SRC_DROP) {
1404 // No-op.
1405 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
1406 uint8_t* ptr = g_src.data.ptr + g_curr_token_end_src_index - len;
1407 if (g_flags.output_format == file_format::json) {
Nigel Taoaf757722020-07-18 17:27:11 +10001408 if (g_flags.input_format == file_format::json) {
1409 TRY(write_dst(ptr, len));
1410 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8) {
Nigel Taod191a3f2020-07-19 22:14:54 +10001411 TRY(write_json_escaped_string(ptr, len));
Nigel Taoaf757722020-07-18 17:27:11 +10001412 } else {
1413 uint8_t as_hex[512];
1414 uint8_t* p = ptr;
1415 size_t n = len;
1416 while (n > 0) {
1417 wuffs_base__transform__output o = wuffs_base__base_16__encode2(
1418 wuffs_base__make_slice_u8(&as_hex[0], sizeof as_hex),
1419 wuffs_base__make_slice_u8(p, n), true,
1420 WUFFS_BASE__BASE_16__DEFAULT_OPTIONS);
1421 TRY(write_dst(&as_hex[0], o.num_dst));
1422 p += o.num_src;
1423 n -= o.num_src;
1424 if (!o.status.is_ok()) {
1425 return o.status.message();
1426 }
1427 }
1428 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001429 } else {
1430 TRY(write_cbor_output_string(ptr, len, false));
1431 }
1432 g_query.incremental_match_slice(ptr, len);
Nigel Taob9ad34f2020-03-03 12:44:01 +11001433 } else {
Nigel Tao168f60a2020-07-14 13:19:33 +10001434 return "main: internal error: unexpected string-token conversion";
1435 }
1436
1437 if (continued) {
1438 return nullptr;
1439 }
1440
1441 if (g_flags.output_format == file_format::json) {
1442 TRY(write_dst("\"", 1));
1443 } else {
1444 TRY(write_cbor_output_string(nullptr, 0, true));
1445 }
1446 return nullptr;
1447}
1448
Nigel Taod191a3f2020-07-19 22:14:54 +10001449// ----
1450
Nigel Tao3b486982020-02-27 15:05:59 +11001451const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001452handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001453 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001454 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001455 uint64_t vbd = t.value_base_detail();
1456 uint64_t len = t.length();
Nigel Tao1b073492020-02-16 22:11:36 +11001457
1458 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001459 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001460 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001461 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001462 return "main: no match for query";
1463 }
Nigel Taod60815c2020-03-26 14:32:35 +11001464 if (g_depth <= 0) {
1465 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001466 }
Nigel Taod60815c2020-03-26 14:32:35 +11001467 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001468
Nigel Taod60815c2020-03-26 14:32:35 +11001469 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1470 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001471 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao168f60a2020-07-14 13:19:33 +10001472 if (g_flags.output_format == file_format::json) {
1473 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1474 ? "\"[…]\""
1475 : "\"{…}\"",
1476 7));
1477 } else {
1478 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1479 ? "\x65[…]"
1480 : "\x65{…}",
1481 6));
1482 }
1483 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001484 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001485 if ((g_ctx != context::in_list_after_bracket) &&
1486 (g_ctx != context::in_dict_after_brace) &&
1487 !g_flags.compact_output) {
Nigel Taoc766bb72020-07-09 12:59:32 +10001488 if (g_flags.output_json_extra_comma) {
1489 TRY(write_dst(",\n", 2));
1490 } else {
1491 TRY(write_dst("\n", 1));
1492 }
Nigel Taod60815c2020-03-26 14:32:35 +11001493 for (uint32_t i = 0; i < g_depth; i++) {
1494 TRY(write_dst(
1495 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001496 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001497 }
Nigel Tao1b073492020-02-16 22:11:36 +11001498 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001499
1500 TRY(write_dst(
1501 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1502 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001503 } else {
1504 TRY(write_dst("\xFF", 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001505 }
1506
Nigel Taod60815c2020-03-26 14:32:35 +11001507 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1508 ? context::in_list_after_value
1509 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001510 goto after_value;
1511 }
1512
Nigel Taod1c928a2020-02-28 12:43:53 +11001513 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
1514 // continuation of a multi-token chain.
Nigel Tao2ef39992020-04-09 17:24:39 +10001515 if (start_of_token_chain) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001516 if (g_flags.output_format != file_format::json) {
1517 // No-op.
1518 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001519 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1520 } else if (g_ctx != context::none) {
Nigel Taof8dfc762020-07-23 23:35:44 +10001521 if ((g_ctx == context::in_dict_after_brace) ||
1522 (g_ctx == context::in_dict_after_value)) {
1523 // Reject dict keys that aren't UTF-8 strings, which could otherwise
1524 // happen with -i=cbor -o=json.
1525 if ((vbc != WUFFS_BASE__TOKEN__VBC__STRING) ||
1526 !(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
1527 return "main: cannot convert CBOR non-text-string to JSON map key";
1528 }
1529 }
1530 if ((g_ctx == context::in_list_after_value) ||
1531 (g_ctx == context::in_dict_after_value)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001532 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001533 }
Nigel Taod60815c2020-03-26 14:32:35 +11001534 if (!g_flags.compact_output) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001535 TRY(write_dst("\n", 1));
Nigel Taod60815c2020-03-26 14:32:35 +11001536 for (size_t i = 0; i < g_depth; i++) {
1537 TRY(write_dst(
1538 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001539 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao0cd2f982020-03-03 23:03:02 +11001540 }
1541 }
1542 }
1543
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001544 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001545 if (g_query.is_at(g_depth)) {
1546 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001547 case context::in_list_after_bracket:
1548 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001549 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001550 break;
1551 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001552 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001553 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001554 default:
1555 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001556 }
1557 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001558 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001559 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001560 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001561 // There is no next fragment. We have matched the complete query, and
1562 // the upcoming JSON value is the result of that query.
1563 //
Nigel Taod60815c2020-03-26 14:32:35 +11001564 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1565 // we were about to decode a top-level value. This makes any subsequent
1566 // indentation be relative to this point, and we will return g_eod
1567 // after the upcoming JSON value is complete.
1568 if (g_suppress_write_dst != 1) {
1569 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001570 }
Nigel Taod60815c2020-03-26 14:32:35 +11001571 g_suppress_write_dst = 0;
1572 g_ctx = context::none;
1573 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001574 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1575 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1576 // The query has moved on to the next fragment but the upcoming JSON
1577 // value is not a container.
1578 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001579 }
1580 }
1581
1582 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001583 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001584 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001585 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001586 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1587 g_suppress_write_dst++;
Nigel Tao168f60a2020-07-14 13:19:33 +10001588 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001589 TRY(write_dst(
1590 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1591 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001592 } else {
1593 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1594 ? "\x9F"
1595 : "\xBF",
1596 1));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001597 }
Nigel Taod60815c2020-03-26 14:32:35 +11001598 g_depth++;
1599 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1600 ? context::in_list_after_bracket
1601 : context::in_dict_after_brace;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001602 return nullptr;
1603
Nigel Tao2cf76db2020-02-27 22:42:01 +11001604 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Tao168f60a2020-07-14 13:19:33 +10001605 TRY(handle_string(vbd, len, start_of_token_chain, t.continued()));
Nigel Tao496e88b2020-04-09 22:10:08 +10001606 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001607 return nullptr;
1608 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001609 goto after_value;
1610
1611 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001612 if (!t.continued()) {
1613 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001614 }
1615 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001616 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001617 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001618
Nigel Tao85fba7f2020-02-29 16:28:06 +11001619 case WUFFS_BASE__TOKEN__VBC__LITERAL:
Nigel Tao168f60a2020-07-14 13:19:33 +10001620 TRY(write_literal(vbd));
1621 goto after_value;
1622
Nigel Tao2cf76db2020-02-27 22:42:01 +11001623 case WUFFS_BASE__TOKEN__VBC__NUMBER:
Nigel Tao168f60a2020-07-14 13:19:33 +10001624 TRY(write_number(vbd, g_src.data.ptr + g_curr_token_end_src_index - len,
1625 len));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001626 goto after_value;
Nigel Tao4e193592020-07-15 12:48:57 +10001627
Nigel Taoc9d4e342020-07-21 15:20:34 +10001628 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1629 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED: {
1630 bool x_is_signed = vbc == WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED;
1631 uint64_t x = x_is_signed
1632 ? ((uint64_t)(t.value_base_detail__sign_extended()))
1633 : vbd;
Nigel Tao850dc182020-07-21 22:52:04 +10001634 if (t.continued()) {
Nigel Tao03a87ea2020-07-21 23:29:26 +10001635 if (len != 0) {
1636 return "main: internal error: unexpected to-be-extended length";
1637 }
Nigel Tao850dc182020-07-21 22:52:04 +10001638 g_token_extension.category = vbc;
1639 g_token_extension.detail = x;
1640 return nullptr;
1641 }
Nigel Tao4e193592020-07-15 12:48:57 +10001642 TRY(write_inline_integer(
Nigel Taoc9d4e342020-07-21 15:20:34 +10001643 x, x_is_signed, g_src.data.ptr + g_curr_token_end_src_index - len,
1644 len));
Nigel Tao4e193592020-07-15 12:48:57 +10001645 goto after_value;
Nigel Taoc9d4e342020-07-21 15:20:34 +10001646 }
Nigel Tao1b073492020-02-16 22:11:36 +11001647 }
1648
Nigel Tao850dc182020-07-21 22:52:04 +10001649 int64_t ext = t.value_extension();
1650 if (ext >= 0) {
Nigel Tao27168032020-07-24 13:05:05 +10001651 uint64_t x = (g_token_extension.detail
1652 << WUFFS_BASE__TOKEN__VALUE_EXTENSION__NUM_BITS) |
1653 ((uint64_t)ext);
Nigel Tao850dc182020-07-21 22:52:04 +10001654 switch (g_token_extension.category) {
1655 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1656 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED:
Nigel Tao850dc182020-07-21 22:52:04 +10001657 TRY(write_inline_integer(
1658 x,
1659 g_token_extension.category ==
1660 WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED,
1661 g_src.data.ptr + g_curr_token_end_src_index - len, len));
1662 g_token_extension.category = 0;
1663 g_token_extension.detail = 0;
1664 goto after_value;
Nigel Tao27168032020-07-24 13:05:05 +10001665 case CATEGORY_CBOR_TAG:
1666 TRY(write_cbor_tag(
1667 x, g_src.data.ptr + g_curr_token_end_src_index - len, len));
1668 g_token_extension.category = 0;
1669 g_token_extension.detail = 0;
1670 return nullptr;
Nigel Tao850dc182020-07-21 22:52:04 +10001671 }
1672 }
1673
Nigel Tao664f8432020-07-16 21:25:14 +10001674 if (t.value_major() == WUFFS_CBOR__TOKEN_VALUE_MAJOR) {
1675 uint64_t value_minor = t.value_minor();
Nigel Taoc9e20102020-07-24 23:19:12 +10001676 if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__MINUS_1_MINUS_X) {
1677 TRY(write_cbor_minus_1_minus_x(
1678 g_src.data.ptr + g_curr_token_end_src_index - len, len));
1679 goto after_value;
1680 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__SIMPLE_VALUE) {
1681 TRY(write_cbor_simple_value(
1682 vbd, g_src.data.ptr + g_curr_token_end_src_index - len, len));
1683 goto after_value;
1684 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__TAG) {
Nigel Tao27168032020-07-24 13:05:05 +10001685 if (t.continued()) {
1686 if (len != 0) {
1687 return "main: internal error: unexpected to-be-extended length";
1688 }
1689 g_token_extension.category = CATEGORY_CBOR_TAG;
1690 g_token_extension.detail = vbd;
1691 return nullptr;
1692 }
1693 return write_cbor_tag(
1694 vbd, g_src.data.ptr + g_curr_token_end_src_index - len, len);
Nigel Tao664f8432020-07-16 21:25:14 +10001695 }
1696 }
1697
1698 // Return an error if we didn't match the (value_major, value_minor) or
1699 // (vbc, vbd) pair.
Nigel Tao2cf76db2020-02-27 22:42:01 +11001700 return "main: internal error: unexpected token";
1701 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001702
Nigel Tao2cf76db2020-02-27 22:42:01 +11001703 // Book-keeping after completing a value (whether a container value or a
1704 // simple value). Empty parent containers are no longer empty. If the parent
1705 // container is a "{...}" object, toggle between keys and values.
1706after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001707 if (g_depth == 0) {
1708 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001709 }
Nigel Taod60815c2020-03-26 14:32:35 +11001710 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001711 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001712 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001713 break;
1714 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001715 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001716 break;
1717 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001718 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001719 break;
1720 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001721 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001722 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001723 default:
1724 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001725 }
1726 return nullptr;
1727}
1728
1729const char* //
1730main1(int argc, char** argv) {
1731 TRY(initialize_globals(argc, argv));
1732
Nigel Taocd183f92020-07-14 12:11:05 +10001733 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001734 while (true) {
Nigel Tao4e193592020-07-15 12:48:57 +10001735 wuffs_base__status status = g_dec->decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001736 &g_tok, &g_src,
1737 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001738
Nigel Taod60815c2020-03-26 14:32:35 +11001739 while (g_tok.meta.ri < g_tok.meta.wi) {
1740 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao2cf76db2020-02-27 22:42:01 +11001741 uint64_t n = t.length();
Nigel Taod60815c2020-03-26 14:32:35 +11001742 if ((g_src.meta.ri - g_curr_token_end_src_index) < n) {
1743 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001744 }
Nigel Taod60815c2020-03-26 14:32:35 +11001745 g_curr_token_end_src_index += n;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001746
Nigel Taod0b16cb2020-03-14 10:15:54 +11001747 // Skip filler tokens (e.g. whitespace).
Nigel Tao3c8589b2020-07-19 21:49:00 +10001748 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao496e88b2020-04-09 22:10:08 +10001749 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001750 continue;
1751 }
1752
Nigel Tao2ef39992020-04-09 17:24:39 +10001753 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao496e88b2020-04-09 22:10:08 +10001754 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001755 if (z == nullptr) {
1756 continue;
Nigel Taod60815c2020-03-26 14:32:35 +11001757 } else if (z == g_eod) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001758 goto end_of_data;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001759 }
1760 return z;
Nigel Tao1b073492020-02-16 22:11:36 +11001761 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001762
1763 if (status.repr == nullptr) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001764 return "main: internal error: unexpected end of token stream";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001765 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Taod60815c2020-03-26 14:32:35 +11001766 if (g_curr_token_end_src_index != g_src.meta.ri) {
1767 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001768 }
1769 TRY(read_src());
Nigel Taod60815c2020-03-26 14:32:35 +11001770 g_curr_token_end_src_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001771 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001772 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001773 } else {
1774 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001775 }
1776 }
Nigel Tao0cd2f982020-03-03 23:03:02 +11001777end_of_data:
1778
Nigel Taod60815c2020-03-26 14:32:35 +11001779 // With a non-empty g_query, don't try to consume trailing whitespace or
Nigel Tao0cd2f982020-03-03 23:03:02 +11001780 // confirm that we've processed all the tokens.
Nigel Taod60815c2020-03-26 14:32:35 +11001781 if (g_flags.query_c_string && *g_flags.query_c_string) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001782 return nullptr;
1783 }
Nigel Tao6b161af2020-02-24 11:01:48 +11001784
Nigel Tao6b161af2020-02-24 11:01:48 +11001785 // Check that we've exhausted the input.
Nigel Taod60815c2020-03-26 14:32:35 +11001786 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001787 TRY(read_src());
1788 }
Nigel Taod60815c2020-03-26 14:32:35 +11001789 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
Nigel Tao51a38292020-07-19 22:43:17 +10001790 return "main: valid JSON|CBOR followed by further (unexpected) data";
Nigel Tao6b161af2020-02-24 11:01:48 +11001791 }
1792
1793 // Check that we've used all of the decoded tokens, other than trailing
Nigel Tao4b186b02020-03-18 14:25:21 +11001794 // filler tokens. For example, "true\n" is valid JSON (and fully consumed
1795 // with WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE enabled) with a trailing
1796 // filler token for the "\n".
Nigel Taod60815c2020-03-26 14:32:35 +11001797 for (; g_tok.meta.ri < g_tok.meta.wi; g_tok.meta.ri++) {
1798 if (g_tok.data.ptr[g_tok.meta.ri].value_base_category() !=
Nigel Tao6b161af2020-02-24 11:01:48 +11001799 WUFFS_BASE__TOKEN__VBC__FILLER) {
1800 return "main: internal error: decoded OK but unprocessed tokens remain";
1801 }
1802 }
1803
1804 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001805}
1806
Nigel Tao2914bae2020-02-26 09:40:30 +11001807int //
1808compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001809 if (!status_msg) {
1810 return 0;
1811 }
Nigel Tao01abc842020-03-06 21:42:33 +11001812 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001813 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001814 n = strlen(status_msg);
1815 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001816 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001817 if (n >= 2047) {
1818 status_msg = "main: internal error: error message is too long";
1819 n = strnlen(status_msg, 2047);
1820 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001821 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001822 const int stderr_fd = 2;
1823 ignore_return_value(write(stderr_fd, status_msg, n));
1824 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Tao9cc2c252020-02-23 17:05:49 +11001825 // Return an exit code of 1 for regular (forseen) errors, e.g. badly
1826 // formatted or unsupported input.
1827 //
1828 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1829 // run-time checks found that an internal invariant did not hold.
1830 //
1831 // Automated testing, including badly formatted inputs, can therefore
1832 // discriminate between expected failure (exit code 1) and unexpected failure
1833 // (other non-zero exit codes). Specifically, exit code 2 for internal
1834 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1835 // linux) for a segmentation fault (e.g. null pointer dereference).
1836 return strstr(status_msg, "internal error:") ? 2 : 1;
1837}
1838
Nigel Tao2914bae2020-02-26 09:40:30 +11001839int //
1840main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001841 // Look for an input filename (the first non-flag argument) in argv. If there
1842 // is one, open it (but do not read from it) before we self-impose a sandbox.
1843 //
1844 // Flags start with "-", unless it comes after a bare "--" arg.
1845 {
1846 bool dash_dash = false;
1847 int a;
1848 for (a = 1; a < argc; a++) {
1849 char* arg = argv[a];
1850 if ((arg[0] == '-') && !dash_dash) {
1851 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1852 continue;
1853 }
Nigel Taod60815c2020-03-26 14:32:35 +11001854 g_input_file_descriptor = open(arg, O_RDONLY);
1855 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001856 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1857 return 1;
1858 }
1859 break;
1860 }
1861 }
1862
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001863#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1864 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001865 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001866#endif
1867
Nigel Tao0cd2f982020-03-03 23:03:02 +11001868 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001869 if (g_wrote_to_dst) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001870 const char* z1 = (g_flags.output_format == file_format::json)
1871 ? write_dst("\n", 1)
1872 : nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001873 const char* z2 = flush_dst();
1874 z = z ? z : (z1 ? z1 : z2);
1875 }
1876 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001877
1878#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1879 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1880 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1881 // only SYS_exit.
1882 syscall(SYS_exit, exit_code);
1883#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001884 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001885}