blob: e088dc5b67b355382ebc0d9a1b76a9f953617038 [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
17/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110018jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao168f60a2020-07-14 13:19:33 +100019(RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin and writes CBOR
20or canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110021
Nigel Taod60815c2020-03-26 14:32:35 +110022See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110023
24----
25
26JSON Pointer (and this program's implementation) is one of many JSON query
27languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
28simple and fewer-featured compared to those others.
29
Nigel Tao168f60a2020-07-14 13:19:33 +100030One benefit of simplicity is that this program's CBOR, JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110031implementations do not dynamically allocate or free memory (yet it does not
32require that the entire input fits in memory at once). They are therefore
33trivially protected against certain bug classes: memory leaks, double-frees and
34use-after-frees.
35
Nigel Tao168f60a2020-07-14 13:19:33 +100036The CBOR and JSON implementations are also written in the Wuffs programming
37language (and then transpiled to C/C++), which is memory-safe (e.g. array
38indexing is bounds-checked) but also prevents integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110039
Nigel Taofe0cbbd2020-03-05 22:01:30 +110040For defense in depth, on Linux, this program also self-imposes a
41SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
42or writing its output. Under this sandbox, the only permitted system calls are
43read, write, exit and sigreturn.
44
Nigel Tao168f60a2020-07-14 13:19:33 +100045All together, this program aims to safely handle untrusted CBOR or JSON files
46without fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110047
48----
Nigel Tao1b073492020-02-16 22:11:36 +110049
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110050As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
51JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
52"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
53was first published on 2016-10-26 and updated on 2018-03-30.
54
Nigel Tao0cd2f982020-03-03 23:03:02 +110055After modifying this program, run "build-example.sh example/jsonptr/" and then
56"script/run-json-test-suite.sh" to catch correctness regressions.
57
58----
59
Nigel Taod0b16cb2020-03-14 10:15:54 +110060This program uses Wuffs' JSON decoder at a relatively low level, processing the
61decoder's token-stream output individually. The core loop, in pseudo-code, is
62"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110063changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110064output text based on that state and the token's source text. Notably,
65handle_token is not recursive, even though JSON values can nest.
66
67This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
68string, object) comprises one or more JSON tokens.
69
70An alternative, higher-level approach is in the sibling example/jsonfindptrs
71program. Neither approach is better or worse per se, but when studying this
72program, be aware that there are multiple ways to use Wuffs' JSON decoder.
73
74The two programs, jsonfindptrs and jsonptr, also demonstrate different
75trade-offs with regard to JSON object duplicate keys. The JSON spec permits
76different implementations to allow or reject duplicate keys. It is not always
77clear which approach is safer. Rejecting them is certainly unambiguous, and
78security bugs can lurk in ambiguous corners of a file format, if two different
79implementations both silently accept a file but differ on how to interpret it.
80On the other hand, in the worst case, detecting duplicate keys requires O(N)
81memory, where N is the size of the (potentially untrusted) input.
82
83This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
84mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
85it runs in a SECCOMP_MODE_STRICT sandbox.
86
87----
88
Nigel Tao1b073492020-02-16 22:11:36 +110089This example program differs from most other example Wuffs programs in that it
90is written in C++, not C.
91
92$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
93
94for a C++ compiler $CXX, such as clang++ or g++.
95*/
96
Nigel Tao721190a2020-04-03 22:25:21 +110097#if defined(__cplusplus) && (__cplusplus < 201103L)
98#error "This C++ program requires -std=c++11 or later"
99#endif
100
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100101#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100102#include <fcntl.h>
103#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100104#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100105#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100106
107// Wuffs ships as a "single file C library" or "header file library" as per
108// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
109//
110// To use that single file as a "foo.c"-like implementation, instead of a
111// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
112// compiling it.
113#define WUFFS_IMPLEMENTATION
114
115// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
116// release/c/etc.c whitelist which parts of Wuffs to build. That file contains
117// the entire Wuffs standard library, implementing a variety of codecs and file
118// formats. Without this macro definition, an optimizing compiler or linker may
119// very well discard Wuffs code for unused codecs, but listing the Wuffs
120// modules we use makes that process explicit. Preprocessing means that such
121// code simply isn't compiled.
122#define WUFFS_CONFIG__MODULES
123#define WUFFS_CONFIG__MODULE__BASE
Nigel Tao4e193592020-07-15 12:48:57 +1000124#define WUFFS_CONFIG__MODULE__CBOR
Nigel Tao1b073492020-02-16 22:11:36 +1100125#define WUFFS_CONFIG__MODULE__JSON
126
127// If building this program in an environment that doesn't easily accommodate
128// relative includes, you can use the script/inline-c-relative-includes.go
129// program to generate a stand-alone C++ file.
130#include "../../release/c/wuffs-unsupported-snapshot.c"
131
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100132#if defined(__linux__)
133#include <linux/prctl.h>
134#include <linux/seccomp.h>
135#include <sys/prctl.h>
136#include <sys/syscall.h>
137#define WUFFS_EXAMPLE_USE_SECCOMP
138#endif
139
Nigel Tao2cf76db2020-02-27 22:42:01 +1100140#define TRY(error_msg) \
141 do { \
142 const char* z = error_msg; \
143 if (z) { \
144 return z; \
145 } \
146 } while (false)
147
Nigel Taod60815c2020-03-26 14:32:35 +1100148static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100149
Nigel Taod60815c2020-03-26 14:32:35 +1100150static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100151 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100152 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100153 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100154 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100155 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao4e193592020-07-15 12:48:57 +1000156 " -i=FMT -input-format={json,cbor}\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000157 " -o=FMT -output-format={json,cbor}\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100158 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000159 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100160 " -t -tabs\n"
161 " -fail-if-unsandboxed\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000162 " -input-allow-json-comments\n"
163 " -input-allow-json-extra-comma\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000164 " -input-allow-json-inf-nan-numbers\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000165 " -output-cbor-metadata-as-json-comments\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000166 " -output-json-extra-comma\n"
Nigel Taodd114692020-07-25 21:54:12 +1000167 " -output-json-inf-nan-numbers\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000168 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100169 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100170 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100171 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100172 "----\n"
173 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100174 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000175 "Pointer (RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin\n"
176 "and writes CBOR or canonicalized, formatted UTF-8 JSON to stdout. The\n"
177 "input and output formats do not have to match, but conversion between\n"
178 "formats may be lossy.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100179 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000180 "Canonicalized JSON means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-\n"
181 "written as \"abc\\n\\txÅ·z\". It does not sort object keys or reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100182 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100183 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000184 "CBOR output is non-canonical (in the RFC 7049 Section 3.9 sense), as\n"
185 "sorting map keys and measuring indefinite-length containers requires\n"
186 "O(input_length) memory but this program runs in O(1) memory.\n"
187 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100188 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000189 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000190 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags. Those\n"
191 "flags only apply to JSON (not CBOR) output.\n"
192 "\n"
193 "The -input-format and -output-format flags select between reading and\n"
194 "writing JSON (the default, a textual format) or CBOR (a binary format).\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100195 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000196 "The -input-allow-json-comments flag allows \"/*slash-star*/\" and\n"
197 "\"//slash-slash\" C-style comments within JSON input.\n"
198 "\n"
199 "The -input-allow-json-extra-comma flag allows input like \"[1,2,]\",\n"
200 "with a comma after the final element of a JSON list or dictionary.\n"
201 "\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000202 "The -input-allow-json-inf-nan-numbers flag allows non-finite floating\n"
203 "point numbers (infinities and not-a-numbers) within JSON input.\n"
204 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000205 "The -output-cbor-metadata-as-json-comments writes CBOR tags and other\n"
206 "metadata as /*comments*/, when -i=json and -o=cbor are also set. Such\n"
207 "comments are non-compliant with the JSON specification but many parsers\n"
208 "accept them.\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000209 "\n"
210 "The -output-json-extra-comma flag writes extra commas, regardless of\n"
Nigel Taodd114692020-07-25 21:54:12 +1000211 "whether the input had it. Such commas are non-compliant with the JSON\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000212 "specification but many parsers accept them and they can produce simpler\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000213 "line-based diffs. This flag is ignored when -compact-output is set.\n"
214 "\n"
Nigel Taodd114692020-07-25 21:54:12 +1000215 "The -output-json-inf-nan-numbers flag writes Inf and NaN instead of a\n"
216 "substitute null value, when converting from -i=cbor to -o=json. Such\n"
217 "values are non-compliant with the JSON specification but many parsers\n"
218 "accept them.\n"
219 "\n"
Nigel Tao983a74f2020-07-27 15:17:46 +1000220 "CBOR is more permissive about map keys but JSON only allows strings.\n"
221 "When converting from -i=cbor to -o=json, this program rejects keys other\n"
222 "than text strings and non-negative integers (CBOR major types 3 and 0).\n"
223 "Integer keys like 123 quoted to be string keys like \"123\". Being even\n"
224 "more permissive would have complicated interactions with the -query=STR\n"
225 "flag and streaming input, so this program just rejects other keys.\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000226 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100227 "----\n"
228 "\n"
229 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100230 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100231 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
232 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100233 "will print:\n"
234 " \"baz\"\n"
235 "\n"
236 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100237 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100238 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
239 "child (the value in a key-value pair) of the root whose key is the empty\n"
240 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100241 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000242 "If the query found a valid JSON|CBOR value, this program will return a\n"
243 "zero exit code even if the rest of the input isn't valid. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100244 "did not find a value, or found an invalid one, this program returns a\n"
245 "non-zero exit code, but may still print partial output to stdout.\n"
246 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000247 "The JSON and CBOR specifications (https://json.org/ or RFC 8259; RFC\n"
248 "7049) permit implementations to allow duplicate keys, as this one does.\n"
249 "This JSON Pointer implementation is also greedy, following the first\n"
250 "match for each fragment without back-tracking. For example, the\n"
251 "\"/foo/bar\" query will fail if the root object has multiple \"foo\"\n"
252 "children but the first one doesn't have a \"bar\" child, even if later\n"
253 "ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100254 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000255 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
256 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
257 "\"~\" and \"/\". Without this flag, this program also lets \"~n\" and\n"
258 "\"~r\" escape the New Line and Carriage Return ASCII control characters,\n"
259 "which can work better with line oriented Unix tools that assume exactly\n"
260 "one value (i.e. one JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100261 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100262 "----\n"
263 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100264 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000265 "output depth. JSON|CBOR containers ([] arrays and {} objects) can hold\n"
266 "other containers. When this flag is set, containers at depth NUM are\n"
267 "replaced with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is\n"
268 "equivalent to -d=1. The flag's absence means an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100269 "\n"
270 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000271 "affect whether or not the input is considered valid JSON|CBOR. The\n"
272 "format specifications permit implementations to set their own maximum\n"
273 "input depth. This JSON|CBOR implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100274 "\n"
275 "Depth is measured in terms of nested containers. It is unaffected by the\n"
276 "number of spaces or tabs used to indent.\n"
277 "\n"
278 "When both -max-output-depth and -query are set, the output depth is\n"
279 "measured from when the query resolves, not from the input root. The\n"
280 "input depth (measured from the root) is still limited to 1024.\n"
281 "\n"
282 "----\n"
283 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100284 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
285 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100286 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100287
Nigel Tao2cf76db2020-02-27 22:42:01 +1100288// ----
289
Nigel Taof3146c22020-03-26 08:47:42 +1100290// Wuffs allows either statically or dynamically allocated work buffers. This
291// program exercises static allocation.
292#define WORK_BUFFER_ARRAY_SIZE \
293 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
294#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100295uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100296#else
297// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100298uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100299#endif
300
Nigel Taod60815c2020-03-26 14:32:35 +1100301bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100302
Nigel Taod60815c2020-03-26 14:32:35 +1100303int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100304
Nigel Tao2cf76db2020-02-27 22:42:01 +1100305#define MAX_INDENT 8
Nigel Tao107f0ef2020-03-01 21:35:02 +1100306#define INDENT_SPACES_STRING " "
Nigel Tao6e7d1412020-03-06 09:21:35 +1100307#define INDENT_TAB_STRING "\t"
Nigel Tao107f0ef2020-03-01 21:35:02 +1100308
Nigel Taofdac24a2020-03-06 21:53:08 +1100309#ifndef DST_BUFFER_ARRAY_SIZE
310#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100311#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100312#ifndef SRC_BUFFER_ARRAY_SIZE
313#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100314#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100315#ifndef TOKEN_BUFFER_ARRAY_SIZE
316#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100317#endif
318
Nigel Taod60815c2020-03-26 14:32:35 +1100319uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
320uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
321wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100322
Nigel Taod60815c2020-03-26 14:32:35 +1100323wuffs_base__io_buffer g_dst;
324wuffs_base__io_buffer g_src;
325wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100326
Nigel Taod60815c2020-03-26 14:32:35 +1100327// g_curr_token_end_src_index is the g_src.data.ptr index of the end of the
328// current token. An invariant is that (g_curr_token_end_src_index <=
329// g_src.meta.ri).
330size_t g_curr_token_end_src_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100331
Nigel Tao27168032020-07-24 13:05:05 +1000332// Valid token's VBCs range in 0 ..= 15. Values over that are for tokens from
333// outside of the base package, such as the CBOR package.
334#define CATEGORY_CBOR_TAG 16
335
Nigel Tao850dc182020-07-21 22:52:04 +1000336struct {
337 uint64_t category;
338 uint64_t detail;
339} g_token_extension;
340
Nigel Tao77c75512020-07-27 21:35:11 +1000341bool g_previous_token_was_cbor_tag;
342
Nigel Taod60815c2020-03-26 14:32:35 +1100343uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100344
345enum class context {
346 none,
347 in_list_after_bracket,
348 in_list_after_value,
349 in_dict_after_brace,
350 in_dict_after_key,
351 in_dict_after_value,
Nigel Taod60815c2020-03-26 14:32:35 +1100352} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100353
Nigel Tao0cd2f982020-03-03 23:03:02 +1100354bool //
355in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100356 return (g_ctx == context::in_dict_after_brace) ||
357 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100358}
359
Nigel Taod60815c2020-03-26 14:32:35 +1100360uint32_t g_suppress_write_dst;
361bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100362
Nigel Tao4e193592020-07-15 12:48:57 +1000363wuffs_cbor__decoder g_cbor_decoder;
364wuffs_json__decoder g_json_decoder;
365wuffs_base__token_decoder* g_dec;
Nigel Tao1b073492020-02-16 22:11:36 +1100366
Nigel Taoea532452020-07-27 00:03:00 +1000367// g_spool_array is a 4 KiB buffer.
Nigel Tao168f60a2020-07-14 13:19:33 +1000368//
Nigel Taoea532452020-07-27 00:03:00 +1000369// For -o=cbor, strings up to SPOOL_ARRAY_SIZE long are written as a single
370// definite-length string. Longer strings are written as an indefinite-length
371// string containing multiple definite-length chunks, each of length up to
372// SPOOL_ARRAY_SIZE. See RFC 7049 section 2.2.2 "Indefinite-Length Byte Strings
373// and Text Strings". Byte strings and text strings are spooled prior to this
374// chunking, so that the output is determinate even when the input is streamed.
375//
376// For -o=json, CBOR byte strings are spooled prior to base64url encoding,
377// which map multiples of 3 source bytes to 4 destination bytes.
378//
379// If raising SPOOL_ARRAY_SIZE above 0xFFFF then you will also have to update
380// flush_cbor_output_string.
381#define SPOOL_ARRAY_SIZE 4096
382uint8_t g_spool_array[SPOOL_ARRAY_SIZE];
Nigel Tao168f60a2020-07-14 13:19:33 +1000383
384uint32_t g_cbor_output_string_length;
385bool g_cbor_output_string_is_multiple_chunks;
386bool g_cbor_output_string_is_utf_8;
387
Nigel Taoea532452020-07-27 00:03:00 +1000388uint32_t g_json_output_byte_string_length;
389
Nigel Tao0cd2f982020-03-03 23:03:02 +1100390// ----
391
392// Query is a JSON Pointer query. After initializing with a NUL-terminated C
393// string, its multiple fragments are consumed as the program walks the JSON
394// data from stdin. For example, letting "$" denote a NUL, suppose that we
395// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100396// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100397//
398// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
399// / a p p l e / b a n a n a / 1 2 / d u r i a n $
400// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
401// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100402// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100403//
Nigel Taob48ee752020-03-13 09:27:33 +1100404// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
405// start (inclusive) and end (exclusive) of the query fragment. They satisfy
406// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
407// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100408//
Nigel Taob48ee752020-03-13 09:27:33 +1100409// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
410// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100411//
412// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
413// tokens, as backslash-escaped values within that JSON string may each get
414// their own token.
415//
Nigel Taob48ee752020-03-13 09:27:33 +1100416// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100417//
Nigel Taob48ee752020-03-13 09:27:33 +1100418// While mfj remains non-nullptr, each token's unescaped contents are then
419// compared to that part of the fragment from mfj to mfk. If it is a prefix
420// (including the case of an exact match), then mfj is advanced by the
421// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100422//
423// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
424// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100425// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
426// responsible for calling Query::validate (with a strict_json_pointer_syntax
427// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100428//
Nigel Taob48ee752020-03-13 09:27:33 +1100429// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
430// incrementally match the object key with the query fragment. For example, if
431// we have already matched the "ban" of "banana", then we would accept any of
432// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
433// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100434//
Nigel Taob48ee752020-03-13 09:27:33 +1100435// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100436// v
437// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
438// / a p p l e / b a n a n a / 1 2 / d u r i a n $
439// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
440// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100441// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100442//
443// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100444// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
445// have a fragment match: the query fragment equals the object key. If there is
446// a next fragment (in this example, "12") we move the frag_etc pointers to its
447// start and end and increment Query::m_depth. Otherwise, we have matched the
448// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100449//
450// The discussion above centers on object keys. If the query fragment is
451// numeric then it can also match as an array index: the string fragment "12"
452// will match an array's 13th element (starting counting from zero). See RFC
453// 6901 for its precise definition of an "array index" number.
454//
Nigel Taob48ee752020-03-13 09:27:33 +1100455// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100456// whose type (wuffs_base__result_u64) is a result type. An error result means
457// that the fragment is not an array index. A value result holds the number of
458// list elements remaining. When matching a query fragment in an array (instead
459// of in an object), each element ticks this number down towards zero. At zero,
460// the upcoming JSON value is the one that matches the query fragment.
461class Query {
462 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100463 uint8_t* m_frag_i;
464 uint8_t* m_frag_j;
465 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100466
Nigel Taob48ee752020-03-13 09:27:33 +1100467 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100468
Nigel Taob48ee752020-03-13 09:27:33 +1100469 wuffs_base__result_u64 m_array_index;
Nigel Tao983a74f2020-07-27 15:17:46 +1000470 uint64_t m_array_index_remaining;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100471
472 public:
473 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100474 m_frag_i = (uint8_t*)query_c_string;
475 m_frag_j = (uint8_t*)query_c_string;
476 m_frag_k = (uint8_t*)query_c_string;
477 m_depth = 0;
478 m_array_index.status.repr = "#main: not an array index query fragment";
479 m_array_index.value = 0;
Nigel Tao983a74f2020-07-27 15:17:46 +1000480 m_array_index_remaining = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100481 }
482
Nigel Taob48ee752020-03-13 09:27:33 +1100483 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100484
Nigel Taob48ee752020-03-13 09:27:33 +1100485 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100486
487 // tick returns whether the fragment is a valid array index whose value is
488 // zero. If valid but non-zero, it decrements it and returns false.
489 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100490 if (m_array_index.status.is_ok()) {
Nigel Tao983a74f2020-07-27 15:17:46 +1000491 if (m_array_index_remaining == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100492 return true;
493 }
Nigel Tao983a74f2020-07-27 15:17:46 +1000494 m_array_index_remaining--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100495 }
496 return false;
497 }
498
499 // next_fragment moves to the next fragment, returning whether it existed.
500 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100501 uint8_t* k = m_frag_k;
502 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100503
504 this->reset(nullptr);
505
506 if (!k || (*k != '/')) {
507 return false;
508 }
509 k++;
510
511 bool all_digits = true;
512 uint8_t* i = k;
513 while ((*k != '\x00') && (*k != '/')) {
514 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
515 k++;
516 }
Nigel Taob48ee752020-03-13 09:27:33 +1100517 m_frag_i = i;
518 m_frag_j = i;
519 m_frag_k = k;
520 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100521 if (all_digits) {
522 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000523 m_array_index = wuffs_base__parse_number_u64(
524 wuffs_base__make_slice_u8(i, k - i),
525 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao983a74f2020-07-27 15:17:46 +1000526 m_array_index_remaining = m_array_index.value;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100527 }
528 return true;
529 }
530
Nigel Taob48ee752020-03-13 09:27:33 +1100531 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100532
Nigel Taob48ee752020-03-13 09:27:33 +1100533 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100534
Nigel Tao983a74f2020-07-27 15:17:46 +1000535 void restart_and_match_unsigned_number(bool enable, uint64_t u) {
536 m_frag_j =
537 (enable && (m_array_index.status.is_ok()) && (m_array_index.value == u))
538 ? m_frag_k
539 : nullptr;
540 }
541
Nigel Tao0cd2f982020-03-03 23:03:02 +1100542 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100543 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100544 return;
545 }
Nigel Taob48ee752020-03-13 09:27:33 +1100546 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100547 while (true) {
548 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100549 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100550 return;
551 }
552
553 if (*j == '\x00') {
554 break;
555
556 } else if (*j == '~') {
557 j++;
558 if (*j == '0') {
559 if (*ptr != '~') {
560 break;
561 }
562 } else if (*j == '1') {
563 if (*ptr != '/') {
564 break;
565 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100566 } else if (*j == 'n') {
567 if (*ptr != '\n') {
568 break;
569 }
570 } else if (*j == 'r') {
571 if (*ptr != '\r') {
572 break;
573 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100574 } else {
575 break;
576 }
577
578 } else if (*j != *ptr) {
579 break;
580 }
581
582 j++;
583 ptr++;
584 len--;
585 }
Nigel Taob48ee752020-03-13 09:27:33 +1100586 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100587 }
588
589 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100590 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100591 return;
592 }
593 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
594 size_t n = wuffs_base__utf_8__encode(
595 wuffs_base__make_slice_u8(&u[0],
596 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
597 code_point);
598 if (n > 0) {
599 this->incremental_match_slice(&u[0], n);
600 }
601 }
602
603 // validate returns whether the (ptr, len) arguments form a valid JSON
604 // Pointer. In particular, it must be valid UTF-8, and either be empty or
605 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100606 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
607 // followed by either 'n' or 'r'.
608 static bool validate(char* query_c_string,
609 size_t length,
610 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100611 if (length <= 0) {
612 return true;
613 }
614 if (query_c_string[0] != '/') {
615 return false;
616 }
617 wuffs_base__slice_u8 s =
618 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
619 bool previous_was_tilde = false;
620 while (s.len > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +1000621 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s.ptr, s.len);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100622 if (!o.is_valid()) {
623 return false;
624 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100625
626 if (previous_was_tilde) {
627 switch (o.code_point) {
628 case '0':
629 case '1':
630 break;
631 case 'n':
632 case 'r':
633 if (strict_json_pointer_syntax) {
634 return false;
635 }
636 break;
637 default:
638 return false;
639 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100640 }
641 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100642
Nigel Tao0cd2f982020-03-03 23:03:02 +1100643 s.ptr += o.byte_length;
644 s.len -= o.byte_length;
645 }
646 return !previous_was_tilde;
647 }
Nigel Taod60815c2020-03-26 14:32:35 +1100648} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100649
650// ----
651
Nigel Tao168f60a2020-07-14 13:19:33 +1000652enum class file_format {
653 json,
654 cbor,
655};
656
Nigel Tao68920952020-03-03 11:25:18 +1100657struct {
658 int remaining_argc;
659 char** remaining_argv;
660
Nigel Tao3690e832020-03-12 16:52:26 +1100661 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100662 bool fail_if_unsandboxed;
Nigel Tao4e193592020-07-15 12:48:57 +1000663 file_format input_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000664 bool input_allow_json_comments;
665 bool input_allow_json_extra_comma;
Nigel Tao51a38292020-07-19 22:43:17 +1000666 bool input_allow_json_inf_nan_numbers;
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100667 uint32_t max_output_depth;
Nigel Tao168f60a2020-07-14 13:19:33 +1000668 file_format output_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000669 bool output_cbor_metadata_as_json_comments;
Nigel Taoc766bb72020-07-09 12:59:32 +1000670 bool output_json_extra_comma;
Nigel Taodd114692020-07-25 21:54:12 +1000671 bool output_json_inf_nan_numbers;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100672 char* query_c_string;
Nigel Taoecadf722020-07-13 08:22:34 +1000673 size_t spaces;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100674 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100675 bool tabs;
Nigel Taod60815c2020-03-26 14:32:35 +1100676} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100677
678const char* //
679parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000680 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100681 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100682
683 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
684 for (; c < argc; c++) {
685 char* arg = argv[c];
686 if (*arg++ != '-') {
687 break;
688 }
689
690 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
691 // cases, a bare "-" is not a flag (some programs may interpret it as
692 // stdin) and a bare "--" means to stop parsing flags.
693 if (*arg == '\x00') {
694 break;
695 } else if (*arg == '-') {
696 arg++;
697 if (*arg == '\x00') {
698 c++;
699 break;
700 }
701 }
702
Nigel Tao3690e832020-03-12 16:52:26 +1100703 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100704 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100705 continue;
706 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100707 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
708 g_flags.max_output_depth = 1;
709 continue;
710 } else if (!strncmp(arg, "d=", 2) ||
711 !strncmp(arg, "max-output-depth=", 16)) {
712 while (*arg++ != '=') {
713 }
714 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000715 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
716 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000717 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100718 g_flags.max_output_depth = (uint32_t)(u.value);
719 continue;
720 }
721 return g_usage;
722 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100723 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100724 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100725 continue;
726 }
Nigel Tao4e193592020-07-15 12:48:57 +1000727 if (!strcmp(arg, "i=cbor") || !strcmp(arg, "input-format=cbor")) {
728 g_flags.input_format = file_format::cbor;
729 continue;
730 }
731 if (!strcmp(arg, "i=json") || !strcmp(arg, "input-format=json")) {
732 g_flags.input_format = file_format::json;
733 continue;
734 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000735 if (!strcmp(arg, "input-allow-json-comments")) {
736 g_flags.input_allow_json_comments = true;
737 continue;
738 }
739 if (!strcmp(arg, "input-allow-json-extra-comma")) {
740 g_flags.input_allow_json_extra_comma = true;
Nigel Taoc766bb72020-07-09 12:59:32 +1000741 continue;
742 }
Nigel Tao51a38292020-07-19 22:43:17 +1000743 if (!strcmp(arg, "input-allow-json-inf-nan-numbers")) {
744 g_flags.input_allow_json_inf_nan_numbers = true;
745 continue;
746 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000747 if (!strcmp(arg, "o=cbor") || !strcmp(arg, "output-format=cbor")) {
748 g_flags.output_format = file_format::cbor;
749 continue;
750 }
751 if (!strcmp(arg, "o=json") || !strcmp(arg, "output-format=json")) {
752 g_flags.output_format = file_format::json;
753 continue;
754 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000755 if (!strcmp(arg, "output-cbor-metadata-as-json-comments")) {
756 g_flags.output_cbor_metadata_as_json_comments = true;
757 continue;
758 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000759 if (!strcmp(arg, "output-json-extra-comma")) {
760 g_flags.output_json_extra_comma = true;
761 continue;
762 }
Nigel Taodd114692020-07-25 21:54:12 +1000763 if (!strcmp(arg, "output-json-inf-nan-numbers")) {
764 g_flags.output_json_inf_nan_numbers = true;
765 continue;
766 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100767 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
768 while (*arg++ != '=') {
769 }
Nigel Taod60815c2020-03-26 14:32:35 +1100770 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100771 continue;
772 }
Nigel Taoecadf722020-07-13 08:22:34 +1000773 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
774 while (*arg++ != '=') {
775 }
776 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
777 g_flags.spaces = arg[0] - '0';
778 continue;
779 }
780 return g_usage;
781 }
782 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100783 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100784 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100785 }
786 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100787 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100788 continue;
789 }
790
Nigel Taod60815c2020-03-26 14:32:35 +1100791 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100792 }
793
Nigel Taod60815c2020-03-26 14:32:35 +1100794 if (g_flags.query_c_string &&
795 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
796 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100797 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
798 }
799
Nigel Taod60815c2020-03-26 14:32:35 +1100800 g_flags.remaining_argc = argc - c;
801 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100802 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100803}
804
Nigel Tao2cf76db2020-02-27 22:42:01 +1100805const char* //
806initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100807 g_dst = wuffs_base__make_io_buffer(
808 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100809 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100810
Nigel Taod60815c2020-03-26 14:32:35 +1100811 g_src = wuffs_base__make_io_buffer(
812 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100813 wuffs_base__empty_io_buffer_meta());
814
Nigel Taod60815c2020-03-26 14:32:35 +1100815 g_tok = wuffs_base__make_token_buffer(
816 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100817 wuffs_base__empty_token_buffer_meta());
818
Nigel Taod60815c2020-03-26 14:32:35 +1100819 g_curr_token_end_src_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100820
Nigel Tao850dc182020-07-21 22:52:04 +1000821 g_token_extension.category = 0;
822 g_token_extension.detail = 0;
823
Nigel Tao77c75512020-07-27 21:35:11 +1000824 g_previous_token_was_cbor_tag = false;
825
Nigel Taod60815c2020-03-26 14:32:35 +1100826 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100827
Nigel Taod60815c2020-03-26 14:32:35 +1100828 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100829
Nigel Tao68920952020-03-03 11:25:18 +1100830 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100831 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100832 return "main: unsandboxed";
833 }
Nigel Tao01abc842020-03-06 21:42:33 +1100834 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100835 if (g_flags.remaining_argc >
836 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
837 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100838 }
839
Nigel Taod60815c2020-03-26 14:32:35 +1100840 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100841
Nigel Taoc96b31c2020-07-27 22:37:23 +1000842 // If the query is non-empty, suppress writing to stdout until we've
Nigel Tao0cd2f982020-03-03 23:03:02 +1100843 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100844 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
845 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100846
Nigel Tao4e193592020-07-15 12:48:57 +1000847 if (g_flags.input_format == file_format::json) {
848 TRY(g_json_decoder
849 .initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
850 .message());
851 g_dec = g_json_decoder.upcast_as__wuffs_base__token_decoder();
852 } else {
853 TRY(g_cbor_decoder
854 .initialize(sizeof__wuffs_cbor__decoder(), WUFFS_VERSION, 0)
855 .message());
856 g_dec = g_cbor_decoder.upcast_as__wuffs_base__token_decoder();
857 }
Nigel Tao4b186b02020-03-18 14:25:21 +1100858
Nigel Tao3c8589b2020-07-19 21:49:00 +1000859 if (g_flags.input_allow_json_comments) {
860 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
861 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
862 }
863 if (g_flags.input_allow_json_extra_comma) {
Nigel Tao4e193592020-07-15 12:48:57 +1000864 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000865 }
Nigel Tao51a38292020-07-19 22:43:17 +1000866 if (g_flags.input_allow_json_inf_nan_numbers) {
867 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_INF_NAN_NUMBERS, true);
868 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000869
Nigel Tao4b186b02020-03-18 14:25:21 +1100870 // Consume an optional whitespace trailer. This isn't part of the JSON spec,
871 // but it works better with line oriented Unix tools (such as "echo 123 |
872 // jsonptr" where it's "echo", not "echo -n") or hand-edited JSON files which
873 // can accidentally contain trailing whitespace.
Nigel Tao4e193592020-07-15 12:48:57 +1000874 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100875
876 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100877}
Nigel Tao1b073492020-02-16 22:11:36 +1100878
879// ----
880
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100881// ignore_return_value suppresses errors from -Wall -Werror.
882static void //
883ignore_return_value(int ignored) {}
884
Nigel Tao2914bae2020-02-26 09:40:30 +1100885const char* //
886read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +1100887 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +1100888 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +1100889 }
Nigel Taod60815c2020-03-26 14:32:35 +1100890 g_src.compact();
891 if (g_src.meta.wi >= g_src.data.len) {
892 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100893 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100894 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000895 ssize_t n = read(g_input_file_descriptor, g_src.writer_pointer(),
896 g_src.writer_length());
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100897 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100898 g_src.meta.wi += n;
899 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100900 break;
901 } else if (errno != EINTR) {
902 return strerror(errno);
903 }
Nigel Tao1b073492020-02-16 22:11:36 +1100904 }
905 return nullptr;
906}
907
Nigel Tao2914bae2020-02-26 09:40:30 +1100908const char* //
909flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100910 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000911 size_t n = g_dst.reader_length();
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100912 if (n == 0) {
913 break;
Nigel Tao1b073492020-02-16 22:11:36 +1100914 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100915 const int stdout_fd = 1;
Nigel Taod6a10df2020-07-27 11:47:47 +1000916 ssize_t i = write(stdout_fd, g_dst.reader_pointer(), n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100917 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100918 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100919 } else if (errno != EINTR) {
920 return strerror(errno);
921 }
Nigel Tao1b073492020-02-16 22:11:36 +1100922 }
Nigel Taod60815c2020-03-26 14:32:35 +1100923 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +1100924 return nullptr;
925}
926
Nigel Tao2914bae2020-02-26 09:40:30 +1100927const char* //
928write_dst(const void* s, size_t n) {
Nigel Taod60815c2020-03-26 14:32:35 +1100929 if (g_suppress_write_dst > 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100930 return nullptr;
931 }
Nigel Tao1b073492020-02-16 22:11:36 +1100932 const uint8_t* p = static_cast<const uint8_t*>(s);
933 while (n > 0) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000934 size_t i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +1100935 if (i == 0) {
936 const char* z = flush_dst();
937 if (z) {
938 return z;
939 }
Nigel Taod6a10df2020-07-27 11:47:47 +1000940 i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +1100941 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100942 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100943 }
944 }
945
946 if (i > n) {
947 i = n;
948 }
Nigel Taod60815c2020-03-26 14:32:35 +1100949 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
950 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +1100951 p += i;
952 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +1100953 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +1100954 }
955 return nullptr;
956}
957
958// ----
959
Nigel Tao168f60a2020-07-14 13:19:33 +1000960const char* //
961write_literal(uint64_t vbd) {
962 const char* ptr = nullptr;
963 size_t len = 0;
964 if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__UNDEFINED) {
965 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +1000966 // JSON's closest approximation to "undefined" is "null".
967 if (g_flags.output_cbor_metadata_as_json_comments) {
968 ptr = "/*cbor:undefined*/null";
969 len = 22;
970 } else {
971 ptr = "null";
972 len = 4;
973 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000974 } else {
975 ptr = "\xF7";
976 len = 1;
977 }
978 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__NULL) {
979 if (g_flags.output_format == file_format::json) {
980 ptr = "null";
981 len = 4;
982 } else {
983 ptr = "\xF6";
984 len = 1;
985 }
986 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__FALSE) {
987 if (g_flags.output_format == file_format::json) {
988 ptr = "false";
989 len = 5;
990 } else {
991 ptr = "\xF4";
992 len = 1;
993 }
994 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__TRUE) {
995 if (g_flags.output_format == file_format::json) {
996 ptr = "true";
997 len = 4;
998 } else {
999 ptr = "\xF5";
1000 len = 1;
1001 }
1002 } else {
1003 return "main: internal error: unexpected write_literal argument";
1004 }
1005 return write_dst(ptr, len);
1006}
1007
1008// ----
1009
1010const char* //
Nigel Tao664f8432020-07-16 21:25:14 +10001011write_number_as_cbor_f64(double f) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001012 uint8_t buf[9];
1013 wuffs_base__lossy_value_u16 lv16 =
1014 wuffs_base__ieee_754_bit_representation__from_f64_to_u16_truncate(f);
1015 if (!lv16.lossy) {
1016 buf[0] = 0xF9;
1017 wuffs_base__store_u16be__no_bounds_check(&buf[1], lv16.value);
1018 return write_dst(&buf[0], 3);
1019 }
1020 wuffs_base__lossy_value_u32 lv32 =
1021 wuffs_base__ieee_754_bit_representation__from_f64_to_u32_truncate(f);
1022 if (!lv32.lossy) {
1023 buf[0] = 0xFA;
1024 wuffs_base__store_u32be__no_bounds_check(&buf[1], lv32.value);
1025 return write_dst(&buf[0], 5);
1026 }
1027 buf[0] = 0xFB;
1028 wuffs_base__store_u64be__no_bounds_check(
1029 &buf[1], wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f));
1030 return write_dst(&buf[0], 9);
1031}
1032
1033const char* //
Nigel Tao664f8432020-07-16 21:25:14 +10001034write_number_as_cbor_u64(uint8_t base, uint64_t u) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001035 uint8_t buf[9];
1036 if (u < 0x18) {
1037 buf[0] = base | ((uint8_t)u);
1038 return write_dst(&buf[0], 1);
1039 } else if ((u >> 8) == 0) {
1040 buf[0] = base | 0x18;
1041 buf[1] = ((uint8_t)u);
1042 return write_dst(&buf[0], 2);
1043 } else if ((u >> 16) == 0) {
1044 buf[0] = base | 0x19;
1045 wuffs_base__store_u16be__no_bounds_check(&buf[1], ((uint16_t)u));
1046 return write_dst(&buf[0], 3);
1047 } else if ((u >> 32) == 0) {
1048 buf[0] = base | 0x1A;
1049 wuffs_base__store_u32be__no_bounds_check(&buf[1], ((uint32_t)u));
1050 return write_dst(&buf[0], 5);
1051 }
1052 buf[0] = base | 0x1B;
1053 wuffs_base__store_u64be__no_bounds_check(&buf[1], u);
1054 return write_dst(&buf[0], 9);
1055}
1056
1057const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001058write_number_as_json_f64(wuffs_base__slice_u8 s) {
Nigel Tao5a616b62020-07-24 23:54:52 +10001059 double f;
Nigel Taoee6927f2020-07-27 12:08:33 +10001060 switch (s.len) {
Nigel Tao5a616b62020-07-24 23:54:52 +10001061 case 3:
1062 f = wuffs_base__ieee_754_bit_representation__from_u16_to_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001063 wuffs_base__load_u16be__no_bounds_check(s.ptr + 1));
Nigel Tao5a616b62020-07-24 23:54:52 +10001064 break;
1065 case 5:
1066 f = wuffs_base__ieee_754_bit_representation__from_u32_to_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001067 wuffs_base__load_u32be__no_bounds_check(s.ptr + 1));
Nigel Tao5a616b62020-07-24 23:54:52 +10001068 break;
1069 case 9:
1070 f = wuffs_base__ieee_754_bit_representation__from_u64_to_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001071 wuffs_base__load_u64be__no_bounds_check(s.ptr + 1));
Nigel Tao5a616b62020-07-24 23:54:52 +10001072 break;
1073 default:
1074 return "main: internal error: unexpected write_number_as_json_f64 len";
1075 }
1076 uint8_t buf[512];
1077 const uint32_t precision = 0;
1078 size_t n = wuffs_base__render_number_f64(
1079 wuffs_base__make_slice_u8(&buf[0], sizeof buf), f, precision,
1080 WUFFS_BASE__RENDER_NUMBER_FXX__JUST_ENOUGH_PRECISION);
1081
Nigel Taodd114692020-07-25 21:54:12 +10001082 if (!g_flags.output_json_inf_nan_numbers) {
1083 // JSON numbers don't include Infinities or NaNs. For such numbers, their
1084 // IEEE 754 bit representation's 11 exponent bits are all on.
1085 uint64_t u = wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f);
1086 if (((u >> 52) & 0x7FF) == 0x7FF) {
1087 if (g_flags.output_cbor_metadata_as_json_comments) {
1088 TRY(write_dst("/*cbor:", 7));
1089 TRY(write_dst(&buf[0], n));
1090 TRY(write_dst("*/", 2));
1091 }
1092 return write_dst("null", 4);
Nigel Tao5a616b62020-07-24 23:54:52 +10001093 }
Nigel Tao5a616b62020-07-24 23:54:52 +10001094 }
1095
1096 return write_dst(&buf[0], n);
1097}
1098
1099const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001100write_cbor_minus_1_minus_x(wuffs_base__slice_u8 s) {
Nigel Tao27168032020-07-24 13:05:05 +10001101 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001102 return write_dst(s.ptr, s.len);
Nigel Tao27168032020-07-24 13:05:05 +10001103 }
1104
Nigel Taoee6927f2020-07-27 12:08:33 +10001105 if (s.len != 9) {
Nigel Tao850dc182020-07-21 22:52:04 +10001106 return "main: internal error: invalid ETC__MINUS_1_MINUS_X token length";
Nigel Tao664f8432020-07-16 21:25:14 +10001107 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001108 uint64_t u = 1 + wuffs_base__load_u64be__no_bounds_check(s.ptr + 1);
Nigel Tao850dc182020-07-21 22:52:04 +10001109 if (u == 0) {
1110 // See the cbor.TOKEN_VALUE_MINOR__MINUS_1_MINUS_X comment re overflow.
1111 return write_dst("-18446744073709551616", 21);
Nigel Tao664f8432020-07-16 21:25:14 +10001112 }
1113 uint8_t buf[1 + WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1114 uint8_t* b = &buf[0];
Nigel Tao850dc182020-07-21 22:52:04 +10001115 *b++ = '-';
Nigel Tao664f8432020-07-16 21:25:14 +10001116 size_t n = wuffs_base__render_number_u64(
1117 wuffs_base__make_slice_u8(b, WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL), u,
1118 WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao850dc182020-07-21 22:52:04 +10001119 return write_dst(&buf[0], 1 + n);
Nigel Tao664f8432020-07-16 21:25:14 +10001120}
1121
1122const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001123write_cbor_simple_value(uint64_t tag, wuffs_base__slice_u8 s) {
Nigel Tao042e94f2020-07-24 23:14:27 +10001124 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001125 return write_dst(s.ptr, s.len);
Nigel Tao042e94f2020-07-24 23:14:27 +10001126 }
1127
1128 if (!g_flags.output_cbor_metadata_as_json_comments) {
Nigel Tao35c4e952020-07-27 18:01:05 +10001129 return write_dst("null", 4);
Nigel Tao042e94f2020-07-24 23:14:27 +10001130 }
1131 uint8_t buf[WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1132 size_t n = wuffs_base__render_number_u64(
1133 wuffs_base__make_slice_u8(&buf[0],
1134 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL),
1135 tag, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1136 TRY(write_dst("/*cbor:simple", 13));
1137 TRY(write_dst(&buf[0], n));
1138 return write_dst("*/null", 6);
1139}
1140
1141const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001142write_cbor_tag(uint64_t tag, wuffs_base__slice_u8 s) {
Nigel Tao27168032020-07-24 13:05:05 +10001143 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001144 return write_dst(s.ptr, s.len);
Nigel Tao27168032020-07-24 13:05:05 +10001145 }
1146
1147 if (!g_flags.output_cbor_metadata_as_json_comments) {
1148 return nullptr;
1149 }
1150 uint8_t buf[WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1151 size_t n = wuffs_base__render_number_u64(
1152 wuffs_base__make_slice_u8(&buf[0],
1153 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL),
1154 tag, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1155 TRY(write_dst("/*cbor:tag", 10));
1156 TRY(write_dst(&buf[0], n));
1157 return write_dst("*/", 2);
1158}
1159
1160const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001161write_number(uint64_t vbd, wuffs_base__slice_u8 s) {
Nigel Tao0a68f632020-07-28 10:39:32 +10001162 const uint64_t cfp_fbbe_fifb =
1163 WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT |
1164 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_BINARY_BIG_ENDIAN |
1165 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_IGNORE_FIRST_BYTE;
1166
Nigel Tao4e193592020-07-15 12:48:57 +10001167 if (g_flags.output_format == file_format::json) {
Nigel Tao51a38292020-07-19 22:43:17 +10001168 if (g_flags.input_format == file_format::json) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001169 return write_dst(s.ptr, s.len);
Nigel Tao5a616b62020-07-24 23:54:52 +10001170 } else if ((vbd & cfp_fbbe_fifb) == cfp_fbbe_fifb) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001171 return write_number_as_json_f64(s);
Nigel Tao168f60a2020-07-14 13:19:33 +10001172 }
1173
Nigel Tao4e193592020-07-15 12:48:57 +10001174 // From here on, (g_flags.output_format == file_format::cbor).
Nigel Tao4e193592020-07-15 12:48:57 +10001175 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_TEXT) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001176 // First try to parse s as an integer. Something like
Nigel Tao168f60a2020-07-14 13:19:33 +10001177 // "1180591620717411303424" is a valid number (in the JSON sense) but will
1178 // overflow int64_t or uint64_t, so fall back to parsing it as a float64.
1179 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_INTEGER_SIGNED) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001180 if ((s.len > 0) && (s.ptr[0] == '-')) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001181 wuffs_base__result_i64 ri = wuffs_base__parse_number_i64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001182 s, WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao168f60a2020-07-14 13:19:33 +10001183 if (ri.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001184 return write_number_as_cbor_u64(0x20, ~ri.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001185 }
1186 } else {
1187 wuffs_base__result_u64 ru = wuffs_base__parse_number_u64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001188 s, WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao168f60a2020-07-14 13:19:33 +10001189 if (ru.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001190 return write_number_as_cbor_u64(0x00, ru.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001191 }
1192 }
1193 }
1194
1195 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT) {
1196 wuffs_base__result_f64 rf = wuffs_base__parse_number_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001197 s, WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao168f60a2020-07-14 13:19:33 +10001198 if (rf.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001199 return write_number_as_cbor_f64(rf.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001200 }
1201 }
Nigel Tao51a38292020-07-19 22:43:17 +10001202 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_INF) {
1203 return write_dst("\xF9\xFC\x00", 3);
1204 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_INF) {
1205 return write_dst("\xF9\x7C\x00", 3);
1206 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_NAN) {
1207 return write_dst("\xF9\xFF\xFF", 3);
1208 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_NAN) {
1209 return write_dst("\xF9\x7F\xFF", 3);
Nigel Tao0a68f632020-07-28 10:39:32 +10001210 } else if ((vbd & cfp_fbbe_fifb) == cfp_fbbe_fifb) {
1211 return write_dst(s.ptr, s.len);
Nigel Tao168f60a2020-07-14 13:19:33 +10001212 }
1213
Nigel Tao4e193592020-07-15 12:48:57 +10001214fail:
Nigel Tao168f60a2020-07-14 13:19:33 +10001215 return "main: internal error: unexpected write_number argument";
1216}
1217
Nigel Tao4e193592020-07-15 12:48:57 +10001218const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001219write_inline_integer(uint64_t x, bool x_is_signed, wuffs_base__slice_u8 s) {
Nigel Tao983a74f2020-07-27 15:17:46 +10001220 bool is_key = in_dict_before_key();
1221 g_query.restart_and_match_unsigned_number(
1222 is_key && g_query.is_at(g_depth) && !x_is_signed, x);
1223
Nigel Tao4e193592020-07-15 12:48:57 +10001224 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001225 return write_dst(s.ptr, s.len);
Nigel Tao4e193592020-07-15 12:48:57 +10001226 }
1227
Nigel Tao983a74f2020-07-27 15:17:46 +10001228 if (is_key) {
1229 TRY(write_dst("\"", 1));
1230 }
1231
Nigel Taoc9d4e342020-07-21 15:20:34 +10001232 // Adding the two ETC__BYTE_LENGTH__ETC constants is overkill, but it's
1233 // simpler (for producing a constant-expression array size) than taking the
1234 // maximum of the two.
1235 uint8_t buf[WUFFS_BASE__I64__BYTE_LENGTH__MAX_INCL +
1236 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1237 wuffs_base__slice_u8 dst = wuffs_base__make_slice_u8(&buf[0], sizeof buf);
1238 size_t n =
1239 x_is_signed
1240 ? wuffs_base__render_number_i64(
1241 dst, (int64_t)x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS)
1242 : wuffs_base__render_number_u64(
1243 dst, x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao983a74f2020-07-27 15:17:46 +10001244 TRY(write_dst(&buf[0], n));
1245
1246 if (is_key) {
1247 TRY(write_dst("\"", 1));
1248 }
1249 return nullptr;
Nigel Tao4e193592020-07-15 12:48:57 +10001250}
1251
Nigel Tao168f60a2020-07-14 13:19:33 +10001252// ----
1253
Nigel Tao2914bae2020-02-26 09:40:30 +11001254uint8_t //
1255hex_digit(uint8_t nibble) {
Nigel Taob5461bd2020-02-21 14:13:37 +11001256 nibble &= 0x0F;
1257 if (nibble <= 9) {
1258 return '0' + nibble;
1259 }
1260 return ('A' - 10) + nibble;
1261}
1262
Nigel Tao2914bae2020-02-26 09:40:30 +11001263const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001264flush_cbor_output_string() {
1265 uint8_t prefix[3];
1266 prefix[0] = g_cbor_output_string_is_utf_8 ? 0x60 : 0x40;
1267 if (g_cbor_output_string_length < 0x18) {
1268 prefix[0] |= g_cbor_output_string_length;
1269 TRY(write_dst(&prefix[0], 1));
1270 } else if (g_cbor_output_string_length <= 0xFF) {
1271 prefix[0] |= 0x18;
1272 prefix[1] = g_cbor_output_string_length;
1273 TRY(write_dst(&prefix[0], 2));
1274 } else if (g_cbor_output_string_length <= 0xFFFF) {
1275 prefix[0] |= 0x19;
1276 prefix[1] = g_cbor_output_string_length >> 8;
1277 prefix[2] = g_cbor_output_string_length;
1278 TRY(write_dst(&prefix[0], 3));
1279 } else {
1280 return "main: internal error: CBOR string output is too long";
1281 }
1282
1283 size_t n = g_cbor_output_string_length;
1284 g_cbor_output_string_length = 0;
Nigel Taoea532452020-07-27 00:03:00 +10001285 return write_dst(&g_spool_array[0], n);
Nigel Tao168f60a2020-07-14 13:19:33 +10001286}
1287
1288const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001289write_cbor_output_string(wuffs_base__slice_u8 s, bool finish) {
Nigel Taoea532452020-07-27 00:03:00 +10001290 // Check that g_spool_array can hold any UTF-8 code point.
1291 if (SPOOL_ARRAY_SIZE < 4) {
1292 return "main: internal error: SPOOL_ARRAY_SIZE is too short";
Nigel Tao168f60a2020-07-14 13:19:33 +10001293 }
1294
Nigel Taoee6927f2020-07-27 12:08:33 +10001295 uint8_t* ptr = s.ptr;
1296 size_t len = s.len;
Nigel Tao168f60a2020-07-14 13:19:33 +10001297 while (len > 0) {
Nigel Taoea532452020-07-27 00:03:00 +10001298 size_t available = SPOOL_ARRAY_SIZE - g_cbor_output_string_length;
Nigel Tao168f60a2020-07-14 13:19:33 +10001299 if (available >= len) {
Nigel Taoea532452020-07-27 00:03:00 +10001300 memcpy(&g_spool_array[g_cbor_output_string_length], ptr, len);
Nigel Tao168f60a2020-07-14 13:19:33 +10001301 g_cbor_output_string_length += len;
1302 ptr += len;
1303 len = 0;
1304 break;
1305
1306 } else if (available > 0) {
1307 if (!g_cbor_output_string_is_multiple_chunks) {
1308 g_cbor_output_string_is_multiple_chunks = true;
1309 TRY(write_dst(g_cbor_output_string_is_utf_8 ? "\x7F" : "\x5F", 1));
Nigel Tao3b486982020-02-27 15:05:59 +11001310 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001311
1312 if (g_cbor_output_string_is_utf_8) {
1313 // Walk the end backwards to a UTF-8 boundary, so that each chunk of
1314 // the multi-chunk string is also valid UTF-8.
1315 while (available > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +10001316 wuffs_base__utf_8__next__output o =
1317 wuffs_base__utf_8__next_from_end(ptr, available);
Nigel Tao168f60a2020-07-14 13:19:33 +10001318 if ((o.code_point != WUFFS_BASE__UNICODE_REPLACEMENT_CHARACTER) ||
1319 (o.byte_length != 1)) {
1320 break;
1321 }
1322 available--;
1323 }
1324 }
1325
Nigel Taoea532452020-07-27 00:03:00 +10001326 memcpy(&g_spool_array[g_cbor_output_string_length], ptr, available);
Nigel Tao168f60a2020-07-14 13:19:33 +10001327 g_cbor_output_string_length += available;
1328 ptr += available;
1329 len -= available;
Nigel Tao3b486982020-02-27 15:05:59 +11001330 }
1331
Nigel Tao168f60a2020-07-14 13:19:33 +10001332 TRY(flush_cbor_output_string());
1333 }
Nigel Taob9ad34f2020-03-03 12:44:01 +11001334
Nigel Tao168f60a2020-07-14 13:19:33 +10001335 if (finish) {
1336 TRY(flush_cbor_output_string());
1337 if (g_cbor_output_string_is_multiple_chunks) {
1338 TRY(write_dst("\xFF", 1));
1339 }
1340 }
1341 return nullptr;
1342}
Nigel Taob9ad34f2020-03-03 12:44:01 +11001343
Nigel Tao168f60a2020-07-14 13:19:33 +10001344const char* //
Nigel Taoea532452020-07-27 00:03:00 +10001345flush_json_output_byte_string(bool finish) {
1346 uint8_t* ptr = &g_spool_array[0];
1347 size_t len = g_json_output_byte_string_length;
1348 while (len > 0) {
1349 wuffs_base__transform__output o = wuffs_base__base_64__encode(
Nigel Taod6a10df2020-07-27 11:47:47 +10001350 g_dst.writer_slice(), wuffs_base__make_slice_u8(ptr, len), finish,
Nigel Taoea532452020-07-27 00:03:00 +10001351 WUFFS_BASE__BASE_64__URL_ALPHABET);
1352 g_dst.meta.wi += o.num_dst;
1353 ptr += o.num_src;
1354 len -= o.num_src;
1355 if (o.status.repr == nullptr) {
1356 if (len != 0) {
1357 return "main: internal error: inconsistent spool length";
1358 }
1359 g_json_output_byte_string_length = 0;
1360 break;
1361 } else if (o.status.repr == wuffs_base__suspension__short_read) {
1362 memmove(&g_spool_array[0], ptr, len);
1363 g_json_output_byte_string_length = len;
1364 break;
1365 } else if (o.status.repr != wuffs_base__suspension__short_write) {
1366 return o.status.message();
1367 }
1368 TRY(flush_dst());
1369 }
1370 return nullptr;
1371}
1372
1373const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001374write_json_output_byte_string(wuffs_base__slice_u8 s, bool finish) {
Nigel Taoc96b31c2020-07-27 22:37:23 +10001375 // This function and flush_json_output_byte_string doesn't call write_dst.
1376 // Instead, they call wuffs_base__base_64__encode to write directly to g_dst.
1377 // It is therefore responsible for checking g_suppress_write_dst.
1378 if (g_suppress_write_dst > 0) {
1379 return nullptr;
1380 }
1381
Nigel Taoee6927f2020-07-27 12:08:33 +10001382 uint8_t* ptr = s.ptr;
1383 size_t len = s.len;
Nigel Taoea532452020-07-27 00:03:00 +10001384 while (len > 0) {
1385 size_t available = SPOOL_ARRAY_SIZE - g_json_output_byte_string_length;
1386 if (available >= len) {
1387 memcpy(&g_spool_array[g_json_output_byte_string_length], ptr, len);
1388 g_json_output_byte_string_length += len;
1389 ptr += len;
1390 len = 0;
1391 break;
1392
1393 } else if (available > 0) {
1394 memcpy(&g_spool_array[g_json_output_byte_string_length], ptr, available);
1395 g_json_output_byte_string_length += available;
1396 ptr += available;
1397 len -= available;
1398 }
1399
1400 TRY(flush_json_output_byte_string(false));
1401 }
1402
1403 if (finish) {
1404 TRY(flush_json_output_byte_string(true));
1405 }
1406 return nullptr;
1407}
1408
1409// ----
1410
1411const char* //
Nigel Tao7cb76542020-07-19 22:19:04 +10001412handle_unicode_code_point(uint32_t ucp) {
1413 if (g_flags.output_format == file_format::json) {
1414 if (ucp < 0x0020) {
1415 switch (ucp) {
1416 case '\b':
1417 return write_dst("\\b", 2);
1418 case '\f':
1419 return write_dst("\\f", 2);
1420 case '\n':
1421 return write_dst("\\n", 2);
1422 case '\r':
1423 return write_dst("\\r", 2);
1424 case '\t':
1425 return write_dst("\\t", 2);
1426 }
1427
1428 // Other bytes less than 0x0020 are valid UTF-8 but not valid in a
1429 // JSON string. They need to remain escaped.
1430 uint8_t esc6[6];
1431 esc6[0] = '\\';
1432 esc6[1] = 'u';
1433 esc6[2] = '0';
1434 esc6[3] = '0';
1435 esc6[4] = hex_digit(ucp >> 4);
1436 esc6[5] = hex_digit(ucp >> 0);
1437 return write_dst(&esc6[0], 6);
1438
1439 } else if (ucp == '\"') {
1440 return write_dst("\\\"", 2);
1441
1442 } else if (ucp == '\\') {
1443 return write_dst("\\\\", 2);
1444 }
1445 }
1446
1447 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1448 size_t n = wuffs_base__utf_8__encode(
1449 wuffs_base__make_slice_u8(&u[0],
1450 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1451 ucp);
1452 if (n == 0) {
1453 return "main: internal error: unexpected Unicode code point";
1454 }
1455
1456 if (g_flags.output_format == file_format::json) {
1457 return write_dst(&u[0], n);
1458 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001459 return write_cbor_output_string(wuffs_base__make_slice_u8(&u[0], n), false);
Nigel Tao7cb76542020-07-19 22:19:04 +10001460}
Nigel Taod191a3f2020-07-19 22:14:54 +10001461
1462const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001463write_json_output_text_string(wuffs_base__slice_u8 s) {
1464 uint8_t* ptr = s.ptr;
1465 size_t len = s.len;
Nigel Taod191a3f2020-07-19 22:14:54 +10001466restart:
1467 while (true) {
1468 size_t i;
1469 for (i = 0; i < len; i++) {
1470 uint8_t c = ptr[i];
1471 if ((c == '"') || (c == '\\') || (c < 0x20)) {
1472 TRY(write_dst(ptr, i));
1473 TRY(handle_unicode_code_point(c));
1474 ptr += i + 1;
1475 len -= i + 1;
1476 goto restart;
1477 }
1478 }
1479 TRY(write_dst(ptr, len));
1480 break;
1481 }
1482 return nullptr;
1483}
1484
1485const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001486handle_string(uint64_t vbd,
Nigel Taoee6927f2020-07-27 12:08:33 +10001487 wuffs_base__slice_u8 s,
Nigel Tao168f60a2020-07-14 13:19:33 +10001488 bool start_of_token_chain,
1489 bool continued) {
1490 if (start_of_token_chain) {
1491 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +10001492 if (g_flags.output_cbor_metadata_as_json_comments &&
1493 !(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
Nigel Taoea532452020-07-27 00:03:00 +10001494 TRY(write_dst("/*cbor:base64url*/\"", 19));
1495 g_json_output_byte_string_length = 0;
Nigel Tao3c8589b2020-07-19 21:49:00 +10001496 } else {
1497 TRY(write_dst("\"", 1));
1498 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001499 } else {
1500 g_cbor_output_string_length = 0;
1501 g_cbor_output_string_is_multiple_chunks = false;
1502 g_cbor_output_string_is_utf_8 =
1503 vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8;
1504 }
1505 g_query.restart_fragment(in_dict_before_key() && g_query.is_at(g_depth));
1506 }
1507
1508 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_0_DST_1_SRC_DROP) {
1509 // No-op.
1510 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001511 if (g_flags.output_format == file_format::json) {
Nigel Taoaf757722020-07-18 17:27:11 +10001512 if (g_flags.input_format == file_format::json) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001513 TRY(write_dst(s.ptr, s.len));
Nigel Taoaf757722020-07-18 17:27:11 +10001514 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001515 TRY(write_json_output_text_string(s));
Nigel Taoaf757722020-07-18 17:27:11 +10001516 } else {
Nigel Taoee6927f2020-07-27 12:08:33 +10001517 TRY(write_json_output_byte_string(s, false));
Nigel Taoaf757722020-07-18 17:27:11 +10001518 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001519 } else {
Nigel Taoee6927f2020-07-27 12:08:33 +10001520 TRY(write_cbor_output_string(s, false));
Nigel Tao168f60a2020-07-14 13:19:33 +10001521 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001522 g_query.incremental_match_slice(s.ptr, s.len);
Nigel Taob9ad34f2020-03-03 12:44:01 +11001523 } else {
Nigel Tao168f60a2020-07-14 13:19:33 +10001524 return "main: internal error: unexpected string-token conversion";
1525 }
1526
1527 if (continued) {
1528 return nullptr;
1529 }
1530
1531 if (g_flags.output_format == file_format::json) {
Nigel Taoea532452020-07-27 00:03:00 +10001532 if (!(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001533 TRY(write_json_output_byte_string(wuffs_base__empty_slice_u8(), true));
Nigel Taoea532452020-07-27 00:03:00 +10001534 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001535 TRY(write_dst("\"", 1));
1536 } else {
Nigel Taoee6927f2020-07-27 12:08:33 +10001537 TRY(write_cbor_output_string(wuffs_base__empty_slice_u8(), true));
Nigel Tao168f60a2020-07-14 13:19:33 +10001538 }
1539 return nullptr;
1540}
1541
Nigel Taod191a3f2020-07-19 22:14:54 +10001542// ----
1543
Nigel Tao3b486982020-02-27 15:05:59 +11001544const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001545handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001546 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001547 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001548 uint64_t vbd = t.value_base_detail();
Nigel Taoee6927f2020-07-27 12:08:33 +10001549 uint64_t token_length = t.length();
1550 wuffs_base__slice_u8 tok = wuffs_base__make_slice_u8(
1551 g_src.data.ptr + g_curr_token_end_src_index - token_length,
1552 token_length);
Nigel Tao1b073492020-02-16 22:11:36 +11001553
1554 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001555 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001556 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001557 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001558 return "main: no match for query";
1559 }
Nigel Taod60815c2020-03-26 14:32:35 +11001560 if (g_depth <= 0) {
1561 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001562 }
Nigel Taod60815c2020-03-26 14:32:35 +11001563 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001564
Nigel Taod60815c2020-03-26 14:32:35 +11001565 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1566 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001567 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao168f60a2020-07-14 13:19:33 +10001568 if (g_flags.output_format == file_format::json) {
1569 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1570 ? "\"[…]\""
1571 : "\"{…}\"",
1572 7));
1573 } else {
1574 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1575 ? "\x65[…]"
1576 : "\x65{…}",
1577 6));
1578 }
1579 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001580 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001581 if ((g_ctx != context::in_list_after_bracket) &&
1582 (g_ctx != context::in_dict_after_brace) &&
1583 !g_flags.compact_output) {
Nigel Taoc766bb72020-07-09 12:59:32 +10001584 if (g_flags.output_json_extra_comma) {
1585 TRY(write_dst(",\n", 2));
1586 } else {
1587 TRY(write_dst("\n", 1));
1588 }
Nigel Taod60815c2020-03-26 14:32:35 +11001589 for (uint32_t i = 0; i < g_depth; i++) {
1590 TRY(write_dst(
1591 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001592 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001593 }
Nigel Tao1b073492020-02-16 22:11:36 +11001594 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001595
1596 TRY(write_dst(
1597 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1598 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001599 } else {
1600 TRY(write_dst("\xFF", 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001601 }
1602
Nigel Taod60815c2020-03-26 14:32:35 +11001603 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1604 ? context::in_list_after_value
1605 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001606 goto after_value;
1607 }
1608
Nigel Taod1c928a2020-02-28 12:43:53 +11001609 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
Nigel Tao77c75512020-07-27 21:35:11 +10001610 // continuation of a multi-token chain or a CBOR tagged data item.
1611 if (g_previous_token_was_cbor_tag) {
1612 g_previous_token_was_cbor_tag = false;
1613 } else if (start_of_token_chain) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001614 if (g_flags.output_format != file_format::json) {
1615 // No-op.
1616 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001617 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1618 } else if (g_ctx != context::none) {
Nigel Taof8dfc762020-07-23 23:35:44 +10001619 if ((g_ctx == context::in_dict_after_brace) ||
1620 (g_ctx == context::in_dict_after_value)) {
Nigel Tao983a74f2020-07-27 15:17:46 +10001621 // Reject dict keys that aren't UTF-8 strings or non-negative
1622 // integers, which could otherwise happen with -i=cbor -o=json.
1623 if (vbc == WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED) {
1624 // No-op.
1625 } else if ((vbc == WUFFS_BASE__TOKEN__VBC__STRING) &&
1626 (vbd &
1627 WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
1628 // No-op.
1629 } else {
Nigel Taof8dfc762020-07-23 23:35:44 +10001630 return "main: cannot convert CBOR non-text-string to JSON map key";
1631 }
1632 }
1633 if ((g_ctx == context::in_list_after_value) ||
1634 (g_ctx == context::in_dict_after_value)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001635 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001636 }
Nigel Taod60815c2020-03-26 14:32:35 +11001637 if (!g_flags.compact_output) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001638 TRY(write_dst("\n", 1));
Nigel Taod60815c2020-03-26 14:32:35 +11001639 for (size_t i = 0; i < g_depth; i++) {
1640 TRY(write_dst(
1641 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001642 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao0cd2f982020-03-03 23:03:02 +11001643 }
1644 }
1645 }
1646
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001647 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001648 if (g_query.is_at(g_depth)) {
1649 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001650 case context::in_list_after_bracket:
1651 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001652 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001653 break;
1654 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001655 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001656 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001657 default:
1658 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001659 }
1660 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001661 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001662 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001663 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001664 // There is no next fragment. We have matched the complete query, and
1665 // the upcoming JSON value is the result of that query.
1666 //
Nigel Taod60815c2020-03-26 14:32:35 +11001667 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1668 // we were about to decode a top-level value. This makes any subsequent
1669 // indentation be relative to this point, and we will return g_eod
1670 // after the upcoming JSON value is complete.
1671 if (g_suppress_write_dst != 1) {
1672 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001673 }
Nigel Taod60815c2020-03-26 14:32:35 +11001674 g_suppress_write_dst = 0;
1675 g_ctx = context::none;
1676 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001677 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1678 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1679 // The query has moved on to the next fragment but the upcoming JSON
1680 // value is not a container.
1681 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001682 }
1683 }
1684
1685 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001686 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001687 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001688 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001689 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1690 g_suppress_write_dst++;
Nigel Tao168f60a2020-07-14 13:19:33 +10001691 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001692 TRY(write_dst(
1693 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1694 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001695 } else {
1696 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1697 ? "\x9F"
1698 : "\xBF",
1699 1));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001700 }
Nigel Taod60815c2020-03-26 14:32:35 +11001701 g_depth++;
1702 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1703 ? context::in_list_after_bracket
1704 : context::in_dict_after_brace;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001705 return nullptr;
1706
Nigel Tao2cf76db2020-02-27 22:42:01 +11001707 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Taoee6927f2020-07-27 12:08:33 +10001708 TRY(handle_string(vbd, tok, start_of_token_chain, t.continued()));
Nigel Tao496e88b2020-04-09 22:10:08 +10001709 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001710 return nullptr;
1711 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001712 goto after_value;
1713
1714 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001715 if (!t.continued()) {
1716 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001717 }
1718 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001719 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001720 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001721
Nigel Tao85fba7f2020-02-29 16:28:06 +11001722 case WUFFS_BASE__TOKEN__VBC__LITERAL:
Nigel Tao168f60a2020-07-14 13:19:33 +10001723 TRY(write_literal(vbd));
1724 goto after_value;
1725
Nigel Tao2cf76db2020-02-27 22:42:01 +11001726 case WUFFS_BASE__TOKEN__VBC__NUMBER:
Nigel Taoee6927f2020-07-27 12:08:33 +10001727 TRY(write_number(vbd, tok));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001728 goto after_value;
Nigel Tao4e193592020-07-15 12:48:57 +10001729
Nigel Taoc9d4e342020-07-21 15:20:34 +10001730 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1731 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED: {
1732 bool x_is_signed = vbc == WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED;
1733 uint64_t x = x_is_signed
1734 ? ((uint64_t)(t.value_base_detail__sign_extended()))
1735 : vbd;
Nigel Tao850dc182020-07-21 22:52:04 +10001736 if (t.continued()) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001737 if (tok.len != 0) {
Nigel Tao03a87ea2020-07-21 23:29:26 +10001738 return "main: internal error: unexpected to-be-extended length";
1739 }
Nigel Tao850dc182020-07-21 22:52:04 +10001740 g_token_extension.category = vbc;
1741 g_token_extension.detail = x;
1742 return nullptr;
1743 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001744 TRY(write_inline_integer(x, x_is_signed, tok));
Nigel Tao4e193592020-07-15 12:48:57 +10001745 goto after_value;
Nigel Taoc9d4e342020-07-21 15:20:34 +10001746 }
Nigel Tao1b073492020-02-16 22:11:36 +11001747 }
1748
Nigel Tao850dc182020-07-21 22:52:04 +10001749 int64_t ext = t.value_extension();
1750 if (ext >= 0) {
Nigel Tao27168032020-07-24 13:05:05 +10001751 uint64_t x = (g_token_extension.detail
1752 << WUFFS_BASE__TOKEN__VALUE_EXTENSION__NUM_BITS) |
1753 ((uint64_t)ext);
Nigel Tao850dc182020-07-21 22:52:04 +10001754 switch (g_token_extension.category) {
1755 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1756 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED:
Nigel Tao850dc182020-07-21 22:52:04 +10001757 TRY(write_inline_integer(
1758 x,
1759 g_token_extension.category ==
1760 WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED,
Nigel Taoee6927f2020-07-27 12:08:33 +10001761 tok));
Nigel Tao850dc182020-07-21 22:52:04 +10001762 g_token_extension.category = 0;
1763 g_token_extension.detail = 0;
1764 goto after_value;
Nigel Tao27168032020-07-24 13:05:05 +10001765 case CATEGORY_CBOR_TAG:
Nigel Tao77c75512020-07-27 21:35:11 +10001766 g_previous_token_was_cbor_tag = true;
Nigel Taoee6927f2020-07-27 12:08:33 +10001767 TRY(write_cbor_tag(x, tok));
Nigel Tao27168032020-07-24 13:05:05 +10001768 g_token_extension.category = 0;
1769 g_token_extension.detail = 0;
1770 return nullptr;
Nigel Tao850dc182020-07-21 22:52:04 +10001771 }
1772 }
1773
Nigel Tao664f8432020-07-16 21:25:14 +10001774 if (t.value_major() == WUFFS_CBOR__TOKEN_VALUE_MAJOR) {
1775 uint64_t value_minor = t.value_minor();
Nigel Taoc9e20102020-07-24 23:19:12 +10001776 if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__MINUS_1_MINUS_X) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001777 TRY(write_cbor_minus_1_minus_x(tok));
Nigel Taoc9e20102020-07-24 23:19:12 +10001778 goto after_value;
1779 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__SIMPLE_VALUE) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001780 TRY(write_cbor_simple_value(vbd, tok));
Nigel Taoc9e20102020-07-24 23:19:12 +10001781 goto after_value;
1782 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__TAG) {
Nigel Tao77c75512020-07-27 21:35:11 +10001783 g_previous_token_was_cbor_tag = true;
Nigel Tao27168032020-07-24 13:05:05 +10001784 if (t.continued()) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001785 if (tok.len != 0) {
Nigel Tao27168032020-07-24 13:05:05 +10001786 return "main: internal error: unexpected to-be-extended length";
1787 }
1788 g_token_extension.category = CATEGORY_CBOR_TAG;
1789 g_token_extension.detail = vbd;
1790 return nullptr;
1791 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001792 return write_cbor_tag(vbd, tok);
Nigel Tao664f8432020-07-16 21:25:14 +10001793 }
1794 }
1795
1796 // Return an error if we didn't match the (value_major, value_minor) or
1797 // (vbc, vbd) pair.
Nigel Tao2cf76db2020-02-27 22:42:01 +11001798 return "main: internal error: unexpected token";
1799 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001800
Nigel Tao2cf76db2020-02-27 22:42:01 +11001801 // Book-keeping after completing a value (whether a container value or a
1802 // simple value). Empty parent containers are no longer empty. If the parent
1803 // container is a "{...}" object, toggle between keys and values.
1804after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001805 if (g_depth == 0) {
1806 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001807 }
Nigel Taod60815c2020-03-26 14:32:35 +11001808 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001809 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001810 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001811 break;
1812 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001813 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001814 break;
1815 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001816 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001817 break;
1818 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001819 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001820 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001821 default:
1822 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001823 }
1824 return nullptr;
1825}
1826
1827const char* //
1828main1(int argc, char** argv) {
1829 TRY(initialize_globals(argc, argv));
1830
Nigel Taocd183f92020-07-14 12:11:05 +10001831 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001832 while (true) {
Nigel Tao4e193592020-07-15 12:48:57 +10001833 wuffs_base__status status = g_dec->decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001834 &g_tok, &g_src,
1835 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001836
Nigel Taod60815c2020-03-26 14:32:35 +11001837 while (g_tok.meta.ri < g_tok.meta.wi) {
1838 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao2cf76db2020-02-27 22:42:01 +11001839 uint64_t n = t.length();
Nigel Taod60815c2020-03-26 14:32:35 +11001840 if ((g_src.meta.ri - g_curr_token_end_src_index) < n) {
1841 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001842 }
Nigel Taod60815c2020-03-26 14:32:35 +11001843 g_curr_token_end_src_index += n;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001844
Nigel Taod0b16cb2020-03-14 10:15:54 +11001845 // Skip filler tokens (e.g. whitespace).
Nigel Tao3c8589b2020-07-19 21:49:00 +10001846 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao496e88b2020-04-09 22:10:08 +10001847 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001848 continue;
1849 }
1850
Nigel Tao2ef39992020-04-09 17:24:39 +10001851 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao496e88b2020-04-09 22:10:08 +10001852 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001853 if (z == nullptr) {
1854 continue;
Nigel Taod60815c2020-03-26 14:32:35 +11001855 } else if (z == g_eod) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001856 goto end_of_data;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001857 }
1858 return z;
Nigel Tao1b073492020-02-16 22:11:36 +11001859 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001860
1861 if (status.repr == nullptr) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001862 return "main: internal error: unexpected end of token stream";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001863 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Taod60815c2020-03-26 14:32:35 +11001864 if (g_curr_token_end_src_index != g_src.meta.ri) {
1865 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001866 }
1867 TRY(read_src());
Nigel Taod60815c2020-03-26 14:32:35 +11001868 g_curr_token_end_src_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001869 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001870 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001871 } else {
1872 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001873 }
1874 }
Nigel Tao0cd2f982020-03-03 23:03:02 +11001875end_of_data:
1876
Nigel Taod60815c2020-03-26 14:32:35 +11001877 // With a non-empty g_query, don't try to consume trailing whitespace or
Nigel Tao0cd2f982020-03-03 23:03:02 +11001878 // confirm that we've processed all the tokens.
Nigel Taod60815c2020-03-26 14:32:35 +11001879 if (g_flags.query_c_string && *g_flags.query_c_string) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001880 return nullptr;
1881 }
Nigel Tao6b161af2020-02-24 11:01:48 +11001882
Nigel Tao6b161af2020-02-24 11:01:48 +11001883 // Check that we've exhausted the input.
Nigel Taod60815c2020-03-26 14:32:35 +11001884 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001885 TRY(read_src());
1886 }
Nigel Taod60815c2020-03-26 14:32:35 +11001887 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
Nigel Tao51a38292020-07-19 22:43:17 +10001888 return "main: valid JSON|CBOR followed by further (unexpected) data";
Nigel Tao6b161af2020-02-24 11:01:48 +11001889 }
1890
1891 // Check that we've used all of the decoded tokens, other than trailing
Nigel Tao4b186b02020-03-18 14:25:21 +11001892 // filler tokens. For example, "true\n" is valid JSON (and fully consumed
1893 // with WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE enabled) with a trailing
1894 // filler token for the "\n".
Nigel Taod60815c2020-03-26 14:32:35 +11001895 for (; g_tok.meta.ri < g_tok.meta.wi; g_tok.meta.ri++) {
1896 if (g_tok.data.ptr[g_tok.meta.ri].value_base_category() !=
Nigel Tao6b161af2020-02-24 11:01:48 +11001897 WUFFS_BASE__TOKEN__VBC__FILLER) {
1898 return "main: internal error: decoded OK but unprocessed tokens remain";
1899 }
1900 }
1901
1902 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001903}
1904
Nigel Tao2914bae2020-02-26 09:40:30 +11001905int //
1906compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001907 if (!status_msg) {
1908 return 0;
1909 }
Nigel Tao01abc842020-03-06 21:42:33 +11001910 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001911 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001912 n = strlen(status_msg);
1913 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001914 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001915 if (n >= 2047) {
1916 status_msg = "main: internal error: error message is too long";
1917 n = strnlen(status_msg, 2047);
1918 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001919 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001920 const int stderr_fd = 2;
1921 ignore_return_value(write(stderr_fd, status_msg, n));
1922 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Tao9cc2c252020-02-23 17:05:49 +11001923 // Return an exit code of 1 for regular (forseen) errors, e.g. badly
1924 // formatted or unsupported input.
1925 //
1926 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1927 // run-time checks found that an internal invariant did not hold.
1928 //
1929 // Automated testing, including badly formatted inputs, can therefore
1930 // discriminate between expected failure (exit code 1) and unexpected failure
1931 // (other non-zero exit codes). Specifically, exit code 2 for internal
1932 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1933 // linux) for a segmentation fault (e.g. null pointer dereference).
1934 return strstr(status_msg, "internal error:") ? 2 : 1;
1935}
1936
Nigel Tao2914bae2020-02-26 09:40:30 +11001937int //
1938main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001939 // Look for an input filename (the first non-flag argument) in argv. If there
1940 // is one, open it (but do not read from it) before we self-impose a sandbox.
1941 //
1942 // Flags start with "-", unless it comes after a bare "--" arg.
1943 {
1944 bool dash_dash = false;
1945 int a;
1946 for (a = 1; a < argc; a++) {
1947 char* arg = argv[a];
1948 if ((arg[0] == '-') && !dash_dash) {
1949 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1950 continue;
1951 }
Nigel Taod60815c2020-03-26 14:32:35 +11001952 g_input_file_descriptor = open(arg, O_RDONLY);
1953 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001954 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1955 return 1;
1956 }
1957 break;
1958 }
1959 }
1960
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001961#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1962 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001963 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001964#endif
1965
Nigel Tao0cd2f982020-03-03 23:03:02 +11001966 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001967 if (g_wrote_to_dst) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001968 const char* z1 = (g_flags.output_format == file_format::json)
1969 ? write_dst("\n", 1)
1970 : nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001971 const char* z2 = flush_dst();
1972 z = z ? z : (z1 ? z1 : z2);
1973 }
1974 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001975
1976#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1977 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1978 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1979 // only SYS_exit.
1980 syscall(SYS_exit, exit_code);
1981#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001982 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001983}