blob: 7641b053927ce7bb95708908185becc75c1275a4 [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
17/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110018jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao168f60a2020-07-14 13:19:33 +100019(RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin and writes CBOR
20or canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110021
Nigel Taod60815c2020-03-26 14:32:35 +110022See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110023
24----
25
26JSON Pointer (and this program's implementation) is one of many JSON query
27languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
28simple and fewer-featured compared to those others.
29
Nigel Tao168f60a2020-07-14 13:19:33 +100030One benefit of simplicity is that this program's CBOR, JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110031implementations do not dynamically allocate or free memory (yet it does not
32require that the entire input fits in memory at once). They are therefore
33trivially protected against certain bug classes: memory leaks, double-frees and
34use-after-frees.
35
Nigel Tao168f60a2020-07-14 13:19:33 +100036The CBOR and JSON implementations are also written in the Wuffs programming
37language (and then transpiled to C/C++), which is memory-safe (e.g. array
38indexing is bounds-checked) but also prevents integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110039
Nigel Taofe0cbbd2020-03-05 22:01:30 +110040For defense in depth, on Linux, this program also self-imposes a
41SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
42or writing its output. Under this sandbox, the only permitted system calls are
43read, write, exit and sigreturn.
44
Nigel Tao168f60a2020-07-14 13:19:33 +100045All together, this program aims to safely handle untrusted CBOR or JSON files
46without fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110047
48----
Nigel Tao1b073492020-02-16 22:11:36 +110049
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110050As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
51JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
52"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
53was first published on 2016-10-26 and updated on 2018-03-30.
54
Nigel Tao0cd2f982020-03-03 23:03:02 +110055After modifying this program, run "build-example.sh example/jsonptr/" and then
56"script/run-json-test-suite.sh" to catch correctness regressions.
57
58----
59
Nigel Taod0b16cb2020-03-14 10:15:54 +110060This program uses Wuffs' JSON decoder at a relatively low level, processing the
61decoder's token-stream output individually. The core loop, in pseudo-code, is
62"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110063changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110064output text based on that state and the token's source text. Notably,
65handle_token is not recursive, even though JSON values can nest.
66
67This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
68string, object) comprises one or more JSON tokens.
69
70An alternative, higher-level approach is in the sibling example/jsonfindptrs
71program. Neither approach is better or worse per se, but when studying this
72program, be aware that there are multiple ways to use Wuffs' JSON decoder.
73
74The two programs, jsonfindptrs and jsonptr, also demonstrate different
75trade-offs with regard to JSON object duplicate keys. The JSON spec permits
76different implementations to allow or reject duplicate keys. It is not always
77clear which approach is safer. Rejecting them is certainly unambiguous, and
78security bugs can lurk in ambiguous corners of a file format, if two different
79implementations both silently accept a file but differ on how to interpret it.
80On the other hand, in the worst case, detecting duplicate keys requires O(N)
81memory, where N is the size of the (potentially untrusted) input.
82
83This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
84mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
85it runs in a SECCOMP_MODE_STRICT sandbox.
86
87----
88
Nigel Tao1b073492020-02-16 22:11:36 +110089This example program differs from most other example Wuffs programs in that it
90is written in C++, not C.
91
92$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
93
94for a C++ compiler $CXX, such as clang++ or g++.
95*/
96
Nigel Tao721190a2020-04-03 22:25:21 +110097#if defined(__cplusplus) && (__cplusplus < 201103L)
98#error "This C++ program requires -std=c++11 or later"
99#endif
100
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100101#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100102#include <fcntl.h>
103#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100104#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100105#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100106
107// Wuffs ships as a "single file C library" or "header file library" as per
108// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
109//
110// To use that single file as a "foo.c"-like implementation, instead of a
111// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
112// compiling it.
113#define WUFFS_IMPLEMENTATION
114
115// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
116// release/c/etc.c whitelist which parts of Wuffs to build. That file contains
117// the entire Wuffs standard library, implementing a variety of codecs and file
118// formats. Without this macro definition, an optimizing compiler or linker may
119// very well discard Wuffs code for unused codecs, but listing the Wuffs
120// modules we use makes that process explicit. Preprocessing means that such
121// code simply isn't compiled.
122#define WUFFS_CONFIG__MODULES
123#define WUFFS_CONFIG__MODULE__BASE
Nigel Tao4e193592020-07-15 12:48:57 +1000124#define WUFFS_CONFIG__MODULE__CBOR
Nigel Tao1b073492020-02-16 22:11:36 +1100125#define WUFFS_CONFIG__MODULE__JSON
126
127// If building this program in an environment that doesn't easily accommodate
128// relative includes, you can use the script/inline-c-relative-includes.go
129// program to generate a stand-alone C++ file.
130#include "../../release/c/wuffs-unsupported-snapshot.c"
131
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100132#if defined(__linux__)
133#include <linux/prctl.h>
134#include <linux/seccomp.h>
135#include <sys/prctl.h>
136#include <sys/syscall.h>
137#define WUFFS_EXAMPLE_USE_SECCOMP
138#endif
139
Nigel Tao2cf76db2020-02-27 22:42:01 +1100140#define TRY(error_msg) \
141 do { \
142 const char* z = error_msg; \
143 if (z) { \
144 return z; \
145 } \
146 } while (false)
147
Nigel Taod60815c2020-03-26 14:32:35 +1100148static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100149
Nigel Taod60815c2020-03-26 14:32:35 +1100150static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100151 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100152 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100153 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100154 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100155 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao4e193592020-07-15 12:48:57 +1000156 " -i=FMT -input-format={json,cbor}\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000157 " -o=FMT -output-format={json,cbor}\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100158 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000159 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100160 " -t -tabs\n"
161 " -fail-if-unsandboxed\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000162 " -input-allow-json-comments\n"
163 " -input-allow-json-extra-comma\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000164 " -input-allow-json-inf-nan-numbers\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000165 " -output-cbor-metadata-as-json-comments\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000166 " -output-json-extra-comma\n"
Nigel Taodd114692020-07-25 21:54:12 +1000167 " -output-json-inf-nan-numbers\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000168 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100169 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100170 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100171 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100172 "----\n"
173 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100174 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000175 "Pointer (RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin\n"
176 "and writes CBOR or canonicalized, formatted UTF-8 JSON to stdout. The\n"
177 "input and output formats do not have to match, but conversion between\n"
178 "formats may be lossy.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100179 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000180 "Canonicalized JSON means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-\n"
181 "written as \"abc\\n\\txÅ·z\". It does not sort object keys or reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100182 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100183 "\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000184 "CBOR output is non-canonical (in the RFC 7049 Section 3.9 sense), as\n"
185 "sorting map keys and measuring indefinite-length containers requires\n"
186 "O(input_length) memory but this program runs in O(1) memory.\n"
187 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100188 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000189 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000190 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags. Those\n"
191 "flags only apply to JSON (not CBOR) output.\n"
192 "\n"
193 "The -input-format and -output-format flags select between reading and\n"
194 "writing JSON (the default, a textual format) or CBOR (a binary format).\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100195 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000196 "The -input-allow-json-comments flag allows \"/*slash-star*/\" and\n"
197 "\"//slash-slash\" C-style comments within JSON input.\n"
198 "\n"
199 "The -input-allow-json-extra-comma flag allows input like \"[1,2,]\",\n"
200 "with a comma after the final element of a JSON list or dictionary.\n"
201 "\n"
Nigel Tao51a38292020-07-19 22:43:17 +1000202 "The -input-allow-json-inf-nan-numbers flag allows non-finite floating\n"
203 "point numbers (infinities and not-a-numbers) within JSON input.\n"
204 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000205 "The -output-cbor-metadata-as-json-comments writes CBOR tags and other\n"
206 "metadata as /*comments*/, when -i=json and -o=cbor are also set. Such\n"
207 "comments are non-compliant with the JSON specification but many parsers\n"
208 "accept them.\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000209 "\n"
210 "The -output-json-extra-comma flag writes extra commas, regardless of\n"
Nigel Taodd114692020-07-25 21:54:12 +1000211 "whether the input had it. Such commas are non-compliant with the JSON\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000212 "specification but many parsers accept them and they can produce simpler\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000213 "line-based diffs. This flag is ignored when -compact-output is set.\n"
214 "\n"
Nigel Taodd114692020-07-25 21:54:12 +1000215 "The -output-json-inf-nan-numbers flag writes Inf and NaN instead of a\n"
216 "substitute null value, when converting from -i=cbor to -o=json. Such\n"
217 "values are non-compliant with the JSON specification but many parsers\n"
218 "accept them.\n"
219 "\n"
Nigel Tao983a74f2020-07-27 15:17:46 +1000220 "CBOR is more permissive about map keys but JSON only allows strings.\n"
221 "When converting from -i=cbor to -o=json, this program rejects keys other\n"
222 "than text strings and non-negative integers (CBOR major types 3 and 0).\n"
223 "Integer keys like 123 quoted to be string keys like \"123\". Being even\n"
224 "more permissive would have complicated interactions with the -query=STR\n"
225 "flag and streaming input, so this program just rejects other keys.\n"
Nigel Taof8dfc762020-07-23 23:35:44 +1000226 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100227 "----\n"
228 "\n"
229 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100230 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100231 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
232 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100233 "will print:\n"
234 " \"baz\"\n"
235 "\n"
236 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100237 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100238 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
239 "child (the value in a key-value pair) of the root whose key is the empty\n"
240 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100241 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000242 "If the query found a valid JSON|CBOR value, this program will return a\n"
243 "zero exit code even if the rest of the input isn't valid. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100244 "did not find a value, or found an invalid one, this program returns a\n"
245 "non-zero exit code, but may still print partial output to stdout.\n"
246 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000247 "The JSON and CBOR specifications (https://json.org/ or RFC 8259; RFC\n"
248 "7049) permit implementations to allow duplicate keys, as this one does.\n"
249 "This JSON Pointer implementation is also greedy, following the first\n"
250 "match for each fragment without back-tracking. For example, the\n"
251 "\"/foo/bar\" query will fail if the root object has multiple \"foo\"\n"
252 "children but the first one doesn't have a \"bar\" child, even if later\n"
253 "ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100254 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000255 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
256 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
257 "\"~\" and \"/\". Without this flag, this program also lets \"~n\" and\n"
258 "\"~r\" escape the New Line and Carriage Return ASCII control characters,\n"
259 "which can work better with line oriented Unix tools that assume exactly\n"
260 "one value (i.e. one JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100261 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100262 "----\n"
263 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100264 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000265 "output depth. JSON|CBOR containers ([] arrays and {} objects) can hold\n"
266 "other containers. When this flag is set, containers at depth NUM are\n"
267 "replaced with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is\n"
268 "equivalent to -d=1. The flag's absence means an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100269 "\n"
270 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000271 "affect whether or not the input is considered valid JSON|CBOR. The\n"
272 "format specifications permit implementations to set their own maximum\n"
273 "input depth. This JSON|CBOR implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100274 "\n"
275 "Depth is measured in terms of nested containers. It is unaffected by the\n"
276 "number of spaces or tabs used to indent.\n"
277 "\n"
278 "When both -max-output-depth and -query are set, the output depth is\n"
279 "measured from when the query resolves, not from the input root. The\n"
280 "input depth (measured from the root) is still limited to 1024.\n"
281 "\n"
282 "----\n"
283 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100284 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
285 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100286 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100287
Nigel Tao2cf76db2020-02-27 22:42:01 +1100288// ----
289
Nigel Taof3146c22020-03-26 08:47:42 +1100290// Wuffs allows either statically or dynamically allocated work buffers. This
291// program exercises static allocation.
292#define WORK_BUFFER_ARRAY_SIZE \
293 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
294#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100295uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100296#else
297// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100298uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100299#endif
300
Nigel Taod60815c2020-03-26 14:32:35 +1100301bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100302
Nigel Taod60815c2020-03-26 14:32:35 +1100303int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100304
Nigel Tao2cf76db2020-02-27 22:42:01 +1100305#define MAX_INDENT 8
Nigel Tao107f0ef2020-03-01 21:35:02 +1100306#define INDENT_SPACES_STRING " "
Nigel Tao6e7d1412020-03-06 09:21:35 +1100307#define INDENT_TAB_STRING "\t"
Nigel Tao107f0ef2020-03-01 21:35:02 +1100308
Nigel Taofdac24a2020-03-06 21:53:08 +1100309#ifndef DST_BUFFER_ARRAY_SIZE
310#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100311#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100312#ifndef SRC_BUFFER_ARRAY_SIZE
313#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100314#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100315#ifndef TOKEN_BUFFER_ARRAY_SIZE
316#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100317#endif
318
Nigel Taod60815c2020-03-26 14:32:35 +1100319uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
320uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
321wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100322
Nigel Taod60815c2020-03-26 14:32:35 +1100323wuffs_base__io_buffer g_dst;
324wuffs_base__io_buffer g_src;
325wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100326
Nigel Taod60815c2020-03-26 14:32:35 +1100327// g_curr_token_end_src_index is the g_src.data.ptr index of the end of the
328// current token. An invariant is that (g_curr_token_end_src_index <=
329// g_src.meta.ri).
330size_t g_curr_token_end_src_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100331
Nigel Tao27168032020-07-24 13:05:05 +1000332// Valid token's VBCs range in 0 ..= 15. Values over that are for tokens from
333// outside of the base package, such as the CBOR package.
334#define CATEGORY_CBOR_TAG 16
335
Nigel Tao850dc182020-07-21 22:52:04 +1000336struct {
337 uint64_t category;
338 uint64_t detail;
339} g_token_extension;
340
Nigel Taod60815c2020-03-26 14:32:35 +1100341uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100342
343enum class context {
344 none,
345 in_list_after_bracket,
346 in_list_after_value,
347 in_dict_after_brace,
348 in_dict_after_key,
349 in_dict_after_value,
Nigel Taod60815c2020-03-26 14:32:35 +1100350} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100351
Nigel Tao0cd2f982020-03-03 23:03:02 +1100352bool //
353in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100354 return (g_ctx == context::in_dict_after_brace) ||
355 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100356}
357
Nigel Taod60815c2020-03-26 14:32:35 +1100358uint32_t g_suppress_write_dst;
359bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100360
Nigel Tao4e193592020-07-15 12:48:57 +1000361wuffs_cbor__decoder g_cbor_decoder;
362wuffs_json__decoder g_json_decoder;
363wuffs_base__token_decoder* g_dec;
Nigel Tao1b073492020-02-16 22:11:36 +1100364
Nigel Taoea532452020-07-27 00:03:00 +1000365// g_spool_array is a 4 KiB buffer.
Nigel Tao168f60a2020-07-14 13:19:33 +1000366//
Nigel Taoea532452020-07-27 00:03:00 +1000367// For -o=cbor, strings up to SPOOL_ARRAY_SIZE long are written as a single
368// definite-length string. Longer strings are written as an indefinite-length
369// string containing multiple definite-length chunks, each of length up to
370// SPOOL_ARRAY_SIZE. See RFC 7049 section 2.2.2 "Indefinite-Length Byte Strings
371// and Text Strings". Byte strings and text strings are spooled prior to this
372// chunking, so that the output is determinate even when the input is streamed.
373//
374// For -o=json, CBOR byte strings are spooled prior to base64url encoding,
375// which map multiples of 3 source bytes to 4 destination bytes.
376//
377// If raising SPOOL_ARRAY_SIZE above 0xFFFF then you will also have to update
378// flush_cbor_output_string.
379#define SPOOL_ARRAY_SIZE 4096
380uint8_t g_spool_array[SPOOL_ARRAY_SIZE];
Nigel Tao168f60a2020-07-14 13:19:33 +1000381
382uint32_t g_cbor_output_string_length;
383bool g_cbor_output_string_is_multiple_chunks;
384bool g_cbor_output_string_is_utf_8;
385
Nigel Taoea532452020-07-27 00:03:00 +1000386uint32_t g_json_output_byte_string_length;
387
Nigel Tao0cd2f982020-03-03 23:03:02 +1100388// ----
389
390// Query is a JSON Pointer query. After initializing with a NUL-terminated C
391// string, its multiple fragments are consumed as the program walks the JSON
392// data from stdin. For example, letting "$" denote a NUL, suppose that we
393// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100394// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100395//
396// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
397// / a p p l e / b a n a n a / 1 2 / d u r i a n $
398// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
399// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100400// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100401//
Nigel Taob48ee752020-03-13 09:27:33 +1100402// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
403// start (inclusive) and end (exclusive) of the query fragment. They satisfy
404// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
405// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100406//
Nigel Taob48ee752020-03-13 09:27:33 +1100407// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
408// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100409//
410// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
411// tokens, as backslash-escaped values within that JSON string may each get
412// their own token.
413//
Nigel Taob48ee752020-03-13 09:27:33 +1100414// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100415//
Nigel Taob48ee752020-03-13 09:27:33 +1100416// While mfj remains non-nullptr, each token's unescaped contents are then
417// compared to that part of the fragment from mfj to mfk. If it is a prefix
418// (including the case of an exact match), then mfj is advanced by the
419// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100420//
421// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
422// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100423// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
424// responsible for calling Query::validate (with a strict_json_pointer_syntax
425// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100426//
Nigel Taob48ee752020-03-13 09:27:33 +1100427// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
428// incrementally match the object key with the query fragment. For example, if
429// we have already matched the "ban" of "banana", then we would accept any of
430// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
431// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100432//
Nigel Taob48ee752020-03-13 09:27:33 +1100433// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100434// v
435// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
436// / a p p l e / b a n a n a / 1 2 / d u r i a n $
437// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
438// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100439// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100440//
441// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100442// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
443// have a fragment match: the query fragment equals the object key. If there is
444// a next fragment (in this example, "12") we move the frag_etc pointers to its
445// start and end and increment Query::m_depth. Otherwise, we have matched the
446// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100447//
448// The discussion above centers on object keys. If the query fragment is
449// numeric then it can also match as an array index: the string fragment "12"
450// will match an array's 13th element (starting counting from zero). See RFC
451// 6901 for its precise definition of an "array index" number.
452//
Nigel Taob48ee752020-03-13 09:27:33 +1100453// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100454// whose type (wuffs_base__result_u64) is a result type. An error result means
455// that the fragment is not an array index. A value result holds the number of
456// list elements remaining. When matching a query fragment in an array (instead
457// of in an object), each element ticks this number down towards zero. At zero,
458// the upcoming JSON value is the one that matches the query fragment.
459class Query {
460 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100461 uint8_t* m_frag_i;
462 uint8_t* m_frag_j;
463 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100464
Nigel Taob48ee752020-03-13 09:27:33 +1100465 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100466
Nigel Taob48ee752020-03-13 09:27:33 +1100467 wuffs_base__result_u64 m_array_index;
Nigel Tao983a74f2020-07-27 15:17:46 +1000468 uint64_t m_array_index_remaining;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100469
470 public:
471 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100472 m_frag_i = (uint8_t*)query_c_string;
473 m_frag_j = (uint8_t*)query_c_string;
474 m_frag_k = (uint8_t*)query_c_string;
475 m_depth = 0;
476 m_array_index.status.repr = "#main: not an array index query fragment";
477 m_array_index.value = 0;
Nigel Tao983a74f2020-07-27 15:17:46 +1000478 m_array_index_remaining = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100479 }
480
Nigel Taob48ee752020-03-13 09:27:33 +1100481 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100482
Nigel Taob48ee752020-03-13 09:27:33 +1100483 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100484
485 // tick returns whether the fragment is a valid array index whose value is
486 // zero. If valid but non-zero, it decrements it and returns false.
487 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100488 if (m_array_index.status.is_ok()) {
Nigel Tao983a74f2020-07-27 15:17:46 +1000489 if (m_array_index_remaining == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100490 return true;
491 }
Nigel Tao983a74f2020-07-27 15:17:46 +1000492 m_array_index_remaining--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100493 }
494 return false;
495 }
496
497 // next_fragment moves to the next fragment, returning whether it existed.
498 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100499 uint8_t* k = m_frag_k;
500 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100501
502 this->reset(nullptr);
503
504 if (!k || (*k != '/')) {
505 return false;
506 }
507 k++;
508
509 bool all_digits = true;
510 uint8_t* i = k;
511 while ((*k != '\x00') && (*k != '/')) {
512 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
513 k++;
514 }
Nigel Taob48ee752020-03-13 09:27:33 +1100515 m_frag_i = i;
516 m_frag_j = i;
517 m_frag_k = k;
518 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100519 if (all_digits) {
520 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000521 m_array_index = wuffs_base__parse_number_u64(
522 wuffs_base__make_slice_u8(i, k - i),
523 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao983a74f2020-07-27 15:17:46 +1000524 m_array_index_remaining = m_array_index.value;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100525 }
526 return true;
527 }
528
Nigel Taob48ee752020-03-13 09:27:33 +1100529 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100530
Nigel Taob48ee752020-03-13 09:27:33 +1100531 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100532
Nigel Tao983a74f2020-07-27 15:17:46 +1000533 void restart_and_match_unsigned_number(bool enable, uint64_t u) {
534 m_frag_j =
535 (enable && (m_array_index.status.is_ok()) && (m_array_index.value == u))
536 ? m_frag_k
537 : nullptr;
538 }
539
Nigel Tao0cd2f982020-03-03 23:03:02 +1100540 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100541 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100542 return;
543 }
Nigel Taob48ee752020-03-13 09:27:33 +1100544 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100545 while (true) {
546 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100547 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100548 return;
549 }
550
551 if (*j == '\x00') {
552 break;
553
554 } else if (*j == '~') {
555 j++;
556 if (*j == '0') {
557 if (*ptr != '~') {
558 break;
559 }
560 } else if (*j == '1') {
561 if (*ptr != '/') {
562 break;
563 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100564 } else if (*j == 'n') {
565 if (*ptr != '\n') {
566 break;
567 }
568 } else if (*j == 'r') {
569 if (*ptr != '\r') {
570 break;
571 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100572 } else {
573 break;
574 }
575
576 } else if (*j != *ptr) {
577 break;
578 }
579
580 j++;
581 ptr++;
582 len--;
583 }
Nigel Taob48ee752020-03-13 09:27:33 +1100584 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100585 }
586
587 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100588 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100589 return;
590 }
591 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
592 size_t n = wuffs_base__utf_8__encode(
593 wuffs_base__make_slice_u8(&u[0],
594 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
595 code_point);
596 if (n > 0) {
597 this->incremental_match_slice(&u[0], n);
598 }
599 }
600
601 // validate returns whether the (ptr, len) arguments form a valid JSON
602 // Pointer. In particular, it must be valid UTF-8, and either be empty or
603 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100604 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
605 // followed by either 'n' or 'r'.
606 static bool validate(char* query_c_string,
607 size_t length,
608 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100609 if (length <= 0) {
610 return true;
611 }
612 if (query_c_string[0] != '/') {
613 return false;
614 }
615 wuffs_base__slice_u8 s =
616 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
617 bool previous_was_tilde = false;
618 while (s.len > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +1000619 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s.ptr, s.len);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100620 if (!o.is_valid()) {
621 return false;
622 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100623
624 if (previous_was_tilde) {
625 switch (o.code_point) {
626 case '0':
627 case '1':
628 break;
629 case 'n':
630 case 'r':
631 if (strict_json_pointer_syntax) {
632 return false;
633 }
634 break;
635 default:
636 return false;
637 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100638 }
639 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100640
Nigel Tao0cd2f982020-03-03 23:03:02 +1100641 s.ptr += o.byte_length;
642 s.len -= o.byte_length;
643 }
644 return !previous_was_tilde;
645 }
Nigel Taod60815c2020-03-26 14:32:35 +1100646} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100647
648// ----
649
Nigel Tao168f60a2020-07-14 13:19:33 +1000650enum class file_format {
651 json,
652 cbor,
653};
654
Nigel Tao68920952020-03-03 11:25:18 +1100655struct {
656 int remaining_argc;
657 char** remaining_argv;
658
Nigel Tao3690e832020-03-12 16:52:26 +1100659 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100660 bool fail_if_unsandboxed;
Nigel Tao4e193592020-07-15 12:48:57 +1000661 file_format input_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000662 bool input_allow_json_comments;
663 bool input_allow_json_extra_comma;
Nigel Tao51a38292020-07-19 22:43:17 +1000664 bool input_allow_json_inf_nan_numbers;
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100665 uint32_t max_output_depth;
Nigel Tao168f60a2020-07-14 13:19:33 +1000666 file_format output_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000667 bool output_cbor_metadata_as_json_comments;
Nigel Taoc766bb72020-07-09 12:59:32 +1000668 bool output_json_extra_comma;
Nigel Taodd114692020-07-25 21:54:12 +1000669 bool output_json_inf_nan_numbers;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100670 char* query_c_string;
Nigel Taoecadf722020-07-13 08:22:34 +1000671 size_t spaces;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100672 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100673 bool tabs;
Nigel Taod60815c2020-03-26 14:32:35 +1100674} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100675
676const char* //
677parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000678 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100679 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100680
681 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
682 for (; c < argc; c++) {
683 char* arg = argv[c];
684 if (*arg++ != '-') {
685 break;
686 }
687
688 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
689 // cases, a bare "-" is not a flag (some programs may interpret it as
690 // stdin) and a bare "--" means to stop parsing flags.
691 if (*arg == '\x00') {
692 break;
693 } else if (*arg == '-') {
694 arg++;
695 if (*arg == '\x00') {
696 c++;
697 break;
698 }
699 }
700
Nigel Tao3690e832020-03-12 16:52:26 +1100701 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100702 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100703 continue;
704 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100705 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
706 g_flags.max_output_depth = 1;
707 continue;
708 } else if (!strncmp(arg, "d=", 2) ||
709 !strncmp(arg, "max-output-depth=", 16)) {
710 while (*arg++ != '=') {
711 }
712 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000713 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
714 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000715 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100716 g_flags.max_output_depth = (uint32_t)(u.value);
717 continue;
718 }
719 return g_usage;
720 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100721 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100722 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100723 continue;
724 }
Nigel Tao4e193592020-07-15 12:48:57 +1000725 if (!strcmp(arg, "i=cbor") || !strcmp(arg, "input-format=cbor")) {
726 g_flags.input_format = file_format::cbor;
727 continue;
728 }
729 if (!strcmp(arg, "i=json") || !strcmp(arg, "input-format=json")) {
730 g_flags.input_format = file_format::json;
731 continue;
732 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000733 if (!strcmp(arg, "input-allow-json-comments")) {
734 g_flags.input_allow_json_comments = true;
735 continue;
736 }
737 if (!strcmp(arg, "input-allow-json-extra-comma")) {
738 g_flags.input_allow_json_extra_comma = true;
Nigel Taoc766bb72020-07-09 12:59:32 +1000739 continue;
740 }
Nigel Tao51a38292020-07-19 22:43:17 +1000741 if (!strcmp(arg, "input-allow-json-inf-nan-numbers")) {
742 g_flags.input_allow_json_inf_nan_numbers = true;
743 continue;
744 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000745 if (!strcmp(arg, "o=cbor") || !strcmp(arg, "output-format=cbor")) {
746 g_flags.output_format = file_format::cbor;
747 continue;
748 }
749 if (!strcmp(arg, "o=json") || !strcmp(arg, "output-format=json")) {
750 g_flags.output_format = file_format::json;
751 continue;
752 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000753 if (!strcmp(arg, "output-cbor-metadata-as-json-comments")) {
754 g_flags.output_cbor_metadata_as_json_comments = true;
755 continue;
756 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000757 if (!strcmp(arg, "output-json-extra-comma")) {
758 g_flags.output_json_extra_comma = true;
759 continue;
760 }
Nigel Taodd114692020-07-25 21:54:12 +1000761 if (!strcmp(arg, "output-json-inf-nan-numbers")) {
762 g_flags.output_json_inf_nan_numbers = true;
763 continue;
764 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100765 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
766 while (*arg++ != '=') {
767 }
Nigel Taod60815c2020-03-26 14:32:35 +1100768 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100769 continue;
770 }
Nigel Taoecadf722020-07-13 08:22:34 +1000771 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
772 while (*arg++ != '=') {
773 }
774 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
775 g_flags.spaces = arg[0] - '0';
776 continue;
777 }
778 return g_usage;
779 }
780 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100781 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100782 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100783 }
784 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100785 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100786 continue;
787 }
788
Nigel Taod60815c2020-03-26 14:32:35 +1100789 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100790 }
791
Nigel Taod60815c2020-03-26 14:32:35 +1100792 if (g_flags.query_c_string &&
793 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
794 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100795 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
796 }
797
Nigel Taod60815c2020-03-26 14:32:35 +1100798 g_flags.remaining_argc = argc - c;
799 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100800 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100801}
802
Nigel Tao2cf76db2020-02-27 22:42:01 +1100803const char* //
804initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100805 g_dst = wuffs_base__make_io_buffer(
806 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100807 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100808
Nigel Taod60815c2020-03-26 14:32:35 +1100809 g_src = wuffs_base__make_io_buffer(
810 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100811 wuffs_base__empty_io_buffer_meta());
812
Nigel Taod60815c2020-03-26 14:32:35 +1100813 g_tok = wuffs_base__make_token_buffer(
814 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100815 wuffs_base__empty_token_buffer_meta());
816
Nigel Taod60815c2020-03-26 14:32:35 +1100817 g_curr_token_end_src_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100818
Nigel Tao850dc182020-07-21 22:52:04 +1000819 g_token_extension.category = 0;
820 g_token_extension.detail = 0;
821
Nigel Taod60815c2020-03-26 14:32:35 +1100822 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100823
Nigel Taod60815c2020-03-26 14:32:35 +1100824 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100825
Nigel Tao68920952020-03-03 11:25:18 +1100826 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100827 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100828 return "main: unsandboxed";
829 }
Nigel Tao01abc842020-03-06 21:42:33 +1100830 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100831 if (g_flags.remaining_argc >
832 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
833 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100834 }
835
Nigel Taod60815c2020-03-26 14:32:35 +1100836 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100837
838 // If the query is non-empty, suprress writing to stdout until we've
839 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100840 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
841 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100842
Nigel Tao4e193592020-07-15 12:48:57 +1000843 if (g_flags.input_format == file_format::json) {
844 TRY(g_json_decoder
845 .initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
846 .message());
847 g_dec = g_json_decoder.upcast_as__wuffs_base__token_decoder();
848 } else {
849 TRY(g_cbor_decoder
850 .initialize(sizeof__wuffs_cbor__decoder(), WUFFS_VERSION, 0)
851 .message());
852 g_dec = g_cbor_decoder.upcast_as__wuffs_base__token_decoder();
853 }
Nigel Tao4b186b02020-03-18 14:25:21 +1100854
Nigel Tao3c8589b2020-07-19 21:49:00 +1000855 if (g_flags.input_allow_json_comments) {
856 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
857 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
858 }
859 if (g_flags.input_allow_json_extra_comma) {
Nigel Tao4e193592020-07-15 12:48:57 +1000860 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000861 }
Nigel Tao51a38292020-07-19 22:43:17 +1000862 if (g_flags.input_allow_json_inf_nan_numbers) {
863 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_INF_NAN_NUMBERS, true);
864 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000865
Nigel Tao4b186b02020-03-18 14:25:21 +1100866 // Consume an optional whitespace trailer. This isn't part of the JSON spec,
867 // but it works better with line oriented Unix tools (such as "echo 123 |
868 // jsonptr" where it's "echo", not "echo -n") or hand-edited JSON files which
869 // can accidentally contain trailing whitespace.
Nigel Tao4e193592020-07-15 12:48:57 +1000870 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100871
872 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100873}
Nigel Tao1b073492020-02-16 22:11:36 +1100874
875// ----
876
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100877// ignore_return_value suppresses errors from -Wall -Werror.
878static void //
879ignore_return_value(int ignored) {}
880
Nigel Tao2914bae2020-02-26 09:40:30 +1100881const char* //
882read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +1100883 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +1100884 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +1100885 }
Nigel Taod60815c2020-03-26 14:32:35 +1100886 g_src.compact();
887 if (g_src.meta.wi >= g_src.data.len) {
888 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100889 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100890 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000891 ssize_t n = read(g_input_file_descriptor, g_src.writer_pointer(),
892 g_src.writer_length());
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100893 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100894 g_src.meta.wi += n;
895 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100896 break;
897 } else if (errno != EINTR) {
898 return strerror(errno);
899 }
Nigel Tao1b073492020-02-16 22:11:36 +1100900 }
901 return nullptr;
902}
903
Nigel Tao2914bae2020-02-26 09:40:30 +1100904const char* //
905flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100906 while (true) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000907 size_t n = g_dst.reader_length();
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100908 if (n == 0) {
909 break;
Nigel Tao1b073492020-02-16 22:11:36 +1100910 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100911 const int stdout_fd = 1;
Nigel Taod6a10df2020-07-27 11:47:47 +1000912 ssize_t i = write(stdout_fd, g_dst.reader_pointer(), n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100913 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100914 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100915 } else if (errno != EINTR) {
916 return strerror(errno);
917 }
Nigel Tao1b073492020-02-16 22:11:36 +1100918 }
Nigel Taod60815c2020-03-26 14:32:35 +1100919 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +1100920 return nullptr;
921}
922
Nigel Tao2914bae2020-02-26 09:40:30 +1100923const char* //
924write_dst(const void* s, size_t n) {
Nigel Taod60815c2020-03-26 14:32:35 +1100925 if (g_suppress_write_dst > 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100926 return nullptr;
927 }
Nigel Tao1b073492020-02-16 22:11:36 +1100928 const uint8_t* p = static_cast<const uint8_t*>(s);
929 while (n > 0) {
Nigel Taod6a10df2020-07-27 11:47:47 +1000930 size_t i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +1100931 if (i == 0) {
932 const char* z = flush_dst();
933 if (z) {
934 return z;
935 }
Nigel Taod6a10df2020-07-27 11:47:47 +1000936 i = g_dst.writer_length();
Nigel Tao1b073492020-02-16 22:11:36 +1100937 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100938 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100939 }
940 }
941
942 if (i > n) {
943 i = n;
944 }
Nigel Taod60815c2020-03-26 14:32:35 +1100945 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
946 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +1100947 p += i;
948 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +1100949 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +1100950 }
951 return nullptr;
952}
953
954// ----
955
Nigel Tao168f60a2020-07-14 13:19:33 +1000956const char* //
957write_literal(uint64_t vbd) {
958 const char* ptr = nullptr;
959 size_t len = 0;
960 if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__UNDEFINED) {
961 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +1000962 // JSON's closest approximation to "undefined" is "null".
963 if (g_flags.output_cbor_metadata_as_json_comments) {
964 ptr = "/*cbor:undefined*/null";
965 len = 22;
966 } else {
967 ptr = "null";
968 len = 4;
969 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000970 } else {
971 ptr = "\xF7";
972 len = 1;
973 }
974 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__NULL) {
975 if (g_flags.output_format == file_format::json) {
976 ptr = "null";
977 len = 4;
978 } else {
979 ptr = "\xF6";
980 len = 1;
981 }
982 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__FALSE) {
983 if (g_flags.output_format == file_format::json) {
984 ptr = "false";
985 len = 5;
986 } else {
987 ptr = "\xF4";
988 len = 1;
989 }
990 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__TRUE) {
991 if (g_flags.output_format == file_format::json) {
992 ptr = "true";
993 len = 4;
994 } else {
995 ptr = "\xF5";
996 len = 1;
997 }
998 } else {
999 return "main: internal error: unexpected write_literal argument";
1000 }
1001 return write_dst(ptr, len);
1002}
1003
1004// ----
1005
1006const char* //
Nigel Tao664f8432020-07-16 21:25:14 +10001007write_number_as_cbor_f64(double f) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001008 uint8_t buf[9];
1009 wuffs_base__lossy_value_u16 lv16 =
1010 wuffs_base__ieee_754_bit_representation__from_f64_to_u16_truncate(f);
1011 if (!lv16.lossy) {
1012 buf[0] = 0xF9;
1013 wuffs_base__store_u16be__no_bounds_check(&buf[1], lv16.value);
1014 return write_dst(&buf[0], 3);
1015 }
1016 wuffs_base__lossy_value_u32 lv32 =
1017 wuffs_base__ieee_754_bit_representation__from_f64_to_u32_truncate(f);
1018 if (!lv32.lossy) {
1019 buf[0] = 0xFA;
1020 wuffs_base__store_u32be__no_bounds_check(&buf[1], lv32.value);
1021 return write_dst(&buf[0], 5);
1022 }
1023 buf[0] = 0xFB;
1024 wuffs_base__store_u64be__no_bounds_check(
1025 &buf[1], wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f));
1026 return write_dst(&buf[0], 9);
1027}
1028
1029const char* //
Nigel Tao664f8432020-07-16 21:25:14 +10001030write_number_as_cbor_u64(uint8_t base, uint64_t u) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001031 uint8_t buf[9];
1032 if (u < 0x18) {
1033 buf[0] = base | ((uint8_t)u);
1034 return write_dst(&buf[0], 1);
1035 } else if ((u >> 8) == 0) {
1036 buf[0] = base | 0x18;
1037 buf[1] = ((uint8_t)u);
1038 return write_dst(&buf[0], 2);
1039 } else if ((u >> 16) == 0) {
1040 buf[0] = base | 0x19;
1041 wuffs_base__store_u16be__no_bounds_check(&buf[1], ((uint16_t)u));
1042 return write_dst(&buf[0], 3);
1043 } else if ((u >> 32) == 0) {
1044 buf[0] = base | 0x1A;
1045 wuffs_base__store_u32be__no_bounds_check(&buf[1], ((uint32_t)u));
1046 return write_dst(&buf[0], 5);
1047 }
1048 buf[0] = base | 0x1B;
1049 wuffs_base__store_u64be__no_bounds_check(&buf[1], u);
1050 return write_dst(&buf[0], 9);
1051}
1052
1053const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001054write_number_as_json_f64(wuffs_base__slice_u8 s) {
Nigel Tao5a616b62020-07-24 23:54:52 +10001055 double f;
Nigel Taoee6927f2020-07-27 12:08:33 +10001056 switch (s.len) {
Nigel Tao5a616b62020-07-24 23:54:52 +10001057 case 3:
1058 f = wuffs_base__ieee_754_bit_representation__from_u16_to_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001059 wuffs_base__load_u16be__no_bounds_check(s.ptr + 1));
Nigel Tao5a616b62020-07-24 23:54:52 +10001060 break;
1061 case 5:
1062 f = wuffs_base__ieee_754_bit_representation__from_u32_to_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001063 wuffs_base__load_u32be__no_bounds_check(s.ptr + 1));
Nigel Tao5a616b62020-07-24 23:54:52 +10001064 break;
1065 case 9:
1066 f = wuffs_base__ieee_754_bit_representation__from_u64_to_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001067 wuffs_base__load_u64be__no_bounds_check(s.ptr + 1));
Nigel Tao5a616b62020-07-24 23:54:52 +10001068 break;
1069 default:
1070 return "main: internal error: unexpected write_number_as_json_f64 len";
1071 }
1072 uint8_t buf[512];
1073 const uint32_t precision = 0;
1074 size_t n = wuffs_base__render_number_f64(
1075 wuffs_base__make_slice_u8(&buf[0], sizeof buf), f, precision,
1076 WUFFS_BASE__RENDER_NUMBER_FXX__JUST_ENOUGH_PRECISION);
1077
Nigel Taodd114692020-07-25 21:54:12 +10001078 if (!g_flags.output_json_inf_nan_numbers) {
1079 // JSON numbers don't include Infinities or NaNs. For such numbers, their
1080 // IEEE 754 bit representation's 11 exponent bits are all on.
1081 uint64_t u = wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f);
1082 if (((u >> 52) & 0x7FF) == 0x7FF) {
1083 if (g_flags.output_cbor_metadata_as_json_comments) {
1084 TRY(write_dst("/*cbor:", 7));
1085 TRY(write_dst(&buf[0], n));
1086 TRY(write_dst("*/", 2));
1087 }
1088 return write_dst("null", 4);
Nigel Tao5a616b62020-07-24 23:54:52 +10001089 }
Nigel Tao5a616b62020-07-24 23:54:52 +10001090 }
1091
1092 return write_dst(&buf[0], n);
1093}
1094
1095const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001096write_cbor_minus_1_minus_x(wuffs_base__slice_u8 s) {
Nigel Tao27168032020-07-24 13:05:05 +10001097 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001098 return write_dst(s.ptr, s.len);
Nigel Tao27168032020-07-24 13:05:05 +10001099 }
1100
Nigel Taoee6927f2020-07-27 12:08:33 +10001101 if (s.len != 9) {
Nigel Tao850dc182020-07-21 22:52:04 +10001102 return "main: internal error: invalid ETC__MINUS_1_MINUS_X token length";
Nigel Tao664f8432020-07-16 21:25:14 +10001103 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001104 uint64_t u = 1 + wuffs_base__load_u64be__no_bounds_check(s.ptr + 1);
Nigel Tao850dc182020-07-21 22:52:04 +10001105 if (u == 0) {
1106 // See the cbor.TOKEN_VALUE_MINOR__MINUS_1_MINUS_X comment re overflow.
1107 return write_dst("-18446744073709551616", 21);
Nigel Tao664f8432020-07-16 21:25:14 +10001108 }
1109 uint8_t buf[1 + WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1110 uint8_t* b = &buf[0];
Nigel Tao850dc182020-07-21 22:52:04 +10001111 *b++ = '-';
Nigel Tao664f8432020-07-16 21:25:14 +10001112 size_t n = wuffs_base__render_number_u64(
1113 wuffs_base__make_slice_u8(b, WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL), u,
1114 WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao850dc182020-07-21 22:52:04 +10001115 return write_dst(&buf[0], 1 + n);
Nigel Tao664f8432020-07-16 21:25:14 +10001116}
1117
1118const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001119write_cbor_simple_value(uint64_t tag, wuffs_base__slice_u8 s) {
Nigel Tao042e94f2020-07-24 23:14:27 +10001120 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001121 return write_dst(s.ptr, s.len);
Nigel Tao042e94f2020-07-24 23:14:27 +10001122 }
1123
1124 if (!g_flags.output_cbor_metadata_as_json_comments) {
1125 return nullptr;
1126 }
1127 uint8_t buf[WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1128 size_t n = wuffs_base__render_number_u64(
1129 wuffs_base__make_slice_u8(&buf[0],
1130 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL),
1131 tag, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1132 TRY(write_dst("/*cbor:simple", 13));
1133 TRY(write_dst(&buf[0], n));
1134 return write_dst("*/null", 6);
1135}
1136
1137const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001138write_cbor_tag(uint64_t tag, wuffs_base__slice_u8 s) {
Nigel Tao27168032020-07-24 13:05:05 +10001139 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001140 return write_dst(s.ptr, s.len);
Nigel Tao27168032020-07-24 13:05:05 +10001141 }
1142
1143 if (!g_flags.output_cbor_metadata_as_json_comments) {
1144 return nullptr;
1145 }
1146 uint8_t buf[WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1147 size_t n = wuffs_base__render_number_u64(
1148 wuffs_base__make_slice_u8(&buf[0],
1149 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL),
1150 tag, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1151 TRY(write_dst("/*cbor:tag", 10));
1152 TRY(write_dst(&buf[0], n));
1153 return write_dst("*/", 2);
1154}
1155
1156const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001157write_number(uint64_t vbd, wuffs_base__slice_u8 s) {
Nigel Tao4e193592020-07-15 12:48:57 +10001158 if (g_flags.output_format == file_format::json) {
Nigel Tao5a616b62020-07-24 23:54:52 +10001159 const uint64_t cfp_fbbe_fifb =
1160 WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT |
1161 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_BINARY_BIG_ENDIAN |
1162 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_IGNORE_FIRST_BYTE;
Nigel Tao51a38292020-07-19 22:43:17 +10001163 if (g_flags.input_format == file_format::json) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001164 return write_dst(s.ptr, s.len);
Nigel Tao5a616b62020-07-24 23:54:52 +10001165 } else if ((vbd & cfp_fbbe_fifb) == cfp_fbbe_fifb) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001166 return write_number_as_json_f64(s);
Nigel Tao168f60a2020-07-14 13:19:33 +10001167 }
1168
Nigel Tao4e193592020-07-15 12:48:57 +10001169 // From here on, (g_flags.output_format == file_format::cbor).
Nigel Tao4e193592020-07-15 12:48:57 +10001170 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_TEXT) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001171 // First try to parse s as an integer. Something like
Nigel Tao168f60a2020-07-14 13:19:33 +10001172 // "1180591620717411303424" is a valid number (in the JSON sense) but will
1173 // overflow int64_t or uint64_t, so fall back to parsing it as a float64.
1174 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_INTEGER_SIGNED) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001175 if ((s.len > 0) && (s.ptr[0] == '-')) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001176 wuffs_base__result_i64 ri = wuffs_base__parse_number_i64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001177 s, WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao168f60a2020-07-14 13:19:33 +10001178 if (ri.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001179 return write_number_as_cbor_u64(0x20, ~ri.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001180 }
1181 } else {
1182 wuffs_base__result_u64 ru = wuffs_base__parse_number_u64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001183 s, WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao168f60a2020-07-14 13:19:33 +10001184 if (ru.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001185 return write_number_as_cbor_u64(0x00, ru.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001186 }
1187 }
1188 }
1189
1190 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT) {
1191 wuffs_base__result_f64 rf = wuffs_base__parse_number_f64(
Nigel Taoee6927f2020-07-27 12:08:33 +10001192 s, WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao168f60a2020-07-14 13:19:33 +10001193 if (rf.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001194 return write_number_as_cbor_f64(rf.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001195 }
1196 }
Nigel Tao51a38292020-07-19 22:43:17 +10001197 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_INF) {
1198 return write_dst("\xF9\xFC\x00", 3);
1199 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_INF) {
1200 return write_dst("\xF9\x7C\x00", 3);
1201 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_NEG_NAN) {
1202 return write_dst("\xF9\xFF\xFF", 3);
1203 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_POS_NAN) {
1204 return write_dst("\xF9\x7F\xFF", 3);
Nigel Tao168f60a2020-07-14 13:19:33 +10001205 }
1206
Nigel Tao4e193592020-07-15 12:48:57 +10001207fail:
Nigel Tao168f60a2020-07-14 13:19:33 +10001208 return "main: internal error: unexpected write_number argument";
1209}
1210
Nigel Tao4e193592020-07-15 12:48:57 +10001211const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001212write_inline_integer(uint64_t x, bool x_is_signed, wuffs_base__slice_u8 s) {
Nigel Tao983a74f2020-07-27 15:17:46 +10001213 bool is_key = in_dict_before_key();
1214 g_query.restart_and_match_unsigned_number(
1215 is_key && g_query.is_at(g_depth) && !x_is_signed, x);
1216
Nigel Tao4e193592020-07-15 12:48:57 +10001217 if (g_flags.output_format == file_format::cbor) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001218 return write_dst(s.ptr, s.len);
Nigel Tao4e193592020-07-15 12:48:57 +10001219 }
1220
Nigel Tao983a74f2020-07-27 15:17:46 +10001221 if (is_key) {
1222 TRY(write_dst("\"", 1));
1223 }
1224
Nigel Taoc9d4e342020-07-21 15:20:34 +10001225 // Adding the two ETC__BYTE_LENGTH__ETC constants is overkill, but it's
1226 // simpler (for producing a constant-expression array size) than taking the
1227 // maximum of the two.
1228 uint8_t buf[WUFFS_BASE__I64__BYTE_LENGTH__MAX_INCL +
1229 WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1230 wuffs_base__slice_u8 dst = wuffs_base__make_slice_u8(&buf[0], sizeof buf);
1231 size_t n =
1232 x_is_signed
1233 ? wuffs_base__render_number_i64(
1234 dst, (int64_t)x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS)
1235 : wuffs_base__render_number_u64(
1236 dst, x, WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao983a74f2020-07-27 15:17:46 +10001237 TRY(write_dst(&buf[0], n));
1238
1239 if (is_key) {
1240 TRY(write_dst("\"", 1));
1241 }
1242 return nullptr;
Nigel Tao4e193592020-07-15 12:48:57 +10001243}
1244
Nigel Tao168f60a2020-07-14 13:19:33 +10001245// ----
1246
Nigel Tao2914bae2020-02-26 09:40:30 +11001247uint8_t //
1248hex_digit(uint8_t nibble) {
Nigel Taob5461bd2020-02-21 14:13:37 +11001249 nibble &= 0x0F;
1250 if (nibble <= 9) {
1251 return '0' + nibble;
1252 }
1253 return ('A' - 10) + nibble;
1254}
1255
Nigel Tao2914bae2020-02-26 09:40:30 +11001256const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001257flush_cbor_output_string() {
1258 uint8_t prefix[3];
1259 prefix[0] = g_cbor_output_string_is_utf_8 ? 0x60 : 0x40;
1260 if (g_cbor_output_string_length < 0x18) {
1261 prefix[0] |= g_cbor_output_string_length;
1262 TRY(write_dst(&prefix[0], 1));
1263 } else if (g_cbor_output_string_length <= 0xFF) {
1264 prefix[0] |= 0x18;
1265 prefix[1] = g_cbor_output_string_length;
1266 TRY(write_dst(&prefix[0], 2));
1267 } else if (g_cbor_output_string_length <= 0xFFFF) {
1268 prefix[0] |= 0x19;
1269 prefix[1] = g_cbor_output_string_length >> 8;
1270 prefix[2] = g_cbor_output_string_length;
1271 TRY(write_dst(&prefix[0], 3));
1272 } else {
1273 return "main: internal error: CBOR string output is too long";
1274 }
1275
1276 size_t n = g_cbor_output_string_length;
1277 g_cbor_output_string_length = 0;
Nigel Taoea532452020-07-27 00:03:00 +10001278 return write_dst(&g_spool_array[0], n);
Nigel Tao168f60a2020-07-14 13:19:33 +10001279}
1280
1281const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001282write_cbor_output_string(wuffs_base__slice_u8 s, bool finish) {
Nigel Taoea532452020-07-27 00:03:00 +10001283 // Check that g_spool_array can hold any UTF-8 code point.
1284 if (SPOOL_ARRAY_SIZE < 4) {
1285 return "main: internal error: SPOOL_ARRAY_SIZE is too short";
Nigel Tao168f60a2020-07-14 13:19:33 +10001286 }
1287
Nigel Taoee6927f2020-07-27 12:08:33 +10001288 uint8_t* ptr = s.ptr;
1289 size_t len = s.len;
Nigel Tao168f60a2020-07-14 13:19:33 +10001290 while (len > 0) {
Nigel Taoea532452020-07-27 00:03:00 +10001291 size_t available = SPOOL_ARRAY_SIZE - g_cbor_output_string_length;
Nigel Tao168f60a2020-07-14 13:19:33 +10001292 if (available >= len) {
Nigel Taoea532452020-07-27 00:03:00 +10001293 memcpy(&g_spool_array[g_cbor_output_string_length], ptr, len);
Nigel Tao168f60a2020-07-14 13:19:33 +10001294 g_cbor_output_string_length += len;
1295 ptr += len;
1296 len = 0;
1297 break;
1298
1299 } else if (available > 0) {
1300 if (!g_cbor_output_string_is_multiple_chunks) {
1301 g_cbor_output_string_is_multiple_chunks = true;
1302 TRY(write_dst(g_cbor_output_string_is_utf_8 ? "\x7F" : "\x5F", 1));
Nigel Tao3b486982020-02-27 15:05:59 +11001303 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001304
1305 if (g_cbor_output_string_is_utf_8) {
1306 // Walk the end backwards to a UTF-8 boundary, so that each chunk of
1307 // the multi-chunk string is also valid UTF-8.
1308 while (available > 0) {
Nigel Tao702c7b22020-07-22 15:42:54 +10001309 wuffs_base__utf_8__next__output o =
1310 wuffs_base__utf_8__next_from_end(ptr, available);
Nigel Tao168f60a2020-07-14 13:19:33 +10001311 if ((o.code_point != WUFFS_BASE__UNICODE_REPLACEMENT_CHARACTER) ||
1312 (o.byte_length != 1)) {
1313 break;
1314 }
1315 available--;
1316 }
1317 }
1318
Nigel Taoea532452020-07-27 00:03:00 +10001319 memcpy(&g_spool_array[g_cbor_output_string_length], ptr, available);
Nigel Tao168f60a2020-07-14 13:19:33 +10001320 g_cbor_output_string_length += available;
1321 ptr += available;
1322 len -= available;
Nigel Tao3b486982020-02-27 15:05:59 +11001323 }
1324
Nigel Tao168f60a2020-07-14 13:19:33 +10001325 TRY(flush_cbor_output_string());
1326 }
Nigel Taob9ad34f2020-03-03 12:44:01 +11001327
Nigel Tao168f60a2020-07-14 13:19:33 +10001328 if (finish) {
1329 TRY(flush_cbor_output_string());
1330 if (g_cbor_output_string_is_multiple_chunks) {
1331 TRY(write_dst("\xFF", 1));
1332 }
1333 }
1334 return nullptr;
1335}
Nigel Taob9ad34f2020-03-03 12:44:01 +11001336
Nigel Tao168f60a2020-07-14 13:19:33 +10001337const char* //
Nigel Taoea532452020-07-27 00:03:00 +10001338flush_json_output_byte_string(bool finish) {
1339 uint8_t* ptr = &g_spool_array[0];
1340 size_t len = g_json_output_byte_string_length;
1341 while (len > 0) {
1342 wuffs_base__transform__output o = wuffs_base__base_64__encode(
Nigel Taod6a10df2020-07-27 11:47:47 +10001343 g_dst.writer_slice(), wuffs_base__make_slice_u8(ptr, len), finish,
Nigel Taoea532452020-07-27 00:03:00 +10001344 WUFFS_BASE__BASE_64__URL_ALPHABET);
1345 g_dst.meta.wi += o.num_dst;
1346 ptr += o.num_src;
1347 len -= o.num_src;
1348 if (o.status.repr == nullptr) {
1349 if (len != 0) {
1350 return "main: internal error: inconsistent spool length";
1351 }
1352 g_json_output_byte_string_length = 0;
1353 break;
1354 } else if (o.status.repr == wuffs_base__suspension__short_read) {
1355 memmove(&g_spool_array[0], ptr, len);
1356 g_json_output_byte_string_length = len;
1357 break;
1358 } else if (o.status.repr != wuffs_base__suspension__short_write) {
1359 return o.status.message();
1360 }
1361 TRY(flush_dst());
1362 }
1363 return nullptr;
1364}
1365
1366const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001367write_json_output_byte_string(wuffs_base__slice_u8 s, bool finish) {
1368 uint8_t* ptr = s.ptr;
1369 size_t len = s.len;
Nigel Taoea532452020-07-27 00:03:00 +10001370 while (len > 0) {
1371 size_t available = SPOOL_ARRAY_SIZE - g_json_output_byte_string_length;
1372 if (available >= len) {
1373 memcpy(&g_spool_array[g_json_output_byte_string_length], ptr, len);
1374 g_json_output_byte_string_length += len;
1375 ptr += len;
1376 len = 0;
1377 break;
1378
1379 } else if (available > 0) {
1380 memcpy(&g_spool_array[g_json_output_byte_string_length], ptr, available);
1381 g_json_output_byte_string_length += available;
1382 ptr += available;
1383 len -= available;
1384 }
1385
1386 TRY(flush_json_output_byte_string(false));
1387 }
1388
1389 if (finish) {
1390 TRY(flush_json_output_byte_string(true));
1391 }
1392 return nullptr;
1393}
1394
1395// ----
1396
1397const char* //
Nigel Tao7cb76542020-07-19 22:19:04 +10001398handle_unicode_code_point(uint32_t ucp) {
1399 if (g_flags.output_format == file_format::json) {
1400 if (ucp < 0x0020) {
1401 switch (ucp) {
1402 case '\b':
1403 return write_dst("\\b", 2);
1404 case '\f':
1405 return write_dst("\\f", 2);
1406 case '\n':
1407 return write_dst("\\n", 2);
1408 case '\r':
1409 return write_dst("\\r", 2);
1410 case '\t':
1411 return write_dst("\\t", 2);
1412 }
1413
1414 // Other bytes less than 0x0020 are valid UTF-8 but not valid in a
1415 // JSON string. They need to remain escaped.
1416 uint8_t esc6[6];
1417 esc6[0] = '\\';
1418 esc6[1] = 'u';
1419 esc6[2] = '0';
1420 esc6[3] = '0';
1421 esc6[4] = hex_digit(ucp >> 4);
1422 esc6[5] = hex_digit(ucp >> 0);
1423 return write_dst(&esc6[0], 6);
1424
1425 } else if (ucp == '\"') {
1426 return write_dst("\\\"", 2);
1427
1428 } else if (ucp == '\\') {
1429 return write_dst("\\\\", 2);
1430 }
1431 }
1432
1433 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1434 size_t n = wuffs_base__utf_8__encode(
1435 wuffs_base__make_slice_u8(&u[0],
1436 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1437 ucp);
1438 if (n == 0) {
1439 return "main: internal error: unexpected Unicode code point";
1440 }
1441
1442 if (g_flags.output_format == file_format::json) {
1443 return write_dst(&u[0], n);
1444 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001445 return write_cbor_output_string(wuffs_base__make_slice_u8(&u[0], n), false);
Nigel Tao7cb76542020-07-19 22:19:04 +10001446}
Nigel Taod191a3f2020-07-19 22:14:54 +10001447
1448const char* //
Nigel Taoee6927f2020-07-27 12:08:33 +10001449write_json_output_text_string(wuffs_base__slice_u8 s) {
1450 uint8_t* ptr = s.ptr;
1451 size_t len = s.len;
Nigel Taod191a3f2020-07-19 22:14:54 +10001452restart:
1453 while (true) {
1454 size_t i;
1455 for (i = 0; i < len; i++) {
1456 uint8_t c = ptr[i];
1457 if ((c == '"') || (c == '\\') || (c < 0x20)) {
1458 TRY(write_dst(ptr, i));
1459 TRY(handle_unicode_code_point(c));
1460 ptr += i + 1;
1461 len -= i + 1;
1462 goto restart;
1463 }
1464 }
1465 TRY(write_dst(ptr, len));
1466 break;
1467 }
1468 return nullptr;
1469}
1470
1471const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001472handle_string(uint64_t vbd,
Nigel Taoee6927f2020-07-27 12:08:33 +10001473 wuffs_base__slice_u8 s,
Nigel Tao168f60a2020-07-14 13:19:33 +10001474 bool start_of_token_chain,
1475 bool continued) {
1476 if (start_of_token_chain) {
1477 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +10001478 if (g_flags.output_cbor_metadata_as_json_comments &&
1479 !(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
Nigel Taoea532452020-07-27 00:03:00 +10001480 TRY(write_dst("/*cbor:base64url*/\"", 19));
1481 g_json_output_byte_string_length = 0;
Nigel Tao3c8589b2020-07-19 21:49:00 +10001482 } else {
1483 TRY(write_dst("\"", 1));
1484 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001485 } else {
1486 g_cbor_output_string_length = 0;
1487 g_cbor_output_string_is_multiple_chunks = false;
1488 g_cbor_output_string_is_utf_8 =
1489 vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8;
1490 }
1491 g_query.restart_fragment(in_dict_before_key() && g_query.is_at(g_depth));
1492 }
1493
1494 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_0_DST_1_SRC_DROP) {
1495 // No-op.
1496 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001497 if (g_flags.output_format == file_format::json) {
Nigel Taoaf757722020-07-18 17:27:11 +10001498 if (g_flags.input_format == file_format::json) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001499 TRY(write_dst(s.ptr, s.len));
Nigel Taoaf757722020-07-18 17:27:11 +10001500 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001501 TRY(write_json_output_text_string(s));
Nigel Taoaf757722020-07-18 17:27:11 +10001502 } else {
Nigel Taoee6927f2020-07-27 12:08:33 +10001503 TRY(write_json_output_byte_string(s, false));
Nigel Taoaf757722020-07-18 17:27:11 +10001504 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001505 } else {
Nigel Taoee6927f2020-07-27 12:08:33 +10001506 TRY(write_cbor_output_string(s, false));
Nigel Tao168f60a2020-07-14 13:19:33 +10001507 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001508 g_query.incremental_match_slice(s.ptr, s.len);
Nigel Taob9ad34f2020-03-03 12:44:01 +11001509 } else {
Nigel Tao168f60a2020-07-14 13:19:33 +10001510 return "main: internal error: unexpected string-token conversion";
1511 }
1512
1513 if (continued) {
1514 return nullptr;
1515 }
1516
1517 if (g_flags.output_format == file_format::json) {
Nigel Taoea532452020-07-27 00:03:00 +10001518 if (!(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001519 TRY(write_json_output_byte_string(wuffs_base__empty_slice_u8(), true));
Nigel Taoea532452020-07-27 00:03:00 +10001520 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001521 TRY(write_dst("\"", 1));
1522 } else {
Nigel Taoee6927f2020-07-27 12:08:33 +10001523 TRY(write_cbor_output_string(wuffs_base__empty_slice_u8(), true));
Nigel Tao168f60a2020-07-14 13:19:33 +10001524 }
1525 return nullptr;
1526}
1527
Nigel Taod191a3f2020-07-19 22:14:54 +10001528// ----
1529
Nigel Tao3b486982020-02-27 15:05:59 +11001530const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001531handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001532 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001533 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001534 uint64_t vbd = t.value_base_detail();
Nigel Taoee6927f2020-07-27 12:08:33 +10001535 uint64_t token_length = t.length();
1536 wuffs_base__slice_u8 tok = wuffs_base__make_slice_u8(
1537 g_src.data.ptr + g_curr_token_end_src_index - token_length,
1538 token_length);
Nigel Tao1b073492020-02-16 22:11:36 +11001539
1540 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001541 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001542 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001543 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001544 return "main: no match for query";
1545 }
Nigel Taod60815c2020-03-26 14:32:35 +11001546 if (g_depth <= 0) {
1547 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001548 }
Nigel Taod60815c2020-03-26 14:32:35 +11001549 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001550
Nigel Taod60815c2020-03-26 14:32:35 +11001551 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1552 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001553 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao168f60a2020-07-14 13:19:33 +10001554 if (g_flags.output_format == file_format::json) {
1555 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1556 ? "\"[…]\""
1557 : "\"{…}\"",
1558 7));
1559 } else {
1560 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1561 ? "\x65[…]"
1562 : "\x65{…}",
1563 6));
1564 }
1565 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001566 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001567 if ((g_ctx != context::in_list_after_bracket) &&
1568 (g_ctx != context::in_dict_after_brace) &&
1569 !g_flags.compact_output) {
Nigel Taoc766bb72020-07-09 12:59:32 +10001570 if (g_flags.output_json_extra_comma) {
1571 TRY(write_dst(",\n", 2));
1572 } else {
1573 TRY(write_dst("\n", 1));
1574 }
Nigel Taod60815c2020-03-26 14:32:35 +11001575 for (uint32_t i = 0; i < g_depth; i++) {
1576 TRY(write_dst(
1577 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001578 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001579 }
Nigel Tao1b073492020-02-16 22:11:36 +11001580 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001581
1582 TRY(write_dst(
1583 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1584 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001585 } else {
1586 TRY(write_dst("\xFF", 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001587 }
1588
Nigel Taod60815c2020-03-26 14:32:35 +11001589 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1590 ? context::in_list_after_value
1591 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001592 goto after_value;
1593 }
1594
Nigel Taod1c928a2020-02-28 12:43:53 +11001595 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
1596 // continuation of a multi-token chain.
Nigel Tao2ef39992020-04-09 17:24:39 +10001597 if (start_of_token_chain) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001598 if (g_flags.output_format != file_format::json) {
1599 // No-op.
1600 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001601 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1602 } else if (g_ctx != context::none) {
Nigel Taof8dfc762020-07-23 23:35:44 +10001603 if ((g_ctx == context::in_dict_after_brace) ||
1604 (g_ctx == context::in_dict_after_value)) {
Nigel Tao983a74f2020-07-27 15:17:46 +10001605 // Reject dict keys that aren't UTF-8 strings or non-negative
1606 // integers, which could otherwise happen with -i=cbor -o=json.
1607 if (vbc == WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED) {
1608 // No-op.
1609 } else if ((vbc == WUFFS_BASE__TOKEN__VBC__STRING) &&
1610 (vbd &
1611 WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
1612 // No-op.
1613 } else {
Nigel Taof8dfc762020-07-23 23:35:44 +10001614 return "main: cannot convert CBOR non-text-string to JSON map key";
1615 }
1616 }
1617 if ((g_ctx == context::in_list_after_value) ||
1618 (g_ctx == context::in_dict_after_value)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001619 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001620 }
Nigel Taod60815c2020-03-26 14:32:35 +11001621 if (!g_flags.compact_output) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001622 TRY(write_dst("\n", 1));
Nigel Taod60815c2020-03-26 14:32:35 +11001623 for (size_t i = 0; i < g_depth; i++) {
1624 TRY(write_dst(
1625 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001626 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao0cd2f982020-03-03 23:03:02 +11001627 }
1628 }
1629 }
1630
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001631 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001632 if (g_query.is_at(g_depth)) {
1633 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001634 case context::in_list_after_bracket:
1635 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001636 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001637 break;
1638 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001639 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001640 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001641 default:
1642 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001643 }
1644 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001645 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001646 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001647 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001648 // There is no next fragment. We have matched the complete query, and
1649 // the upcoming JSON value is the result of that query.
1650 //
Nigel Taod60815c2020-03-26 14:32:35 +11001651 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1652 // we were about to decode a top-level value. This makes any subsequent
1653 // indentation be relative to this point, and we will return g_eod
1654 // after the upcoming JSON value is complete.
1655 if (g_suppress_write_dst != 1) {
1656 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001657 }
Nigel Taod60815c2020-03-26 14:32:35 +11001658 g_suppress_write_dst = 0;
1659 g_ctx = context::none;
1660 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001661 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1662 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1663 // The query has moved on to the next fragment but the upcoming JSON
1664 // value is not a container.
1665 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001666 }
1667 }
1668
1669 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001670 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001671 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001672 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001673 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1674 g_suppress_write_dst++;
Nigel Tao168f60a2020-07-14 13:19:33 +10001675 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001676 TRY(write_dst(
1677 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1678 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001679 } else {
1680 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1681 ? "\x9F"
1682 : "\xBF",
1683 1));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001684 }
Nigel Taod60815c2020-03-26 14:32:35 +11001685 g_depth++;
1686 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1687 ? context::in_list_after_bracket
1688 : context::in_dict_after_brace;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001689 return nullptr;
1690
Nigel Tao2cf76db2020-02-27 22:42:01 +11001691 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Taoee6927f2020-07-27 12:08:33 +10001692 TRY(handle_string(vbd, tok, start_of_token_chain, t.continued()));
Nigel Tao496e88b2020-04-09 22:10:08 +10001693 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001694 return nullptr;
1695 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001696 goto after_value;
1697
1698 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001699 if (!t.continued()) {
1700 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001701 }
1702 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001703 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001704 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001705
Nigel Tao85fba7f2020-02-29 16:28:06 +11001706 case WUFFS_BASE__TOKEN__VBC__LITERAL:
Nigel Tao168f60a2020-07-14 13:19:33 +10001707 TRY(write_literal(vbd));
1708 goto after_value;
1709
Nigel Tao2cf76db2020-02-27 22:42:01 +11001710 case WUFFS_BASE__TOKEN__VBC__NUMBER:
Nigel Taoee6927f2020-07-27 12:08:33 +10001711 TRY(write_number(vbd, tok));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001712 goto after_value;
Nigel Tao4e193592020-07-15 12:48:57 +10001713
Nigel Taoc9d4e342020-07-21 15:20:34 +10001714 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1715 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED: {
1716 bool x_is_signed = vbc == WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED;
1717 uint64_t x = x_is_signed
1718 ? ((uint64_t)(t.value_base_detail__sign_extended()))
1719 : vbd;
Nigel Tao850dc182020-07-21 22:52:04 +10001720 if (t.continued()) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001721 if (tok.len != 0) {
Nigel Tao03a87ea2020-07-21 23:29:26 +10001722 return "main: internal error: unexpected to-be-extended length";
1723 }
Nigel Tao850dc182020-07-21 22:52:04 +10001724 g_token_extension.category = vbc;
1725 g_token_extension.detail = x;
1726 return nullptr;
1727 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001728 TRY(write_inline_integer(x, x_is_signed, tok));
Nigel Tao4e193592020-07-15 12:48:57 +10001729 goto after_value;
Nigel Taoc9d4e342020-07-21 15:20:34 +10001730 }
Nigel Tao1b073492020-02-16 22:11:36 +11001731 }
1732
Nigel Tao850dc182020-07-21 22:52:04 +10001733 int64_t ext = t.value_extension();
1734 if (ext >= 0) {
Nigel Tao27168032020-07-24 13:05:05 +10001735 uint64_t x = (g_token_extension.detail
1736 << WUFFS_BASE__TOKEN__VALUE_EXTENSION__NUM_BITS) |
1737 ((uint64_t)ext);
Nigel Tao850dc182020-07-21 22:52:04 +10001738 switch (g_token_extension.category) {
1739 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED:
1740 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_UNSIGNED:
Nigel Tao850dc182020-07-21 22:52:04 +10001741 TRY(write_inline_integer(
1742 x,
1743 g_token_extension.category ==
1744 WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER_SIGNED,
Nigel Taoee6927f2020-07-27 12:08:33 +10001745 tok));
Nigel Tao850dc182020-07-21 22:52:04 +10001746 g_token_extension.category = 0;
1747 g_token_extension.detail = 0;
1748 goto after_value;
Nigel Tao27168032020-07-24 13:05:05 +10001749 case CATEGORY_CBOR_TAG:
Nigel Taoee6927f2020-07-27 12:08:33 +10001750 TRY(write_cbor_tag(x, tok));
Nigel Tao27168032020-07-24 13:05:05 +10001751 g_token_extension.category = 0;
1752 g_token_extension.detail = 0;
1753 return nullptr;
Nigel Tao850dc182020-07-21 22:52:04 +10001754 }
1755 }
1756
Nigel Tao664f8432020-07-16 21:25:14 +10001757 if (t.value_major() == WUFFS_CBOR__TOKEN_VALUE_MAJOR) {
1758 uint64_t value_minor = t.value_minor();
Nigel Taoc9e20102020-07-24 23:19:12 +10001759 if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__MINUS_1_MINUS_X) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001760 TRY(write_cbor_minus_1_minus_x(tok));
Nigel Taoc9e20102020-07-24 23:19:12 +10001761 goto after_value;
1762 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__SIMPLE_VALUE) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001763 TRY(write_cbor_simple_value(vbd, tok));
Nigel Taoc9e20102020-07-24 23:19:12 +10001764 goto after_value;
1765 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__TAG) {
Nigel Tao27168032020-07-24 13:05:05 +10001766 if (t.continued()) {
Nigel Taoee6927f2020-07-27 12:08:33 +10001767 if (tok.len != 0) {
Nigel Tao27168032020-07-24 13:05:05 +10001768 return "main: internal error: unexpected to-be-extended length";
1769 }
1770 g_token_extension.category = CATEGORY_CBOR_TAG;
1771 g_token_extension.detail = vbd;
1772 return nullptr;
1773 }
Nigel Taoee6927f2020-07-27 12:08:33 +10001774 return write_cbor_tag(vbd, tok);
Nigel Tao664f8432020-07-16 21:25:14 +10001775 }
1776 }
1777
1778 // Return an error if we didn't match the (value_major, value_minor) or
1779 // (vbc, vbd) pair.
Nigel Tao2cf76db2020-02-27 22:42:01 +11001780 return "main: internal error: unexpected token";
1781 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001782
Nigel Tao2cf76db2020-02-27 22:42:01 +11001783 // Book-keeping after completing a value (whether a container value or a
1784 // simple value). Empty parent containers are no longer empty. If the parent
1785 // container is a "{...}" object, toggle between keys and values.
1786after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001787 if (g_depth == 0) {
1788 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001789 }
Nigel Taod60815c2020-03-26 14:32:35 +11001790 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001791 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001792 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001793 break;
1794 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001795 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001796 break;
1797 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001798 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001799 break;
1800 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001801 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001802 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001803 default:
1804 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001805 }
1806 return nullptr;
1807}
1808
1809const char* //
1810main1(int argc, char** argv) {
1811 TRY(initialize_globals(argc, argv));
1812
Nigel Taocd183f92020-07-14 12:11:05 +10001813 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001814 while (true) {
Nigel Tao4e193592020-07-15 12:48:57 +10001815 wuffs_base__status status = g_dec->decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001816 &g_tok, &g_src,
1817 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001818
Nigel Taod60815c2020-03-26 14:32:35 +11001819 while (g_tok.meta.ri < g_tok.meta.wi) {
1820 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao2cf76db2020-02-27 22:42:01 +11001821 uint64_t n = t.length();
Nigel Taod60815c2020-03-26 14:32:35 +11001822 if ((g_src.meta.ri - g_curr_token_end_src_index) < n) {
1823 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001824 }
Nigel Taod60815c2020-03-26 14:32:35 +11001825 g_curr_token_end_src_index += n;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001826
Nigel Taod0b16cb2020-03-14 10:15:54 +11001827 // Skip filler tokens (e.g. whitespace).
Nigel Tao3c8589b2020-07-19 21:49:00 +10001828 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao496e88b2020-04-09 22:10:08 +10001829 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001830 continue;
1831 }
1832
Nigel Tao2ef39992020-04-09 17:24:39 +10001833 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao496e88b2020-04-09 22:10:08 +10001834 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001835 if (z == nullptr) {
1836 continue;
Nigel Taod60815c2020-03-26 14:32:35 +11001837 } else if (z == g_eod) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001838 goto end_of_data;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001839 }
1840 return z;
Nigel Tao1b073492020-02-16 22:11:36 +11001841 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001842
1843 if (status.repr == nullptr) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001844 return "main: internal error: unexpected end of token stream";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001845 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Taod60815c2020-03-26 14:32:35 +11001846 if (g_curr_token_end_src_index != g_src.meta.ri) {
1847 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001848 }
1849 TRY(read_src());
Nigel Taod60815c2020-03-26 14:32:35 +11001850 g_curr_token_end_src_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001851 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001852 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001853 } else {
1854 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001855 }
1856 }
Nigel Tao0cd2f982020-03-03 23:03:02 +11001857end_of_data:
1858
Nigel Taod60815c2020-03-26 14:32:35 +11001859 // With a non-empty g_query, don't try to consume trailing whitespace or
Nigel Tao0cd2f982020-03-03 23:03:02 +11001860 // confirm that we've processed all the tokens.
Nigel Taod60815c2020-03-26 14:32:35 +11001861 if (g_flags.query_c_string && *g_flags.query_c_string) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001862 return nullptr;
1863 }
Nigel Tao6b161af2020-02-24 11:01:48 +11001864
Nigel Tao6b161af2020-02-24 11:01:48 +11001865 // Check that we've exhausted the input.
Nigel Taod60815c2020-03-26 14:32:35 +11001866 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001867 TRY(read_src());
1868 }
Nigel Taod60815c2020-03-26 14:32:35 +11001869 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
Nigel Tao51a38292020-07-19 22:43:17 +10001870 return "main: valid JSON|CBOR followed by further (unexpected) data";
Nigel Tao6b161af2020-02-24 11:01:48 +11001871 }
1872
1873 // Check that we've used all of the decoded tokens, other than trailing
Nigel Tao4b186b02020-03-18 14:25:21 +11001874 // filler tokens. For example, "true\n" is valid JSON (and fully consumed
1875 // with WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE enabled) with a trailing
1876 // filler token for the "\n".
Nigel Taod60815c2020-03-26 14:32:35 +11001877 for (; g_tok.meta.ri < g_tok.meta.wi; g_tok.meta.ri++) {
1878 if (g_tok.data.ptr[g_tok.meta.ri].value_base_category() !=
Nigel Tao6b161af2020-02-24 11:01:48 +11001879 WUFFS_BASE__TOKEN__VBC__FILLER) {
1880 return "main: internal error: decoded OK but unprocessed tokens remain";
1881 }
1882 }
1883
1884 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001885}
1886
Nigel Tao2914bae2020-02-26 09:40:30 +11001887int //
1888compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001889 if (!status_msg) {
1890 return 0;
1891 }
Nigel Tao01abc842020-03-06 21:42:33 +11001892 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001893 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001894 n = strlen(status_msg);
1895 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001896 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001897 if (n >= 2047) {
1898 status_msg = "main: internal error: error message is too long";
1899 n = strnlen(status_msg, 2047);
1900 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001901 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001902 const int stderr_fd = 2;
1903 ignore_return_value(write(stderr_fd, status_msg, n));
1904 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Tao9cc2c252020-02-23 17:05:49 +11001905 // Return an exit code of 1 for regular (forseen) errors, e.g. badly
1906 // formatted or unsupported input.
1907 //
1908 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1909 // run-time checks found that an internal invariant did not hold.
1910 //
1911 // Automated testing, including badly formatted inputs, can therefore
1912 // discriminate between expected failure (exit code 1) and unexpected failure
1913 // (other non-zero exit codes). Specifically, exit code 2 for internal
1914 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1915 // linux) for a segmentation fault (e.g. null pointer dereference).
1916 return strstr(status_msg, "internal error:") ? 2 : 1;
1917}
1918
Nigel Tao2914bae2020-02-26 09:40:30 +11001919int //
1920main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001921 // Look for an input filename (the first non-flag argument) in argv. If there
1922 // is one, open it (but do not read from it) before we self-impose a sandbox.
1923 //
1924 // Flags start with "-", unless it comes after a bare "--" arg.
1925 {
1926 bool dash_dash = false;
1927 int a;
1928 for (a = 1; a < argc; a++) {
1929 char* arg = argv[a];
1930 if ((arg[0] == '-') && !dash_dash) {
1931 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1932 continue;
1933 }
Nigel Taod60815c2020-03-26 14:32:35 +11001934 g_input_file_descriptor = open(arg, O_RDONLY);
1935 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001936 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1937 return 1;
1938 }
1939 break;
1940 }
1941 }
1942
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001943#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1944 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001945 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001946#endif
1947
Nigel Tao0cd2f982020-03-03 23:03:02 +11001948 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001949 if (g_wrote_to_dst) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001950 const char* z1 = (g_flags.output_format == file_format::json)
1951 ? write_dst("\n", 1)
1952 : nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001953 const char* z2 = flush_dst();
1954 z = z ? z : (z1 ? z1 : z2);
1955 }
1956 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001957
1958#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1959 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1960 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1961 // only SYS_exit.
1962 syscall(SYS_exit, exit_code);
1963#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001964 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001965}