blob: ddafad0d1162b44e3da183dfe191e04fb711a0db [file] [log] [blame]
Nigel Tao1b073492020-02-16 22:11:36 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
17/*
Nigel Tao0cd2f982020-03-03 23:03:02 +110018jsonptr is a JSON formatter (pretty-printer) that supports the JSON Pointer
Nigel Tao168f60a2020-07-14 13:19:33 +100019(RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin and writes CBOR
20or canonicalized, formatted UTF-8 JSON to stdout.
Nigel Tao0cd2f982020-03-03 23:03:02 +110021
Nigel Taod60815c2020-03-26 14:32:35 +110022See the "const char* g_usage" string below for details.
Nigel Tao0cd2f982020-03-03 23:03:02 +110023
24----
25
26JSON Pointer (and this program's implementation) is one of many JSON query
27languages and JSON tools, such as jq, jql and JMESPath. This one is relatively
28simple and fewer-featured compared to those others.
29
Nigel Tao168f60a2020-07-14 13:19:33 +100030One benefit of simplicity is that this program's CBOR, JSON and JSON Pointer
Nigel Tao0cd2f982020-03-03 23:03:02 +110031implementations do not dynamically allocate or free memory (yet it does not
32require that the entire input fits in memory at once). They are therefore
33trivially protected against certain bug classes: memory leaks, double-frees and
34use-after-frees.
35
Nigel Tao168f60a2020-07-14 13:19:33 +100036The CBOR and JSON implementations are also written in the Wuffs programming
37language (and then transpiled to C/C++), which is memory-safe (e.g. array
38indexing is bounds-checked) but also prevents integer arithmetic overflows.
Nigel Tao0cd2f982020-03-03 23:03:02 +110039
Nigel Taofe0cbbd2020-03-05 22:01:30 +110040For defense in depth, on Linux, this program also self-imposes a
41SECCOMP_MODE_STRICT sandbox before reading (or otherwise processing) its input
42or writing its output. Under this sandbox, the only permitted system calls are
43read, write, exit and sigreturn.
44
Nigel Tao168f60a2020-07-14 13:19:33 +100045All together, this program aims to safely handle untrusted CBOR or JSON files
46without fear of security bugs such as remote code execution.
Nigel Tao0cd2f982020-03-03 23:03:02 +110047
48----
Nigel Tao1b073492020-02-16 22:11:36 +110049
Nigel Taoc5b3a9e2020-02-24 11:54:35 +110050As of 2020-02-24, this program passes all 318 "test_parsing" cases from the
51JSON test suite (https://github.com/nst/JSONTestSuite), an appendix to the
52"Parsing JSON is a Minefield" article (http://seriot.ch/parsing_json.php) that
53was first published on 2016-10-26 and updated on 2018-03-30.
54
Nigel Tao0cd2f982020-03-03 23:03:02 +110055After modifying this program, run "build-example.sh example/jsonptr/" and then
56"script/run-json-test-suite.sh" to catch correctness regressions.
57
58----
59
Nigel Taod0b16cb2020-03-14 10:15:54 +110060This program uses Wuffs' JSON decoder at a relatively low level, processing the
61decoder's token-stream output individually. The core loop, in pseudo-code, is
62"for_each_token { handle_token(etc); }", where the handle_token function
Nigel Taod60815c2020-03-26 14:32:35 +110063changes global state (e.g. the `g_depth` and `g_ctx` variables) and prints
Nigel Taod0b16cb2020-03-14 10:15:54 +110064output text based on that state and the token's source text. Notably,
65handle_token is not recursive, even though JSON values can nest.
66
67This approach is centered around JSON tokens. Each JSON 'thing' (e.g. number,
68string, object) comprises one or more JSON tokens.
69
70An alternative, higher-level approach is in the sibling example/jsonfindptrs
71program. Neither approach is better or worse per se, but when studying this
72program, be aware that there are multiple ways to use Wuffs' JSON decoder.
73
74The two programs, jsonfindptrs and jsonptr, also demonstrate different
75trade-offs with regard to JSON object duplicate keys. The JSON spec permits
76different implementations to allow or reject duplicate keys. It is not always
77clear which approach is safer. Rejecting them is certainly unambiguous, and
78security bugs can lurk in ambiguous corners of a file format, if two different
79implementations both silently accept a file but differ on how to interpret it.
80On the other hand, in the worst case, detecting duplicate keys requires O(N)
81memory, where N is the size of the (potentially untrusted) input.
82
83This program (jsonptr) allows duplicate keys and requires only O(1) memory. As
84mentioned above, it doesn't dynamically allocate memory at all, and on Linux,
85it runs in a SECCOMP_MODE_STRICT sandbox.
86
87----
88
Nigel Tao1b073492020-02-16 22:11:36 +110089This example program differs from most other example Wuffs programs in that it
90is written in C++, not C.
91
92$CXX jsonptr.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
93
94for a C++ compiler $CXX, such as clang++ or g++.
95*/
96
Nigel Tao721190a2020-04-03 22:25:21 +110097#if defined(__cplusplus) && (__cplusplus < 201103L)
98#error "This C++ program requires -std=c++11 or later"
99#endif
100
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100101#include <errno.h>
Nigel Tao01abc842020-03-06 21:42:33 +1100102#include <fcntl.h>
103#include <stdio.h>
Nigel Tao9cc2c252020-02-23 17:05:49 +1100104#include <string.h>
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100105#include <unistd.h>
Nigel Tao1b073492020-02-16 22:11:36 +1100106
107// Wuffs ships as a "single file C library" or "header file library" as per
108// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
109//
110// To use that single file as a "foo.c"-like implementation, instead of a
111// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
112// compiling it.
113#define WUFFS_IMPLEMENTATION
114
115// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
116// release/c/etc.c whitelist which parts of Wuffs to build. That file contains
117// the entire Wuffs standard library, implementing a variety of codecs and file
118// formats. Without this macro definition, an optimizing compiler or linker may
119// very well discard Wuffs code for unused codecs, but listing the Wuffs
120// modules we use makes that process explicit. Preprocessing means that such
121// code simply isn't compiled.
122#define WUFFS_CONFIG__MODULES
123#define WUFFS_CONFIG__MODULE__BASE
Nigel Tao4e193592020-07-15 12:48:57 +1000124#define WUFFS_CONFIG__MODULE__CBOR
Nigel Tao1b073492020-02-16 22:11:36 +1100125#define WUFFS_CONFIG__MODULE__JSON
126
127// If building this program in an environment that doesn't easily accommodate
128// relative includes, you can use the script/inline-c-relative-includes.go
129// program to generate a stand-alone C++ file.
130#include "../../release/c/wuffs-unsupported-snapshot.c"
131
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100132#if defined(__linux__)
133#include <linux/prctl.h>
134#include <linux/seccomp.h>
135#include <sys/prctl.h>
136#include <sys/syscall.h>
137#define WUFFS_EXAMPLE_USE_SECCOMP
138#endif
139
Nigel Tao2cf76db2020-02-27 22:42:01 +1100140#define TRY(error_msg) \
141 do { \
142 const char* z = error_msg; \
143 if (z) { \
144 return z; \
145 } \
146 } while (false)
147
Nigel Taod60815c2020-03-26 14:32:35 +1100148static const char* g_eod = "main: end of data";
Nigel Tao2cf76db2020-02-27 22:42:01 +1100149
Nigel Taod60815c2020-03-26 14:32:35 +1100150static const char* g_usage =
Nigel Tao01abc842020-03-06 21:42:33 +1100151 "Usage: jsonptr -flags input.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100152 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100153 "Flags:\n"
Nigel Tao3690e832020-03-12 16:52:26 +1100154 " -c -compact-output\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100155 " -d=NUM -max-output-depth=NUM\n"
Nigel Tao4e193592020-07-15 12:48:57 +1000156 " -i=FMT -input-format={json,cbor}\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000157 " -o=FMT -output-format={json,cbor}\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100158 " -q=STR -query=STR\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000159 " -s=NUM -spaces=NUM\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100160 " -t -tabs\n"
161 " -fail-if-unsandboxed\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000162 " -input-allow-json-comments\n"
163 " -input-allow-json-extra-comma\n"
164 " -output-cbor-metadata-as-json-comments\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000165 " -output-json-extra-comma\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000166 " -strict-json-pointer-syntax\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100167 "\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100168 "The input.json filename is optional. If absent, it reads from stdin.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100169 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100170 "----\n"
171 "\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100172 "jsonptr is a JSON formatter (pretty-printer) that supports the JSON\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000173 "Pointer (RFC 6901) query syntax. It reads CBOR or UTF-8 JSON from stdin\n"
174 "and writes CBOR or canonicalized, formatted UTF-8 JSON to stdout. The\n"
175 "input and output formats do not have to match, but conversion between\n"
176 "formats may be lossy.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100177 "\n"
178 "Canonicalized means that e.g. \"abc\\u000A\\tx\\u0177z\" is re-written\n"
179 "as \"abc\\n\\txÅ·z\". It does not sort object keys, nor does it reject\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100180 "duplicate keys. Canonicalization does not imply Unicode normalization.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100181 "\n"
182 "Formatted means that arrays' and objects' elements are indented, each\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000183 "on its own line. Configure this with the -c / -compact-output, -s=NUM /\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000184 "-spaces=NUM (for NUM ranging from 0 to 8) and -t / -tabs flags. Those\n"
185 "flags only apply to JSON (not CBOR) output.\n"
186 "\n"
187 "The -input-format and -output-format flags select between reading and\n"
188 "writing JSON (the default, a textual format) or CBOR (a binary format).\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100189 "\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000190 "The -input-allow-json-comments flag allows \"/*slash-star*/\" and\n"
191 "\"//slash-slash\" C-style comments within JSON input.\n"
192 "\n"
193 "The -input-allow-json-extra-comma flag allows input like \"[1,2,]\",\n"
194 "with a comma after the final element of a JSON list or dictionary.\n"
195 "\n"
196 "The -output-cbor-metadata-as-json-comments writes CBOR tags and other\n"
197 "metadata as /*comments*/, when -i=json and -o=cbor are also set. Such\n"
198 "comments are non-compliant with the JSON specification but many parsers\n"
199 "accept them.\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000200 "\n"
201 "The -output-json-extra-comma flag writes extra commas, regardless of\n"
202 "whether the input had it. Extra commas are non-compliant with the JSON\n"
Nigel Tao3c8589b2020-07-19 21:49:00 +1000203 "specification but many parsers accept them and they can produce simpler\n"
Nigel Taoc766bb72020-07-09 12:59:32 +1000204 "line-based diffs. This flag is ignored when -compact-output is set.\n"
205 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100206 "----\n"
207 "\n"
208 "The -q=STR or -query=STR flag gives an optional JSON Pointer query, to\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100209 "print a subset of the input. For example, given RFC 6901 section 5's\n"
Nigel Tao01abc842020-03-06 21:42:33 +1100210 "sample input (https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
211 " jsonptr -query=/foo/1 rfc-6901-json-pointer.json\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100212 "will print:\n"
213 " \"baz\"\n"
214 "\n"
215 "An absent query is equivalent to the empty query, which identifies the\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100216 "entire input (the root value). Unlike a file system, the \"/\" query\n"
Nigel Taod0b16cb2020-03-14 10:15:54 +1100217 "does not identify the root. Instead, \"\" is the root and \"/\" is the\n"
218 "child (the value in a key-value pair) of the root whose key is the empty\n"
219 "string. Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100220 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000221 "If the query found a valid JSON|CBOR value, this program will return a\n"
222 "zero exit code even if the rest of the input isn't valid. If the query\n"
Nigel Tao0cd2f982020-03-03 23:03:02 +1100223 "did not find a value, or found an invalid one, this program returns a\n"
224 "non-zero exit code, but may still print partial output to stdout.\n"
225 "\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000226 "The JSON and CBOR specifications (https://json.org/ or RFC 8259; RFC\n"
227 "7049) permit implementations to allow duplicate keys, as this one does.\n"
228 "This JSON Pointer implementation is also greedy, following the first\n"
229 "match for each fragment without back-tracking. For example, the\n"
230 "\"/foo/bar\" query will fail if the root object has multiple \"foo\"\n"
231 "children but the first one doesn't have a \"bar\" child, even if later\n"
232 "ones do.\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100233 "\n"
Nigel Taoecadf722020-07-13 08:22:34 +1000234 "The -strict-json-pointer-syntax flag restricts the -query=STR string to\n"
235 "exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\" for\n"
236 "\"~\" and \"/\". Without this flag, this program also lets \"~n\" and\n"
237 "\"~r\" escape the New Line and Carriage Return ASCII control characters,\n"
238 "which can work better with line oriented Unix tools that assume exactly\n"
239 "one value (i.e. one JSON Pointer string) per line.\n"
Nigel Taod6fdfb12020-03-11 12:24:14 +1100240 "\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100241 "----\n"
242 "\n"
Nigel Tao94440cf2020-04-02 22:28:24 +1100243 "The -d=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000244 "output depth. JSON|CBOR containers ([] arrays and {} objects) can hold\n"
245 "other containers. When this flag is set, containers at depth NUM are\n"
246 "replaced with \"[…]\" or \"{…}\". A bare -d or -max-output-depth is\n"
247 "equivalent to -d=1. The flag's absence means an unlimited output depth.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100248 "\n"
249 "The -max-output-depth flag only affects the program's output. It doesn't\n"
Nigel Tao168f60a2020-07-14 13:19:33 +1000250 "affect whether or not the input is considered valid JSON|CBOR. The\n"
251 "format specifications permit implementations to set their own maximum\n"
252 "input depth. This JSON|CBOR implementation sets it to 1024.\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100253 "\n"
254 "Depth is measured in terms of nested containers. It is unaffected by the\n"
255 "number of spaces or tabs used to indent.\n"
256 "\n"
257 "When both -max-output-depth and -query are set, the output depth is\n"
258 "measured from when the query resolves, not from the input root. The\n"
259 "input depth (measured from the root) is still limited to 1024.\n"
260 "\n"
261 "----\n"
262 "\n"
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100263 "The -fail-if-unsandboxed flag causes the program to exit if it does not\n"
264 "self-impose a sandbox. On Linux, it self-imposes a SECCOMP_MODE_STRICT\n"
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100265 "sandbox, regardless of whether this flag was set.";
Nigel Tao0cd2f982020-03-03 23:03:02 +1100266
Nigel Tao2cf76db2020-02-27 22:42:01 +1100267// ----
268
Nigel Taof3146c22020-03-26 08:47:42 +1100269// Wuffs allows either statically or dynamically allocated work buffers. This
270// program exercises static allocation.
271#define WORK_BUFFER_ARRAY_SIZE \
272 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
273#if WORK_BUFFER_ARRAY_SIZE > 0
Nigel Taod60815c2020-03-26 14:32:35 +1100274uint8_t g_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100275#else
276// Not all C/C++ compilers support 0-length arrays.
Nigel Taod60815c2020-03-26 14:32:35 +1100277uint8_t g_work_buffer_array[1];
Nigel Taof3146c22020-03-26 08:47:42 +1100278#endif
279
Nigel Taod60815c2020-03-26 14:32:35 +1100280bool g_sandboxed = false;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100281
Nigel Taod60815c2020-03-26 14:32:35 +1100282int g_input_file_descriptor = 0; // A 0 default means stdin.
Nigel Tao01abc842020-03-06 21:42:33 +1100283
Nigel Tao2cf76db2020-02-27 22:42:01 +1100284#define MAX_INDENT 8
Nigel Tao107f0ef2020-03-01 21:35:02 +1100285#define INDENT_SPACES_STRING " "
Nigel Tao6e7d1412020-03-06 09:21:35 +1100286#define INDENT_TAB_STRING "\t"
Nigel Tao107f0ef2020-03-01 21:35:02 +1100287
Nigel Taofdac24a2020-03-06 21:53:08 +1100288#ifndef DST_BUFFER_ARRAY_SIZE
289#define DST_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100290#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100291#ifndef SRC_BUFFER_ARRAY_SIZE
292#define SRC_BUFFER_ARRAY_SIZE (32 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100293#endif
Nigel Taofdac24a2020-03-06 21:53:08 +1100294#ifndef TOKEN_BUFFER_ARRAY_SIZE
295#define TOKEN_BUFFER_ARRAY_SIZE (4 * 1024)
Nigel Tao1b073492020-02-16 22:11:36 +1100296#endif
297
Nigel Taod60815c2020-03-26 14:32:35 +1100298uint8_t g_dst_array[DST_BUFFER_ARRAY_SIZE];
299uint8_t g_src_array[SRC_BUFFER_ARRAY_SIZE];
300wuffs_base__token g_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
Nigel Tao1b073492020-02-16 22:11:36 +1100301
Nigel Taod60815c2020-03-26 14:32:35 +1100302wuffs_base__io_buffer g_dst;
303wuffs_base__io_buffer g_src;
304wuffs_base__token_buffer g_tok;
Nigel Tao1b073492020-02-16 22:11:36 +1100305
Nigel Taod60815c2020-03-26 14:32:35 +1100306// g_curr_token_end_src_index is the g_src.data.ptr index of the end of the
307// current token. An invariant is that (g_curr_token_end_src_index <=
308// g_src.meta.ri).
309size_t g_curr_token_end_src_index;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100310
Nigel Taod60815c2020-03-26 14:32:35 +1100311uint32_t g_depth;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100312
313enum class context {
314 none,
315 in_list_after_bracket,
316 in_list_after_value,
317 in_dict_after_brace,
318 in_dict_after_key,
319 in_dict_after_value,
Nigel Taod60815c2020-03-26 14:32:35 +1100320} g_ctx;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100321
Nigel Tao0cd2f982020-03-03 23:03:02 +1100322bool //
323in_dict_before_key() {
Nigel Taod60815c2020-03-26 14:32:35 +1100324 return (g_ctx == context::in_dict_after_brace) ||
325 (g_ctx == context::in_dict_after_value);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100326}
327
Nigel Taod60815c2020-03-26 14:32:35 +1100328uint32_t g_suppress_write_dst;
329bool g_wrote_to_dst;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100330
Nigel Tao4e193592020-07-15 12:48:57 +1000331wuffs_cbor__decoder g_cbor_decoder;
332wuffs_json__decoder g_json_decoder;
333wuffs_base__token_decoder* g_dec;
Nigel Tao1b073492020-02-16 22:11:36 +1100334
Nigel Tao168f60a2020-07-14 13:19:33 +1000335// cbor_output_string_array is a 4 KiB buffer. For -output-format=cbor, strings
336// whose length are 4096 or less are written as a single definite-length
337// string. Longer strings are written as an indefinite-length string containing
338// multiple definite-length chunks, each of length up to 4 KiB. See the CBOR
339// RFC (RFC 7049) section 2.2.2 "Indefinite-Length Byte Strings and Text
340// Strings". The output is determinate even when the input is streamed.
341//
342// If raising CBOR_OUTPUT_STRING_ARRAY_SIZE above 0xFFFF then you will also
343// have to update flush_cbor_output_string.
344#define CBOR_OUTPUT_STRING_ARRAY_SIZE 4096
345uint8_t g_cbor_output_string_array[CBOR_OUTPUT_STRING_ARRAY_SIZE];
346
347uint32_t g_cbor_output_string_length;
348bool g_cbor_output_string_is_multiple_chunks;
349bool g_cbor_output_string_is_utf_8;
350
Nigel Tao0cd2f982020-03-03 23:03:02 +1100351// ----
352
353// Query is a JSON Pointer query. After initializing with a NUL-terminated C
354// string, its multiple fragments are consumed as the program walks the JSON
355// data from stdin. For example, letting "$" denote a NUL, suppose that we
356// started with a query string of "/apple/banana/12/durian" and are currently
Nigel Taob48ee752020-03-13 09:27:33 +1100357// trying to match the second fragment, "banana", so that Query::m_depth is 2:
Nigel Tao0cd2f982020-03-03 23:03:02 +1100358//
359// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
360// / a p p l e / b a n a n a / 1 2 / d u r i a n $
361// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
362// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100363// m_frag_i m_frag_k
Nigel Tao0cd2f982020-03-03 23:03:02 +1100364//
Nigel Taob48ee752020-03-13 09:27:33 +1100365// The two pointers m_frag_i and m_frag_k (abbreviated as mfi and mfk) are the
366// start (inclusive) and end (exclusive) of the query fragment. They satisfy
367// (mfi <= mfk) and may be equal if the fragment empty (note that "" is a valid
368// JSON object key).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100369//
Nigel Taob48ee752020-03-13 09:27:33 +1100370// The m_frag_j (mfj) pointer moves between these two, or is nullptr. An
371// invariant is that (((mfi <= mfj) && (mfj <= mfk)) || (mfj == nullptr)).
Nigel Tao0cd2f982020-03-03 23:03:02 +1100372//
373// Wuffs' JSON tokenizer can portray a single JSON string as multiple Wuffs
374// tokens, as backslash-escaped values within that JSON string may each get
375// their own token.
376//
Nigel Taob48ee752020-03-13 09:27:33 +1100377// At the start of each object key (a JSON string), mfj is set to mfi.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100378//
Nigel Taob48ee752020-03-13 09:27:33 +1100379// While mfj remains non-nullptr, each token's unescaped contents are then
380// compared to that part of the fragment from mfj to mfk. If it is a prefix
381// (including the case of an exact match), then mfj is advanced by the
382// unescaped length. Otherwise, mfj is set to nullptr.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100383//
384// Comparison accounts for JSON Pointer's escaping notation: "~0" and "~1" in
385// the query (not the JSON value) are unescaped to "~" and "/" respectively.
Nigel Taob48ee752020-03-13 09:27:33 +1100386// "~n" and "~r" are also unescaped to "\n" and "\r". The program is
387// responsible for calling Query::validate (with a strict_json_pointer_syntax
388// argument) before otherwise using this class.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100389//
Nigel Taob48ee752020-03-13 09:27:33 +1100390// The mfj pointer therefore advances from mfi to mfk, or drops out, as we
391// incrementally match the object key with the query fragment. For example, if
392// we have already matched the "ban" of "banana", then we would accept any of
393// an "ana" token, an "a" token or a "\u0061" token, amongst others. They would
394// advance mfj by 3, 1 or 1 bytes respectively.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100395//
Nigel Taob48ee752020-03-13 09:27:33 +1100396// mfj
Nigel Tao0cd2f982020-03-03 23:03:02 +1100397// v
398// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
399// / a p p l e / b a n a n a / 1 2 / d u r i a n $
400// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
401// ^ ^
Nigel Taob48ee752020-03-13 09:27:33 +1100402// mfi mfk
Nigel Tao0cd2f982020-03-03 23:03:02 +1100403//
404// At the end of each object key (or equivalently, at the start of each object
Nigel Taob48ee752020-03-13 09:27:33 +1100405// value), if mfj is non-nullptr and equal to (but not less than) mfk then we
406// have a fragment match: the query fragment equals the object key. If there is
407// a next fragment (in this example, "12") we move the frag_etc pointers to its
408// start and end and increment Query::m_depth. Otherwise, we have matched the
409// complete query, and the upcoming JSON value is the result of that query.
Nigel Tao0cd2f982020-03-03 23:03:02 +1100410//
411// The discussion above centers on object keys. If the query fragment is
412// numeric then it can also match as an array index: the string fragment "12"
413// will match an array's 13th element (starting counting from zero). See RFC
414// 6901 for its precise definition of an "array index" number.
415//
Nigel Taob48ee752020-03-13 09:27:33 +1100416// Array index fragment match is represented by the Query::m_array_index field,
Nigel Tao0cd2f982020-03-03 23:03:02 +1100417// whose type (wuffs_base__result_u64) is a result type. An error result means
418// that the fragment is not an array index. A value result holds the number of
419// list elements remaining. When matching a query fragment in an array (instead
420// of in an object), each element ticks this number down towards zero. At zero,
421// the upcoming JSON value is the one that matches the query fragment.
422class Query {
423 private:
Nigel Taob48ee752020-03-13 09:27:33 +1100424 uint8_t* m_frag_i;
425 uint8_t* m_frag_j;
426 uint8_t* m_frag_k;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100427
Nigel Taob48ee752020-03-13 09:27:33 +1100428 uint32_t m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100429
Nigel Taob48ee752020-03-13 09:27:33 +1100430 wuffs_base__result_u64 m_array_index;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100431
432 public:
433 void reset(char* query_c_string) {
Nigel Taob48ee752020-03-13 09:27:33 +1100434 m_frag_i = (uint8_t*)query_c_string;
435 m_frag_j = (uint8_t*)query_c_string;
436 m_frag_k = (uint8_t*)query_c_string;
437 m_depth = 0;
438 m_array_index.status.repr = "#main: not an array index query fragment";
439 m_array_index.value = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100440 }
441
Nigel Taob48ee752020-03-13 09:27:33 +1100442 void restart_fragment(bool enable) { m_frag_j = enable ? m_frag_i : nullptr; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100443
Nigel Taob48ee752020-03-13 09:27:33 +1100444 bool is_at(uint32_t depth) { return m_depth == depth; }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100445
446 // tick returns whether the fragment is a valid array index whose value is
447 // zero. If valid but non-zero, it decrements it and returns false.
448 bool tick() {
Nigel Taob48ee752020-03-13 09:27:33 +1100449 if (m_array_index.status.is_ok()) {
450 if (m_array_index.value == 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100451 return true;
452 }
Nigel Taob48ee752020-03-13 09:27:33 +1100453 m_array_index.value--;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100454 }
455 return false;
456 }
457
458 // next_fragment moves to the next fragment, returning whether it existed.
459 bool next_fragment() {
Nigel Taob48ee752020-03-13 09:27:33 +1100460 uint8_t* k = m_frag_k;
461 uint32_t d = m_depth;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100462
463 this->reset(nullptr);
464
465 if (!k || (*k != '/')) {
466 return false;
467 }
468 k++;
469
470 bool all_digits = true;
471 uint8_t* i = k;
472 while ((*k != '\x00') && (*k != '/')) {
473 all_digits = all_digits && ('0' <= *k) && (*k <= '9');
474 k++;
475 }
Nigel Taob48ee752020-03-13 09:27:33 +1100476 m_frag_i = i;
477 m_frag_j = i;
478 m_frag_k = k;
479 m_depth = d + 1;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100480 if (all_digits) {
481 // wuffs_base__parse_number_u64 rejects leading zeroes, e.g. "00", "07".
Nigel Tao6b7ce302020-07-07 16:19:46 +1000482 m_array_index = wuffs_base__parse_number_u64(
483 wuffs_base__make_slice_u8(i, k - i),
484 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100485 }
486 return true;
487 }
488
Nigel Taob48ee752020-03-13 09:27:33 +1100489 bool matched_all() { return m_frag_k == nullptr; }
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100490
Nigel Taob48ee752020-03-13 09:27:33 +1100491 bool matched_fragment() { return m_frag_j && (m_frag_j == m_frag_k); }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100492
493 void incremental_match_slice(uint8_t* ptr, size_t len) {
Nigel Taob48ee752020-03-13 09:27:33 +1100494 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100495 return;
496 }
Nigel Taob48ee752020-03-13 09:27:33 +1100497 uint8_t* j = m_frag_j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100498 while (true) {
499 if (len == 0) {
Nigel Taob48ee752020-03-13 09:27:33 +1100500 m_frag_j = j;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100501 return;
502 }
503
504 if (*j == '\x00') {
505 break;
506
507 } else if (*j == '~') {
508 j++;
509 if (*j == '0') {
510 if (*ptr != '~') {
511 break;
512 }
513 } else if (*j == '1') {
514 if (*ptr != '/') {
515 break;
516 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100517 } else if (*j == 'n') {
518 if (*ptr != '\n') {
519 break;
520 }
521 } else if (*j == 'r') {
522 if (*ptr != '\r') {
523 break;
524 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100525 } else {
526 break;
527 }
528
529 } else if (*j != *ptr) {
530 break;
531 }
532
533 j++;
534 ptr++;
535 len--;
536 }
Nigel Taob48ee752020-03-13 09:27:33 +1100537 m_frag_j = nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100538 }
539
540 void incremental_match_code_point(uint32_t code_point) {
Nigel Taob48ee752020-03-13 09:27:33 +1100541 if (!m_frag_j) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100542 return;
543 }
544 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
545 size_t n = wuffs_base__utf_8__encode(
546 wuffs_base__make_slice_u8(&u[0],
547 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
548 code_point);
549 if (n > 0) {
550 this->incremental_match_slice(&u[0], n);
551 }
552 }
553
554 // validate returns whether the (ptr, len) arguments form a valid JSON
555 // Pointer. In particular, it must be valid UTF-8, and either be empty or
556 // start with a '/'. Any '~' within must immediately be followed by either
Nigel Taod6fdfb12020-03-11 12:24:14 +1100557 // '0' or '1'. If strict_json_pointer_syntax is false, a '~' may also be
558 // followed by either 'n' or 'r'.
559 static bool validate(char* query_c_string,
560 size_t length,
561 bool strict_json_pointer_syntax) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100562 if (length <= 0) {
563 return true;
564 }
565 if (query_c_string[0] != '/') {
566 return false;
567 }
568 wuffs_base__slice_u8 s =
569 wuffs_base__make_slice_u8((uint8_t*)query_c_string, length);
570 bool previous_was_tilde = false;
571 while (s.len > 0) {
572 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next(s);
573 if (!o.is_valid()) {
574 return false;
575 }
Nigel Taod6fdfb12020-03-11 12:24:14 +1100576
577 if (previous_was_tilde) {
578 switch (o.code_point) {
579 case '0':
580 case '1':
581 break;
582 case 'n':
583 case 'r':
584 if (strict_json_pointer_syntax) {
585 return false;
586 }
587 break;
588 default:
589 return false;
590 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100591 }
592 previous_was_tilde = o.code_point == '~';
Nigel Taod6fdfb12020-03-11 12:24:14 +1100593
Nigel Tao0cd2f982020-03-03 23:03:02 +1100594 s.ptr += o.byte_length;
595 s.len -= o.byte_length;
596 }
597 return !previous_was_tilde;
598 }
Nigel Taod60815c2020-03-26 14:32:35 +1100599} g_query;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100600
601// ----
602
Nigel Tao168f60a2020-07-14 13:19:33 +1000603enum class file_format {
604 json,
605 cbor,
606};
607
Nigel Tao68920952020-03-03 11:25:18 +1100608struct {
609 int remaining_argc;
610 char** remaining_argv;
611
Nigel Tao3690e832020-03-12 16:52:26 +1100612 bool compact_output;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100613 bool fail_if_unsandboxed;
Nigel Tao4e193592020-07-15 12:48:57 +1000614 file_format input_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000615 bool input_allow_json_comments;
616 bool input_allow_json_extra_comma;
Nigel Tao52c4d6a2020-03-08 21:12:38 +1100617 uint32_t max_output_depth;
Nigel Tao168f60a2020-07-14 13:19:33 +1000618 file_format output_format;
Nigel Tao3c8589b2020-07-19 21:49:00 +1000619 bool output_cbor_metadata_as_json_comments;
Nigel Taoc766bb72020-07-09 12:59:32 +1000620 bool output_json_extra_comma;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100621 char* query_c_string;
Nigel Taoecadf722020-07-13 08:22:34 +1000622 size_t spaces;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100623 bool strict_json_pointer_syntax;
Nigel Tao68920952020-03-03 11:25:18 +1100624 bool tabs;
Nigel Taod60815c2020-03-26 14:32:35 +1100625} g_flags = {0};
Nigel Tao68920952020-03-03 11:25:18 +1100626
627const char* //
628parse_flags(int argc, char** argv) {
Nigel Taoecadf722020-07-13 08:22:34 +1000629 g_flags.spaces = 4;
Nigel Taod60815c2020-03-26 14:32:35 +1100630 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Tao68920952020-03-03 11:25:18 +1100631
632 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
633 for (; c < argc; c++) {
634 char* arg = argv[c];
635 if (*arg++ != '-') {
636 break;
637 }
638
639 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
640 // cases, a bare "-" is not a flag (some programs may interpret it as
641 // stdin) and a bare "--" means to stop parsing flags.
642 if (*arg == '\x00') {
643 break;
644 } else if (*arg == '-') {
645 arg++;
646 if (*arg == '\x00') {
647 c++;
648 break;
649 }
650 }
651
Nigel Tao3690e832020-03-12 16:52:26 +1100652 if (!strcmp(arg, "c") || !strcmp(arg, "compact-output")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100653 g_flags.compact_output = true;
Nigel Tao68920952020-03-03 11:25:18 +1100654 continue;
655 }
Nigel Tao94440cf2020-04-02 22:28:24 +1100656 if (!strcmp(arg, "d") || !strcmp(arg, "max-output-depth")) {
657 g_flags.max_output_depth = 1;
658 continue;
659 } else if (!strncmp(arg, "d=", 2) ||
660 !strncmp(arg, "max-output-depth=", 16)) {
661 while (*arg++ != '=') {
662 }
663 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
Nigel Tao6b7ce302020-07-07 16:19:46 +1000664 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)),
665 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
Nigel Taoaf757722020-07-18 17:27:11 +1000666 if (u.status.is_ok() && (u.value <= 0xFFFFFFFF)) {
Nigel Tao94440cf2020-04-02 22:28:24 +1100667 g_flags.max_output_depth = (uint32_t)(u.value);
668 continue;
669 }
670 return g_usage;
671 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100672 if (!strcmp(arg, "fail-if-unsandboxed")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100673 g_flags.fail_if_unsandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100674 continue;
675 }
Nigel Tao4e193592020-07-15 12:48:57 +1000676 if (!strcmp(arg, "i=cbor") || !strcmp(arg, "input-format=cbor")) {
677 g_flags.input_format = file_format::cbor;
678 continue;
679 }
680 if (!strcmp(arg, "i=json") || !strcmp(arg, "input-format=json")) {
681 g_flags.input_format = file_format::json;
682 continue;
683 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000684 if (!strcmp(arg, "input-allow-json-comments")) {
685 g_flags.input_allow_json_comments = true;
686 continue;
687 }
688 if (!strcmp(arg, "input-allow-json-extra-comma")) {
689 g_flags.input_allow_json_extra_comma = true;
Nigel Taoc766bb72020-07-09 12:59:32 +1000690 continue;
691 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000692 if (!strcmp(arg, "o=cbor") || !strcmp(arg, "output-format=cbor")) {
693 g_flags.output_format = file_format::cbor;
694 continue;
695 }
696 if (!strcmp(arg, "o=json") || !strcmp(arg, "output-format=json")) {
697 g_flags.output_format = file_format::json;
698 continue;
699 }
Nigel Tao3c8589b2020-07-19 21:49:00 +1000700 if (!strcmp(arg, "output-cbor-metadata-as-json-comments")) {
701 g_flags.output_cbor_metadata_as_json_comments = true;
702 continue;
703 }
Nigel Taoc766bb72020-07-09 12:59:32 +1000704 if (!strcmp(arg, "output-json-extra-comma")) {
705 g_flags.output_json_extra_comma = true;
706 continue;
707 }
Nigel Tao0cd2f982020-03-03 23:03:02 +1100708 if (!strncmp(arg, "q=", 2) || !strncmp(arg, "query=", 6)) {
709 while (*arg++ != '=') {
710 }
Nigel Taod60815c2020-03-26 14:32:35 +1100711 g_flags.query_c_string = arg;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100712 continue;
713 }
Nigel Taoecadf722020-07-13 08:22:34 +1000714 if (!strncmp(arg, "s=", 2) || !strncmp(arg, "spaces=", 7)) {
715 while (*arg++ != '=') {
716 }
717 if (('0' <= arg[0]) && (arg[0] <= '8') && (arg[1] == '\x00')) {
718 g_flags.spaces = arg[0] - '0';
719 continue;
720 }
721 return g_usage;
722 }
723 if (!strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100724 g_flags.strict_json_pointer_syntax = true;
Nigel Taod6fdfb12020-03-11 12:24:14 +1100725 continue;
Nigel Tao68920952020-03-03 11:25:18 +1100726 }
727 if (!strcmp(arg, "t") || !strcmp(arg, "tabs")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100728 g_flags.tabs = true;
Nigel Tao68920952020-03-03 11:25:18 +1100729 continue;
730 }
731
Nigel Taod60815c2020-03-26 14:32:35 +1100732 return g_usage;
Nigel Tao68920952020-03-03 11:25:18 +1100733 }
734
Nigel Taod60815c2020-03-26 14:32:35 +1100735 if (g_flags.query_c_string &&
736 !Query::validate(g_flags.query_c_string, strlen(g_flags.query_c_string),
737 g_flags.strict_json_pointer_syntax)) {
Nigel Taod6fdfb12020-03-11 12:24:14 +1100738 return "main: bad JSON Pointer (RFC 6901) syntax for the -query=STR flag";
739 }
740
Nigel Taod60815c2020-03-26 14:32:35 +1100741 g_flags.remaining_argc = argc - c;
742 g_flags.remaining_argv = argv + c;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100743 return nullptr;
Nigel Tao68920952020-03-03 11:25:18 +1100744}
745
Nigel Tao2cf76db2020-02-27 22:42:01 +1100746const char* //
747initialize_globals(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100748 g_dst = wuffs_base__make_io_buffer(
749 wuffs_base__make_slice_u8(g_dst_array, DST_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100750 wuffs_base__empty_io_buffer_meta());
Nigel Tao1b073492020-02-16 22:11:36 +1100751
Nigel Taod60815c2020-03-26 14:32:35 +1100752 g_src = wuffs_base__make_io_buffer(
753 wuffs_base__make_slice_u8(g_src_array, SRC_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100754 wuffs_base__empty_io_buffer_meta());
755
Nigel Taod60815c2020-03-26 14:32:35 +1100756 g_tok = wuffs_base__make_token_buffer(
757 wuffs_base__make_slice_token(g_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
Nigel Tao2cf76db2020-02-27 22:42:01 +1100758 wuffs_base__empty_token_buffer_meta());
759
Nigel Taod60815c2020-03-26 14:32:35 +1100760 g_curr_token_end_src_index = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100761
Nigel Taod60815c2020-03-26 14:32:35 +1100762 g_depth = 0;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100763
Nigel Taod60815c2020-03-26 14:32:35 +1100764 g_ctx = context::none;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100765
Nigel Tao68920952020-03-03 11:25:18 +1100766 TRY(parse_flags(argc, argv));
Nigel Taod60815c2020-03-26 14:32:35 +1100767 if (g_flags.fail_if_unsandboxed && !g_sandboxed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100768 return "main: unsandboxed";
769 }
Nigel Tao01abc842020-03-06 21:42:33 +1100770 const int stdin_fd = 0;
Nigel Taod60815c2020-03-26 14:32:35 +1100771 if (g_flags.remaining_argc >
772 ((g_input_file_descriptor != stdin_fd) ? 1 : 0)) {
773 return g_usage;
Nigel Tao107f0ef2020-03-01 21:35:02 +1100774 }
775
Nigel Taod60815c2020-03-26 14:32:35 +1100776 g_query.reset(g_flags.query_c_string);
Nigel Tao0cd2f982020-03-03 23:03:02 +1100777
778 // If the query is non-empty, suprress writing to stdout until we've
779 // completed the query.
Nigel Taod60815c2020-03-26 14:32:35 +1100780 g_suppress_write_dst = g_query.next_fragment() ? 1 : 0;
781 g_wrote_to_dst = false;
Nigel Tao0cd2f982020-03-03 23:03:02 +1100782
Nigel Tao4e193592020-07-15 12:48:57 +1000783 if (g_flags.input_format == file_format::json) {
784 TRY(g_json_decoder
785 .initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0)
786 .message());
787 g_dec = g_json_decoder.upcast_as__wuffs_base__token_decoder();
788 } else {
789 TRY(g_cbor_decoder
790 .initialize(sizeof__wuffs_cbor__decoder(), WUFFS_VERSION, 0)
791 .message());
792 g_dec = g_cbor_decoder.upcast_as__wuffs_base__token_decoder();
793 }
Nigel Tao4b186b02020-03-18 14:25:21 +1100794
Nigel Tao3c8589b2020-07-19 21:49:00 +1000795 if (g_flags.input_allow_json_comments) {
796 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_BLOCK, true);
797 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_COMMENT_LINE, true);
798 }
799 if (g_flags.input_allow_json_extra_comma) {
Nigel Tao4e193592020-07-15 12:48:57 +1000800 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_EXTRA_COMMA, true);
Nigel Taoc766bb72020-07-09 12:59:32 +1000801 }
802
Nigel Tao4b186b02020-03-18 14:25:21 +1100803 // Consume an optional whitespace trailer. This isn't part of the JSON spec,
804 // but it works better with line oriented Unix tools (such as "echo 123 |
805 // jsonptr" where it's "echo", not "echo -n") or hand-edited JSON files which
806 // can accidentally contain trailing whitespace.
Nigel Tao4e193592020-07-15 12:48:57 +1000807 g_dec->set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE, true);
Nigel Tao4b186b02020-03-18 14:25:21 +1100808
809 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +1100810}
Nigel Tao1b073492020-02-16 22:11:36 +1100811
812// ----
813
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100814// ignore_return_value suppresses errors from -Wall -Werror.
815static void //
816ignore_return_value(int ignored) {}
817
Nigel Tao2914bae2020-02-26 09:40:30 +1100818const char* //
819read_src() {
Nigel Taod60815c2020-03-26 14:32:35 +1100820 if (g_src.meta.closed) {
Nigel Tao9cc2c252020-02-23 17:05:49 +1100821 return "main: internal error: read requested on a closed source";
Nigel Taoa8406922020-02-19 12:22:00 +1100822 }
Nigel Taod60815c2020-03-26 14:32:35 +1100823 g_src.compact();
824 if (g_src.meta.wi >= g_src.data.len) {
825 return "main: g_src buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100826 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100827 while (true) {
Nigel Taod60815c2020-03-26 14:32:35 +1100828 ssize_t n = read(g_input_file_descriptor, g_src.data.ptr + g_src.meta.wi,
829 g_src.data.len - g_src.meta.wi);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100830 if (n >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100831 g_src.meta.wi += n;
832 g_src.meta.closed = n == 0;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100833 break;
834 } else if (errno != EINTR) {
835 return strerror(errno);
836 }
Nigel Tao1b073492020-02-16 22:11:36 +1100837 }
838 return nullptr;
839}
840
Nigel Tao2914bae2020-02-26 09:40:30 +1100841const char* //
842flush_dst() {
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100843 while (true) {
Nigel Taod60815c2020-03-26 14:32:35 +1100844 size_t n = g_dst.meta.wi - g_dst.meta.ri;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100845 if (n == 0) {
846 break;
Nigel Tao1b073492020-02-16 22:11:36 +1100847 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100848 const int stdout_fd = 1;
Nigel Taod60815c2020-03-26 14:32:35 +1100849 ssize_t i = write(stdout_fd, g_dst.data.ptr + g_dst.meta.ri, n);
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100850 if (i >= 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100851 g_dst.meta.ri += i;
Nigel Taofe0cbbd2020-03-05 22:01:30 +1100852 } else if (errno != EINTR) {
853 return strerror(errno);
854 }
Nigel Tao1b073492020-02-16 22:11:36 +1100855 }
Nigel Taod60815c2020-03-26 14:32:35 +1100856 g_dst.compact();
Nigel Tao1b073492020-02-16 22:11:36 +1100857 return nullptr;
858}
859
Nigel Tao2914bae2020-02-26 09:40:30 +1100860const char* //
861write_dst(const void* s, size_t n) {
Nigel Taod60815c2020-03-26 14:32:35 +1100862 if (g_suppress_write_dst > 0) {
Nigel Tao0cd2f982020-03-03 23:03:02 +1100863 return nullptr;
864 }
Nigel Tao1b073492020-02-16 22:11:36 +1100865 const uint8_t* p = static_cast<const uint8_t*>(s);
866 while (n > 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100867 size_t i = g_dst.writer_available();
Nigel Tao1b073492020-02-16 22:11:36 +1100868 if (i == 0) {
869 const char* z = flush_dst();
870 if (z) {
871 return z;
872 }
Nigel Taod60815c2020-03-26 14:32:35 +1100873 i = g_dst.writer_available();
Nigel Tao1b073492020-02-16 22:11:36 +1100874 if (i == 0) {
Nigel Taod60815c2020-03-26 14:32:35 +1100875 return "main: g_dst buffer is full";
Nigel Tao1b073492020-02-16 22:11:36 +1100876 }
877 }
878
879 if (i > n) {
880 i = n;
881 }
Nigel Taod60815c2020-03-26 14:32:35 +1100882 memcpy(g_dst.data.ptr + g_dst.meta.wi, p, i);
883 g_dst.meta.wi += i;
Nigel Tao1b073492020-02-16 22:11:36 +1100884 p += i;
885 n -= i;
Nigel Taod60815c2020-03-26 14:32:35 +1100886 g_wrote_to_dst = true;
Nigel Tao1b073492020-02-16 22:11:36 +1100887 }
888 return nullptr;
889}
890
891// ----
892
Nigel Tao168f60a2020-07-14 13:19:33 +1000893const char* //
894write_literal(uint64_t vbd) {
895 const char* ptr = nullptr;
896 size_t len = 0;
897 if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__UNDEFINED) {
898 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +1000899 // JSON's closest approximation to "undefined" is "null".
900 if (g_flags.output_cbor_metadata_as_json_comments) {
901 ptr = "/*cbor:undefined*/null";
902 len = 22;
903 } else {
904 ptr = "null";
905 len = 4;
906 }
Nigel Tao168f60a2020-07-14 13:19:33 +1000907 } else {
908 ptr = "\xF7";
909 len = 1;
910 }
911 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__NULL) {
912 if (g_flags.output_format == file_format::json) {
913 ptr = "null";
914 len = 4;
915 } else {
916 ptr = "\xF6";
917 len = 1;
918 }
919 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__FALSE) {
920 if (g_flags.output_format == file_format::json) {
921 ptr = "false";
922 len = 5;
923 } else {
924 ptr = "\xF4";
925 len = 1;
926 }
927 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__TRUE) {
928 if (g_flags.output_format == file_format::json) {
929 ptr = "true";
930 len = 4;
931 } else {
932 ptr = "\xF5";
933 len = 1;
934 }
935 } else {
936 return "main: internal error: unexpected write_literal argument";
937 }
938 return write_dst(ptr, len);
939}
940
941// ----
942
943const char* //
Nigel Tao664f8432020-07-16 21:25:14 +1000944write_number_as_cbor_f64(double f) {
Nigel Tao168f60a2020-07-14 13:19:33 +1000945 uint8_t buf[9];
946 wuffs_base__lossy_value_u16 lv16 =
947 wuffs_base__ieee_754_bit_representation__from_f64_to_u16_truncate(f);
948 if (!lv16.lossy) {
949 buf[0] = 0xF9;
950 wuffs_base__store_u16be__no_bounds_check(&buf[1], lv16.value);
951 return write_dst(&buf[0], 3);
952 }
953 wuffs_base__lossy_value_u32 lv32 =
954 wuffs_base__ieee_754_bit_representation__from_f64_to_u32_truncate(f);
955 if (!lv32.lossy) {
956 buf[0] = 0xFA;
957 wuffs_base__store_u32be__no_bounds_check(&buf[1], lv32.value);
958 return write_dst(&buf[0], 5);
959 }
960 buf[0] = 0xFB;
961 wuffs_base__store_u64be__no_bounds_check(
962 &buf[1], wuffs_base__ieee_754_bit_representation__from_f64_to_u64(f));
963 return write_dst(&buf[0], 9);
964}
965
966const char* //
Nigel Tao664f8432020-07-16 21:25:14 +1000967write_number_as_cbor_u64(uint8_t base, uint64_t u) {
Nigel Tao168f60a2020-07-14 13:19:33 +1000968 uint8_t buf[9];
969 if (u < 0x18) {
970 buf[0] = base | ((uint8_t)u);
971 return write_dst(&buf[0], 1);
972 } else if ((u >> 8) == 0) {
973 buf[0] = base | 0x18;
974 buf[1] = ((uint8_t)u);
975 return write_dst(&buf[0], 2);
976 } else if ((u >> 16) == 0) {
977 buf[0] = base | 0x19;
978 wuffs_base__store_u16be__no_bounds_check(&buf[1], ((uint16_t)u));
979 return write_dst(&buf[0], 3);
980 } else if ((u >> 32) == 0) {
981 buf[0] = base | 0x1A;
982 wuffs_base__store_u32be__no_bounds_check(&buf[1], ((uint32_t)u));
983 return write_dst(&buf[0], 5);
984 }
985 buf[0] = base | 0x1B;
986 wuffs_base__store_u64be__no_bounds_check(&buf[1], u);
987 return write_dst(&buf[0], 9);
988}
989
990const char* //
Nigel Tao664f8432020-07-16 21:25:14 +1000991write_cbor_number_as_json(uint8_t* ptr,
992 size_t len,
993 bool ignore_first_byte,
994 bool minus_1_minus_x) {
995 if (ignore_first_byte) {
996 if (len == 0) {
997 return "main: internal error: ignore_first_byte with no bytes";
998 }
999 ptr++;
1000 len--;
1001 }
1002 uint64_t u;
1003 switch (len) {
1004 case 1:
1005 u = wuffs_base__load_u8__no_bounds_check(ptr);
1006 break;
1007 case 2:
1008 u = wuffs_base__load_u16be__no_bounds_check(ptr);
1009 break;
1010 case 4:
1011 u = wuffs_base__load_u32be__no_bounds_check(ptr);
1012 break;
1013 case 8:
1014 u = wuffs_base__load_u64be__no_bounds_check(ptr);
1015 break;
1016 default:
1017 return "main: internal error: unexpected cbor number byte length";
1018 }
1019 uint8_t buf[1 + WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL];
1020 uint8_t* b = &buf[0];
1021 if (minus_1_minus_x) {
1022 u++;
1023 if (u == 0) {
1024 // See the cbor.TOKEN_VALUE_MINOR__MINUS_1_MINUS_X comment re overflow.
1025 return write_dst("-18446744073709551616", 21);
1026 }
1027 *b++ = '-';
1028 }
1029 size_t n = wuffs_base__render_number_u64(
1030 wuffs_base__make_slice_u8(b, WUFFS_BASE__U64__BYTE_LENGTH__MAX_INCL), u,
1031 WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1032 return write_dst(&buf[0], n + (minus_1_minus_x ? 1 : 0));
1033}
1034
1035const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001036write_number(uint64_t vbd, uint8_t* ptr, size_t len) {
Nigel Tao4e193592020-07-15 12:48:57 +10001037 if (g_flags.output_format == file_format::json) {
1038 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_TEXT) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001039 return write_dst(ptr, len);
Nigel Tao4e193592020-07-15 12:48:57 +10001040 } else if ((vbd &
1041 WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_INTEGER_UNSIGNED) &&
1042 (vbd &
1043 WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_BINARY_BIG_ENDIAN)) {
Nigel Tao664f8432020-07-16 21:25:14 +10001044 return write_cbor_number_as_json(
1045 ptr, len,
1046 vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_IGNORE_FIRST_BYTE,
1047 false);
Nigel Tao168f60a2020-07-14 13:19:33 +10001048 }
1049
Nigel Tao4e193592020-07-15 12:48:57 +10001050 // From here on, (g_flags.output_format == file_format::cbor).
1051 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_BINARY_BIG_ENDIAN) {
1052 return write_dst(ptr, len);
1053 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_TEXT) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001054 // First try to parse (ptr, len) as an integer. Something like
1055 // "1180591620717411303424" is a valid number (in the JSON sense) but will
1056 // overflow int64_t or uint64_t, so fall back to parsing it as a float64.
1057 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_INTEGER_SIGNED) {
1058 if ((len > 0) && (ptr[0] == '-')) {
1059 wuffs_base__result_i64 ri = wuffs_base__parse_number_i64(
1060 wuffs_base__make_slice_u8(ptr, len),
1061 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1062 if (ri.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001063 return write_number_as_cbor_u64(0x20, ~ri.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001064 }
1065 } else {
1066 wuffs_base__result_u64 ru = wuffs_base__parse_number_u64(
1067 wuffs_base__make_slice_u8(ptr, len),
1068 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1069 if (ru.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001070 return write_number_as_cbor_u64(0x00, ru.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001071 }
1072 }
1073 }
1074
1075 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT) {
1076 wuffs_base__result_f64 rf = wuffs_base__parse_number_f64(
1077 wuffs_base__make_slice_u8(ptr, len),
1078 WUFFS_BASE__PARSE_NUMBER_XXX__DEFAULT_OPTIONS);
1079 if (rf.status.is_ok()) {
Nigel Tao664f8432020-07-16 21:25:14 +10001080 return write_number_as_cbor_f64(rf.value);
Nigel Tao168f60a2020-07-14 13:19:33 +10001081 }
1082 }
1083 }
1084
Nigel Tao4e193592020-07-15 12:48:57 +10001085fail:
Nigel Tao168f60a2020-07-14 13:19:33 +10001086 return "main: internal error: unexpected write_number argument";
1087}
1088
Nigel Tao4e193592020-07-15 12:48:57 +10001089const char* //
1090write_inline_integer(uint64_t vbd, uint8_t* ptr, size_t len) {
1091 if (g_flags.output_format == file_format::cbor) {
1092 return write_dst(ptr, len);
1093 }
1094
1095 uint8_t buf[WUFFS_BASE__I64__BYTE_LENGTH__MAX_INCL];
1096 size_t n = wuffs_base__render_number_i64(
1097 wuffs_base__make_slice_u8(&buf[0], sizeof buf), (int16_t)vbd,
1098 WUFFS_BASE__RENDER_NUMBER_XXX__DEFAULT_OPTIONS);
1099 return write_dst(&buf[0], n);
1100}
1101
Nigel Tao168f60a2020-07-14 13:19:33 +10001102// ----
1103
Nigel Tao2914bae2020-02-26 09:40:30 +11001104uint8_t //
1105hex_digit(uint8_t nibble) {
Nigel Taob5461bd2020-02-21 14:13:37 +11001106 nibble &= 0x0F;
1107 if (nibble <= 9) {
1108 return '0' + nibble;
1109 }
1110 return ('A' - 10) + nibble;
1111}
1112
Nigel Tao2914bae2020-02-26 09:40:30 +11001113const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001114flush_cbor_output_string() {
1115 uint8_t prefix[3];
1116 prefix[0] = g_cbor_output_string_is_utf_8 ? 0x60 : 0x40;
1117 if (g_cbor_output_string_length < 0x18) {
1118 prefix[0] |= g_cbor_output_string_length;
1119 TRY(write_dst(&prefix[0], 1));
1120 } else if (g_cbor_output_string_length <= 0xFF) {
1121 prefix[0] |= 0x18;
1122 prefix[1] = g_cbor_output_string_length;
1123 TRY(write_dst(&prefix[0], 2));
1124 } else if (g_cbor_output_string_length <= 0xFFFF) {
1125 prefix[0] |= 0x19;
1126 prefix[1] = g_cbor_output_string_length >> 8;
1127 prefix[2] = g_cbor_output_string_length;
1128 TRY(write_dst(&prefix[0], 3));
1129 } else {
1130 return "main: internal error: CBOR string output is too long";
1131 }
1132
1133 size_t n = g_cbor_output_string_length;
1134 g_cbor_output_string_length = 0;
1135 return write_dst(&g_cbor_output_string_array[0], n);
1136}
1137
1138const char* //
1139write_cbor_output_string(uint8_t* ptr, size_t len, bool finish) {
1140 // Check that g_cbor_output_string_array can hold any UTF-8 code point.
1141 if (CBOR_OUTPUT_STRING_ARRAY_SIZE < 4) {
1142 return "main: internal error: CBOR_OUTPUT_STRING_ARRAY_SIZE is too short";
1143 }
1144
1145 while (len > 0) {
1146 size_t available =
1147 CBOR_OUTPUT_STRING_ARRAY_SIZE - g_cbor_output_string_length;
1148 if (available >= len) {
1149 memcpy(&g_cbor_output_string_array[g_cbor_output_string_length], ptr,
1150 len);
1151 g_cbor_output_string_length += len;
1152 ptr += len;
1153 len = 0;
1154 break;
1155
1156 } else if (available > 0) {
1157 if (!g_cbor_output_string_is_multiple_chunks) {
1158 g_cbor_output_string_is_multiple_chunks = true;
1159 TRY(write_dst(g_cbor_output_string_is_utf_8 ? "\x7F" : "\x5F", 1));
Nigel Tao3b486982020-02-27 15:05:59 +11001160 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001161
1162 if (g_cbor_output_string_is_utf_8) {
1163 // Walk the end backwards to a UTF-8 boundary, so that each chunk of
1164 // the multi-chunk string is also valid UTF-8.
1165 while (available > 0) {
1166 wuffs_base__utf_8__next__output o = wuffs_base__utf_8__next_from_end(
1167 wuffs_base__make_slice_u8(ptr, available));
1168 if ((o.code_point != WUFFS_BASE__UNICODE_REPLACEMENT_CHARACTER) ||
1169 (o.byte_length != 1)) {
1170 break;
1171 }
1172 available--;
1173 }
1174 }
1175
1176 memcpy(&g_cbor_output_string_array[g_cbor_output_string_length], ptr,
1177 available);
1178 g_cbor_output_string_length += available;
1179 ptr += available;
1180 len -= available;
Nigel Tao3b486982020-02-27 15:05:59 +11001181 }
1182
Nigel Tao168f60a2020-07-14 13:19:33 +10001183 TRY(flush_cbor_output_string());
1184 }
Nigel Taob9ad34f2020-03-03 12:44:01 +11001185
Nigel Tao168f60a2020-07-14 13:19:33 +10001186 if (finish) {
1187 TRY(flush_cbor_output_string());
1188 if (g_cbor_output_string_is_multiple_chunks) {
1189 TRY(write_dst("\xFF", 1));
1190 }
1191 }
1192 return nullptr;
1193}
Nigel Taob9ad34f2020-03-03 12:44:01 +11001194
Nigel Tao168f60a2020-07-14 13:19:33 +10001195const char* //
Nigel Taod191a3f2020-07-19 22:14:54 +10001196handle_unicode_code_point(uint32_t ucp);
1197
1198const char* //
1199write_json_escaped_string(uint8_t* ptr, size_t len) {
1200restart:
1201 while (true) {
1202 size_t i;
1203 for (i = 0; i < len; i++) {
1204 uint8_t c = ptr[i];
1205 if ((c == '"') || (c == '\\') || (c < 0x20)) {
1206 TRY(write_dst(ptr, i));
1207 TRY(handle_unicode_code_point(c));
1208 ptr += i + 1;
1209 len -= i + 1;
1210 goto restart;
1211 }
1212 }
1213 TRY(write_dst(ptr, len));
1214 break;
1215 }
1216 return nullptr;
1217}
1218
1219const char* //
Nigel Tao168f60a2020-07-14 13:19:33 +10001220handle_string(uint64_t vbd,
1221 uint64_t len,
1222 bool start_of_token_chain,
1223 bool continued) {
1224 if (start_of_token_chain) {
1225 if (g_flags.output_format == file_format::json) {
Nigel Tao3c8589b2020-07-19 21:49:00 +10001226 if (g_flags.output_cbor_metadata_as_json_comments &&
1227 !(vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8)) {
1228 TRY(write_dst("/*cbor:hex*/\"", 13));
1229 } else {
1230 TRY(write_dst("\"", 1));
1231 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001232 } else {
1233 g_cbor_output_string_length = 0;
1234 g_cbor_output_string_is_multiple_chunks = false;
1235 g_cbor_output_string_is_utf_8 =
1236 vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8;
1237 }
1238 g_query.restart_fragment(in_dict_before_key() && g_query.is_at(g_depth));
1239 }
1240
1241 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_0_DST_1_SRC_DROP) {
1242 // No-op.
1243 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
1244 uint8_t* ptr = g_src.data.ptr + g_curr_token_end_src_index - len;
1245 if (g_flags.output_format == file_format::json) {
Nigel Taoaf757722020-07-18 17:27:11 +10001246 if (g_flags.input_format == file_format::json) {
1247 TRY(write_dst(ptr, len));
1248 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CHAIN_MUST_BE_UTF_8) {
Nigel Taod191a3f2020-07-19 22:14:54 +10001249 TRY(write_json_escaped_string(ptr, len));
Nigel Taoaf757722020-07-18 17:27:11 +10001250 } else {
1251 uint8_t as_hex[512];
1252 uint8_t* p = ptr;
1253 size_t n = len;
1254 while (n > 0) {
1255 wuffs_base__transform__output o = wuffs_base__base_16__encode2(
1256 wuffs_base__make_slice_u8(&as_hex[0], sizeof as_hex),
1257 wuffs_base__make_slice_u8(p, n), true,
1258 WUFFS_BASE__BASE_16__DEFAULT_OPTIONS);
1259 TRY(write_dst(&as_hex[0], o.num_dst));
1260 p += o.num_src;
1261 n -= o.num_src;
1262 if (!o.status.is_ok()) {
1263 return o.status.message();
1264 }
1265 }
1266 }
Nigel Tao168f60a2020-07-14 13:19:33 +10001267 } else {
1268 TRY(write_cbor_output_string(ptr, len, false));
1269 }
1270 g_query.incremental_match_slice(ptr, len);
Nigel Taob9ad34f2020-03-03 12:44:01 +11001271 } else {
Nigel Tao168f60a2020-07-14 13:19:33 +10001272 return "main: internal error: unexpected string-token conversion";
1273 }
1274
1275 if (continued) {
1276 return nullptr;
1277 }
1278
1279 if (g_flags.output_format == file_format::json) {
1280 TRY(write_dst("\"", 1));
1281 } else {
1282 TRY(write_cbor_output_string(nullptr, 0, true));
1283 }
1284 return nullptr;
1285}
1286
1287const char* //
1288handle_unicode_code_point(uint32_t ucp) {
1289 if (g_flags.output_format == file_format::json) {
1290 if (ucp < 0x0020) {
1291 switch (ucp) {
1292 case '\b':
1293 return write_dst("\\b", 2);
1294 case '\f':
1295 return write_dst("\\f", 2);
1296 case '\n':
1297 return write_dst("\\n", 2);
1298 case '\r':
1299 return write_dst("\\r", 2);
1300 case '\t':
1301 return write_dst("\\t", 2);
1302 }
1303
1304 // Other bytes less than 0x0020 are valid UTF-8 but not valid in a
1305 // JSON string. They need to remain escaped.
1306 uint8_t esc6[6];
1307 esc6[0] = '\\';
1308 esc6[1] = 'u';
1309 esc6[2] = '0';
1310 esc6[3] = '0';
1311 esc6[4] = hex_digit(ucp >> 4);
1312 esc6[5] = hex_digit(ucp >> 0);
1313 return write_dst(&esc6[0], 6);
1314
1315 } else if (ucp == '\"') {
1316 return write_dst("\\\"", 2);
1317
1318 } else if (ucp == '\\') {
1319 return write_dst("\\\\", 2);
Nigel Tao3b486982020-02-27 15:05:59 +11001320 }
Nigel Tao3b486982020-02-27 15:05:59 +11001321 }
1322
Nigel Tao168f60a2020-07-14 13:19:33 +10001323 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
1324 size_t n = wuffs_base__utf_8__encode(
1325 wuffs_base__make_slice_u8(&u[0],
1326 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
1327 ucp);
1328 if (n == 0) {
1329 return "main: internal error: unexpected Unicode code point";
1330 }
1331
1332 if (g_flags.output_format == file_format::json) {
1333 return write_dst(&u[0], n);
1334 }
1335 return write_cbor_output_string(&u[0], n, false);
Nigel Tao3b486982020-02-27 15:05:59 +11001336}
1337
Nigel Taod191a3f2020-07-19 22:14:54 +10001338// ----
1339
Nigel Tao3b486982020-02-27 15:05:59 +11001340const char* //
Nigel Tao2ef39992020-04-09 17:24:39 +10001341handle_token(wuffs_base__token t, bool start_of_token_chain) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001342 do {
Nigel Tao462f8662020-04-01 23:01:51 +11001343 int64_t vbc = t.value_base_category();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001344 uint64_t vbd = t.value_base_detail();
1345 uint64_t len = t.length();
Nigel Tao1b073492020-02-16 22:11:36 +11001346
1347 // Handle ']' or '}'.
Nigel Tao9f7a2502020-02-23 09:42:02 +11001348 if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
Nigel Tao2cf76db2020-02-27 22:42:01 +11001349 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
Nigel Taod60815c2020-03-26 14:32:35 +11001350 if (g_query.is_at(g_depth)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001351 return "main: no match for query";
1352 }
Nigel Taod60815c2020-03-26 14:32:35 +11001353 if (g_depth <= 0) {
1354 return "main: internal error: inconsistent g_depth";
Nigel Tao1b073492020-02-16 22:11:36 +11001355 }
Nigel Taod60815c2020-03-26 14:32:35 +11001356 g_depth--;
Nigel Tao1b073492020-02-16 22:11:36 +11001357
Nigel Taod60815c2020-03-26 14:32:35 +11001358 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1359 g_suppress_write_dst--;
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001360 // '…' is U+2026 HORIZONTAL ELLIPSIS, which is 3 UTF-8 bytes.
Nigel Tao168f60a2020-07-14 13:19:33 +10001361 if (g_flags.output_format == file_format::json) {
1362 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1363 ? "\"[…]\""
1364 : "\"{…}\"",
1365 7));
1366 } else {
1367 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST)
1368 ? "\x65[…]"
1369 : "\x65{…}",
1370 6));
1371 }
1372 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001373 // Write preceding whitespace.
Nigel Taod60815c2020-03-26 14:32:35 +11001374 if ((g_ctx != context::in_list_after_bracket) &&
1375 (g_ctx != context::in_dict_after_brace) &&
1376 !g_flags.compact_output) {
Nigel Taoc766bb72020-07-09 12:59:32 +10001377 if (g_flags.output_json_extra_comma) {
1378 TRY(write_dst(",\n", 2));
1379 } else {
1380 TRY(write_dst("\n", 1));
1381 }
Nigel Taod60815c2020-03-26 14:32:35 +11001382 for (uint32_t i = 0; i < g_depth; i++) {
1383 TRY(write_dst(
1384 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001385 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001386 }
Nigel Tao1b073492020-02-16 22:11:36 +11001387 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001388
1389 TRY(write_dst(
1390 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__FROM_LIST) ? "]" : "}",
1391 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001392 } else {
1393 TRY(write_dst("\xFF", 1));
Nigel Tao1b073492020-02-16 22:11:36 +11001394 }
1395
Nigel Taod60815c2020-03-26 14:32:35 +11001396 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1397 ? context::in_list_after_value
1398 : context::in_dict_after_key;
Nigel Tao1b073492020-02-16 22:11:36 +11001399 goto after_value;
1400 }
1401
Nigel Taod1c928a2020-02-28 12:43:53 +11001402 // Write preceding whitespace and punctuation, if it wasn't ']', '}' or a
1403 // continuation of a multi-token chain.
Nigel Tao2ef39992020-04-09 17:24:39 +10001404 if (start_of_token_chain) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001405 if (g_flags.output_format != file_format::json) {
1406 // No-op.
1407 } else if (g_ctx == context::in_dict_after_key) {
Nigel Taod60815c2020-03-26 14:32:35 +11001408 TRY(write_dst(": ", g_flags.compact_output ? 1 : 2));
1409 } else if (g_ctx != context::none) {
1410 if ((g_ctx != context::in_list_after_bracket) &&
1411 (g_ctx != context::in_dict_after_brace)) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001412 TRY(write_dst(",", 1));
Nigel Tao107f0ef2020-03-01 21:35:02 +11001413 }
Nigel Taod60815c2020-03-26 14:32:35 +11001414 if (!g_flags.compact_output) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001415 TRY(write_dst("\n", 1));
Nigel Taod60815c2020-03-26 14:32:35 +11001416 for (size_t i = 0; i < g_depth; i++) {
1417 TRY(write_dst(
1418 g_flags.tabs ? INDENT_TAB_STRING : INDENT_SPACES_STRING,
Nigel Taoecadf722020-07-13 08:22:34 +10001419 g_flags.tabs ? 1 : g_flags.spaces));
Nigel Tao0cd2f982020-03-03 23:03:02 +11001420 }
1421 }
1422 }
1423
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001424 bool query_matched_fragment = false;
Nigel Taod60815c2020-03-26 14:32:35 +11001425 if (g_query.is_at(g_depth)) {
1426 switch (g_ctx) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001427 case context::in_list_after_bracket:
1428 case context::in_list_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001429 query_matched_fragment = g_query.tick();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001430 break;
1431 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001432 query_matched_fragment = g_query.matched_fragment();
Nigel Tao0cd2f982020-03-03 23:03:02 +11001433 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001434 default:
1435 break;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001436 }
1437 }
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001438 if (!query_matched_fragment) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001439 // No-op.
Nigel Taod60815c2020-03-26 14:32:35 +11001440 } else if (!g_query.next_fragment()) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001441 // There is no next fragment. We have matched the complete query, and
1442 // the upcoming JSON value is the result of that query.
1443 //
Nigel Taod60815c2020-03-26 14:32:35 +11001444 // Un-suppress writing to stdout and reset the g_ctx and g_depth as if
1445 // we were about to decode a top-level value. This makes any subsequent
1446 // indentation be relative to this point, and we will return g_eod
1447 // after the upcoming JSON value is complete.
1448 if (g_suppress_write_dst != 1) {
1449 return "main: internal error: inconsistent g_suppress_write_dst";
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001450 }
Nigel Taod60815c2020-03-26 14:32:35 +11001451 g_suppress_write_dst = 0;
1452 g_ctx = context::none;
1453 g_depth = 0;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001454 } else if ((vbc != WUFFS_BASE__TOKEN__VBC__STRUCTURE) ||
1455 !(vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH)) {
1456 // The query has moved on to the next fragment but the upcoming JSON
1457 // value is not a container.
1458 return "main: no match for query";
Nigel Tao1b073492020-02-16 22:11:36 +11001459 }
1460 }
1461
1462 // Handle the token itself: either a container ('[' or '{') or a simple
Nigel Tao85fba7f2020-02-29 16:28:06 +11001463 // value: string (a chain of raw or escaped parts), literal or number.
Nigel Tao1b073492020-02-16 22:11:36 +11001464 switch (vbc) {
Nigel Tao85fba7f2020-02-29 16:28:06 +11001465 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
Nigel Taod60815c2020-03-26 14:32:35 +11001466 if (g_query.matched_all() && (g_depth >= g_flags.max_output_depth)) {
1467 g_suppress_write_dst++;
Nigel Tao168f60a2020-07-14 13:19:33 +10001468 } else if (g_flags.output_format == file_format::json) {
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001469 TRY(write_dst(
1470 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) ? "[" : "{",
1471 1));
Nigel Tao168f60a2020-07-14 13:19:33 +10001472 } else {
1473 TRY(write_dst((vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1474 ? "\x9F"
1475 : "\xBF",
1476 1));
Nigel Tao52c4d6a2020-03-08 21:12:38 +11001477 }
Nigel Taod60815c2020-03-26 14:32:35 +11001478 g_depth++;
1479 g_ctx = (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST)
1480 ? context::in_list_after_bracket
1481 : context::in_dict_after_brace;
Nigel Tao85fba7f2020-02-29 16:28:06 +11001482 return nullptr;
1483
Nigel Tao2cf76db2020-02-27 22:42:01 +11001484 case WUFFS_BASE__TOKEN__VBC__STRING:
Nigel Tao168f60a2020-07-14 13:19:33 +10001485 TRY(handle_string(vbd, len, start_of_token_chain, t.continued()));
Nigel Tao496e88b2020-04-09 22:10:08 +10001486 if (t.continued()) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001487 return nullptr;
1488 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001489 goto after_value;
1490
1491 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT:
Nigel Tao496e88b2020-04-09 22:10:08 +10001492 if (!t.continued()) {
1493 return "main: internal error: unexpected non-continued UCP token";
Nigel Tao0cd2f982020-03-03 23:03:02 +11001494 }
1495 TRY(handle_unicode_code_point(vbd));
Nigel Taod60815c2020-03-26 14:32:35 +11001496 g_query.incremental_match_code_point(vbd);
Nigel Tao0cd2f982020-03-03 23:03:02 +11001497 return nullptr;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001498
Nigel Tao85fba7f2020-02-29 16:28:06 +11001499 case WUFFS_BASE__TOKEN__VBC__LITERAL:
Nigel Tao168f60a2020-07-14 13:19:33 +10001500 TRY(write_literal(vbd));
1501 goto after_value;
1502
Nigel Tao2cf76db2020-02-27 22:42:01 +11001503 case WUFFS_BASE__TOKEN__VBC__NUMBER:
Nigel Tao168f60a2020-07-14 13:19:33 +10001504 TRY(write_number(vbd, g_src.data.ptr + g_curr_token_end_src_index - len,
1505 len));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001506 goto after_value;
Nigel Tao4e193592020-07-15 12:48:57 +10001507
1508 case WUFFS_BASE__TOKEN__VBC__INLINE_INTEGER:
1509 TRY(write_inline_integer(
1510 vbd, g_src.data.ptr + g_curr_token_end_src_index - len, len));
1511 goto after_value;
Nigel Tao1b073492020-02-16 22:11:36 +11001512 }
1513
Nigel Tao664f8432020-07-16 21:25:14 +10001514 if (t.value_major() == WUFFS_CBOR__TOKEN_VALUE_MAJOR) {
1515 uint64_t value_minor = t.value_minor();
1516 if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__TAG) {
1517 // TODO: CBOR tags.
1518 } else if (value_minor & WUFFS_CBOR__TOKEN_VALUE_MINOR__MINUS_1_MINUS_X) {
1519 TRY(write_cbor_number_as_json(
1520 g_src.data.ptr + g_curr_token_end_src_index - len, len, true,
1521 true));
1522 goto after_value;
1523 }
1524 }
1525
1526 // Return an error if we didn't match the (value_major, value_minor) or
1527 // (vbc, vbd) pair.
Nigel Tao2cf76db2020-02-27 22:42:01 +11001528 return "main: internal error: unexpected token";
1529 } while (0);
Nigel Tao1b073492020-02-16 22:11:36 +11001530
Nigel Tao2cf76db2020-02-27 22:42:01 +11001531 // Book-keeping after completing a value (whether a container value or a
1532 // simple value). Empty parent containers are no longer empty. If the parent
1533 // container is a "{...}" object, toggle between keys and values.
1534after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001535 if (g_depth == 0) {
1536 return g_eod;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001537 }
Nigel Taod60815c2020-03-26 14:32:35 +11001538 switch (g_ctx) {
Nigel Tao2cf76db2020-02-27 22:42:01 +11001539 case context::in_list_after_bracket:
Nigel Taod60815c2020-03-26 14:32:35 +11001540 g_ctx = context::in_list_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001541 break;
1542 case context::in_dict_after_brace:
Nigel Taod60815c2020-03-26 14:32:35 +11001543 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001544 break;
1545 case context::in_dict_after_key:
Nigel Taod60815c2020-03-26 14:32:35 +11001546 g_ctx = context::in_dict_after_value;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001547 break;
1548 case context::in_dict_after_value:
Nigel Taod60815c2020-03-26 14:32:35 +11001549 g_ctx = context::in_dict_after_key;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001550 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +11001551 default:
1552 break;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001553 }
1554 return nullptr;
1555}
1556
1557const char* //
1558main1(int argc, char** argv) {
1559 TRY(initialize_globals(argc, argv));
1560
Nigel Taocd183f92020-07-14 12:11:05 +10001561 bool start_of_token_chain = true;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001562 while (true) {
Nigel Tao4e193592020-07-15 12:48:57 +10001563 wuffs_base__status status = g_dec->decode_tokens(
Nigel Taod60815c2020-03-26 14:32:35 +11001564 &g_tok, &g_src,
1565 wuffs_base__make_slice_u8(g_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Tao2cf76db2020-02-27 22:42:01 +11001566
Nigel Taod60815c2020-03-26 14:32:35 +11001567 while (g_tok.meta.ri < g_tok.meta.wi) {
1568 wuffs_base__token t = g_tok.data.ptr[g_tok.meta.ri++];
Nigel Tao2cf76db2020-02-27 22:42:01 +11001569 uint64_t n = t.length();
Nigel Taod60815c2020-03-26 14:32:35 +11001570 if ((g_src.meta.ri - g_curr_token_end_src_index) < n) {
1571 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001572 }
Nigel Taod60815c2020-03-26 14:32:35 +11001573 g_curr_token_end_src_index += n;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001574
Nigel Taod0b16cb2020-03-14 10:15:54 +11001575 // Skip filler tokens (e.g. whitespace).
Nigel Tao3c8589b2020-07-19 21:49:00 +10001576 if (t.value_base_category() == WUFFS_BASE__TOKEN__VBC__FILLER) {
Nigel Tao496e88b2020-04-09 22:10:08 +10001577 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001578 continue;
1579 }
1580
Nigel Tao2ef39992020-04-09 17:24:39 +10001581 const char* z = handle_token(t, start_of_token_chain);
Nigel Tao496e88b2020-04-09 22:10:08 +10001582 start_of_token_chain = !t.continued();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001583 if (z == nullptr) {
1584 continue;
Nigel Taod60815c2020-03-26 14:32:35 +11001585 } else if (z == g_eod) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001586 goto end_of_data;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001587 }
1588 return z;
Nigel Tao1b073492020-02-16 22:11:36 +11001589 }
Nigel Tao2cf76db2020-02-27 22:42:01 +11001590
1591 if (status.repr == nullptr) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001592 return "main: internal error: unexpected end of token stream";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001593 } else if (status.repr == wuffs_base__suspension__short_read) {
Nigel Taod60815c2020-03-26 14:32:35 +11001594 if (g_curr_token_end_src_index != g_src.meta.ri) {
1595 return "main: internal error: inconsistent g_src indexes";
Nigel Tao2cf76db2020-02-27 22:42:01 +11001596 }
1597 TRY(read_src());
Nigel Taod60815c2020-03-26 14:32:35 +11001598 g_curr_token_end_src_index = g_src.meta.ri;
Nigel Tao2cf76db2020-02-27 22:42:01 +11001599 } else if (status.repr == wuffs_base__suspension__short_write) {
Nigel Taod60815c2020-03-26 14:32:35 +11001600 g_tok.compact();
Nigel Tao2cf76db2020-02-27 22:42:01 +11001601 } else {
1602 return status.message();
Nigel Tao1b073492020-02-16 22:11:36 +11001603 }
1604 }
Nigel Tao0cd2f982020-03-03 23:03:02 +11001605end_of_data:
1606
Nigel Taod60815c2020-03-26 14:32:35 +11001607 // With a non-empty g_query, don't try to consume trailing whitespace or
Nigel Tao0cd2f982020-03-03 23:03:02 +11001608 // confirm that we've processed all the tokens.
Nigel Taod60815c2020-03-26 14:32:35 +11001609 if (g_flags.query_c_string && *g_flags.query_c_string) {
Nigel Tao0cd2f982020-03-03 23:03:02 +11001610 return nullptr;
1611 }
Nigel Tao6b161af2020-02-24 11:01:48 +11001612
Nigel Tao6b161af2020-02-24 11:01:48 +11001613 // Check that we've exhausted the input.
Nigel Taod60815c2020-03-26 14:32:35 +11001614 if ((g_src.meta.ri == g_src.meta.wi) && !g_src.meta.closed) {
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001615 TRY(read_src());
1616 }
Nigel Taod60815c2020-03-26 14:32:35 +11001617 if ((g_src.meta.ri < g_src.meta.wi) || !g_src.meta.closed) {
Nigel Tao6b161af2020-02-24 11:01:48 +11001618 return "main: valid JSON followed by further (unexpected) data";
1619 }
1620
1621 // Check that we've used all of the decoded tokens, other than trailing
Nigel Tao4b186b02020-03-18 14:25:21 +11001622 // filler tokens. For example, "true\n" is valid JSON (and fully consumed
1623 // with WUFFS_JSON__QUIRK_ALLOW_TRAILING_NEW_LINE enabled) with a trailing
1624 // filler token for the "\n".
Nigel Taod60815c2020-03-26 14:32:35 +11001625 for (; g_tok.meta.ri < g_tok.meta.wi; g_tok.meta.ri++) {
1626 if (g_tok.data.ptr[g_tok.meta.ri].value_base_category() !=
Nigel Tao6b161af2020-02-24 11:01:48 +11001627 WUFFS_BASE__TOKEN__VBC__FILLER) {
1628 return "main: internal error: decoded OK but unprocessed tokens remain";
1629 }
1630 }
1631
1632 return nullptr;
Nigel Tao1b073492020-02-16 22:11:36 +11001633}
1634
Nigel Tao2914bae2020-02-26 09:40:30 +11001635int //
1636compute_exit_code(const char* status_msg) {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001637 if (!status_msg) {
1638 return 0;
1639 }
Nigel Tao01abc842020-03-06 21:42:33 +11001640 size_t n;
Nigel Taod60815c2020-03-26 14:32:35 +11001641 if (status_msg == g_usage) {
Nigel Tao01abc842020-03-06 21:42:33 +11001642 n = strlen(status_msg);
1643 } else {
Nigel Tao9cc2c252020-02-23 17:05:49 +11001644 n = strnlen(status_msg, 2047);
Nigel Tao01abc842020-03-06 21:42:33 +11001645 if (n >= 2047) {
1646 status_msg = "main: internal error: error message is too long";
1647 n = strnlen(status_msg, 2047);
1648 }
Nigel Tao9cc2c252020-02-23 17:05:49 +11001649 }
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001650 const int stderr_fd = 2;
1651 ignore_return_value(write(stderr_fd, status_msg, n));
1652 ignore_return_value(write(stderr_fd, "\n", 1));
Nigel Tao9cc2c252020-02-23 17:05:49 +11001653 // Return an exit code of 1 for regular (forseen) errors, e.g. badly
1654 // formatted or unsupported input.
1655 //
1656 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
1657 // run-time checks found that an internal invariant did not hold.
1658 //
1659 // Automated testing, including badly formatted inputs, can therefore
1660 // discriminate between expected failure (exit code 1) and unexpected failure
1661 // (other non-zero exit codes). Specifically, exit code 2 for internal
1662 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
1663 // linux) for a segmentation fault (e.g. null pointer dereference).
1664 return strstr(status_msg, "internal error:") ? 2 : 1;
1665}
1666
Nigel Tao2914bae2020-02-26 09:40:30 +11001667int //
1668main(int argc, char** argv) {
Nigel Tao01abc842020-03-06 21:42:33 +11001669 // Look for an input filename (the first non-flag argument) in argv. If there
1670 // is one, open it (but do not read from it) before we self-impose a sandbox.
1671 //
1672 // Flags start with "-", unless it comes after a bare "--" arg.
1673 {
1674 bool dash_dash = false;
1675 int a;
1676 for (a = 1; a < argc; a++) {
1677 char* arg = argv[a];
1678 if ((arg[0] == '-') && !dash_dash) {
1679 dash_dash = (arg[1] == '-') && (arg[2] == '\x00');
1680 continue;
1681 }
Nigel Taod60815c2020-03-26 14:32:35 +11001682 g_input_file_descriptor = open(arg, O_RDONLY);
1683 if (g_input_file_descriptor < 0) {
Nigel Tao01abc842020-03-06 21:42:33 +11001684 fprintf(stderr, "%s: %s\n", arg, strerror(errno));
1685 return 1;
1686 }
1687 break;
1688 }
1689 }
1690
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001691#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1692 prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
Nigel Taod60815c2020-03-26 14:32:35 +11001693 g_sandboxed = true;
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001694#endif
1695
Nigel Tao0cd2f982020-03-03 23:03:02 +11001696 const char* z = main1(argc, argv);
Nigel Taod60815c2020-03-26 14:32:35 +11001697 if (g_wrote_to_dst) {
Nigel Tao168f60a2020-07-14 13:19:33 +10001698 const char* z1 = (g_flags.output_format == file_format::json)
1699 ? write_dst("\n", 1)
1700 : nullptr;
Nigel Tao0cd2f982020-03-03 23:03:02 +11001701 const char* z2 = flush_dst();
1702 z = z ? z : (z1 ? z1 : z2);
1703 }
1704 int exit_code = compute_exit_code(z);
Nigel Taofe0cbbd2020-03-05 22:01:30 +11001705
1706#if defined(WUFFS_EXAMPLE_USE_SECCOMP)
1707 // Call SYS_exit explicitly, instead of calling SYS_exit_group implicitly by
1708 // either calling _exit or returning from main. SECCOMP_MODE_STRICT allows
1709 // only SYS_exit.
1710 syscall(SYS_exit, exit_code);
1711#endif
Nigel Tao9cc2c252020-02-23 17:05:49 +11001712 return exit_code;
Nigel Tao1b073492020-02-16 22:11:36 +11001713}