blob: 325661e163344f8c7bd23065a6f12e20e5f06577 [file] [log] [blame]
Nigel Taod0b16cb2020-03-14 10:15:54 +11001// Copyright 2020 The Wuffs Authors.
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// ----------------
16
17/*
18jsonfindptrs reads UTF-8 JSON from stdin and writes every node's JSON Pointer
19(RFC 6901) to stdout.
20
Nigel Taod60815c2020-03-26 14:32:35 +110021See the "const char* g_usage" string below for details.
Nigel Taod0b16cb2020-03-14 10:15:54 +110022
23----
24
25This program uses Wuffs' JSON decoder at a relatively high level, building
26in-memory representations of JSON 'things' (e.g. numbers, strings, objects).
27After the entire input has been converted, walking the tree prints the output
28(in sorted order). The core conversion mechanism is to call JsonThing::parse,
29which consumes a variable number of tokens (the output of Wuffs' JSON decoder).
30JsonThing::parse can call itself recursively, as JSON values can nest.
31
32This approach is centered around JSON things. Each JSON thing comprises one or
33more JSON tokens.
34
35An alternative, lower-level approach is in the sibling example/jsonptr program.
36Neither approach is better or worse per se, but when studying this program, be
37aware that there are multiple ways to use Wuffs' JSON decoder.
38
39The two programs, jsonfindptrs and jsonptr, also demonstrate different
40trade-offs with regard to JSON object duplicate keys. The JSON spec permits
41different implementations to allow or reject duplicate keys. It is not always
42clear which approach is safer. Rejecting them is certainly unambiguous, and
43security bugs can lurk in ambiguous corners of a file format, if two different
44implementations both silently accept a file but differ on how to interpret it.
45On the other hand, in the worst case, detecting duplicate keys requires O(N)
46memory, where N is the size of the (potentially untrusted) input.
47
48This program (jsonfindptrs) rejects duplicate keys.
49
50----
51
52This example program differs from most other example Wuffs programs in that it
53is written in C++, not C.
54
55$CXX jsonfindptrs.cc && ./a.out < ../../test/data/github-tags.json; rm -f a.out
56
57for a C++ compiler $CXX, such as clang++ or g++.
58*/
59
60#include <errno.h>
61#include <fcntl.h>
62#include <unistd.h>
63#include <iostream>
64#include <map>
65#include <string>
66#include <vector>
67
68// Wuffs ships as a "single file C library" or "header file library" as per
69// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
70//
71// To use that single file as a "foo.c"-like implementation, instead of a
72// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
73// compiling it.
74#define WUFFS_IMPLEMENTATION
75
76// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
77// release/c/etc.c whitelist which parts of Wuffs to build. That file contains
78// the entire Wuffs standard library, implementing a variety of codecs and file
79// formats. Without this macro definition, an optimizing compiler or linker may
80// very well discard Wuffs code for unused codecs, but listing the Wuffs
81// modules we use makes that process explicit. Preprocessing means that such
82// code simply isn't compiled.
83#define WUFFS_CONFIG__MODULES
84#define WUFFS_CONFIG__MODULE__BASE
85#define WUFFS_CONFIG__MODULE__JSON
86
87// If building this program in an environment that doesn't easily accommodate
88// relative includes, you can use the script/inline-c-relative-includes.go
89// program to generate a stand-alone C++ file.
90#include "../../release/c/wuffs-unsupported-snapshot.c"
91
92#define TRY(error_msg) \
93 do { \
94 std::string z = error_msg; \
95 if (!z.empty()) { \
96 return z; \
97 } \
98 } while (false)
99
Nigel Taod60815c2020-03-26 14:32:35 +1100100static const char* g_usage =
Nigel Taod0b16cb2020-03-14 10:15:54 +1100101 "Usage: jsonfindptrs -flags input.json\n"
102 "\n"
103 "Flags:\n"
104 " -o=NUM -max-output-depth=NUM\n"
105 " -s -strict-json-pointer-syntax\n"
106 "\n"
107 "The input.json filename is optional. If absent, it reads from stdin.\n"
108 "\n"
109 "----\n"
110 "\n"
111 "jsonfindptrs reads UTF-8 JSON from stdin and writes every node's JSON\n"
112 "Pointer (RFC 6901) to stdout.\n"
113 "\n"
114 "For example, given RFC 6901 section 5's sample input\n"
115 "(https://tools.ietf.org/rfc/rfc6901.txt), this command:\n"
116 " jsonfindptrs rfc-6901-json-pointer.json\n"
117 "will print:\n"
118 " \n"
119 " /\n"
120 " / \n"
121 " /a~1b\n"
122 " /c%d\n"
123 " /e^f\n"
124 " /foo\n"
125 " /foo/0\n"
126 " /foo/1\n"
127 " /g|h\n"
128 " /i\\j\n"
129 " /k\"l\n"
130 " /m~0n\n"
131 "\n"
132 "The first three lines are (1) a 0-byte \"\", (2) a 1-byte \"/\" and (3)\n"
133 "a 2-byte \"/ \". Unlike a file system, the \"/\" JSON Pointer does not\n"
134 "identify the root. Instead, \"\" is the root and \"/\" is the child (the\n"
135 "value in a key-value pair) of the root whose key is the empty string.\n"
136 "Similarly, \"/xyz\" and \"/xyz/\" are two different nodes.\n"
137 "\n"
138 "----\n"
139 "\n"
140 "The JSON specification (https://json.org/) permits implementations that\n"
141 "allow duplicate keys, but this one does not. Conversely, it prints keys\n"
142 "in sorted order, but the overall output is not necessarily sorted\n"
143 "lexicographically. For example, \"/a/9\" would come before \"/a/10\",\n"
144 "and \"/b/c\", a child of \"/b\", would come before \"/b+\".\n"
145 "\n"
146 "This JSON implementation also rejects integer values outside ±M, where\n"
147 "M is ((1<<53)-1), also known as JavaScript's Number.MAX_SAFE_INTEGER.\n"
148 "\n"
149 "----\n"
150 "\n"
151 "The -s or -strict-json-pointer-syntax flag restricts the output lines\n"
152 "to exactly RFC 6901, with only two escape sequences: \"~0\" and \"~1\"\n"
153 "for \"~\" and \"/\". Without this flag, this program also lets \"~n\"\n"
154 "and \"~r\" escape the New Line and Carriage Return ASCII control\n"
155 "characters, which can work better with line oriented Unix tools that\n"
156 "assume exactly one value (i.e. one JSON Pointer string) per line. With\n"
157 "this flag, the program will fail if the input JSON's object keys contain\n"
158 "\"\\u000A\" or \"\\u000D\".\n"
159 "\n"
160 "----\n"
161 "\n"
162 "The JSON specification permits implementations to set their own maximum\n"
163 "input depth. This JSON implementation sets it to 1024.\n"
164 "\n"
165 "The -o=NUM or -max-output-depth=NUM flag gives the maximum (inclusive)\n"
166 "output depth. JSON containers ([] arrays and {} objects) can hold other\n"
167 "containers. A bare -o or -max-output-depth is equivalent to -o=1,\n"
168 "analogous to the Unix ls command. The flag's absence is equivalent to an\n"
169 "unlimited output depth, analogous to the Unix find command (and hence\n"
170 "the name of this program: jsonfindptrs).";
171
172// ----
173
174struct {
175 int remaining_argc;
176 char** remaining_argv;
177
178 uint32_t max_output_depth;
179 bool strict_json_pointer_syntax;
Nigel Taod60815c2020-03-26 14:32:35 +1100180} g_flags = {0};
Nigel Taod0b16cb2020-03-14 10:15:54 +1100181
182std::string //
183parse_flags(int argc, char** argv) {
Nigel Taod60815c2020-03-26 14:32:35 +1100184 g_flags.max_output_depth = 0xFFFFFFFF;
Nigel Taod0b16cb2020-03-14 10:15:54 +1100185
186 int c = (argc > 0) ? 1 : 0; // Skip argv[0], the program name.
187 for (; c < argc; c++) {
188 char* arg = argv[c];
189 if (*arg++ != '-') {
190 break;
191 }
192
193 // A double-dash "--foo" is equivalent to a single-dash "-foo". As special
194 // cases, a bare "-" is not a flag (some programs may interpret it as
195 // stdin) and a bare "--" means to stop parsing flags.
196 if (*arg == '\x00') {
197 break;
198 } else if (*arg == '-') {
199 arg++;
200 if (*arg == '\x00') {
201 c++;
202 break;
203 }
204 }
205
206 if (!strcmp(arg, "o") || !strcmp(arg, "max-output-depth")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100207 g_flags.max_output_depth = 1;
Nigel Taod0b16cb2020-03-14 10:15:54 +1100208 continue;
209 } else if (!strncmp(arg, "o=", 2) ||
210 !strncmp(arg, "max-output-depth=", 16)) {
211 while (*arg++ != '=') {
212 }
213 wuffs_base__result_u64 u = wuffs_base__parse_number_u64(
214 wuffs_base__make_slice_u8((uint8_t*)arg, strlen(arg)));
215 if (wuffs_base__status__is_ok(&u.status) && (u.value <= 0xFFFFFFFF)) {
Nigel Taod60815c2020-03-26 14:32:35 +1100216 g_flags.max_output_depth = (uint32_t)(u.value);
Nigel Taod0b16cb2020-03-14 10:15:54 +1100217 continue;
218 }
Nigel Taod60815c2020-03-26 14:32:35 +1100219 return g_usage;
Nigel Taod0b16cb2020-03-14 10:15:54 +1100220 }
221 if (!strcmp(arg, "s") || !strcmp(arg, "strict-json-pointer-syntax")) {
Nigel Taod60815c2020-03-26 14:32:35 +1100222 g_flags.strict_json_pointer_syntax = true;
Nigel Taod0b16cb2020-03-14 10:15:54 +1100223 continue;
224 }
225
Nigel Taod60815c2020-03-26 14:32:35 +1100226 return g_usage;
Nigel Taod0b16cb2020-03-14 10:15:54 +1100227 }
228
Nigel Taod60815c2020-03-26 14:32:35 +1100229 g_flags.remaining_argc = argc - c;
230 g_flags.remaining_argv = argv + c;
Nigel Taod0b16cb2020-03-14 10:15:54 +1100231 return "";
232}
233
234 // ----
235
Nigel Taof3146c22020-03-26 08:47:42 +1100236#define WORK_BUFFER_ARRAY_SIZE \
237 WUFFS_JSON__DECODER_WORKBUF_LEN_MAX_INCL_WORST_CASE
238
Nigel Taod0b16cb2020-03-14 10:15:54 +1100239#ifndef SRC_BUFFER_ARRAY_SIZE
240#define SRC_BUFFER_ARRAY_SIZE (4 * 1024)
241#endif
242#ifndef TOKEN_BUFFER_ARRAY_SIZE
243#define TOKEN_BUFFER_ARRAY_SIZE (1 * 1024)
244#endif
245
246class TokenStream {
247 public:
248 struct Result {
249 std::string status_msg;
250 wuffs_base__token token;
251 // src_data is a sub-slice of m_src (a slice is a pointer-length pair).
252 // Calling TokenStream::peek or TokenStream::next may change the backing
253 // array's contents, so handling a TokenStream::Result may require copying
254 // this src_data slice's contents.
255 wuffs_base__slice_u8 src_data;
256
257 Result(std::string s)
258 : status_msg(s),
259 token(wuffs_base__make_token(0)),
260 src_data(wuffs_base__empty_slice_u8()) {}
261
262 Result(std::string s, wuffs_base__token t, wuffs_base__slice_u8 d)
263 : status_msg(s), token(t), src_data(d) {}
264 };
265
266 TokenStream(int input_file_descriptor)
267 : m_status(wuffs_base__make_status(nullptr)),
268 m_src(wuffs_base__make_io_buffer(
269 wuffs_base__make_slice_u8(m_src_array, SRC_BUFFER_ARRAY_SIZE),
270 wuffs_base__empty_io_buffer_meta())),
271 m_tok(wuffs_base__make_token_buffer(
272 wuffs_base__make_slice_token(m_tok_array, TOKEN_BUFFER_ARRAY_SIZE),
273 wuffs_base__empty_token_buffer_meta())),
274 m_input_file_descriptor(input_file_descriptor),
275 m_curr_token_end_src_index(0) {
276 m_status =
277 m_dec.initialize(sizeof__wuffs_json__decoder(), WUFFS_VERSION, 0);
Nigel Tao502c8ef2020-03-21 21:42:30 +1100278
279 // Uncomment these lines to enable the WUFFS_JSON__QUIRK_ALLOW_BACKSLASH_X
280 // option, discussed in a separate comment.
281 //
282 // if (m_status.is_ok()) {
283 // m_dec.set_quirk_enabled(WUFFS_JSON__QUIRK_ALLOW_BACKSLASH_X, true);
284 // }
Nigel Taod0b16cb2020-03-14 10:15:54 +1100285 }
286
287 Result peek() { return peek_or_next(false); }
288 Result next() { return peek_or_next(true); }
289
290 private:
291 Result peek_or_next(bool next) {
292 while (m_tok.meta.ri >= m_tok.meta.wi) {
293 if (m_status.repr == nullptr) {
294 // No-op.
295 } else if (m_status.repr == wuffs_base__suspension__short_read) {
296 if (m_curr_token_end_src_index != m_src.meta.ri) {
297 return Result(
298 "TokenStream: internal error: inconsistent src indexes");
299 }
300 const char* z = read_src();
301 m_curr_token_end_src_index = m_src.meta.ri;
302 if (z) {
303 return Result(z);
304 }
305 } else if (m_status.repr == wuffs_base__suspension__short_write) {
306 m_tok.compact();
307 } else {
308 return Result(m_status.message());
309 }
310
Nigel Taof3146c22020-03-26 08:47:42 +1100311 m_status =
312 m_dec.decode_tokens(&m_tok, &m_src,
313 wuffs_base__make_slice_u8(
314 m_work_buffer_array, WORK_BUFFER_ARRAY_SIZE));
Nigel Taod0b16cb2020-03-14 10:15:54 +1100315 }
316
317 wuffs_base__token t = m_tok.data.ptr[m_tok.meta.ri];
318 size_t i = m_curr_token_end_src_index;
319 uint64_t n = t.length();
320 if ((m_src.meta.ri < i) || ((m_src.meta.ri - i) < n)) {
321 return Result("TokenStream: internal error: inconsistent src indexes");
322 }
323 if (next) {
324 m_tok.meta.ri++;
325 m_curr_token_end_src_index += n;
326 }
327 return Result("", t, wuffs_base__make_slice_u8(m_src.data.ptr + i, n));
328 }
329
330 const char* //
331 read_src() {
332 if (m_src.meta.closed) {
333 return "main: internal error: read requested on a closed source";
334 }
335 m_src.compact();
336 if (m_src.meta.wi >= m_src.data.len) {
337 return "main: src buffer is full";
338 }
339 while (true) {
340 ssize_t n = read(m_input_file_descriptor, m_src.data.ptr + m_src.meta.wi,
341 m_src.data.len - m_src.meta.wi);
342 if (n >= 0) {
343 m_src.meta.wi += n;
344 m_src.meta.closed = n == 0;
345 break;
346 } else if (errno != EINTR) {
347 return strerror(errno);
348 }
349 }
350 return nullptr;
351 }
352
353 wuffs_base__status m_status;
354 wuffs_base__io_buffer m_src;
355 wuffs_base__token_buffer m_tok;
356 int m_input_file_descriptor;
357 // m_curr_token_end_src_index is the m_src.data.ptr index of the end of the
358 // current token. An invariant is that (m_curr_token_end_src_index <=
359 // m_src.meta.ri).
360 size_t m_curr_token_end_src_index;
361
362 wuffs_base__token m_tok_array[TOKEN_BUFFER_ARRAY_SIZE];
363 uint8_t m_src_array[SRC_BUFFER_ARRAY_SIZE];
Nigel Taof3146c22020-03-26 08:47:42 +1100364#if WORK_BUFFER_ARRAY_SIZE > 0
365 uint8_t m_work_buffer_array[WORK_BUFFER_ARRAY_SIZE];
366#else
367 // Not all C/C++ compilers support 0-length arrays.
368 uint8_t m_work_buffer_array[1];
369#endif
Nigel Taod0b16cb2020-03-14 10:15:54 +1100370 wuffs_json__decoder m_dec;
371};
372
373// ----
374
375class JsonThing {
376 public:
377 struct Result;
378
379 using Vector = std::vector<JsonThing>;
380
381 // We use a std::map in this example program to avoid dependencies outside of
382 // the C++ standard library. If you're copy/pasting this JsonThing code,
383 // consider a more efficient data structure such as an absl::btree_map.
384 //
385 // See CppCon 2014: Chandler Carruth "Efficiency with Algorithms, Performance
386 // with Data Structures" at https://www.youtube.com/watch?v=fHNmRkzxHWs
387 using Map = std::map<std::string, JsonThing>;
388
389 enum class Kind {
390 Null,
391 Bool,
392 Int64,
393 Float64,
394 String,
395 Array,
396 Object,
397 } kind = Kind::Null;
398
399 struct Value {
400 bool b = false;
401 int64_t i = 0;
402 double f = 0;
403 std::string s;
404 Vector a;
405 Map o;
406 } value;
407
408 static JsonThing::Result parse(TokenStream& ts);
409
410 private:
411 static JsonThing::Result parse_array(TokenStream& ts);
412 static JsonThing::Result parse_literal(TokenStream::Result tsr);
413 static JsonThing::Result parse_number(TokenStream::Result tsr);
414 static JsonThing::Result parse_object(TokenStream& ts);
415 static JsonThing::Result parse_string(TokenStream& ts,
416 TokenStream::Result tsr);
417};
418
419struct JsonThing::Result {
420 std::string status_msg;
421 JsonThing thing;
422
423 Result(std::string s) : status_msg(s), thing(JsonThing()) {}
424
425 Result(std::string s, JsonThing t) : status_msg(s), thing(t) {}
426};
427
428JsonThing::Result //
429JsonThing::parse(TokenStream& ts) {
430 while (true) {
431 TokenStream::Result tsr = ts.next();
432 if (!tsr.status_msg.empty()) {
433 return Result(std::move(tsr.status_msg));
434 }
435
436 uint64_t vbc = tsr.token.value_base_category();
437 uint64_t vbd = tsr.token.value_base_detail();
438 switch (vbc) {
439 case WUFFS_BASE__TOKEN__VBC__FILLER:
440 continue;
441 case WUFFS_BASE__TOKEN__VBC__STRUCTURE:
442 if (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__PUSH) {
443 if (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_LIST) {
444 return parse_array(ts);
445 } else if (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__TO_DICT) {
446 return parse_object(ts);
447 }
448 }
449 break;
450 case WUFFS_BASE__TOKEN__VBC__STRING:
451 return parse_string(ts, tsr);
452 case WUFFS_BASE__TOKEN__VBC__LITERAL:
453 return parse_literal(tsr);
454 case WUFFS_BASE__TOKEN__VBC__NUMBER: {
455 return parse_number(tsr);
456 }
457 }
458
459 return Result("main: internal error: unexpected token");
460 }
461}
462
463JsonThing::Result //
464JsonThing::parse_array(TokenStream& ts) {
465 JsonThing jt;
466 jt.kind = Kind::Array;
467 while (true) {
468 TokenStream::Result tsr = ts.peek();
469 if (!tsr.status_msg.empty()) {
470 return Result(std::move(tsr.status_msg));
471 }
472 uint64_t vbc = tsr.token.value_base_category();
473 uint64_t vbd = tsr.token.value_base_detail();
474 if (vbc == WUFFS_BASE__TOKEN__VBC__FILLER) {
475 ts.next();
476 continue;
477 } else if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
478 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
479 ts.next();
480 break;
481 }
482
483 JsonThing::Result jtr = JsonThing::parse(ts);
484 if (!jtr.status_msg.empty()) {
485 return Result(std::move(jtr.status_msg));
486 }
487 jt.value.a.push_back(std::move(jtr.thing));
488 }
489 return Result("", jt);
490}
491
492JsonThing::Result //
493JsonThing::parse_literal(TokenStream::Result tsr) {
494 uint64_t vbd = tsr.token.value_base_detail();
495 if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__NULL) {
496 JsonThing jt;
497 jt.kind = Kind::Null;
498 return Result("", jt);
499 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__FALSE) {
500 JsonThing jt;
501 jt.kind = Kind::Bool;
502 jt.value.b = false;
503 return Result("", jt);
504 } else if (vbd & WUFFS_BASE__TOKEN__VBD__LITERAL__TRUE) {
505 JsonThing jt;
506 jt.kind = Kind::Bool;
507 jt.value.b = true;
508 return Result("", jt);
509 }
510 return Result("main: internal error: unexpected token");
511}
512
513JsonThing::Result //
514JsonThing::parse_number(TokenStream::Result tsr) {
515 // Parsing the number from its string representation (converting from "123"
516 // to 123) isn't necessary for the jsonfindptrs program, but if you're
517 // copy/pasting this JsonThing code, here's how to do it.
518 uint64_t vbd = tsr.token.value_base_detail();
519 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__FORMAT_TEXT) {
520 if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_INTEGER_SIGNED) {
521 static constexpr int64_t m = 0x001FFFFFFFFFFFFF; // ((1<<53) - 1).
522 wuffs_base__result_i64 r = wuffs_base__parse_number_i64(tsr.src_data);
523 if (!r.status.is_ok()) {
524 return Result(r.status.message());
525 } else if ((r.value < -m) || (+m < r.value)) {
526 return Result(wuffs_base__error__out_of_bounds);
527 }
528 JsonThing jt;
529 jt.kind = Kind::Int64;
530 jt.value.i = r.value;
531 return Result("", jt);
532 } else if (vbd & WUFFS_BASE__TOKEN__VBD__NUMBER__CONTENT_FLOATING_POINT) {
533 wuffs_base__result_f64 r = wuffs_base__parse_number_f64(tsr.src_data);
534 if (!r.status.is_ok()) {
535 return Result(r.status.message());
536 }
537 JsonThing jt;
538 jt.kind = Kind::Float64;
539 jt.value.f = r.value;
540 return Result("", jt);
541 }
542 }
543 return Result("main: internal error: unexpected number");
544}
545
546JsonThing::Result //
547JsonThing::parse_object(TokenStream& ts) {
548 JsonThing jt;
549 jt.kind = Kind::Object;
550
551 std::string key;
552 bool have_key = false;
553
554 while (true) {
555 TokenStream::Result tsr = ts.peek();
556 if (!tsr.status_msg.empty()) {
557 return Result(std::move(tsr.status_msg));
558 }
559 uint64_t vbc = tsr.token.value_base_category();
560 uint64_t vbd = tsr.token.value_base_detail();
561 if (vbc == WUFFS_BASE__TOKEN__VBC__FILLER) {
562 ts.next();
563 continue;
564 } else if ((vbc == WUFFS_BASE__TOKEN__VBC__STRUCTURE) &&
565 (vbd & WUFFS_BASE__TOKEN__VBD__STRUCTURE__POP)) {
566 ts.next();
567 break;
568 }
569
570 JsonThing::Result jtr = JsonThing::parse(ts);
571 if (!jtr.status_msg.empty()) {
572 return Result(std::move(jtr.status_msg));
573 }
574
575 if (have_key) {
576 have_key = false;
577 auto iter = jt.value.o.find(key);
578 if (iter == jt.value.o.end()) {
579 jt.value.o.insert(
580 iter, Map::value_type(std::move(key), std::move(jtr.thing)));
581 } else {
582 return Result("main: duplicate key: " + key);
583 }
584 } else if (jtr.thing.kind == Kind::String) {
585 have_key = true;
586 key = std::move(jtr.thing.value.s);
587 } else {
588 return Result("main: internal error: unexpected non-string key");
589 }
590 }
591 return Result("", jt);
592}
593
594JsonThing::Result //
595JsonThing::parse_string(TokenStream& ts, TokenStream::Result tsr) {
596 JsonThing jt;
597 jt.kind = Kind::String;
598 while (true) {
599 uint64_t vbc = tsr.token.value_base_category();
600 uint64_t vbd = tsr.token.value_base_detail();
601
602 switch (vbc) {
603 case WUFFS_BASE__TOKEN__VBC__STRING: {
604 if (vbd & WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_0_DST_1_SRC_DROP) {
605 // No-op.
Nigel Tao502c8ef2020-03-21 21:42:30 +1100606
Nigel Taod0b16cb2020-03-14 10:15:54 +1100607 } else if (vbd &
608 WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_1_SRC_COPY) {
609 const char* ptr = // Convert from (uint8_t*).
610 static_cast<const char*>(static_cast<void*>(tsr.src_data.ptr));
611 jt.value.s.append(ptr, tsr.src_data.len);
Nigel Tao502c8ef2020-03-21 21:42:30 +1100612
613 } else if (
614 vbd &
615 WUFFS_BASE__TOKEN__VBD__STRING__CONVERT_1_DST_4_SRC_BACKSLASH_X) {
616 // We shouldn't get here unless we enable the
617 // WUFFS_JSON__QUIRK_ALLOW_BACKSLASH_X option. The jsonfindptrs
618 // program doesn't enable that by default, but if you're copy/pasting
619 // this JsonThing code and your program does enable that option,
620 // here's how to handle it.
621 wuffs_base__slice_u8 encoded = tsr.src_data;
622 if (encoded.len & 3) {
623 return Result(
624 "main: internal error: \\x token length not a multiple of 4",
625 JsonThing());
626 }
627 while (encoded.len) {
628 uint8_t decoded[64];
629 size_t len = wuffs_base__hexadecimal__decode4(
630 wuffs_base__make_slice_u8(&decoded[0], 64), encoded);
631 if ((len > 64) || ((len * 4) > encoded.len)) {
632 return Result(
633 "main: internal error: inconsistent hexadecimal decoding",
634 JsonThing());
635 }
636 const char* ptr = // Convert from (uint8_t*).
637 static_cast<const char*>(static_cast<void*>(&decoded[0]));
638 jt.value.s.append(ptr, len);
639 encoded.ptr += len * 4;
640 encoded.len -= len * 4;
641 }
642
Nigel Taod0b16cb2020-03-14 10:15:54 +1100643 } else {
644 return Result(
645 "main: internal error: unexpected string-token conversion",
646 JsonThing());
647 }
648 break;
649 }
650
651 case WUFFS_BASE__TOKEN__VBC__UNICODE_CODE_POINT: {
652 uint8_t u[WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL];
653 size_t n = wuffs_base__utf_8__encode(
654 wuffs_base__make_slice_u8(&u[0],
655 WUFFS_BASE__UTF_8__BYTE_LENGTH__MAX_INCL),
656 vbd);
657 const char* ptr = // Convert from (uint8_t*).
658 static_cast<const char*>(static_cast<void*>(&u[0]));
659 jt.value.s.append(ptr, n);
660 break;
661 }
662
663 default:
664 return Result("main: internal error: unexpected token");
665 }
666
667 if (!tsr.token.link_next()) {
668 break;
669 }
670 tsr = ts.next();
671 if (!tsr.status_msg.empty()) {
672 return Result(std::move(tsr.status_msg));
673 }
674 }
675 return Result("", jt);
676}
677
678// ----
679
680std::string //
681escape(std::string s) {
682 for (char& c : s) {
683 if ((c == '~') || (c == '/') || (c == '\n') || (c == '\r')) {
684 goto escape_needed;
685 }
686 }
687 return s;
688
689escape_needed:
690 std::string e;
691 e.reserve(8 + s.length());
692 for (char& c : s) {
693 switch (c) {
694 case '~':
695 e += "~0";
696 break;
697 case '/':
698 e += "~1";
699 break;
700 case '\n':
Nigel Taod60815c2020-03-26 14:32:35 +1100701 if (g_flags.strict_json_pointer_syntax) {
Nigel Taod0b16cb2020-03-14 10:15:54 +1100702 return "";
703 }
704 e += "~n";
705 break;
706 case '\r':
Nigel Taod60815c2020-03-26 14:32:35 +1100707 if (g_flags.strict_json_pointer_syntax) {
Nigel Taod0b16cb2020-03-14 10:15:54 +1100708 return "";
709 }
710 e += "~r";
711 break;
712 default:
713 e += c;
714 break;
715 }
716 }
717 return e;
718}
719
720std::string //
721print_json_pointers(JsonThing& jt, std::string s, uint32_t depth) {
722 std::cout << s << std::endl;
Nigel Taod60815c2020-03-26 14:32:35 +1100723 if (depth++ >= g_flags.max_output_depth) {
Nigel Taod0b16cb2020-03-14 10:15:54 +1100724 return "";
725 }
726
727 switch (jt.kind) {
728 case JsonThing::Kind::Array:
729 s += "/";
730 for (size_t i = 0; i < jt.value.a.size(); i++) {
731 TRY(print_json_pointers(jt.value.a[i], s + std::to_string(i), depth));
732 }
733 break;
734 case JsonThing::Kind::Object:
735 s += "/";
736 for (auto& kv : jt.value.o) {
737 std::string e = escape(kv.first);
738 if (e.empty() && !kv.first.empty()) {
739 return "main: unsupported \"\\u000A\" or \"\\u000D\" in object key";
740 }
741 TRY(print_json_pointers(kv.second, s + e, depth));
742 }
743 break;
Nigel Tao18ef5b42020-03-16 10:37:47 +1100744 default:
745 break;
Nigel Taod0b16cb2020-03-14 10:15:54 +1100746 }
747 return "";
748}
749
750std::string //
751main1(int argc, char** argv) {
752 TRY(parse_flags(argc, argv));
753
754 int input_file_descriptor = 0; // A 0 default means stdin.
Nigel Taod60815c2020-03-26 14:32:35 +1100755 if (g_flags.remaining_argc > 1) {
756 return g_usage;
757 } else if (g_flags.remaining_argc == 1) {
758 const char* arg = g_flags.remaining_argv[0];
Nigel Taod0b16cb2020-03-14 10:15:54 +1100759 input_file_descriptor = open(arg, O_RDONLY);
760 if (input_file_descriptor < 0) {
761 return std::string("main: cannot read ") + arg + ": " + strerror(errno);
762 }
763 }
764
765 TokenStream ts(input_file_descriptor);
766 JsonThing::Result jtr = JsonThing::parse(ts);
767 if (!jtr.status_msg.empty()) {
768 return jtr.status_msg;
769 }
770 return print_json_pointers(jtr.thing, "", 0);
771}
772
773// ----
774
775int //
776compute_exit_code(std::string status_msg) {
777 if (status_msg.empty()) {
778 return 0;
779 }
780 std::cerr << status_msg << std::endl;
781 // Return an exit code of 1 for regular (forseen) errors, e.g. badly
782 // formatted or unsupported input.
783 //
784 // Return an exit code of 2 for internal (exceptional) errors, e.g. defensive
785 // run-time checks found that an internal invariant did not hold.
786 //
787 // Automated testing, including badly formatted inputs, can therefore
788 // discriminate between expected failure (exit code 1) and unexpected failure
789 // (other non-zero exit codes). Specifically, exit code 2 for internal
790 // invariant violation, exit code 139 (which is 128 + SIGSEGV on x86_64
791 // linux) for a segmentation fault (e.g. null pointer dereference).
792 return (status_msg.find("internal error:") != std::string::npos) ? 2 : 1;
793}
794
795int //
796main(int argc, char** argv) {
797 std::string z = main1(argc, argv);
798 int exit_code = compute_exit_code(z);
799 return exit_code;
800}