Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/json/v2: new API for encoding/json #71497

Open
dsnet opened this issue Jan 31, 2025 · 89 comments
Open

proposal: encoding/json/v2: new API for encoding/json #71497

dsnet opened this issue Jan 31, 2025 · 89 comments
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal v2 An incompatible library change
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Jan 31, 2025

Proposal Details

This is a formal proposal for the addition of "encoding/json/v2" and "encoding/json/jsontext" packages that has previously been discussed in #63397.

This focuses on just the newly added API. An argument justifying the need for a v2 package can be found in prior discussion. Alternatively, you can watch the GopherCon talk entitled "The Future of JSON in Go".

Most of the API proposal below is copied from the discussion. If you've already read the discussion and only want to know what changed relative to the discussion, skip over to the "Changes from discussion" section.

This is the largest major revision of a standard Go package to date, so there will be many reasonable threads of further discussion. Before commenting, please check the list of sub-issues to see if your comment is better suited in a particular sub-issue. We'll be using sub-issues to isolate and focus discussion on particular topics.

Thank you to everyone who has been involved with the discussion, design review, code review, etc. This proposal is better off because of all your feedback.

This proposal was written with feedback from @mvdan, @johanbrandhorst, @rogpeppe, @ChrisHines, @neild, and @rsc.

Overview

In general, we propose the addition of the following:

  • Package "encoding/json/jsontext", which handles processing of JSON purely at a syntactic layer (with no dependencies on Go reflection). This is a lower-level package that most users will not use, but still sufficiently useful to expose as a standalone package.
  • Package "encoding/json/v2", which will serve as the second major version of the v1 "encoding/json" package. It is implemented in terms of "jsontext".
  • Options in v1 "encoding/json" to provide inter-operability with v2. The v1 package will be implemented entirely in terms of "json/v2".

JSON serialization can be broken down into two primary components:

  • syntactic functionality that is concerned with processing JSON based on its grammar, and
  • semantic functionality that determines the meaning of JSON values as Go values and vice-versa.

We use the terms "encode" and "decode" to describe syntactic functionality and the terms "marshal" and "unmarshal" to describe semantic functionality.

We aim to provide a clear distinction between functionality that is purely concerned with encoding versus that of marshaling. For example, it should be possible to encode a stream of JSON tokens without needing to marshal a concrete Go value representing them. Similarly, it should be possible to decode a stream of JSON tokens without needing to unmarshal them into a concrete Go value.

block-diagram

This diagram provides a high-level overview of the v2 API. Purple blocks represent types, while blue blocks represent functions or methods. The direction of the arrows represent the approximate flow of data. The bottom half (as implemented by the "jsontext" package) of the diagram contains functionality that is only concerned with syntax, while the upper half (as implemented by the "json" package) contains functionality that assigns semantic meaning to syntactic data handled by the bottom half.

Package "encoding/json/jsontext"

The jsontext package provides functionality to process JSON purely according to the grammar.

Overview

The basic API consists of the following:

package jsontext // "encoding/json/jsontext"

type Encoder struct { /* no exported fields */ }
func NewEncoder(io.Writer, ...Options) *Encoder
func (*Encoder) WriteToken(Token) error
func (*Encoder) WriteValue(Value) error

type Decoder struct { /* no exported fields */ }
func NewDecoder(io.Reader, ...Options) *Decoder
func (*Decoder) PeekKind() Kind
func (*Decoder) ReadToken() (Token, error)
func (*Decoder) ReadValue() (Value, error)
func (*Decoder) SkipValue() error

type Kind byte
type Token struct { /* no exported fields */ }
func (Token) Kind() Kind
type Value []byte
func (Value) Kind() Kind

Tokens and Values

The primary data types for interacting with JSON are Kind, Token, and Value.

The Kind is an enumeration that describes the kind of a token or value.

// Kind represents each possible JSON token kind with a single byte,
// which is the first byte of that kind's grammar:
//   - 'n': null
//   - 'f': false
//   - 't': true
//   - '"': string
//   - '0': number
//   - '{': object start
//   - '}': object end
//   - '[': array start
//   - ']': array end
type Kind byte
func (k Kind) String() string

At present, there are no constants declared for individual kinds since each value is humanly readable. Declaring constants will lead to inconsistent usage where some users use the 'n' byte literal, while other users reference the jsontext.KindNull constant. This is a similar problem to the introduction of the http.MethodGet constant, which has led to inconsistency in codebases where the "GET" literal is more frequently used (~75% of the time).

A Token represents a lexical JSON token, which cannot represent entire array or object values. It is analogous to the v1 Token type, but is designed to be allocation-free by being an opaque struct type.

type Token struct { /* no exported fields */ }

var (
	Null  Token = rawToken("null")
	False Token = rawToken("false")
	True  Token = rawToken("true")

	ObjectStart Token = rawToken("{")
	ObjectEnd   Token = rawToken("}")
	ArrayStart  Token = rawToken("[")
	ArrayEnd    Token = rawToken("]")
)

func Bool(b bool) Token
func Int(n int64) Token
func Uint(n uint64) Token
func Float(n float64) Token
func String(s string) Token
func (t Token) Clone() Token
func (t Token) Bool() bool
func (t Token) Int() int64
func (t Token) Uint() uint64
func (t Token) Float() float64
func (t Token) String() string
func (t Token) Kind() Kind

A Value is the raw representation of a single JSON value so, unlike Token, can also represent entire array or object values. It is analogous to the v1 RawMessage type.

type Value []byte

func (v Value) Clone() Value
func (v Value) String() string
func (v Value) IsValid(opts ...Options) bool
func (v *Value) Format(opts ...Options) error
func (v *Value) Compact(opts ...Options) error
func (v *Value) Indent(opts ...Options) error
func (v *Value) Canonicalize(opts ...Options) error
func (v Value) MarshalJSON() ([]byte, error)
func (v *Value) UnmarshalJSON(b []byte) error
func (v Value) Kind() Kind // never ']' or '}' if valid

By default, IsValid validates according to RFC 7493, but accepts options to validate according to looser guarantees (such as allowing duplicate names or invalid UTF-8).

The Format method formats the value according to the specified encoder options.
The Compact and Indent methods operate similar to the v1 Compact and Indent functions.
The Canonicalize method canonicalizes the JSON value according to the JSON Canonicalization Scheme as defined in RFC 8785.
The Compact, Indent, and Canonicalize each call Format with a default list of options. The caller may provide additional options to override the defaults.

Formatting

Some top-level functions are provided for formatting JSON values and strings.

// AppendFormat formats the JSON value in src and appends it to dst
// according to the specified options.
// See [Value.Format] for more details about the formatting behavior.
func AppendFormat(dst, src []byte, opts ...Options) ([]byte, error)

// AppendQuote appends a double-quoted JSON string literal representing src
// to dst and returns the extended buffer.
func AppendQuote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)

// AppendUnquote appends the decoded interpretation of src as a
// double-quoted JSON string literal to dst and returns the extended buffer.
// The input src must be a JSON string without any surrounding whitespace.
func AppendUnquote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)

Encoder and Decoder

The Encoder and Decoder types provide the functionality for encoding to or decoding from an io.Writer or an io.Reader. An Encoder or Decoder can be constructed with NewEncoder or NewDecoder using default options.

The Encoder is a streaming encoder from raw JSON tokens and values. It is used to write a stream of top-level JSON values, each terminated with a newline character.

type Encoder struct { /* no exported fields */ }

func NewEncoder(w io.Writer, opts ...Options) *Encoder
func (e *Encoder) Reset(w io.Writer, opts ...Options)

// WriteToken writes the next token and advances the internal write offset.
// The provided token must be consistent with the JSON grammar.
func (e *Encoder) WriteToken(t Token) error

// WriteValue writes the next raw value and advances the internal write offset.
// The provided value must be consistent with the JSON grammar.
func (e *Encoder) WriteValue(v Value) error

// UnusedBuffer returns a zero-length buffer with a possible non-zero capacity.
// This buffer is intended to be used to populate a Value
// being passed to an immediately succeeding WriteValue call.
//
// Example usage:
//
//	b := d.UnusedBuffer()
//	b = append(b, '"')
//	b = appendString(b, v) // append the string formatting of v
//	b = append(b, '"')
//	... := d.WriteValue(b)
func (e *Encoder) UnusedBuffer() []byte

// OutputOffset returns the current output byte offset, which is the location
// of the next byte immediately after the most recently written token or value.
func (e *Encoder) OutputOffset() int64

The Decoder is a streaming decoder for raw JSON tokens and values. It is used to read a stream of top-level JSON values, each separated by optional whitespace characters.

type Decoder struct { /* no exported fields */ }

func NewDecoder(r io.Reader, opts ...Options) *Decoder
func (d *Decoder) Reset(r io.Reader, opts ...Options)

// PeekKind returns the kind of the token that would be returned by ReadToken.
// It does not advance the read offset.
func (d *Decoder) PeekKind() Kind

// ReadToken reads the next Token, advancing the read offset.
// The returned token is only valid until the next Peek, Read, or Skip call.
// It returns io.EOF if there are no more tokens.
func (d *Decoder) ReadToken() (Token, error)

// ReadValue returns the next raw JSON value, advancing the read offset.
// The returned value is only valid until the next Peek, Read, or Skip call
// and may not be mutated while the Decoder remains in use.
// It returns io.EOF if there are no more values.
func (d *Decoder) ReadValue() (Value, error)

// SkipValue is equivalent to calling ReadValue and discarding the result except
// that memory is not wasted trying to hold the entire value.
func (d *Decoder) SkipValue() error

// UnreadBuffer returns the data remaining in the unread buffer.
// The returned buffer must not be mutated while Decoder continues to be used.
// The buffer contents are valid until the next Peek, Read, or Skip call.
func (d *Decoder) UnreadBuffer() []byte

// InputOffset returns the current input byte offset, which is the location
// of the next byte immediately after the most recently returned token or value.
func (d *Decoder) InputOffset() int64

Some methods common to both Encoder and Decoder report information about the current automaton state.

// StackDepth returns the depth of the state machine.
// Each level on the stack represents a nested JSON object or array.
// It is incremented whenever an ObjectStart or ArrayStart token is encountered
// and decremented whenever an ObjectEnd or ArrayEnd token is encountered.
// The depth is zero-indexed, where zero represents the top-level JSON value.
func (e *Encoder) StackDepth() int
func (d *Decoder) StackDepth() int

// StackIndex returns information about the specified stack level.
// It must be a number between 0 and StackDepth, inclusive.
// For each level, it reports the kind:
//
//   - 0 for a level of zero,
//   - '{' for a level representing a JSON object, and
//   - '[' for a level representing a JSON array.
//
// It also reports the length so far of that JSON object or array.
// Each name and value in a JSON object is counted separately,
// so the effective number of members would be half the length.
// A complete JSON object must have an even length.
func (e *Encoder) StackIndex(i int) (Kind, int64)
func (d *Decoder) StackIndex(i int) (Kind, int64)

// StackPointer returns a JSON Pointer (RFC 6901) to the most recently handled value.
func (e *Encoder) StackPointer() Pointer
func (d *Decoder) StackPointer() Pointer

Options

The behavior of Encoder and Decoder may be altered by passing options to NewEncoder and NewDecoder, which take in a variadic list of options.

type Options = jsonopts.Options

// AllowDuplicateNames specifies that JSON objects may contain
// duplicate member names.
func AllowDuplicateNames(v bool) Options // affects encode and decode

// AllowInvalidUTF8 specifies that JSON strings may contain invalid UTF-8,
// which will be mangled as the Unicode replacement character, U+FFFD.
func AllowInvalidUTF8(v bool) Options // affects encode and decode

// CanonicalizeRawFloats specifies that when encoding a raw JSON floating-point number
// (i.e., a number with a fraction or exponent) in a [Token] or [Value],
// the number is canonicalized according to RFC 8785, section 3.2.2.3.
func CanonicalizeRawFloats(v bool) Options // affects encode only

// CanonicalizeRawInts specifies that when encoding a raw JSON integer number
// (i.e., a number without a fraction and exponent) in a [Token] or [Value],
// the number is canonicalized according to RFC 8785, section 3.2.2.3.
func CanonicalizeRawInts(v bool) Options // affects encode only

// PreserveRawStrings specifies that when encoding a raw JSON string
// in a [Token] or [Value], pre-escaped sequences in a JSON string
// are preserved to the output.
func PreserveRawStrings(v bool) Options // affects encode only

// ReorderRawObjects specifies that when encoding a raw JSON object in a [Value],
// the object members are reordered according to RFC 8785, section 3.2.3.
func ReorderRawObjects(v bool) Options // affects encode only

// EscapeForHTML specifies that '<', '>', and '&' characters within JSON strings
// should be escaped as a hexadecimal Unicode codepoint (e.g., \u003c)
// so that the output is safe to embed within HTML.
func EscapeForHTML(v bool) Options // affects encode only

// EscapeForJS specifies that U+2028 and U+2029 characters within JSON strings
// should be escaped as a hexadecimal Unicode codepoint (e.g., \u2028)
// so that the output is valid to embed within JavaScript.
// See RFC 8259, section 12.
func EscapeForJS(v bool) Options // affects encode only

// Multiline specifies that the JSON output should be expanded, where
// every JSON object member or JSON array element appears on a new, indented line
// according to the nesting depth.
// If an indent is not already specified, then it defaults to using "\t".
func Multiline(v bool) Options // affects encode only

// WithIndent specifies that the encoder should emit multiline output
// where each element in a JSON object or array begins on a new, indented line
// beginning with the indent prefix (see WithIndentPrefix) followed by
// one or more copies of indent according to the nesting depth.
func WithIndent(indent string) Options // affects encode only

// WithIndentPrefix specifies that the encoder should emit multiline output
// where each element in a JSON object or array begins on a new, indented line
// beginning with the indent prefix followed by
// one or more copies of indent (see WithIndent) according to the nesting depth.
func WithIndentPrefix(prefix string) Options // affects encode only

// SpaceAfterColon specifies that the JSON output should emit a space character
// after each colon separator following a JSON object name.
func SpaceAfterColon(v bool) Options // affects encode only

// SpaceAfterComma specifies that the JSON output should emit a space character
// after each comma separator following a JSON object value or array element.
func SpaceAfterComma(v bool) Options // affects encode only

The Options type is a type alias to an internal type that is an interface type with no exported methods. It is used simply as a marker type for options declared in the "json" and "jsontext" packages.

Latter options specified in the variadic list passed to NewEncoder and NewDecoder take precedence over prior option values. For example, NewEncoder(AllowInvalidUTF8(false), AllowInvalidUTF8(true)) results in AllowInvalidUTF8(true) taking precedence.

Options that do not affect the operation in question are ignored. For example, passing Multiline to NewDecoder does nothing.

The WithIndent and WithIndentPrefix flags configure the appearance of whitespace in the output. Their semantics are identical to the v1 Encoder.SetIndent method.

Errors

Errors due to non-compliance with the JSON grammar are reported as a SyntacticError.

type SyntacticError struct {
	// ByteOffset indicates that an error occurred after this byte offset.
	ByteOffset int64
	// JSONPointer indicates that an error occurred within this JSON value
	// as indicated using the JSON Pointer notation (see RFC 6901).
	JSONPointer Pointer
	// Err is the underlying error.
	Err error // always non-nil
}
func (e *SyntacticError) Error() string
func (e *SyntacticError) Unwrap() error

Errors due to I/O are returned as an opaque error that unwrap to the original error returned by the failing io.Reader.Read or io.Writer.Write call.

// ErrDuplicateName indicates that a JSON token could not be
// encoded or decoded because it results in a duplicate JSON object name.
var ErrDuplicateName = errors.New("duplicate object member name")

// ErrNonStringName indicates that a JSON token could not be
// encoded or decoded because it is not a string,
// as required for JSON object names according to RFC 8259, section 4.
var ErrNonStringName = errors.New("object member name must be a string")

ErrDuplicateName and ErrNonStringName are sentinel errors that are
returned while being wrapped within a SyntacticError.

// Pointer is a JSON Pointer (RFC 6901) that references a particular JSON value
// relative to the root of the top-level JSON value.
//
// A Pointer is a slash-separated list of tokens, where each token is
// either a JSON object name or an index to a JSON array element
// encoded as a base-10 integer value.
type Pointer string

// IsValid reports whether p is a valid JSON Pointer according to RFC 6901.
func (p Pointer) IsValid() bool

// AppendToken appends a token to the end of p and returns the full pointer.
func (p Pointer) AppendToken(tok string) Pointer

// Parent strips off the last token and returns the remaining pointer.
func (p Pointer) Parent() Pointer

// Contains reports whether the JSON value that p points to
// is equal to or contains the JSON value that pc points to.
func (p Pointer) Contains(pc Pointer) bool

// LastToken returns the last token in the pointer.
func (p Pointer) LastToken() string

// Tokens returns an iterator over the reference tokens in the JSON pointer.
func (p Pointer) Tokens() iter.Seq[string]

Pointer is a named type representing a JSON Pointer (RFC 6901) and references a particular JSON value relative to a top-level JSON value. It is primarily used for error reporting, but its utility could be expanded in the future (e.g. extracting or modifying a portion of a Value by Pointer reference alone).

Package "encoding/json/v2"

The v2 "json" package provides functionality to marshal or unmarshal JSON data from or into Go value types. This package depends on "jsontext" to process JSON text and the "reflect" package to dynamically introspect Go values at runtime.

Most users will interact directly with the "json" package without ever needing to interact with the lower-level "jsontext package.

Overview

The basic API consists of the following:

package json // "encoding/json/v2"

func Marshal(in any, opts ...Options) (out []byte, err error)
func MarshalWrite(out io.Writer, in any, opts ...Options) error
func MarshalEncode(out *jsontext.Encoder, in any, opts ...Options) error

func Unmarshal(in []byte, out any, opts ...Options) error
func UnmarshalRead(in io.Reader, out any, opts ...Options) error
func UnmarshalDecode(in *jsontext.Decoder, out any, opts ...Options) error

The Marshal and Unmarshal functions mostly match the signature of the same functions in v1, however their behavior differs.

The MarshalWrite and UnmarshalRead functions are equivalent functionality that operate on an io.Writer and io.Reader instead of []byte. The UnmarshalRead function consumes the entire input until io.EOF and reports an error if any invalid tokens appear after the end of the JSON value (#36225).

The MarshalEncode and UnmarshalDecode functions are equivalent functionality that operate on an *jsontext.Encoder and *jsontext.Decoder instead of []byte. Unlike UnmarshalRead, UnmarshalDecode does not read until io.EOF, allowing successive calls to process each JSON value as a stream.

All marshal and unmarshal functions accept a variadic list of options
that configure the behavior of serialization.

Default behavior

The marshal and unmarshal logic in v2 is mostly identical to v1 with following changes:

  • In v1, JSON object members are unmarshaled into a Go struct using a case-insensitive name match with the JSON name of the fields. In contrast, v2 matches fields using an exact, case-sensitive match. The MatchCaseInsensitiveNames and jsonv1.MatchCaseSensitiveDelimiter options control this behavior difference. To explicitly specify a Go struct field to use a particular name matching scheme, either the nocase or the strictcase field option can be specified. Field-specified options take precedence over caller-specified options.

  • In v1, when marshaling a Go struct, a field marked as omitempty is omitted if the field value is an "empty" Go value, which is defined as false, 0, a nil pointer, a nil interface value, and any empty array, slice, map, or string. In contrast, v2 redefines omitempty to omit a field if it encodes as an "empty" JSON value, which is defined as a JSON null, or an empty JSON string, object, or array. The jsonv1.OmitEmptyWithLegacyDefinition option controls this behavior difference. Note that omitempty behaves identically in both v1 and v2 for a Go array, slice, map, or string (assuming no user-defined MarshalJSON method overrides the default representation). Existing usages of omitempty on a Go bool, number, pointer, or interface value should migrate to specifying omitzero instead (which is identically supported in both v1 and v2). See prior discussion for more information.

  • In v1, a Go struct field marked as string can be used to quote a Go string, bool, or number as a JSON string. It does not recursively take effect on composite Go types. In contrast, v2 restricts the string option to only quote a Go number as a JSON string. It does recursively take effect on Go numbers within a composite Go type. The jsonv1.StringifyWithLegacySemantics option controls this behavior difference.

  • In v1, a nil Go slice or Go map is marshaled as a JSON null. In contrast, v2 marshals a nil Go slice or Go map as an empty JSON array or JSON object, respectively. The FormatNilSliceAsNull and FormatNilMapAsNull options control this behavior difference. To explicitly specify a Go struct field to use a particular representation for nil, either the format:emitempty or format:emitnull field option can be specified. Field-specified options take precedence over caller-specified options. See prior discussion for more information.

  • In v1, a Go array may be unmarshaled from a JSON array of any length. In contrast, in v2 a Go array must be unmarshaled from a JSON array of the same length, otherwise it results in an error. The jsonv1.UnmarshalArrayFromAnyLength option controls this behavior difference.

  • In v1, a Go byte array (i.e., ~[N]byte) is represented as a JSON array of JSON numbers. In contrast, in v2 a Go byte array is represented as a Base64-encoded JSON string. The jsonv1.FormatBytesWithLegacySemantics option controls this behavior difference. To explicitly specify a Go struct field to use a particular representation, either the format:array or format:base64 field option can be specified. Field-specified options take precedence over caller-specified options.

  • In v1, MarshalJSON methods declared on a pointer receiver are only called if the Go value is addressable. In contrast, in v2 a MarshalJSON method is always callable regardless of addressability. The jsonv1.CallMethodsWithLegacySemantics option controls this behavior difference.

  • In v1, MarshalJSON and UnmarshalJSON methods are never called for Go map keys. In contrast, in v2 a MarshalJSON or UnmarshalJSON method is eligible for being called for Go map keys. The jsonv1.CallMethodsWithLegacySemantics option controls this behavior difference.

  • In v1, a Go map is marshaled in a deterministic order. In contrast, in v2 a Go map is marshaled in a non-deterministic order. The Deterministic option controls this behavior difference. See prior discussion for more information.

  • In v1, JSON strings are encoded with HTML-specific or JavaScript-specific characters being escaped. In contrast, in v2 JSON strings use the minimal encoding and only escape if required by the JSON grammar. The jsontext.EscapeForHTML and jsontext.EscapeForJS options control this behavior difference.

  • In v1, bytes of invalid UTF-8 within a string are silently replaced with the Unicode replacement character. In contrast, in v2 the presence of invalid UTF-8 results in an error. The jsontext.AllowInvalidUTF8 option controls this behavior difference.

  • In v1, a JSON object with duplicate names is permitted. In contrast, in v2 a JSON object with duplicate names results in an error. The jsontext.AllowDuplicateNames option controls this behavior difference.

  • In v1, when unmarshaling a JSON null into a non-empty Go value it will inconsistently either zero out the value or do nothing. In contrast, in v2 unmarshaling a JSON null will consistently and always zero out the underlying Go value. The jsonv1.MergeWithLegacySemantics option controls this behavior difference.

  • In v1, when unmarshaling a JSON value into a non-zero Go value, it merges into the original Go value for array elements, slice elements, struct fields (but not map values), pointer values, and interface values (only if a non-nil pointer). In contrast, in v2 unmarshal merges into the Go value for struct fields, map values, pointer values, and interface values. In general, the v2 semantic merges when unmarshaling a JSON object, otherwise it replaces the value. The jsonv1.MergeWithLegacySemantics option controls this behavior difference.

  • In v1, a time.Duration is represented as a JSON number containing the decimal number of nanoseconds. In contrast, in v2 a time.Duration is represented as a JSON string containing the formatted duration (e.g., "1h2m3.456s") according to time.Duration.String. The jsonv1.FormatTimeWithLegacySemantics option controls this behavior difference. To explicitly specify a Go struct field to use a particular representation, either the format:nano or format:units field option can be specified. Field-specified options take precedence over caller-specified options.

  • In v1, errors are never reported at runtime for Go struct types that have some form of structural error (e.g., a malformed tag option). In contrast, v2 reports a runtime error for Go types that are invalid as they relate to JSON serialization. For example, a Go struct with only unexported fields cannot be serialized. The jsonv1.ReportErrorsWithLegacySemantics option controls this behavior difference.

While the behavior of Marshal and Unmarshal in "json/v2" is changing relative to v1 "json", note that the behavior of v1 "json" remains as is.

Struct tag options

Similar to v1, v2 also supports customized representation of Go struct fields through the use of struct tags. As before, the json tag will be used. The following tag options are supported:

  • omitzero: When marshaling, the "omitzero" option specifies that the struct field should be omitted if the field value is zero, as determined by the "IsZero() bool" method, if present, otherwise based on whether the field is the zero Go value (per reflect.Value.IsZero). This option has no effect when unmarshaling. (example)

  • omitempty: When marshaling, the "omitempty" option specifies that the struct field should be omitted if the field value would have been encoded as a JSON null, empty string, empty object, or empty array. This option has no effect when unmarshaling. (example)

    • Changed in v2. In v1, the "omitempty" option was narrowly defined as only omitting a field if it is a Go false, 0, a nil pointer, a nil interface value, and any empty array, slice, map, or string. In v2, it has been redefined in terms of the JSON type system, rather than the Go type system. They are practically equivalent except for Go bools, numbers, pointers, and interfaces for which the "omitzero" option can be used instead.
  • string: The "string" option specifies that StringifyNumbers be set when marshaling or unmarshaling a struct field value. This causes numeric types to be encoded as a JSON number within a JSON string, and to be decoded from a JSON string containing a JSON number. This extra level of encoding is often necessary since many JSON parsers cannot precisely represent 64-bit integers.

    • Changed in v2. In v1, the "string" option applied to certain types where use of a JSON string did not make sense (e.g., a bool) and could not be applied recursively (e.g., a slice of integers). In v2, this feature only applies to numeric types and applies recursively.
  • nocase: When unmarshaling, the "nocase" option specifies that if the JSON object name does not exactly match the JSON name for any of the struct fields, then it attempts to match the struct field using a case-insensitive match that also ignores dashes and underscores. (example)

    • New in v2. Since v2 no longer performs a case-insensitive match of JSON object names, this option provides a means to opt-into the v1-like behavior. However, the case-insensitive match is altered relative to v1 in that it also ignores dashes and underscores. This makes the feature more broadly useful for JSON objects with different naming conventions to be unmarshaled. For example, "fooBar", "FOO_BAR", or "foo-bar" will all match with a field named "FooBar".
  • strictcase: When unmarshaling, the "strictcase" option specifies that the JSON object name must exactly match the JSON name for the struct field. This takes precedence even if MatchCaseInsensitiveNames is set to true. This cannot be specified together with the "nocase" option.

    • New in v2 to provide an explicit author-specified way to prevent MatchCaseInsensitiveNames from taking effect on a particular field. This option provides a means to opt-into the v2-like behavior.
  • inline: The "inline" option specifies that the JSON object representation of this field is to be promoted as if it were specified in the parent struct. It is the JSON equivalent of Go struct embedding. A Go embedded field is implicitly inlined unless an explicit JSON name is specified. The inlined field must be a Go struct that does not implement Marshaler or Unmarshaler. Inlined fields of type jsontext.Value and map[~string]T are called “inlined fallbacks”, as they can represent all possible JSON object members not directly handled by the parent struct. Only one inlined fallback field may be specified in a struct, while many non-fallback fields may be specified. This option must not be specified with any other tag option. (example)

    • New in v2. Inlining is an explicit way to embed a JSON object within another JSON object without relying on Go struct embedding. The feature is capable of inlining Go maps and jsontext.Value (#6213).
  • unknown: The "unknown" option is a specialized variant of the inlined fallback to indicate that this Go struct field contains any number of “unknown” JSON object members. The field type must be a jsontext.Value or a map[~string]T. If DiscardUnknownMembers is specified when marshaling, the contents of this field are ignored. If RejectUnknownMembers is specified when unmarshaling, any unknown object members are rejected even if a field exists with the "unknown" option. This option must not be specified with any other tag option. (example)

    • New in v2. The "inline" feature technically provides a way to preserve unknown member (#22533). However, the "inline" feature alone does not semantically tell us whether this field is meant to store unknown members. The "unknown" option gives us this extra bit of information so that we can cooperate with options that affect unknown membership.
  • format: The "format" option specifies a format flag used to specialize the formatting of the field value. The option is a key-value pair specified as "format:value" where the value must be either a literal consisting of letters and numbers (e.g., "format:RFC3339") or a single-quoted string literal (e.g., "format:'2006-01-02'"). The interpretation of the format flag is determined by the struct field type. (example)

    • New in v2. The "format" option provides a general way to customize formatting of arbitrary types.

    • []byte and [N]byte types accept "format" values of either "base64", "base64url", "base32", "base32hex", "base16", or "hex", where it represents the binary bytes as a JSON string encoded using the specified format in RFC 4648. It may also be "array" to treat the slice or array as a JSON array of numbers. The "array" format exists for backwards compatibility since the default representation of an array of bytes now uses Base-64.

    • float32 and float64 types accept a "format" value of "nonfinite", where NaN and infinity are represented as JSON strings.

    • Slice types accept a "format" value of "emitnull" to marshal a nil slice as a JSON null instead of an empty JSON array. (more discussion).

    • Map types accept a "format" value of "emitnull" to marshal a nil map as a JSON null instead of an empty JSON object. (more discussion).

    • The time.Time type accepts a "format" value which may either be a Go identifier for one of the format constants (e.g., "RFC3339") or the format string itself to use with time.Time.Format or time.Parse (#21990). It can also be "unix", "unixmilli", "unixmicro", or "unixnano" to be represented as a decimal number reporting the number of seconds (or milliseconds, etc.) since the Unix epoch.

    • The time.Duration type accepts a "format" value of "sec", "milli", "micro", or "nano" to represent it as the number of seconds (or milliseconds, etc.) formatted as a JSON number. This exists for backwards compatibility since the default representation now uses a string representation (e.g., "53.241s"). If the format is "base60", it is encoded as a JSON string using the "H:MM:SS.SSSSSSSSS" representation.

The "omitzero" and "omitempty" options are similar. The former is defined in terms of the Go type system, while the latter in terms of the JSON type system. Consequently they behave differently in some circumstances. For example, only a nil slice or map is omitted under "omitzero", while an empty slice or map is omitted under "omitempty" regardless of nilness. The "omitzero" option is useful for types with a well-defined zero value (e.g., netip.Addr) or have an IsZero method (e.g., time.Time).

Note that all tag options labeled with "Changed in v2" will behave as it has always historically behaved when using v1 "json". However, all tag options labeled with "New in v2" will be implicitly and retroactively supported in v1 "json" because v1 will be implemented under-the-hood using "json/v2".

Type-specified customization

Go types may customize their own JSON representation by implementing certain interfaces that the "json" package knows to look for:

type Marshaler interface {
	MarshalJSON() ([]byte, error)
}
type MarshalerTo interface {
	MarshalJSONTo(*jsontext.Encoder, Options) error
}

type Unmarshaler interface {
	UnmarshalJSON([]byte) error
}
type UnmarshalerFrom interface {
	UnmarshalJSONFrom(*jsontext.Decoder, Options) error
}

The v1 Marshaler and Unmarshaler interfaces are supported in v2 to provide greater degrees of backward compatibility.

The MarshalerTo and UnmarshalerFrom interfaces operate in a purely streaming manner and provide a means for plumbing down options. This API can provide dramatic performance improvements (see "Performance").

If a type implements both sets of marshaling or unmarshaling interfaces, then the streaming variant takes precedence.

Just like v1, encoding.TextMarshaler and encoding.TextUnmarshaler interfaces remain supported in v2, where these interfaces are treated with lower precedence than JSON-specific serialization interfaces.

Caller-specified customization

In addition to Go types being able to specify their own JSON representation, the caller of the marshal or unmarshal functionality can also specify their own JSON representation for specific Go types (#5901). Caller-specified customization takes precedence over type-specified customization.

// SkipFunc may be returned by MarshalToFunc and UnmarshalFromFunc functions.
// Any function that returns SkipFunc must not cause observable side effects
// on the provided Encoder or Decoder.
const SkipFunc = jsonError("skip function")

// Marshalers holds a list of functions that may override the marshal behavior
// of specific types. Populate WithMarshalers to use it.
// A nil *Marshalers is equivalent to an empty list.
type Marshalers struct { /* no exported fields */ }

// JoinMarshalers constructs a flattened list of marshal functions.
// If multiple functions in the list are applicable for a value of a given type,
// then those earlier in the list take precedence over those that come later.
// If a function returns SkipFunc, then the next applicable function is called,
// otherwise the default marshaling behavior is used.
//
// For example:
//
//	m1 := JoinMarshalers(f1, f2)
//	m2 := JoinMarshalers(f0, m1, f3)     // equivalent to m3
//	m3 := JoinMarshalers(f0, f1, f2, f3) // equivalent to m2
func JoinMarshalers(ms ...*Marshalers) *Marshalers

// MarshalFunc constructs a type-specific marshaler that
// specifies how to marshal values of type T.
func MarshalFunc[T any](fn func(T) ([]byte, error)) *Marshalers

// MarshalToFunc constructs a type-specific marshaler that
// specifies how to marshal values of type T.
// The function is always provided with a non-nil pointer value
// if T is an interface or pointer type.
func MarshalToFunc[T any](fn func(*jsontext.Encoder, T, Options) error) *Marshalers

// Unmarshalers holds a list of functions that may override the unmarshal behavior
// of specific types. Populate WithUnmarshalers to use it.
// A nil *Unmarshalers is equivalent to an empty list.
type Unmarshalers struct { /* no exported fields */ }

// JoinUnmarshalers constructs a flattened list of unmarshal functions.
// It operates in a similar manner as [JoinMarshalers].
func JoinUnmarshalers(us ...*Unmarshalers) *Unmarshalers

// UnmarshalFunc constructs a type-specific unmarshaler that
// specifies how to unmarshal values of type T.
func UnmarshalFunc[T any](fn func([]byte, T) error) *Unmarshalers

// UnmarshalFromFunc constructs a type-specific unmarshaler that
// specifies how to unmarshal values of type T.
// T must be an unnamed pointer or an interface type.
// The function is always provided with a non-nil pointer value.
func UnmarshalFromFunc[T any](fn func(*jsontext.Decoder, T, Options) error) *Unmarshalers

Caller-specified customization is a powerful feature. For example:

  • It can be used to marshal Go errors (example).
  • It can be used to preserve the raw representation of JSON numbers (example). Note that v2 does not have the v1 RawNumber type.
  • It can be used to preserve the input offset of JSON values for error reporting purposes (example).

Options

Options may be specified that configure how marshal and unmarshal operates:

// Options configure Marshal, MarshalWrite, MarshalEncode,
// Unmarshal, UnmarshalRead, and UnmarshalDecode with specific features.
// Each function takes in a variadic list of options, where properties set
// in latter options override the value of previously set properties.
//
// Options represent either a singular option or a set of options.
// It can be functionally thought of as a Go map of option properties
// (even though the underlying implementation avoids Go maps for performance).
//
// The constructors (e.g., Deterministic) return a singular option value:
//	opt := Deterministic(true)
// which is analogous to creating a single entry map:
//	opt := Options{"Deterministic": true}
//
// JoinOptions composes multiple options values to together:
//	out := JoinOptions(opts...)
// which is analogous to making a new map and copying the options over:
//	out := make(Options)
//	for _, m := range opts {
//		for k, v := range m {
//			out[k] = v
//		}
//	}
//
// GetOption looks up the value of options parameter:
//	v, ok := GetOption(opts, Deterministic)
// which is analogous to a Go map lookup:
//	v, ok := opts["Deterministic"]
//
// There is a single Options type, which is used with both marshal and unmarshal.
// Options that do not affect a particular operation are ignored.
type Options = jsonopts.Options

// DefaultOptionsV2 is the full set of all options that define v2 semantics.
// It is equivalent to all options under [Options], [encoding/json.Options],
// and [encoding/json/jsontext.Options] being set to false or the zero value,
// except for the options related to whitespace formatting.
func DefaultOptionsV2() Options

// StringifyNumbers specifies that numeric Go types should be marshaled as
// a JSON string containing the equivalent JSON number value.
// When unmarshaling, numeric Go types are parsed from a JSON string
// containing the JSON number without any surrounding whitespace.
func StringifyNumbers(v bool) Options // affects marshal and unmarshal

// Deterministic specifies that the same input value will be serialized
// as the exact same output bytes. Different processes of
// the same program will serialize equal values to the same bytes,
// but different versions of the same program are not guaranteed
// to produce the exact same sequence of bytes.
func Deterministic(v bool) Options // affects marshal only

// FormatNilMapAsNull specifies that a nil Go map should marshal as a
// JSON null instead of the default representation as an empty JSON object.
func FormatNilMapAsNull(v bool) Options // affects marshal only

// FormatNilSliceAsNull specifies that a nil Go slice should marshal as a
// JSON null instead of the default representation as an empty JSON array
// (or an empty JSON string in the case of ~[]byte).
func FormatNilSliceAsNull(v bool) Options // affects marshal only

// MatchCaseInsensitiveNames specifies that JSON object members are matched
// against Go struct fields using a case-insensitive match of the name.
func MatchCaseInsensitiveNames(v bool) Options // affects marshal and unmarshal

// DiscardUnknownMembers specifies that marshaling should ignore any
// JSON object members stored in Go struct fields dedicated to storing
// unknown JSON object members.
func DiscardUnknownMembers(v bool) Options // affects marshal only

// RejectUnknownMembers specifies that unknown members should be rejected
// when unmarshaling a JSON object, regardless of whether there is a field
// to store unknown members.
func RejectUnknownMembers(v bool) Options // affects unmarshal only

// OmitZeroStructFields specifies that a Go struct should marshal in such a way
// that all struct fields that are zero are omitted from the marshaled output
// if the value is zero as determined by the "IsZero() bool" method if present,
// otherwise based on whether the field is the zero Go value.
func OmitZeroStructFields(v bool) Options // affects marshal only

// NonFatalSemanticErrors specifies that [SemanticErrors] encountered
// while marshaling or unmarshaling should not immediately terminate
// the procedure, but that processing should continue and that all
// errors be returned as a multi-error.
func NonFatalSemanticErrors(v bool) Options // affects marshal and unmarshal

// WithMarshalers specifies a list of type-specific marshalers to use,
// which can be used to override the default marshal behavior
// for values of particular types.
func WithMarshalers(v *Marshalers) Options // affects marshal only

// WithUnmarshalers specifies a list of type-specific unmarshalers to use,
// which can be used to override the default unmarshal behavior
// for values of particular types.
func WithUnmarshalers(v *Unmarshalers) Options // affects unmarshal only

// JoinOptions coalesces the provided list of options into a single Options.
// Properties set in latter options override the value of previously set properties.
func JoinOptions(srcs ...Options) Options

// GetOption returns the value stored in opts with the provided constructor,
// reporting whether the value is present.
func GetOption[T any](opts Options, constructor func(T) Options) (T, bool)

The Options type is a type alias to an internal type that is an interface type with no exported methods. It is used simply as a marker type for options declared in the "json" and "jsontext" package. This is exactly the same Options type as the one in the "jsontext" package.

The same Options type is used for both Marshal and Unmarshal as some options affect both operations.

The MarshalJSONTo, UnmarshalJSONFrom, MarshalToFunc, and UnmarshalFromFunc methods and functions take in a singular Options value instead of a variadic list because the Options type can represent a set of options. The caller (which is the "json" package) can coalesce a list of options before calling the user-specified method or function. Being given a single Options value is more ergonomic for the user as there is only one options value to introspect with GetOption.

Errors

Errors due to the inability to correlate JSON data with Go data are reported as SemanticError.

type SemanticError struct {
	// ByteOffset indicates that an error occurred after this byte offset.
	ByteOffset int64
	// JSONPointer indicates that an error occurred within this JSON value
	// as indicated using the JSON Pointer notation (see RFC 6901).
	JSONPointer jsontext.Pointer

	// JSONKind is the JSON kind that could not be handled.
	JSONKind Kind // may be zero if unknown
	// JSONValue is the JSON number or string that could not be unmarshaled.
	JSONValue jsontext.Value // may be nil if irrelevant or unknown
	// GoType is the Go type that could not be handled.
	GoType reflect.Type // may be nil if unknown

	// Err is the underlying error.
	Err error // may be nil
}
func (e *SemanticError) Error() string
func (e *SemanticError) Unwrap() error
// ErrUnknownName indicates that a JSON object member could not be
// unmarshaled because the name is not known to the target Go struct.
// This error is directly wrapped within a [SemanticError] when produced.
var ErrUnknownName = errors.New("unknown object member name")

ErrUnknownName is a sentinel error that is returned while being wrapped within a SemanticError.

Package "encoding/json"

The API and behavior for v1 "json" remains unchanged except for the addition of new options to configure v2 to operate with legacy v1 behavior.

Options

Options may be specified that configures v2 "json" to operate with legacy v1 behavior:

type Options = jsonopts.Options

// DefaultOptionsV1 is the full set of all options that define v1 semantics.
// It is equivalent to the following boolean options being set to true:
//
//   - [CallMethodsWithLegacySemantics]
//   - [EscapeInvalidUTF8]
//   - [FormatBytesWithLegacySemantics]
//   - [FormatTimeWithLegacySemantics]
//   - [MatchCaseSensitiveDelimiter]
//   - [MergeWithLegacySemantics]
//   - [OmitEmptyWithLegacyDefinition]
//   - [ReportErrorsWithLegacySemantics]
//   - [StringifyWithLegacySemantics]
//   - [UnmarshalArrayFromAnyLength]
//   - [jsonv2.Deterministic]
//   - [jsonv2.FormatNilMapAsNull]
//   - [jsonv2.FormatNilSliceAsNull]
//   - [jsonv2.MatchCaseInsensitiveNames]
//   - [jsontext.AllowDuplicateNames]
//   - [jsontext.AllowInvalidUTF8]
//   - [jsontext.EscapeForHTML]
//   - [jsontext.EscapeForJS]
//   - [jsontext.PreserveRawString]
//
// The [Marshal] and [Unmarshal] functions in this package are
// semantically identical to calling the v2 equivalents with this option:
//
//	jsonv2.Marshal(v, jsonv1.DefaultOptionsV1())
//	jsonv2.Unmarshal(b, v, jsonv1.DefaultOptionsV1())
func DefaultOptionsV1() jsonopts.Options

// CallMethodsWithLegacySemantics specifies that calling of type-provided
// marshal and unmarshal methods follow legacy semantics:
//
//   - When marshaling, a marshal method declared on a pointer receiver
//     is only called if the Go value is addressable.
//     Values obtained from an interface or map element are not addressable.
//     Values obtained from a pointer or slice element are addressable.
//     Values obtained from an array element or struct field inherit
//     the addressability of the parent. In contrast, the v2 semantic
//     is to always call marshal methods regardless of addressability.
//
//   - When marshaling or unmarshaling, the [Marshaler] or [Unmarshaler]
//     methods are ignored for map keys. However, [encoding.TextMarshaler]
//     or [encoding.TextUnmarshaler] are still callable.
//     In contrast, the v2 semantic is to serialize map keys
//     like any other value (with regard to calling methods),
//     which may include calling [Marshaler] or [Unmarshaler] methods,
//     where it is the implementation's responsibility to represent the
//     Go value as a JSON string (as required for JSON object names).
//
//   - When marshaling, if a map key value implements a marshal method
//     and is a nil pointer, then it is serialized as an empty JSON string.
//     In contrast, the v2 semantic is to report an error.
//
//   - When marshaling, if an interface type implements a marshal method
//     and the interface value is a nil pointer to a concrete type,
//     then the marshal method is always called.
//     In contrast, the v2 semantic is to never directly call methods
//     on interface values and to instead defer evaluation based upon
//     the underlying concrete value. Similar to non-interface values,
//     marshal methods are not called on nil pointers and
//     are instead serialized as a JSON null.
//
// This affects either marshaling or unmarshaling.
func CallMethodsWithLegacySemantics(bool) jsonopts.Options // affects marshal and unmarshal

// EscapeInvalidUTF8 specifies that when encoding a [jsontext.String]
// with bytes of invalid UTF-8, such bytes are escaped as
// a hexadecimal Unicode codepoint (i.e., \ufffd).
// In contrast, the v2 default is to use the minimal representation,
// which is to encode invalid UTF-8 as the Unicode replacement rune itself
// (without any form of escaping).
func EscapeInvalidUTF8(bool) jsonopts.Options // affects encoding only

// FormatBytesWithLegacySemantics specifies that handling of
// []~byte and [N]~byte types follow legacy semantics:
//
//   - A Go [N]~byte is always treated as as a normal Go array
//     in contrast to the v2 default of treating [N]byte as
//     using some form of binary data encoding (RFC 4648).
//
//   - A Go []~byte is to be treated as using some form of
//     binary data encoding (RFC 4648) in contrast to the v2 default
//     of only treating []byte as such. In particular, v2 does not
//     treat slices of named byte types as representing binary data.
//
//   - When marshaling, if a named byte implements a marshal method,
//     then the slice is serialized as a JSON array of elements,
//     each of which call the marshal method.
//
//   - When unmarshaling, if the input is a JSON array,
//     then unmarshal into the []~byte as if it were a normal Go slice.
//     In contrast, the v2 default is to report an error unmarshaling
//     a JSON array when expecting some form of binary data encoding.
//
//   - When unmarshaling, '\r' and '\n' characters are ignored
//     within the encoded "base32" and "base64" data.
//     In contrast, the v2 default is to report an error in order to be
//     strictly compliant with RFC 4648, section 3.3,
//     which specifies that non-alphabet characters must be rejected.
func FormatBytesWithLegacySemantics(bool) jsonopts.Options // affects marshal and unmarshal

// FormatTimeWithLegacySemantics specifies that [time] types are formatted
// with legacy semantics:
//
//   - When marshaling or unmarshaling, a [time.Duration] is formatted as
//     a JSON number representing the number of nanoseconds.
//     In contrast, the default v2 behavior uses a JSON string
//     with the duration formatted with [time.Duration.String].
//     If a duration field has a `format` tag option,
//     then the specified formatting takes precedence.
//
//   - When unmarshaling, a [time.Time] follows loose adherence to RFC 3339.
//     In particular, it permits historically incorrect representations,
//     allowing for deviations in hour format, sub-second separator,
//     and timezone representation. In contrast, the default v2 behavior
//     is to strictly comply with the grammar specified in RFC 3339.
func FormatTimeWithLegacySemantics(bool) jsonopts.Options // affects marshal and unmarshal

// MatchCaseSensitiveDelimiter specifies that underscores and dashes are
// not to be ignored when performing case-insensitive name matching which
// occurs under [jsonv2.MatchCaseInsensitiveNames] or the `nocase` tag option.
// Thus, case-insensitive name matching is identical to [strings.EqualFold].
// Use of this option diminishes the ability of case-insensitive matching
// to be able to match common case variants (e.g, "foo_bar" with "fooBar").
func MatchCaseSensitiveDelimiter(bool) jsonopts.Options // affects marshal and unmarshal

// MergeWithLegacySemantics specifies that unmarshaling into a non-zero
// Go value follows legacy semantics:
//
//   - When unmarshaling a JSON null, this preserves the original Go value
//     if the kind is a bool, int, uint, float, string, array, or struct.
//     Otherwise, it zeros the Go value.
//     In contrast, the default v2 behavior is to consistently and always
//     zero the Go value when unmarshaling a JSON null into it.
//
//   - When unmarshaling a JSON value other than null, this merges into
//     the original Go value for array elements, slice elements,
//     struct fields (but not map values),
//     pointer values, and interface values (only if a non-nil pointer).
//     In contrast, the default v2 behavior is to merge into the Go value
//     for struct fields, map values, pointer values, and interface values.
//     In general, the v2 semantic merges when unmarshaling a JSON object,
//     otherwise it replaces the original value.
func MergeWithLegacySemantics(bool) jsonopts.Options // affects unmarshal only

// OmitEmptyWithLegacyDefinition specifies that the `omitempty` tag option
// follows a definition of empty where a field is omitted if the Go value is
// false, 0, a nil pointer, a nil interface value,
// or any empty array, slice, map, or string.
// This overrides the v2 semantic where a field is empty if the value
// marshals as a JSON null or an empty JSON string, object, or array.
//
// The v1 and v2 definitions of `omitempty` are practically the same for
// Go strings, slices, arrays, and maps. Usages of `omitempty` on
// Go bools, ints, uints floats, pointers, and interfaces should migrate to use
// the `omitzero` tag option, which omits a field if it is the zero Go value.
func OmitEmptyWithLegacyDefinition(bool) jsonopts.Options // affects marshal only

// ReportErrorsWithLegacySemantics specifies that Marshal and Unmarshal
// should report errors with legacy semantics:
//
//   - When marshaling or unmarshaling, the returned error values are
//     usually of types such as [SyntaxError], [MarshalerError],
//     [UnsupportedTypeError], [UnsupportedValueError],
//     [InvalidUnmarshalError], or [UnmarshalTypeError].
//     In contrast, the v2 semantic is to always return errors as either
//     [jsonv2.SemanticError] or [jsontext.SyntacticError].
//
//   - When marshaling, if a user-defined marshal method reports an error,
//     it is always wrapped in a [MarshalerError], even if the error itself
//     is already a [MarshalerError], which may lead to multiple redundant
//     layers of wrapping. In contrast, the v2 semantic is to
//     always wrap an error within [jsonv2.SemanticError]
//     unless it is already a semantic error.
//
//   - When unmarshaling, if a user-defined unmarshal method reports an error,
//     it is never wrapped and reported verbatim. In contrast, the v2 semantic
//     is to always wrap an error within [jsonv2.SemanticError]
//     unless it is already a semantic error.
//
//   - When marshaling or unmarshaling, if a Go struct contains type errors
//     (e.g., conflicting names or malformed field tags), then such errors
//     are ignored and the Go struct uses a best-effort representation.
//     In contrast, the v2 semantic is to report a runtime error.
//
//   - When unmarshaling, the syntactic structure of the JSON input
//     is fully validated before performing the semantic unmarshaling
//     of the JSON data into the Go value. Practically speaking,
//     this means that JSON input with syntactic errors do not result
//     in any mutations of the target Go value. In contrast, the v2 semantic
//     is to perform a streaming decode and gradually unmarshal the JSON input
//     into the target Go value, which means that the Go value may be
//     partially mutated when a syntactic error is encountered.
//
//   - When unmarshaling, a semantic error does not immediately terminate the
//     unmarshal procedure, but rather evaluation continues.
//     When unmarshal returns, only the first semantic error is reported.
//     In contrast, the v2 semantic is to terminate unmarshal the moment
//     an error is encountered.
func ReportErrorsWithLegacySemantics(bool) jsonopts.Options // affects marshal and unmarshal

// StringifyWithLegacySemantics specifies that the `string` tag option
// may stringify bools and string values. It only takes effect on fields
// where the top-level type is a bool, string, numeric kind, or a pointer to
// such a kind. Specifically, `string` will not stringify bool, string,
// or numeric kinds within a composite data type
// (e.g., array, slice, struct, map, or interface).
//
// When marshaling, such Go values are serialized as their usual
// JSON representation, but quoted within a JSON string.
// When unmarshaling, such Go values must be deserialized from
// a JSON string containing their usual JSON representation.
// A JSON null quoted in a JSON string is a valid substitute for JSON null
// while unmarshaling into a Go value that `string` takes effect on.
func StringifyWithLegacySemantics(bool) jsonopts.Options // affects marshal only

// UnmarshalArrayFromAnyLength specifies that Go arrays can be unmarshaled
// from input JSON arrays of any length. If the JSON array is too short,
// then the remaining Go array elements are zeroed. If the JSON array
// is too long, then the excess JSON array elements are skipped over.
func UnmarshalArrayFromAnyLength(bool) jsonopts.Options // affects unmarshal only

Many of the options configure fairly obscure behavior. Unfortunately, many of the behaviors cannot be changed in order to maintain backwards compatibility. This is a major justification for a v2 "json" package.

Let jsonv1 be v1 "encoding/json" and jsonv2 be "encoding/json/v2", then the v1 and v2 options can be composed together to obtain behavior that is identical to v1, identical to v2, or anywhere in between. For example:

  • jsonv1.Marshal(v)
    • uses default v1 semantics
  • jsonv2.Marshal(in, jsonv1.DefaultOptionsV1())
    • semantically equivalent to jsonv1.Marshal
  • jsonv2.Marshal(in, jsonv1.DefaultOptionsV1(), jsontext.AllowDuplicateNames(false))
    • uses mostly v1 semantics, but opts into one v2-specific behavior
  • jsonv2.Marshal(in, jsonv1.StringifyWithLegacySemantics(true), jsonv1.ReportErrorsWithLegacySemantics(true))
    • uses mostly v2 semantics, but opts into two v1-specific behaviors
  • jsonv2.Marshal(v, ..., jsonv2.DefaultOptionsV2())
    • semantically equivalent to jsonv2.Marshal since jsonv2.DefaultOptionsV2 overrides any options specified earlier in the ...
  • jsonv2.Marshal(v)
    • uses default v2 semantics

Types aliases

The following types are moved to v2 "json":

type Marshaler = jsonv2.Marshaler
type Unmarshaler = jsonv2.Unmarshaler
type RawMessage = jsontext.Value

Number methods

The Number type no longer has special-case support in the "json" implementation itself.

func (Number) MarshalJSONTo(*jsontext.Encoder, jsonopts.Options) error
func (*Number) UnmarshalJSONFrom(*jsontext.Decoder, jsonopts.Options) error

So methods are added to have it implement the v2 MarshalerTo and UnmarshalerFrom methods to preserve equivalent behavior.

Errors

The UnmarshalTypeError type is extended to wrap an underlying error:

type UnmarshalTypeError struct {
	...
	Err error
}

func (*UnmarshalTypeError) Unwrap() error

Errors returned by v2 "json" are much richer, so the wrapped error provides a way for v1 "json" to preserve some of that context, while still using the UnmarshalTypeError type, which many programs may still be expecting.

The UnmarshalTypeError.Field now reports a dot-delimited path to the error value where each path segment is either a JSON array and map index operation. This is a divergence from prior behavior which was always inconsistent about whether the position was reported according to the Go namespace or the JSON namespace (see #43126).

@dsnet dsnet added the Proposal label Jan 31, 2025
@gopherbot gopherbot added this to the Proposal milestone Jan 31, 2025
@dsnet
Copy link
Member Author

dsnet commented Jan 31, 2025

Changes from discussion

If you have already read the discussion in #63397, then much of the API presented above may be familiar. This section records the differences made relative to the discussion.

Package "encoding/json/jsontext"

The following Value methods were altered to accept options.

- func (v Value) IsValid() bool
+ func (v Value) IsValid(opts ...Options) bool
- func (v *Value) Compact() error
+ func (v *Value) Compact(opts ...Options) error
- func (v *Value) Indent(prefix, indent string) error
+ func (v *Value) Indent(opts ...Options) error
- func (v *Value) Canonicalize() error
+ func (v *Value) Canonicalize(opts ...Options) error

+ func (v *Value) Format(opts ...Options) error

Accepting options allows the default behavior of these methods to be overridden, providing greater flexibility in usage.

The removal of the prefix and indent argument from Indent improves the ergonomics of the method as most users just want indented output without thinking about the particular indent string used. These can still be specified using the WithIndentPrefix and WithIndent options.

One major criticism of Canonicalize (per RFC 8785) is that it mangles the precision of wide integers. By accepting options, users can additionally specify CanonicalizeRawInts(false) to prevent this behavior, while still having canonicalization for all other JSON artifacts.

The Format method was newly added as the primary implementation backing Compact, Indent, and Canonicalize.

The following options were added to provide greater flexibility to formatting:

+ func CanonicalizeRawFloats(v bool) Options
+ func CanonicalizeRawInts(v bool) Options
+ func PreserveRawStrings(v bool) Options
+ func ReorderRawObjects(v bool) Options
+ func SpaceAfterComma(v bool) Options
+ func SpaceAfterColon(v bool) Options

- func Expand(v bool) Options
+ func Multiline(v bool) Options

The Expand option was renamed as Multiline to be more clear and to distinguish it from SpaceAfterComma and SpaceAfterColon (which both technically "expand" the output).

The following formatting API has been added:

+ func AppendFormat(dst, src []byte, opts ...Options) ([]byte, error)
+ func AppendQuote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)
+ func AppendUnquote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)

The length returned by StackIndex is now a int64 instead of int since the length of a JSON array or object could theoretically exceed int when handling JSON in a purely streaming manner. The stack depth, however, remains fundamentally limited by the amount of system memory, so an int is still an appropriate type.

-func (e *Encoder) StackIndex(i int) (Kind, int)
+func (e *Encoder) StackIndex(i int) (Kind, int64)
-func (d *Decoder) StackIndex(i int) (Kind, int)
+func (d *Decoder) StackIndex(i int) (Kind, int64)

The pointer returned by StackPointer is now the named Pointer type.

- func (e *Encoder) StackPointer() string
+ func (e *Encoder) StackPointer() Pointer
- func (d *Decoder) StackPointer() string
+ func (d *Decoder) StackPointer() Pointer

An explicit Pointer type was added to represent a JSON Pointer (RFC 6901) as a means to identify exactly where an error occurred. Convenience methods are defined for interacting with a pointer.

+ type Pointer string
+ func (p Pointer) IsValid() bool
+ func (p Pointer) AppendToken(tok string) Pointer
+ func (p Pointer) Parent() Pointer
+ func (p Pointer) Contains(pc Pointer) bool
+ func (p Pointer) LastToken() string
+ func (p Pointer) Tokens() iter.Seq[string]

Handling of errors was improved:

  type SyntacticError struct {
  	...
- 	JSONPointer string
+ 	JSONPointer Pointer
  }

+ var ErrDuplicateName = errors.New("duplicate object member name")
+ var ErrNonStringName = errors.New("object member name must be a string")

The ErrDuplicateName and ErrNonStringName errors were added to support common error conditions users may want to distinguish upon through the use of errors.Is.

Package "encoding/json/v2"

Interface types and methods were renamed to avoid the V1 and V2 suffixes, which were aesthetically unpleasant. Instead, the V2 declarations now generally use the To and From suffixes to indicate that they support streaming. This follows after the convention established by io.WriterTo and io.ReaderFrom.

- type MarshalerV1 interface { MarshalJSON() ([]byte, error) }
+ type Marshaler interface { MarshalJSON() ([]byte, error)}

- type MarshalerV2 interface { MarshalJSONV2(*jsontext.Encoder, Options) error }
+ type MarshalerTo interface { MarshalJSONTo(*jsontext.Encoder, Options) error}

- type UnmarshalerV1 interface { UnmarshalJSON([]byte) error }
+ type Unmarshaler interface { UnmarshalJSON([]byte) error}

- type UnmarshalerV2 interface { UnmarshalJSONV2(*jsontext.Decoder, Options) error }
+ type UnmarshalerFrom interface { UnmarshalJSONFrom(*jsontext.Decoder, Options) error}

- func MarshalFuncV1[T any](fn func(T) ([]byte, error)) *Marshalers
+ func MarshalFunc[T any](fn func(T) ([]byte, error)) *Marshalers

- func MarshalFuncV2[T any](fn func(*jsontext.Encoder, T, Options) error) *Marshalers
+ func MarshalToFunc[T any](fn func(*jsontext.Encoder, T, Options) error) *Marshalers

- func UnmarshalFuncV1[T any](fn func([]byte, T) error) *Unmarshalers
+ func UnmarshalFunc[T any](fn func([]byte, T) error) *Unmarshalers

- func UnmarshalFuncV2[T any](fn func(*jsontext.Decoder, T, Options) error) *Unmarshalers
+ func UnmarshalFromFunc[T any](fn func(*jsontext.Decoder, T, Options) error) *Unmarshalers

The constructor for Marshalers and Unmarshalers were renamed using the Join prefix to be more consistent with the existing JoinOptions constructor. It also more clearly matches exactly what the constructor does.

- func NewMarshalers(ms ...*Marshalers) *Marshalers
+ func JoinMarshalers(ms ...*Marshalers) *Marshalers

- func NewUnmarshalers(us ...*Unmarshalers) *Unmarshalers
+ func JoinUnmarshalers(us ...*Unmarshalers) *Unmarshalers

The following options were added:

+ func OmitZeroStructFields(v bool) Options
+ func NonFatalSemanticErrors(v bool) Options

The OmitZeroStructFields is a caller-specified option that mirrors the addition of the omitzero struct tag option.

Implementing v1 in terms of v2 required the latter to support non-fatal errors. The NonFatalSemanticErrors option exposes that functionality in a more consistent (i.e., handling both marshal and unmarshal) and in a more modern way (i.e., returning a multi error).

  type SemanticError struct {
  	...
- 	JSONPointer string
+ 	JSONPointer jsontext.Pointer
+ 	JSONValue   jsontext.Value
  }

+ var ErrUnknownName = errors.New("unknown object member name")

The ErrUnknownName error was added to support common use-cases wanting to distinguish this particular condition (see #29035).

The following behavior changes were made to marshal and unmarshal:

  • There is newly added support for the strictcase option to provide a better migration path between users of both v1 and v2.

  • Specifying the string tag option now rejects unmarshaling from a JSON number and only permits unmarshaling from a JSON string. This exactly matches the behavior of v1.

  • When unmarshaling, a floating-point overflow results in an error. This exactly matches the behavior of v1.

  • Serialization now supports embedded fields of unexported struct types with exported fields. This exactly matches the behavior of v1.

Package "encoding/json"

Some options for legacy v1 support were renamed or had similar options folded together.

- func RejectFloatOverflow(v bool) Options

- func IgnoreStructErrors(v bool) Options
- func ReportLegacyErrorValues(v bool) Options
+ func ReportErrorsWithLegacySemantics(bool)

- func SkipUnaddressableMethods(v bool) Options
+ func CallMethodsWithLegacySemantics(bool)

- func FormatByteArrayAsArray(v bool) Options
+ func FormatBytesWithLegacySemantics(bool)

- func FormatTimeDurationAsNanosecond(v bool) Options
+ func FormatTimeWithLegacySemantics(bool)

+ func EscapeInvalidUTF8(bool)

In general, many options were renamed with a WithLegacySemantics suffix because they convered a multitude of behavior differences that could not be adequently described with a concise name.

The RejectFloatOverflow option was removed because v2 now rejects floating-point overflows just like v1.

The EscapeInvalidUTF8 option was added in order to support a behavior difference that was discovered while implementing v1 support in terms of v2. We may avoid adding this as it controls fairly esoteric and undocumented behavior.

The Number type implements MarshalerTo and UnmarshalerFrom for better compatibility with v2.

+ func (Number) MarshalJSONTo(*jsontext.Encoder, jsonopts.Options) error
+ func (*Number) UnmarshalJSONFrom(*jsontext.Decoder, jsonopts.Options) error

The UnmarshalTypeError type now supports error wrapping.

  type UnmarshalTypeError struct {
  	...
+ 	Err error
  }

+ func (*UnmarshalTypeError) Unwrap() error

Changes not made from discussion

There were many ideas discussed in #63397 that did not result in changes to the current proposal. Some ideas are still worth pursuing, while others were declined for various reasons. In general, ideas that could later be built on top of the initial release of v2 were deferred so that we could focus on the current API. We prioritized ideas that could not be explored after the initial API was finalized.

The following is a non-exhaustive list of such considerations:

  • Option structs instead of variadic options: An older prototype of the v2 implementation used option structs, but was refactored to use variadic options once work began implementing v1 in terms of v2. In general, options operate upon several dimensions: v1 versus v2, marshal versus unmarshal, and encode versus decode. Some options are valid in multiple dimensions (e.g., marshal and unmarshal). Some call sites accept options from multiple dimensions (e.g., marshal and encode). A singular options type is more ergonomic, but loses type safety. We deem the wins of the former stronger than the losses of the latter.

  • User-defined option values: json.Options should support user-provided values. One possible future proposal could be a WithOption[T any](v T) constructor to convert a user-provided value into an option. See #71664 for a separate proposal.

  • User-defined format values: json.Options should support specifying format values (e.g., that all time.Time types use the "unix" format). One possible future proposal could be a WithFormat[T any](s string) constructor to declare arbitrary formats for custom use. See #71664 for a separate proposal.

  • context.Context plumbing: json.MarshalerTo and json.UnmarshalerFrom could accept a context.Context. A context serves two purposes: 1) cancelation of an operation, 2) plumbing of user-defined options. Cancelation does not make sense in "json" since the io.Reader and io.Writer interfaces provide no direct way to cancel a read or write operation. Plumbing of options should be done by extending the existing json.Options type to support user-defined options (see above ideas). Thus, there is little utility to plumbing a context.Context.

  • Treat []byte as just strings: ~[]byte and ~[N]byte types should support a format:string option that treats such types as if they were Go strings. This can be a future proposal.

  • Use ISO 8601 for time.Duration: While JavaScript in TC39 recently defined a grammar for durations based on a particular profile of ISO 8601, the specification does not define the meaning of "years", "months", "weeks", etc., so we cannot convert ISO 8601 into a specific time.Duration value. Any value chosen for each unit will inevitably lead to interoperability issues when other systems use different definitions of such units. See #71631 for further discussion about the right default for time.Duration.

  • First-class support for ternary values: Some usages of JSON expect the Go value to distinguish between whether an object member was present, is null, or an explicit value. It is out of scope of v2 to directly support this use case, but the omitzero tag option does make this easier to implement externally.

  • First-class support for ordered maps: While JSON specifies that objects are unordered, many usages of JSON rely on the ordering and expect Go to provide first-class support for handling ordered objects. However, a native Go type that preserves ordering does not exist, so support for this is deferred for now.

  • First-class support for union types: It is common for the type of a particular JSON value to be dynamically changed based on context. This is difficult to support in Go as the equivalent of a dynamic value is a Go interface. When unmarshaling, there is no way in Go reflection to enumerate the set of possible Go types that can be stored in a Go interface in order to choose the right type to automatically unmarshal a dynamic JSON value. Support for such use cases is deferred until better Go language support exists.

  • Split marshal/unmarshal into separate packages: We already split JSON functionality apart based on whether it dealt with JSON at a syntactic (i.e., "jsontext") or semantic (i.e., "json/v2") level. This particular split is justified by the fact that syntactic processing should not depend on Go reflection (which is a relatively heavy dependency). The benefits of splitting marshal and unmarshal apart is less clear. This seems like the job of the Go compiler/linker to perform better dead-code elimination (DCE) of unused functionality.

  • jsontext.Decoder.PeekKind should return an error: PeekKind is often called in a loop to decode all elements of a JSON array or object. There is both an ergonomic and performance reason to avoid reporting an error. While returning an error may signal an error earlier, properly validating the JSON input fundamentally requires calling a ReadToken or ReadValue method until io.EOF. Thus, if Read is always eventually called, then an error during an intermediate PeekKind call is guaranteed to eventually be surfaced.

  • jsontext.Token accessors should return errors: The Int, Uint, and Float accessors are intended to be symmetric accessors to the Int, Uint, and Float constructors. So long as the kind is a JSON number, there is a reasonable way to coerce the JSON number to the closest representation for an int64, uint64, or float64. Users that desire stricter conversion can call strconv.ParseInt, strconv.ParseUint, or strconv.ParseFloat with the String accessor (e.g., strconv.ParseInt(tok.String(), 10, 16)).

  • Remove jsontext.Value.Canonicalize: The primary objection to supporting RFC 8785 for canonicalizing a JSON value is that it mangles the precision of 64-bit integers. This is due to JSON's heritage in JavaScript, which uses floating-point numbers. Rather than removing Canonicalize, we modified it to accept options, so that users could explicitly avoid the integer mangling behavior by specifying jsontext.CanonicalizeRawInts(false).

  • Enforce a max depth or max bytes: The initial v2 release will not implement this, but proposal proposal: encoding/json/jsontext: add WithByteLimit and WithDepthLimit #56733 has already been accepted. An implementation for this may happen soon after v2 lands in the standard library.

  • Immutable jsontext.Token variables: The Null, False, True, ObjectStart, ObjectEnd, ArrayStart, ArrayEnd global variables in jsontext could use constructor functions to be immutable. However, much of the Go standard library already exposes mutable globals and this does not seem to be a problem.

  • Declare jsontext.Kind constants: Using the HTTP method names as prior precedence shows that a vast majority of Go code use a string literal (e.g., "GET") over referencing the constant (e.g., http.MethodGet). It is unclear whether Kind constants will actually provide value or serve to cause greater inconsistency. The addition of constants can be separately proposed in the future.

  • Support JSON5 or JWCC: This is out of scope for v2 and can be a future proposal if those related JSON formats become sufficiently popular.

  • Make Encoder and Decoder an interface: The json.MarshalerTo and json.UnmarshalerFrom interfaces reference a concrete jsontext.Encoder and jsontext.Decoder implementation, which prevents use of a customer encoder or decoder. We considered making these an interface, but the performance cost of constantly calling a virtual method was expensive when a vast majority of usages are for the standard implementation.

Changes since proposal

Since the filing of this proposal, some changes were made in response to feedback:

  • #153 The nocase and strictcase tag options were renamed to case:ignore and case:strict.
  • #159 The ArrayStart, ArrayEnd, ObjectStart, and ObjectEnd variables in the "jsontext" package were renamed as BeginArray, EndArray, BeginObject, and EndObject to match the formal names of these tokens in RFC 8259, section 2.
  • #163 The Options argument was dropped from the MarshalerTo and UnmarshalerFrom interfaces and the MarshalToFunc and UnmarshalFromFunc functions in the "json/v2" package. Instead, an Options method is added to jsontext.Encoder and jsontext.Decoder.
  • #166 Support for the base60 format was dropped for time.Duration due to lack of popular demand.

@dsnet
Copy link
Member Author

dsnet commented Jan 31, 2025

Proposed implementation

This proposal has been implemented by the github.com/go-json-experiment/json module.

If this proposal is accepted, the implementation in github.com/go-json-experiment/json will be moved into the standard library.

We may also provide a golang.org/x/json module that contains an identical copy of the implementation so that users on older Go releases can make use of v2. This module will use type-aliases to the Go standard library if the user is compiling with a sufficiently new version of the Go toolchain.

Performance

For more information, see the github.com/go-json-experiment/jsonbench module.

The following benchmarks compares performance across several different JSON implementations:

  • JSONv1 is encoding/json at v1.23.5
  • JSONv1in2 is github.com/go-json-experiment/json/v1 at v0.0.0-20250127181117-bbe7ee0d7d2c
  • JSONv2 is github.com/go-json-experiment/json at v0.0.0-20250127181117-bbe7ee0d7d2c

The JSONv1in2 implementation replicates the JSONv1 API and behavior purely in terms of the JSONv2 implementation by setting the appropriate set of options to reproduce legacy v1 behavior.

Benchmarks were run across various datasets:

  • CanadaGeometry is a GeoJSON (RFC 7946) representation of Canada. It contains many JSON arrays of arrays of two-element arrays of numbers.
  • CITMCatalog contains many JSON objects using numeric names.
  • SyntheaFHIR is sample JSON data from the healthcare industry. It contains many nested JSON objects with mostly string values, where the set of unique string values is relatively small.
  • TwitterStatus is the JSON response from the Twitter API. It contains a mix of all different JSON kinds, where string values are a mix of both single-byte ASCII and multi-byte Unicode.
  • GolangSource is a simple tree representing the Go source code. It contains many nested JSON objects, each with the same schema.
  • StringUnicode contains many strings with multi-byte Unicode runes.

JSONv2 has several semantic changes relative to JSONv1 that impact performance:

  • When marshaling, JSONv2 no longer sorts the keys of a Go map. This will improve performance.

  • When marshaling or unmarshaling, JSONv2 always checks to make sure JSON object names are unique. This will hurt performance, but is more correct.

  • When unmarshaling, JSONv2 always performs a case-sensitive match for JSON object names.
    This will improve performance and is generally more correct.

  • When marshaling or unmarshaling, JSONv2 always shallow copies the underlying value for a Go interface and shallow copies the key and value for entries in a Go map. This is done to keep the value as addressable so that JSONv2 can call methods and functions that operate on a pointer receiver. This will hurt performance, but is more correct.

  • When marshaling or unmarshaling, JSONv2 supports calling type-defined methods or caller-defined functions with the current jsontext.Encoder or jsontext.Decoder. The Encoder or Decoder must contain a state machine to validate calls according to the JSON grammar. Maintaining this state will hurt performance. The JSONv1 API provides no means for obtaining the Encoder or Decoder so it never needed to explicitly maintain a state machine. Conformance to the JSON grammar is implicitly accomplished by matching against the structure of the call stack.

All of the charts are unit-less since the values are normalized relative to JSONv1, which is why JSONv1 always has a value of 1. A lower value is better (i.e., runs faster).

marshal-performance

When marshaling, JSONv1in2 and JSONv2 is roughly at parity in performance with JSONv1. It is faster for some datasets, yet slower in others.

Compared to high-performance third-party alternatives, the proposed "encoding/json/v2" implementation performs within the same order of magnitude, indicating near-optimal efficiency.

unmarshal-performance

When unmarshaling, JSONv2 is 2.7x to 10.2x faster than JSONv1. Most of the performance gained is due to a faster syntactic parser. JSONv1 takes a lexical scanning approach, which performs a virtual function call for every byte of input. In contrast, JSONv2 makes heavy use of iterative and linear parsing logic (with extra complexity to resume parsing when encountering segmented buffers).

Compared to high-performance third-party alternatives, the proposed "encoding/json/v2" implementation performs within the same order of magnitude, indicating near-optimal efficiency.

While maintaining a JSON state machine hurts the v2 implementation in terms of performance, it provides the ability to marshal or unmarshal in a purely streaming manner. This feature is necessary to convert certain pathological O(N²) runtime scenarios into O(N). For example, switching from UnmarshalJSON to UnmarshalJSONFrom for spec.Swagger resulted in an ~40x performance improvement. These performance gains are not unique to streaming unmarshal, but also apply to streaming marshal. The benchmark charts above do not exercise recursive MarshalJSON or UnmarshalJSON calls, and thus do not demonstrate the significant gains of a pure streaming API.

@mitar
Copy link
Contributor

mitar commented Jan 31, 2025

Thanks! This really looks great.

Does MarshalJSONTo allow custom implementation to skip marshaling the value? What happens if it does not call any method of Encoder while encoding the value of an object? Does this produce invalid JSON or does this skip the value then?

It seems to me there is no way to provide custom (i.e., non-standard) options?

@dsnet
Copy link
Member Author

dsnet commented Jan 31, 2025

Does MarshalJSONTo allow custom implementation to skip marshaling the value?

The current behavior is that MarshalJSONTo and UnmarshalJSONFrom methods are not allowed to return SkipFunc, which will result in an error. We could support this, but there are benefits and detriments. The detriment of allowing methods to return SkipFunc is that directly calling a MarshalJSONTo does not always do what you expect, and the caller is now responsible for checking the error value to perform some fallback. Also, the equivalent behavior of SkipFunc could be accomplished in the implementation of MarshalJSONTo itself:

func (v T) MarshalJSONTo(enc *jsontext.Encoder, opts json.Options) error {
    type TNoMethods T
    return json.MarshalEncode(enc, TNoMethods(v), opts)
}

It seems to me there is no way to provide custom (i.e., non-standard) options?

We're going to withhold this for the initial release of v2, but propose something as a follow-up. I still believe it's an important feature, but we're making a conscious decision to limit the scope of v2, which is already large.

If you're interested, there's a prototype API for user-defined options in go-json-experiment/json#138. One of my comments also scopes out a possible API for specifying format flags for particular types.

@a-pav
Copy link

a-pav commented Jan 31, 2025

Thank you for this great work.

Will json/v2 also include string format for unmarshaling into []byte and [N]byte types with no translation as it was discussed here?

@mitar
Copy link
Contributor

mitar commented Jan 31, 2025

@dsnet Thanks for the response. Seems reasonable.

and the caller is now responsible for checking the error value to perform some fallback

Yes, and this is why I worry that in the future we will never be able to add this in backwards compatible way, unless this is added from the beginning. The rest can be extended easier.

My design goal is really that MarshalJSONTo should be able to reimplement everything struct tags can achieve. Currently you cannot replicate omitempty for example.

(Oh, are all struct tags for a given field even available through json.Options? I guess this can be added in the future? But being able to have custom MarshalJSONTo on your struct and being able to know if the user of your struct used omitempty for a field with your struct as a type is really a useful feature.)

the equivalent behavior of SkipFunc could be accomplished in the implementation of MarshalJSONTo itself:

I must say I do not get how your example can simulate behavior of, for example, omitempty. It just does the default implementation of JSON marshal for the value, but it does not allow one to signal that the value should be omitted or not. Maybe I am missing something?

@duckbrain
Copy link

At present, there are no constants declared for individual kinds since each value is humanly readable. Declaring constants will lead
to inconsistent usage where some users use the 'n' byte literal, while other users reference the jsontext.KindNull constant. This is
similar problem to the introduction of the http.MethodGet constant, which has led to inconsistency in codebases where the "GET"
literal is more frequently used (~75% of the time).

I don't feel like this comparison is fair since:

  • net/http doesn't define a type Method string; http.MethodGet isn't a typed constant
  • The enumerated values are literally the value of the HTTP method vs a single-character representation used here.

I don't think it's difficult to learn/understand, but I'd prefer constants. I tend to prefer them so gopls recommends correct values and to avoid accidental misspellings that the compiler can catch.

@prattmic

This comment has been minimized.

@timbray
Copy link

timbray commented Jan 31, 2025

The design seems sound, although I'd have hoped for fewer options.

On the performance front, what do the benchmarks show about memory consumption and gc behavior? I found v1 to use unreasonable amounts of memory, and this included the streaming interface.

@davecb
Copy link

davecb commented Jan 31, 2025

Yay , you implemented v1 in terms of v2 !!!

In case you think that's minor, updaters and downdaters are much-reinvented improvements, used in Multics, Solaris and GNU libc. All too many other folks don't and suffer from "flag days" when everything has to change at once (:-))

For more detail on why this is cool, see Paul Stachour's "Jack" article , https://cacm.acm.org/practice/you-dont-know-jack-about-software-maintenance/

@nemith
Copy link
Contributor

nemith commented Jan 31, 2025

While I agree with the reasoning of the WithLegacySemantics suffix on options I have concerns over it use

  1. The options are now much more ambiguous in what they do. If the more specific option names were nonoptimal cause they didn't cover all behaviors the new ones are the opposite and now loose a lost of semantic meaning.
  2. Many options have this suffix which means that at a cursory glance it may be easy to mix them up or look over them.

Given the options I would rather have a more specific name that may include other behaviors than a bunch of similarly named ambiguous options that really require historical context to fully understand.

However given these are mostly used for transitional and discourages for general use maybe that is ok?

@dsnet dsnet added the v2 An incompatible library change label Jan 31, 2025
@dsnet

This comment has been minimized.

@dsnet
Copy link
Member Author

dsnet commented Jan 31, 2025

what do the benchmarks show about memory consumption and gc behavior?

@timbray: The v2 implementation allocates less than most alternatives:
Image

This is most likely due to v2's use of a string intern cache.

Aside from strings, unfortunately most other data structures fundamentally have to be allocated. The memory regions discussion #70257 could provide a way to batch allocations together in a single region, which is freed all together.

@dsnet
Copy link
Member Author

dsnet commented Jan 31, 2025

@nemith, the number of legacy options makes me sad as well. When we first started to implement v1 in terms of v2, we thought we could have just a few targeted options with clear names, but it became increasingly clear that there were too many odd behaviors of v1 to have individual options for. Many of these behaviors are arguably bugs, but have practically become stable behavior in v1 as a result of Hyrum's Law.

Of all the options to achieve 100% known compatibility, they roughly fell into three categories:

  1. Features that were reasonable for someone to want, but not as the default behavior (e.g., jsonv2.Deterministic, jsonv2.FormatNilMapAsNull, jsonv2.FormatNilSliceAsNull, jsonv2.MatchCaseInsensitiveNames, jsontext.AllowDuplicateNames, jsontext.AllowInvalidUTF8, jsontext.EscapeForHTML, and jsontext.EscapeForJS). These were all given specific names and are declared in the "json/v2" or "jsontext" packages for easy accessibility.

  2. Behavior that are arguably bugs, unspecified, or mostly backwards compatible (e.g., jsonv1.CallMethodsWithLegacySemantics, jsonv1.EscapeInvalidUTF8, jsonv1.MatchCaseSensitiveDelimiter, jsonv1.MergeWithLegacySemantics, jsonv1.ReportErrorsWithLegacySemantics, jsonv1.StringifyWithLegacySemantics, jsonv1.UnmarshalArrayFromAnyLength). I suspect that 99% of use cases will not be affected by these options.

  3. Functionality that notably changed in v2, but there is some backwards compatible change that type authors can make to make the representation identical under both v1 or v2:

    • OmitEmptyWithLegacyDefinition controls the behavior of omitempty, where v1 and v2 diverge for Go bools, numbers, pointers, and interfaces. These can be migrated to use omitzero, which will behave the same way as legacy omitempty.

    • FormatBytesWithLegacySemantics controls several buggy behavior with binary encoding and also controls how [N]byte are serialized. Using the format:array option, type authors could make byte arrays serialize as they do in v1.

    • FormatTimeWithLegacySemantics controls a parsing bug with RFC 3339 and also controls how time.Duration is serialized. Using the format:nano option, type authors could make durations serialize as they do in v1.

Options in category 3 could use further refinement. For example, it might make sense to split:

  • FormatByteArraysAsArrays out from FormatBytesWithLegacySemantics
  • FormatDurationAsNanos out from FormatTimeWithLegacySemantics

@dsnet
Copy link
Member Author

dsnet commented Feb 1, 2025

Will json/v2 also include string format for unmarshaling into []byte and [N]byte types with no translation

@a-pav In the end we decided to focus on what's blocking v2 from getting the stdlib, so we made a conscious decision not to include that for the initial release. Supporting format:string for []byte and [N]byte types is worth proposing soon after as a follow-up.

I'll update the "Changed from discussion" to include a sub-section on changes that we did not end making.

@dsnet dsnet added the LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool label Feb 2, 2025
@willfaught
Copy link
Contributor

willfaught commented Feb 2, 2025

It seems odd for Encoder method names to use "Write" instead of "Encode", and for Decoder method names to use "Read" instead of "Decode":

func (*Encoder) WriteToken(Token) error
func (*Encoder) WriteValue(Value) error

func (*Decoder) ReadToken() (Token, error)
func (*Decoder) ReadValue() (Value, error)

because encoding/json uses "Encode" and "Decode":

func (enc *Encoder) Encode(v any) error

func (dec *Decoder) Decode(v any) error

and encoding/xml does too:

func (enc *Encoder) Encode(v any) error
func (enc *Encoder) EncodeToken(t Token) error

func (d *Decoder) Decode(v any) error

Instead, the V2 declarations now generally use the To and From suffixes to indicate that they support streaming. This follows after the convention established by io.WriterTo and io.ReaderFrom.

WriterTo and ReaderFrom push to, or pull from, entire byte streams. That seems useful for JSON marshaling too. Users have already written MarshalJSON/UnmarshalJSON methods, and enabling users to add a byte stream version of those methods alongside would be an nice way to opt into better performance with little effort. The MarshalJSON/UnmarshalJSON implementations could just be calls to the stream version with a bytes.Buffer.

Perhaps something like:

MarshalJSONToWriter(io.Writer, Options) error

UnmarshalJSONFromReader(io.Reader, Options) error

Then we'd have:

MarshalJSONToEncoder(*jsontext.Encoder, Options) error

UnmarshalJSONFromDecoder(*jsontext.Decoder, Options) error

This is a similar problem to the introduction of the http.MethodGet constant, which has led to inconsistency in codebases where the "GET" literal is more frequently used (~75% of the time).

I would guess that had more to do with enabling custom or future standard HTTP methods.

Having to remember " is for strings and 0 is for numbers seems error-prone. Having declared constants seems safer. I suspect most users won't write kind values by hand. Personally, I would end up declaring my own constants just to be safe. The library should save me the trouble. If someone wants to take their chances with literals, they can still do that.


There appears to be a misspelling in the name jsonflags.WithinArshalCall at https://github.com/go-json-experiment/json/blob/4e0381018ad6/jsontext/encode.go#L105C21-L105C47.

@willfaught
Copy link
Contributor

The MarshalJSONTo, UnmarshalJSONFrom, MarshalToFunc, and UnmarshalFromFunc methods and functions take in a singular Options value instead of a variadic list because the Options type can represent a set of options.

Why do these take an Options? Is it just in case they invoke the JSON library themselves? If so, what is an example of when that would be useful? And if so, why not have Encoder.Options() Options and Decoder.Options() Options instead to avoid the Options parameter?

// DefaultOptionsV2 is the full set of all options that define v2 semantics.
// It is equivalent to all options under [Options], [encoding/json.Options],
// and [encoding/json/jsontext.Options] being set to false or the zero value,
// except for the options related to whitespace formatting.

Does this mean the whitespace formatting options are all true by default?

Which options are related to whitespace formatting? I see comments like // affects marshal and unmarshal and // affects encode only, but none flagging whitespace behavior. In order to understand what this behavior is, I have to read the documentation for every option, and even then, perhaps have to guess.

Is it possible to have all default values be zero values? If so, then we wouldn't need this declaration.

// JoinOptions composes multiple options values to together:
// out := JoinOptions(opts...)
// which is analogous to making a new map and copying the options over:
// out := make(Options)

Apparently Options is a map? Or is that just for illustrative purposes? What is the underlying type of Options?

func NewEncoder(io.Writer, ...Options) *Encoder
[...]
func UnmarshalDecode(in *jsontext.Decoder, out any, opts ...Options) error

There are a lot of functions that take variadic Options. Why not a single Options, since multiple Options can be combined with JoinOptions, like for MarshalJSONTo, UnmarshalJSONFrom, MarshalToFunc, and UnmarshalFromFunc? Conversely, why not just deal with []Option everywhere, like functional options?

@willfaught
Copy link
Contributor

jsontext.SyntacticError.JSONPointer and json.SemanticError.{JSONPointer,JSONKind,JSONValue} seem to stutter. Dropping the "JSON" wouldn't be confusing.

type Options = jsonopts.Options

This declaration in jsonv1 doesn't seem to be used.

// Options configure Marshal, MarshalWrite, MarshalEncode,
// Unmarshal, UnmarshalRead, and UnmarshalDecode with specific features.
// Each function takes in a variadic list of options, where properties set
// in latter options override the value of previously set properties.

Missing square brackets around declaration names.

"Options configure" seems like a number disagreement between subject and verb. Options is a singular type, so it should be "Options configures".

func MarshalToFunc[T any](fn func(*jsontext.Encoder, T, Options) error) *Marshalers
func UnmarshalFromFunc[T any](fn func(*jsontext.Decoder, T, Options) error) *Unmarshalers

It seems odd for Options to not be the second param in fn, like in MarshalJSONTo and UnmarshalJSONFrom.

@willfaught
Copy link
Contributor

func DefaultOptionsV2() Options

I don't see a DefaultOptionsV1 in jsonv2, so I don't see a need for the V2 suffix in jsonv2. Users could use qualified imports to distinguish between the two declarations in jsonv1 and jsonv2.

func DefaultOptionsV1() jsonopts.Options

Exporting this perpetuates bad JSON and bad behavior with jsonv2, and complicates the public options API. In my opinion, there shouldn't be a way to enable this behavior in jsonv2 unless it's coming from jsonv1 under the hood. Options like jsontext.AllowInvalidUTF8 should be moved to an internal package shared by jsonv1, jsonv2, and jsontext to hide them. That way, jsonv1 gets the improved performance of jsonv2, the jsonv1 and jsonv2 public API isn't cluttered by compatibility concerns, and jsonv1 users are motivated to upgrade to jsonv2 for better behavior and features.

@willfaught
Copy link
Contributor

func AllowDuplicateNames(v bool) Options // affects encode and decode

How would this affect encoding?

@willfaught
Copy link
Contributor

willfaught commented Feb 3, 2025

Why aren't https://github.com/go-json-experiment/json/blob/master/arshal*.go files spelled marshal*.go?


Perhaps GitHub IntelliSense is failing me, but I don't see that the Value methods that take Options are used in the implementation, so I assume they're meant for users only. jsonv1.RawMessage is changed to alias jsontext.Value, so they seem to be analogous. I've never used RawMessage myself, but its documentation says it's for delaying a JSON decoding, or precomputing a JSON encoding, in the context of being a Marshaler/Unmarshaler value. I don't see how the above-mentioned Value methods relate to those use cases; they seem to only be for encoding. When would we want to use them instead of using the Value with an Encoder or Decoder?

@willfaught
Copy link
Contributor

The names in encoding/json/jsontext stutter, and don't seem to conform to the stdlib pattern of using directories as namespaces, even three directories deep:

  • crypto/tls/fipsonly
  • crypto/x509/pkix
  • database/sql/driver
  • go/build/constraint
  • go/doc/comment
  • image/color/palette
  • net/http/cgi
  • net/http/cookiejar
  • net/http/fcgi
  • net/http/pprof
  • net/rpc/jsonrpc
  • text/template/parse

The only exceptions to this pattern I could find are HTTP-related, which were probably authored at around the same time:

  • net/http/httptest
  • net/http/httptrace
  • net/http/httputil

Yet even those names are the minority of HTTP sub-package names. No "http" prefixes here:

  • net/http/cgi
  • net/http/cookiejar
  • net/http/fcgi
  • net/http/pprof

I don't see why encoding/json/jsontext is idiomatic, or qualifies for an exception. What is the reasoning for using encoding/json/jsontext that would not also require encoding/json to be encoding/encodingjson, or encoding/json/jsontext to be encoding/encodingjson/encodingjsontext? What would justify it being an exception to the stdlib pattern that would not also apply to text/template or go/doc/comment? The one exception would seem to be net/rpc/jsonrpc (except for the parent name "rpc" being at the end instead of the beginning), but it's not an exception because it implements something actually called "JSON-RPC".

encoding/json/text doesn't stutter, it fits the stdlib pattern, and imports of it can be qualified to avoid conflicts with other identifiers.

@willfaught
Copy link
Contributor

It seems like jsontext belongs under encoding/json/v2. It seems strange for v1 to have both encoding and marshaling, but for v2 to have only marshaling. If encoding compatibility needs to be broken in the future, we would end up with encoding/json/jsontext/v2 and encoding/json/v3, and it would be unclear that they both go together. It seems better to version the encoding and marshaling together with encoding/json/v2/jsontext.

@willfaught
Copy link
Contributor

The package name "jsontext" doesn't seem right to me. JSON is text. It would be like naming a package "jpegbinary". Perhaps "syntax" or "jsonsyntax" would be better than "text" or "jsontext".

encoding/json/jsontext having the encoding code breaks the stdlib pattern of encoders being in an encoding (possibly versioned) child package. It seems wrong to reach for an encoding/foo package for marshaling, and an encoding/foo/bar package for encoding. If anything, to match the stdlib pattern, it seems like we should have encoding/json/v2 be the encoding code and encoding/json/marshaling be the marshaling code, but then the marshaling import path is the longer one, which is what 99.999999% of users will use, so that doesn't make sense either.

Why are we splitting the encoding and marshaling code into separate packages, again? What I mean is, where is the concrete evidence that justifies the split in terms of who asked for it, the scenarios they need it for, the min/avg/max space savings in those scenarios, how important those savings are in those scenarios, etc. @dsnet said in the GitHub discussion for this that his employer Tailscale needs it to avoid the slightly larger binary size that using reflection causes, but one employer saving a few megabytes of memory is alone hardly justification for the cost of splitting the encoding and marshaling code. Having all the code together is easy to use and fits the stdlib pattern, so it seems to me there needs to be a very compelling reason for the interests of the majority of the Go Community to be served by doing this split. As shown above, this design isn't all upside, it's striking a trade-off, and for us to judge whether it's the right trade-off, we need that concrete evidence. (Apologies if I missed it.)

Where is the line drawn for using reflection? Is reflect a poison pill? Is reflect code to be avoided as much as possible? It appears that net/http uses reflect; was that a mistake? Was it a mistake to not put all marshaling code into separate marshaling/* packages from the outset? If the Go Team really does go for this split, what is their guidance for best practices regarding designing APIs and package boundaries around the impact that reflection has on surrounding code? I'd never heard of wanting to avoid the size cost of reflection until the GitHub discussion for this (but perhaps that's just me).

@josharian
Copy link
Contributor

Love it.

I’ve painted myself into naming corners using the word Legacy. I’d suggest s/Legacy/V1/g. Shorter and more precise.

@dsnet
Copy link
Member Author

dsnet commented Feb 3, 2025

Hi @willfaught, I appreciate your great enthusiasm in providing feedback. I believe it be more productive to condense your thoughts down to the most significant ideas so that we don’t overwhelm the discussion. 20+ thoughts spread throughout 9 distinct posts is challenging for others to follow and engage with even they are worth discussing. We appreciate your thoughts, but not all thoughts are equally fruitful to discuss.

This particular proposal is paired with a working prototype, so some of the questions could be answered on your own by running some code in a playground. Suggestions about spelling errors or nuances of documentation are better filed at github.com/go-json-experiment/json. Some of the points you raised have already been addressed in the prior discussion (#63397). While ideas may have merit, engineering is about tradeoffs and so we sometimes still choose to go down a different path even when presented with valid counter-arguments.

For the sake of this proposal, we should focus on API that cannot be changed once this has been proposed and merged. I recommend choosing a small set of issues that you believe are the most significant and bring the most value. You’re welcome to re-raise a concern already discussed, but let’s aim to keep it a singular issue or two that you believe is most important.

In general, it’s most effective to keep a single comment to a single cohesive thought. GitHub supports emojis, which allows others to signal whether they agree (or disagree) with the idea. Multiple thoughts per comment confuses this reaction mechanism. For example, @prattmic’s comment on naming of nocase and strictcase was a concise and singular idea, making it possible for people to 👍 the comment signaling support for the suggestion. If someone has already made a similar argument to what you’re about to make, then it’s best to upvote the earlier argument rather than to reiterate the same thing.

@nemith
Copy link
Contributor

nemith commented Feb 14, 2025

From the example for Number I see this

   			if dec.PeekKind() == '0' {
   				*val = jsontext.Value(nil)
   			}

Which makes me even more think that constants are the way to go. Reading this as "peeking at the kind to see if it is zero" and then taking a couple of seconds to realize that 0 mean number is just too confusing for me. This is made worse that 0 (not '0' an invalid kind.

I still find the analogy to HTTP methods to be flawed as the use in spec and intentional expansion (new methods are allowed in HTTP are new tokens going to ever be in JSON?) to not be the same.

If the worry is abuse in use then just make the Kind completely opaque? (Edit: It is also unclear why Kind needs to be byte character at all. I am not against it and it feels clever but the HTTP argument goes away when they are represented by anything else)

I am also already missing json.Number (although it seems like it can be reimplemented pretty straightforwardly)

@dsnet
Copy link
Member Author

dsnet commented Feb 14, 2025

I just filed #71756 for adding Kind constants. Let's continue any further discussion there. Thanks.

@seankhliao
Copy link
Member

#71756 raises the prospect of another package. Can enough declarations move such that the MarshalerTo / UnmarshalerFrom interface methods take an interface instead of a concrete Encoder / Decoder?

@dsnet
Copy link
Member Author

dsnet commented Feb 14, 2025

I'm not sure I see the connection with splitting "jsontext" apart and whether MarshalerTo should use an encoder interface instead of a concrete *jsontext.Encoder. The primary issue there was regarding performance since the Write methods are hotly called (where the calling overhead is often more bytes than the JSON payload itself!). An interface is great at allowing for multiple implementations, but makes them all equally slower since every single method is now a virtual method call. A vast majority of interface implementations will probably be *jsontext.Encoder and so it seems like an unfortunate hit to performance for almost all users for some benefit in flexibility. Also, the compiler can no longer prove that arguments that go through a interface method do not escape, further hurting performance.

In the future, we could technically still support third-party encoder implementations by allowing users to register a custom implementation into the jsontext.Encoder implementation. When jsontext.Encoder.WriteToken is called, it checks (with a single nil function pointer check) if there is a custom implementation registered and calls that. This approach would keep the common-case usage with *jsontext.Encoder fast, while still supporting custom implementations (albeit with a somewhat strange API).

Thus far, I haven't been convinced of the need for third-party implementations to justify that the flexibility wins outweights the performance hit of interfaces.

@seankhliao
Copy link
Member

I was thinking less about third party implementations, and more about making the interfaces implementable without pulling in so many dependencies.

@dsnet
Copy link
Member Author

dsnet commented Feb 14, 2025

I see, so in relation to the earlier discussion that "time" should directly implement MarshalerTo? I can see a benefit to that. While it would be nice if "time" could properly implement JSON itself by only referencing a lightweight interface (with minimal dependencies), is that benefit worth the performance costs? I think probably not.

@huww98
Copy link
Contributor

huww98 commented Feb 16, 2025

I'm trying to implement UnmarshalerFrom for my struct. It seems that I cannot reach comparable performance without access to the DisableNamespace method. Can we somehow expose this from jsontext package? So that I can implement my own more efficient duplicate name check in UnmarshalFrom method. User can set AllowDuplicateNames, but that will affects all types.

I propose adding this Method to jsontext.Decoder:

// DisableDuplicateNameCheck disables the duplicate name check for the current decoding object for better performance.
// Call this just after reading '{' token.
// Returns whether the caller should run its own check, depending on whether [AllowDuplicateNames] is set.
DisableDuplicateNameCheck() bool

@willfaught

This comment has been minimized.

@willfaught
Copy link
Contributor

jsontext.SyntacticError.JSONPointer and json.SemanticError.{JSONPointer,JSONKind,JSONValue} seem to stutter.

A SemanticError describes an error bridging two different type systems (i.e., Go and JSON). The Go or JSON prefix in the field name is to be explicit about which type system some error context is stemming from. Given that SemanticError already uses the JSON prefix, we made SyntacticError be consistent in its naming of fields.

@dsnet It seems to me that the types of the fields in SemanticError make it clear which type system they pertain to:

Pointer jsontext.Pointer
Kind jsontext.Kind
Value jsontext.Value
Type reflect.Type

By the way, the type of JSONKind in the proposal is Kind instead of jsontext.Kind.

@willfaught
Copy link
Contributor

I believe it be more productive to condense your thoughts down to the most significant ideas so that we don’t overwhelm the discussion. 20+ thoughts spread throughout 9 distinct posts is challenging for others to follow and engage with even they are worth discussing. We appreciate your thoughts, but not all thoughts are equally fruitful to discuss.

For the sake of this proposal, we should focus on API that cannot be changed once this has been proposed and merged. I recommend choosing a small set of issues that you believe are the most significant and bring the most value. You’re welcome to re-raise a concern already discussed, but let’s aim to keep it a singular issue or two that you believe is most important.

@dsnet It strikes me as improper for you to moderate, shape, or discourage the discussion of your own proposal because you have a conflict of interest. For the proposal process to be perceived as fair, productive, and useful, it should avoid the appearance of impropriety. Frankly, I was so dismayed by your behavior that I walked away from this discussion a couple weeks ago, and I regret participating in it. I don't plan to stay involved past wrapping up this comment.

Some of the points you raised have already been addressed in the prior discussion (#63397).

The prior GitHub Discussion wasn't part of the proposal process. That was for you to gather early feedback. This GitHub Issue is for us, the Go Community, to vet the final proposal, and part of that vetting is pointing out issues that were raised before that haven't been properly addressed in some peoples' estimation. Many people here may have never read that GitHub Discussion, so listing those issues here is a service to them, just as listing the changes to the proposal since the Discussion is a service to those who had.


(The rest of this comment was drafted two weeks ago.)

In general, it’s most effective to keep a single comment to a single cohesive thought. GitHub supports emojis, which allows others to signal whether they agree (or disagree) with the idea. Multiple thoughts per comment confuses this reaction mechanism. For example, #71497 (comment) on naming of nocase and strictcase was a concise and singular idea, making it possible for people to 👍 the comment signaling support for the suggestion. If someone has already made a similar argument to what you’re about to make, then it’s best to upvote the earlier argument rather than to reiterate the same thing.

For the most part, I split my comments to facilitate voting on the suggestions/ideas, but it looks like there were a couple comments that mistakenly grouped stuff together they shouldn't have. The questions and corrections were batched together because they're not meant to be voted on. (I note that you still haven't answered the questions.) Some things were posted as they came up while I was drafting other comments. The comments weren't all written or submitted at once.

Perhaps I should just put every separate thought in its own comment. That would be simpler, but I hate the thought of the noise.

This particular proposal is paired with a working prototype, so some of the questions could be answered on your own by running some code in a playground.

I'm not sure what you're referring to. Can you be specific? If I could have knowingly run code to answer a question, I would have.

Suggestions about spelling errors or nuances of documentation are better filed at github.com/go-json-experiment/json.

The proposal says that the prototype will be used, presumably as-is. For all I know, issues filed there now will never be addressed. This seems to be the appropriate place to report them now.

Some of the points you raised have already been addressed in the prior discussion (#63397). While ideas may have merit, engineering is about tradeoffs and so we sometimes still choose to go down a different path even when presented with valid counter-arguments.

Can you identify which specific points you're referring to, and link to or quote how you addressed them, instead of painting all the points with the same brush? For all we know, you may be mistaken. The community needs to be able to judge for themselves. Unfortunately, your assurances that you've struck the right design and engineering trade-offs aren't convincing in and of themselves.

@timbray
Copy link

timbray commented Feb 17, 2025

It's called "jsontext" because it literally handles "JSON text", which is specifically called out as a term in RFC 8259, section 1.2.

This was my response:

I haven't read the JSON RFCs, and I've only skimmed 8259 and searched for uses of "JSON text" in it, but it seems to me

Speaking as the editor of 8259, although most credit has to go to Doug Crockford who wrote the original RFC4627 many years ago… the spec says what it says, not what Doug or I thought we meant, but FWIW I think that “JSON text” refers to the bits on the wire or the bytes on the disk, which earn that name by conforming to the grammar productions in the RFC.

Given that, "jsontext" seems like a perfectly appropriate name for the lowest-level API used for addressing those bytes & bytes.

@doggedOwl
Copy link

doggedOwl commented Feb 17, 2025

Many people here may have never read that GitHub Discussion, so listing those issues here is a service to them,

@willfaught the discussion is there for anyone interested to read. There is no need to replicate everything here, either wise there is no need for a discussion before a proposal in the first place.
And I agree too that your initial flood of comments many of them just reiterating the same points you or others had raised in the discussion is not helpful. The fact that the answers where not to your satisfaction does not mean that you need to bring them again and again when the discussion in general reached a very satisfactory equilibrium and has been active for almost two years now.

@burdiyan
Copy link

I haven’t followed the full conversation leading up to this proposal, so apologies if I’m bringing up something already discussed.

I really like this proposal! But I wonder—could this be an opportunity to introduce a more generalized approach to marshaling/unmarshaling data in Go? Something like Rust’s Serde (not a big Rust fan myself, but having a standard serialization framework is pretty nice).

For example, there could be a generic way to (un)marshal data, similar to what’s outlined here, taking an Encoder + options, but making the Encoder/Decoder more general-purpose, such that various encoding formats (JSON, CBOR, etc.) could implement their own. That way, different formats could share the same reflection machine, walking of the structs, etc.

On top of that, maybe encoders could use a well-known registry interface, so custom encodings for types wouldn’t require modifying those types. That’d make it easier to define custom codecs (e.g., for time.Time) without conflicts across packages. A program could set up its own codec early on and use it consistently for marshaling/unmarshaling.

@mvdan
Copy link
Member

mvdan commented Feb 17, 2025

@burdiyan this was briefly discussed in #63397 (comment), but it didn't really go anywhere - primarily as it's not clear if such an approach would work in Go, or what it would look like. It seems to me like it would need to be a separate proposal, as it would affect more than just JSON if accepted.

And ideally we don't hold up json/v2 for another year or two while we figure out if such an approach is viable.

@willfaught
Copy link
Contributor

willfaught commented Feb 18, 2025

@doggedOwl I'm going to reply because you bring up points I forgot to make in my last comment.

@willfaught the discussion is there for anyone interested to read. There is no need to replicate everything here, either wise there is no need for a discussion before a proposal in the first place.

That isn't compatible with the Go proposal process. Preliminary, informal discussions elsewhere on GitHub, Reddit, Slack, Discord, Twitter, email, etc. don't count, and pose an undue burden on people trying to participate as commenters in the proposal process.

And I agree too that your initial flood of comments many of them just reiterating the same points you or others had raised in the discussion is not helpful. The fact that the answers where not to your satisfaction does not mean that you need to bring them again and again when the discussion in general reached a very satisfactory equilibrium and has been active for almost two years now.

To my recollection, every point I raised here except for one (the name of jsontext) was new. I'd never brought them up before, and neither had others. That's why I asked @dsnet to cite where he addressed them previously: because I didn't know what he was talking about. What he said was (I assume unintentionally) misleading, and many readers won't go back and check the old GitHub discussion for themselves, which is why the first quotation above is a bad idea.

@josharian
Copy link
Contributor

[...] because he can't. He misled you [...]

This is a serious accusation. I have known Joe personally and professionally for many years. He is serious, impeccably honest, and dedicated. (And, I will note, is doing this work as a service to the community, not as his job.)

Please rethink your tone.

@willfaught
Copy link
Contributor

willfaught commented Feb 18, 2025

@josharian What I meant was that what he said was misleading. "To mislead" can be done with malicious intent or unintentionally. I suggest you remember that part of the Go code of conduct is to be charitable. I've never questioned his motives. I've thanked @dsnet more than once for his hard work on this project, and I would have thanked him once more in my planned last feedback comment here, but I never got there. I apologize if my phrasing implied malicious intent.

@ianlancetaylor
Copy link
Member

This conversation has unfortunately gotten heated. I think everybody needs to step back and focus on any remaining technical details, not on discussions of what was said before and when, not on discussions for how to approach this proposal, not on requests for citations. Thanks.

@andig
Copy link
Contributor

andig commented Mar 5, 2025

How far does this proposal address cleaning up invalid json tags in the standard library?
One notable example is oauth2.Token with it's time struct marked as

Expiry time.Time `json:"expiry,omitempty"`

@dsnet
Copy link
Member Author

dsnet commented Mar 5, 2025

@andig, perhaps that's better addressed by #51261?

For existing usages, it seems these should be fixed by having such cases migrate to omitzero rather than muddle the meaning of omitempty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal v2 An incompatible library change
Projects
None yet
Development

No branches or pull requests