Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/json/jsontext: add constants for each Kind #71756

Open
dsnet opened this issue Feb 14, 2025 · 24 comments
Open

proposal: encoding/json/jsontext: add constants for each Kind #71756

dsnet opened this issue Feb 14, 2025 · 24 comments
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Feb 14, 2025

Proposal Details

This is a sub-issue of the "encoding/json/v2" proposal (#71497).

Here, we discuss whether to add constants for each Kind.
This builds on top of the v2 API and does not block the acceptance of v2.

Specifically, this issue would result in the following changes to the API:

  package jsontext

  type Kind byte

  const (
+ 	KindNull        Kind = 'n'
+ 	KindFalse       Kind = 'f'
+ 	KindTrue        Kind = 't'
+ 	KindString      Kind = '"'
+ 	KindNumber      Kind = '0'
+ 	KindBeginObject Kind = '{'
+ 	KindEndObject   Kind = '}'
+ 	KindBeginArray  Kind = '['
+ 	KindEndArray    Kind = ']'
  )

The choice of using a Kind prefix matches the http.MethodGet-like constants. We need at least a prefix or a suffix since Null, False, True, etc. conflict with the existing Token declarations.

Analysis of this change:

  • ✔️ Always using the constants protects against accidental typos with literals (e.g., 0 vs '0'). This problem could be alleviated by govet or staticcheck.
  • ❌ The literals are already sufficiently readable that many authors will avoid referencing the constants. For example, 75% of code avoid http.MethodGet literal and instead just use "GET". It is almost certain that the world will be inconsistent about usage. In the case of "jsontext", it is notably more effort to reference jsontext.KindNull rather than use the 'n' literal.

Alternatives

What about making Kind an opaque type?

We could, but there was a performance hit since you are no longer performing a comparison against a constant or
constant literal. For example, let's suppose we had:

type Kind struct { k kind }

var (
	KindNull  Kind = Kind{'n'}
	KindFalse Kind = Kind{'f'}
	KindTrue  Kind = Kind{'t'}
	...
)

Opaque types cannot be declared as constants. Referencing a global variable requires referencing the memory location for that variable, which is slower.

What about a separate package?

It should be noted that this could also just be a third-party package:

package jsonkind // could be even shorter as jsonk or even jk

const (
	Null        jsontext.Kind = 'n'
	False       jsontext.Kind = 'f'
	True        jsontext.Kind = 't'
	...
)

which is shorter to reference as jsonkind.Null instead of jsontext.KindNull.

@dsnet dsnet added the Proposal label Feb 14, 2025
@gopherbot gopherbot added this to the Proposal milestone Feb 14, 2025
@dsnet dsnet added the LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool label Feb 14, 2025
@mateusz834
Copy link
Member

I am in favour of this change (with the Kind prefix), because these kinds of constants are autocompleted by gopls, thus it is easier to work with them.

@mateusz834
Copy link
Member

Personally, i would change the names to:

- KindBeginObject Kind = '{'
- KindEndObject   Kind = '}'
- KindBeginArray  Kind = '['
- KindEndArray    Kind = ']'
+ KindObjectBegin Kind = '{'
+ KindObjectEnd   Kind = '}'
+ KindArrayBegin  Kind = '['
+ KindArrayEnd    Kind = ']'

@dsnet
Copy link
Member Author

dsnet commented Feb 14, 2025

RFC 8259, section 2 uses the terms begin-array, begin-object, end-array, and end-object as the formal grammatical names for these constructs. We aim to use proper JSON terminology if relevant.

As an aside, the term "kind" is unique to the Go "jsontext" package, but we need a way to describe the kind of a token. The term "type" would be incorrect since that's used by the RFC to describe null, strings, numbers, booleans, objects, and arrays (i.e., all complete values).

@nemith
Copy link
Contributor

nemith commented Feb 14, 2025

For example, 75% of code avoid http.MethodGet literal and instead just use "GET". It is almost certain that the world will be inconsistent about usage. In the case of "jsontext", it is notably more effort to reference jsontext.KindNull rather than use the 'n' literal.

I don't think this is a fair comparison and trying to abstract lessons from that doesn't feel right for this situation.

  • HTTP RFC doesn't actually limit the methods being used. So the API must be able to support additional methods so string MUST be the underlying type. There is no great way in making it opaque. Comparison with the jsontext.Token where the value is from a limited set and shouldn't ever support additional values. You could make the types anything besides characters and things will works just fine (iota with int8?)
  • HTTP Methods in their non-const form are very readable and, I dare say, more readable than the constant versions. "GET", "POST" , "DELETE" are whole words, capitalized and are exactly map to what is on the wire. Compare that to jsontext.Token and the only 4([,],{,}) out of the 7 map directly. The other are contrived replacements which require some careful document reading to figure out. "GET" means HTTP Get is different than '0' means number.

Also on the "effort" front are we talking just about number of character to type? Code is read more than it's is written. 'n' may be low effort to write than jsontext.KindNull but much higher effort to read and comprehend which, in my opinion, is much more important.

It should be noted that this could also just be a third-party package.

That would increase the representations rather than shrink them which seems to be the exact argument against making constants?

As for a complete separate stdlib package jsonkind I am not in huge favor of packages for pure namespacing and seems against the rest of packages in the stdlib.

@gabyhelp
Copy link

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@neild
Copy link
Contributor

neild commented Feb 14, 2025

If we want opaque(-ish) constants, we could define:

type Kind byte

const (
  kindNull = Kind(' ')
  // ...
)

const (
  KindNull = kindNull
  // ...
)

...although if we do that, I think the values should be integers (iota+1, etc.) rather than characters.

Either way, I think we should define these constants:

  • I find the argument for better code completion compelling.
  • The character values are cute, but not obvious. KindNull is obvious, 'n' is not.
  • While Go does not have support for enums, we have a convention for how to define one. I don't know if any linters will detect an enum defined in the conventional form as a set of values in a single const ( ) block and warn if a switch statement doesn't exhaustively cover the options, but it's the sort of thing you could do. Using character values would subvert this.
  • ...and even in the absence of such a linter, I can hit "go to definition" on KindNull and have my editor take me to an exhaustive list of cases that need covering.

I also agree with @nemith's comments regarding http.MethodGet. (I started writing something, but they said it better than I was going to.)

@mateusz834
Copy link
Member

Yeah, I think that if we go with constants it should be using iota.

@nemith
Copy link
Contributor

nemith commented Feb 14, 2025

While Go does not have support for enums, we have a convention for how to define one. I don't know if any linters will detect an enum defined in the conventional form as a set of values in a single const ( ) block and warn if a switch statement doesn't exhaustively cover the options, but it's the sort of thing you could do. Using character values would subvert this.

https://golangci-lint.run/usage/linters/#exhaustive

Now that being said the use of exhaustive switch statement outside of the stdlib is probably limited. Most cases you are going to look for a specific token and if not move on.

@dsnet
Copy link
Member Author

dsnet commented Feb 14, 2025

...although if we do that, I think the values should be integers (iota+1, etc.) rather than characters.

We tried that in the past, and renumbering the constants was also a performance hit since it involved many unpredictable branches. We could use a 256B lookup table, but I'm hesitant about using LUTs to solve re-mapping as it's adding more pressure on the data cache. The current approach is fast since it's just grabbing the first byte of the token (and only needs to branch for numbers).

@jimmyfrasche
Copy link
Member

If it were just "{}[] I'd be fine without constants since there's a 1:1 mapping with the actual token. The other kinds may have sensible one character names assigned but they are not obvious. That is not readable. Constants are needed here for clarity.

@nemith
Copy link
Contributor

nemith commented Feb 14, 2025

Also for the record I didn't much care that there was no Kind constants and agreed with the initial assessment that they were not needed. That was until I went to read this example: https://pkg.go.dev/github.com/go-json-experiment/json#example-WithUnmarshalers-RawNumber

				if dec.PeekKind() == '0' {
					*val = jsontext.Value(nil)
				}

Iit look a significant amount of time to realize that '0' meant a number and that was with the context of the example saying that. Without proper context we better hope that every instance of matching a token to number, string, null, or boolean is well commented.

With constants it's obvious/self-commenting.

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Feb 14, 2025
@ChrisHines
Copy link
Contributor

IMHO the readability argument wins the day and we should add constants for these.

@willfaught
Copy link
Contributor

We tried that in the past, and renumbering the constants was also a performance hit since it involved many unpredictable branches.

@dsnet I don't follow. What constants are you renumbering, and why? What perf is involved? I'm imagining the code doing something like this:

func (e *Encoder) PeekKind() Kind {
  switch e.token {
  case "{": return KindObjectBegin
...

I don't see how there's a perf difference there when KindObjectBegin is a custom byte value like '{') vs. is an automatic number from iota like 3.

@puellanivis
Copy link

Do you see a performance difference between:

func (e *Encoder) PeekKind() Kind {
  return e.token[0]
}

and:

func (e *Encoder) PeekKind() Kind {
  switch e.token {
  case "{": return 1

@godcong
Copy link

godcong commented Feb 18, 2025

Don't you need to add an exception Kind?
What if an incorrect json is passed in to get a Kind that is not in the currently defined scope?

@willfaught
Copy link
Contributor

func (e *Encoder) PeekKind() Kind {
 return e.token[0]
}

token[0] doesn't work for numbers, null, true, false, etc.

@puellanivis
Copy link

func (e *Encoder) PeekKind() Kind {
 return e.token[0]
}

token[0] doesn't work for numbers, null, true, false, etc.

What do you mean? It absolutely does work for null, true, and false. That was the whole point of using 'n', 't', 'f'.

@willfaught
Copy link
Contributor

What do you mean? It absolutely does work for null, true, and false. That was the whole point of using 'n', 't', 'f'.

I see what you mean about those. Still, it doesn't work for numbers. In any case, the code just becomes

func (e *Encoder) PeekKind() Kind {
  switch e.token[0] {
  case "{": return KindObjectBegin
  case "}": ...
  ...
  default: return KindNumber
}

I don't see how using iota for Kind* values makes that perform worse.

@puellanivis
Copy link

puellanivis commented Feb 20, 2025

func (e *Encoder) PeekCharKind() Kind {
	tok := Kind(e.token[0])
	if tok >= '1' && tok <= '9' {
		return '0'
	}
	return tok // bug, I didn’t consider '-'
}

func (e *Encoder) PeekIotaKind() IotaKind {
	switch e.token[0] {
	case 'n': return IotaKindNull
	case 'f': return IotaKindFalse
...
	default: return IotaKindNumber
	}
}

Running off a loop of tokens, one of each possible starting character:

$ go test -bench=. -count=10
BenchmarkCharKind-32            1000000000               0.2803 ns/op
BenchmarkCharKind-32            1000000000               0.2816 ns/op
BenchmarkCharKind-32            1000000000               0.2820 ns/op
BenchmarkCharKind-32            1000000000               0.2816 ns/op
BenchmarkCharKind-32            1000000000               0.2802 ns/op
BenchmarkCharKind-32            1000000000               0.2803 ns/op
BenchmarkCharKind-32            1000000000               0.2806 ns/op
BenchmarkCharKind-32            1000000000               0.2821 ns/op
BenchmarkCharKind-32            1000000000               0.2812 ns/op
BenchmarkCharKind-32            1000000000               0.2808 ns/op
BenchmarkIotaKind-32            1000000000               0.5613 ns/op
BenchmarkIotaKind-32            1000000000               0.5620 ns/op
BenchmarkIotaKind-32            1000000000               0.5599 ns/op
BenchmarkIotaKind-32            1000000000               0.5598 ns/op
BenchmarkIotaKind-32            1000000000               0.5603 ns/op
BenchmarkIotaKind-32            1000000000               0.5608 ns/op
BenchmarkIotaKind-32            1000000000               0.5627 ns/op
BenchmarkIotaKind-32            1000000000               0.5588 ns/op
BenchmarkIotaKind-32            1000000000               0.5616 ns/op
BenchmarkIotaKind-32            1000000000               0.5575 ns/op

@extemporalgenome
Copy link
Contributor

Since json/v2 would be a net-new stdlib addition (thus any decision we make now can't break anyone yet), and since vet is automatically run via go test, it seems that we can avoid the http.MethodGet adoption issue by launching with vet enforcement.

@extemporalgenome
Copy link
Contributor

extemporalgenome commented Feb 21, 2025

Would '-' be a better byte value to represent numbers? It's punctuation, like many of the other constants, and it has a unique role among numeric leading bytes, i.e. there's no + or any other modifier that's permitted in that position, whereas 0 has somewhat less distinctiveness compared to 1-9. Aside from numbers, all other kinds have a unique leading byte.

@jimmyfrasche
Copy link
Member

If the constants are named it doesn't especially matter what the values assigned to each name are.

@puellanivis
Copy link

Since json/v2 would be a net-new stdlib addition (thus any decision we make now can't break anyone yet), and since vet is automatically run via go test, it seems that we can avoid the http.MethodGet adoption issue by launching with vet enforcement.

Honestly, I think we’ll avoid the http.MethodGet in general, because json.KindNull != "null". I presume a chief cause of the problem with the http.Method* names is that they’re shorter and more readable as a bare string, and that value is repeated essentially verbatim in the constants’ names.

@extemporalgenome
Copy link
Contributor

Nonetheless, we can add vet coverage for these, and not rely on adoption via path-of-least-resistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal
Projects
Status: Incoming
Development

No branches or pull requests