Strips HTML from the input, outputs plain text. It is streamed in realtime without preloading the whole document.
Easy to use Writer interface:
io.Copy(&htmlstrip.Writer{W: os.Stdout}, os.Stdin)
All it does is strip HTML into plain text.
Should never use excessive memory as it does not buffer the whole document.
Script, style and head tags are removed entirely, as they are not part of the page's text.
The provided command strips HTML from standard input or specified files, writes plain text to standard output.
go install
Could be used as an extremely basic, non-interactive text browser:
curl -s -S | htmlstrip | less