You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the dataset expansion by the jsonb_array_elements() function is so faster tham per-row JSONs.
Only need faster when there are a lot of rows (more tham ~9000)... When need really fast SELECT, use best (~2 to 5 times, not so much) is a table or MATERIALIZED VIEW; and faster is a indexed one.
By other hand, there are some demand to add into dataset.big "any other free JSON" dataset.
Conclusion: the best is "1 dataset per row" at dataset.bit, a free-JSONb dataset with "some JSON-schema" controled by dataset.meta. We denominated it as "JSON TYPE DEFINITION", JTD, to be a set of labels indication standard structures. Only some JTDs are tabular: other need a rule to join/split into rows... The best is to stay all in "1 dataset per row".
Reference Benchmark of v0.1.0
15 datasets, load as "many rows per dataset". In terms of disk-usage, all the dataset-schema sums table_size 6080 kB. If all datasets of dataset.big was translated to usual SQL-tables, the custo will be less (!), ~75% or less.
Total disk-usage reduction, from 6080 kB to 72 kB, 8440%! (6080=72*84.4). No disk cost for datasets!
ppKrauss
changed the title
Abandon use of rows, all datasets will be expanded by demand, and any other JSON can be added
Abandon use of SQL-rows representation, expand datasets to rows by demand
Nov 20, 2017
(this issue was for v0.1.0, that change from v0.2+, so, use benchmarking of v0.1 as reference)
The benchmark, see Wiki, shows that
the dataset expansion by the jsonb_array_elements() function is so faster tham per-row JSONs.
Only need faster when there are a lot of rows (more tham ~9000)... When need really fast SELECT, use best (~2 to 5 times, not so much) is a table or MATERIALIZED VIEW; and faster is a indexed one.
By other hand, there are some demand to add into
dataset.big
"any other free JSON" dataset.Conclusion: the best is "1 dataset per row" at
dataset.bit
, a free-JSONb dataset with "some JSON-schema" controled bydataset.meta
. We denominated it as "JSON TYPE DEFINITION", JTD, to be a set of labels indication standard structures. Only some JTDs are tabular: other need a rule to join/split into rows... The best is to stay all in "1 dataset per row".Reference Benchmark of v0.1.0
15 datasets, load as "many rows per dataset". In terms of disk-usage, all the dataset-schema sums table_size 6080 kB. If all datasets of
dataset.big
was translated to usual SQL-tables, the custo will be less (!), ~75% or less.Comparing JSON and JSONb
A test with less rows (before "datasets:world_cities"), but good results, for
dataset.big
total usage:So no expressive advantage to use a JSON, with only ~2% of gain. Better stay with JSONb, a "first-class citizen".
Disk-usage reduction when all rows in one JSONb-array
Checking table_usage with "datasets-br:br_city_codes":
Conclusion: "all rows in one" is ~10 times better!
Checking performance
No loss of performance by PostgreSQL's EXPLAIN ANALYZE: as showed at Wiki, the function jsonb_array_elements() is very fast.
The text was updated successfully, but these errors were encountered: