You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First and foremost, I discovered we were computing it wrong in narwhals with duckdb backend thanks to sqlframe!
It appears that pyspark computes the sample skewness while duckdb computes population skewness. The difference is the adjustment of a correction factor of
$$\frac{\sqrt{n(n-1)}}{n-2}$$
Let me know if this is out of scope (as it would only be needed to match pyspark behavior).
Description
First and foremost, I discovered we were computing it wrong in narwhals with duckdb backend thanks to sqlframe!
It appears that pyspark computes the sample skewness while duckdb computes population skewness. The difference is the adjustment of a correction factor of
Let me know if this is out of scope (as it would only be needed to match pyspark behavior).
In code/numbers:
Spark:
DuckDB:
I just opened a PR to fix it in narwhals with native duckdb backend if interested.
The text was updated successfully, but these errors were encountered: