Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Summing empty Pandas DataFrame #1711

Closed
akmalsoliev opened this issue Jan 3, 2025 · 2 comments · Fixed by #1715
Closed

[Bug]: Summing empty Pandas DataFrame #1711

akmalsoliev opened this issue Jan 3, 2025 · 2 comments · Fixed by #1715

Comments

@akmalsoliev
Copy link
Contributor

Describe the bug

ValueError when attempting to sum an empty Pandas DataFrame

Steps or code to reproduce the bug

import narwhals as nw
import pandas as pd
from narwhals import narwhalify

df = pd.DataFrame(
    {
        "name": ["John", "Doe", "Jane"],
    },
)


@narwhalify
def bla(df):
    df = df.filter(nw.col("name") == "Boo")
    df.with_columns(nw.col("name").sum())


bla(df)

Expected results

An empty DataFrame

Actual results

ValueError: Length of values (1) does not match length of index (0)

Please run narwhals.show_version() and enter the output below.

System:
    python: 3.12.8 (main, Dec  6 2024, 19:42:06) [Clang 18.1.8 ]
executable: ...
   machine: macOS-15.2-arm64-arm-64bit

Python dependencies:
     narwhals: 1.20.1
       pandas: 2.2.3
       polars: 1.18.0
         cudf:
        modin:
      pyarrow:
        numpy: 2.2.1

Relevant log output

No response

@AlessandroMiola
Copy link
Contributor

AlessandroMiola commented Jan 3, 2025

Partly unrelated, but, should we (also) raise here to prevent sum to be called on str as Polars does? (also because pandas would "mimic" Polars' .str.join in these cases)

import narwhals as nw
import polars as pl
import pandas as pd

df_pd = pd.DataFrame({"name": ["John", "Doe", "Jane"]})
df_pl = pl.DataFrame({"name": ["John", "Doe", "Jane"]})

def func(df):
    df = nw.from_native(df)
    return df.select(nw.col("name").sum()).to_native()

func(df_pd)
          name
0  JohnDoeJane

func(df_pl)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in func
  File "/home/ales/projects/narwhals/narwhals/narwhals/dataframe.py", line 1791, in select
    return super().select(*exprs, **named_exprs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ales/projects/narwhals/narwhals/narwhals/dataframe.py", line 133, in select
    self._compliant_frame.select(*exprs, **named_exprs),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ales/projects/narwhals/narwhals/narwhals/_polars/dataframe.py", line 109, in func
    getattr(self._native_frame, attr)(*args, **kwargs)
  File "/home/ales/projects/narwhals/narwhals/.venv/lib/python3.12/site-packages/polars/dataframe/frame.py", line 9322, in select
    return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ales/projects/narwhals/narwhals/.venv/lib/python3.12/site-packages/polars/lazyframe/frame.py", line 2031, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.InvalidOperationError: `sum` operation not supported for dtype `str`

Hint: you may mean to call `str.concat` or `list.join`

Some time ago, I wanted to take the chance to standardize the behaviour of reduction methods being called on str so as to mimic Polars and as we did for median and as we're also doing now for is_nan. I stopped though because of pola-rs/polars#20304 (previously spotted at #1212), namely because the slight inconsistencies reported in the issue might require us to intercept Polars on pl.mean (if we want to stay fully consistent), which I don't think is desirable in this case, given their relatively low impact. Hope it makes sense 😇

@camriddell
Copy link
Contributor

Spoke with @skritsotalakis today and we decided that I'll take this one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants