Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specification of Datafuse SQL #1492

Closed
leiysky opened this issue Aug 17, 2021 · 9 comments
Closed

Specification of Datafuse SQL #1492

leiysky opened this issue Aug 17, 2021 · 9 comments
Labels
C-refactoring Category: refactor

Comments

@leiysky
Copy link
Contributor

leiysky commented Aug 17, 2021

We need a standard and stable SQL specification to guide Datafuse developers and users.

This specification should involve:

  • SQL grammar, e.g. BNF
  • Type system, including storage scheme, codec scheme, conversion rules and etc.
  • Functions, including scalar functions and aggregate funtions

Welcome to comment and give your advice.

@leiysky leiysky added C-refactoring Category: refactor SQL labels Aug 17, 2021
@leiysky
Copy link
Contributor Author

leiysky commented Aug 17, 2021

IMHO, it's a nice choice to follow some SQL standards of mainstream Database(e.g. Postgres, MySQL), so users can get up with Datafuse easier.

PostgreSQL is widely used in Data Warehousing thanks to its rich OLAP features and strict type system. Both AWS Redshift, Pivotal Greenplum, AliCloud Hologres are based on PostgreSQL.

Therefore, I suggest that we can build Datafuse SQL specification based on PostgreSQL.

@leiysky
Copy link
Contributor Author

leiysky commented Aug 17, 2021

@BohuTANG
Copy link
Member

IMHO, it's a nice choice to follow some SQL standards of mainstream Database(e.g. Postgres, MySQL), so users can get up with Datafuse easier.

PostgreSQL is widely used in Data Warehousing thanks to its rich OLAP features and strict type system. Both AWS Redshift, Pivotal Greenplum, AliCloud Hologres are based on PostgreSQL.

Therefore, I suggest that we can build Datafuse SQL specification based on PostgreSQL.

Thanks for the sharing, it's OK to me.
In OLAP, the syntax(Postgres, MySQL and others) is probably not very different(I think), most on function names.
And we can do transform in the server handlers before the parser, such as MySQLHandler or ClickHouseHandler.

@PsiACE
Copy link
Member

PsiACE commented Aug 17, 2021

In general, I agree with your point. However, there is something other than a standardized draft.

  • A solid reference, but don't get tied down. Even I can accept something based on GraphQL.
  • Easy to extend and maintain, including code and documentation, preferably supported by an integrated solution

@sundy-li
Copy link
Member

sundy-li commented Aug 17, 2021

It's ok to me. Seems sqlparser-rs choosed PostgreSQL, If we choose PostgreSQL as the main SQL standard, should we need to implements the functions/behavior of PostgreSQL ?

@leiysky
Copy link
Contributor Author

leiysky commented Aug 17, 2021

It's ok to me. Seems sqlparser-rs choosed PostgreSQL, If we choose PostgreSQL as the main SQL standard, should we need to implements the functions/behavior of PostgreSQL ?

This part is a little bit tricky, that is, unless a database directly use code of MySQL or Postgres(e.g. AWS Aurora, AliCloud PolarDB), it can never really be compatible with them.

sqlparser-rs choosed PostgreSQL, while actually it doesn't fully support PostgreSQL syntax(I've just found a issue on this apache/datafusion-sqlparser-rs#329).

Almost every XxxDB-compatible database get their customers by implementing common/core features of corresponding database to achieve basicly compatible.

Generally speaking, the features about SQL can be catagorized as: functions, SQL operators, data types.

For Datafuse, we can basically follow PostgreSQL's grammar(maybe only the ANSI SQL part) and only ensure limited compatibility by providing common data types(e.g. NUMERIC, DATE/DATETIME, STRING, JSON) and some common functions. With these functionality, PostgreSQL users can get up with Datafuse quickly.

Then we can add our own extensions based on this, and no longer need to concern about the compatibility issue.

@zhang2014
Copy link
Member

Almost every XxxDB-compatible database get their customers by implementing common/core features of corresponding database to achieve basicly compatible.

It's ok to me. as you said, maybe we need to consider compatibility with other DB(in future), perhaps we should consider it in the design

@sundy-li
Copy link
Member

sundy-li commented Aug 18, 2021

Some functions need to add parameters, like: windowFunnel(3600) (...)

Currently, I'll create a pr in https://github.com/datafuse-extras/sqlparser-rs first.

The function expr will be:

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub struct Function {
    pub name: ObjectName,
    pub args: Vec<FunctionArg>,
    pub params: Vec<Value>,
    pub over: Option<WindowSpec>,
    // aggregate functions may specify eg `COUNT(DISTINCT x)`
    pub distinct: bool,
}

@leiysky
Copy link
Contributor Author

leiysky commented May 13, 2022

Closed by #4916

@leiysky leiysky closed this as completed May 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-refactoring Category: refactor
Projects
None yet
Development

No branches or pull requests

5 participants