Skip to content

Support for multi-stage-query (sql-msq-task) #302

@emanueledomingo

Description

@emanueledomingo

Hi everyone.

With Druid 24.0.0 the team included the possibility to start an ingestion using SQL. This uses a new SQL engine called multi-stage-query. Changelog here.

This seems to not be supported in pydruid (using SqlAlchemy). If i run

import sqlalchemy as sqla

query = """
INSERT INTO w000
SELECT
    TIME_PARSE("timestamp") AS __time,
    isRobot,
    channel,
    flags,
    isUnpatrolled,
    page,
    diffUrl,
    added,
    comment,
    commentLength,
    isNew,
    isMinor,
    delta,
    isAnonymous,
    user,
    deltaBucket,
    deleted,
    namespace,
    cityName,
    countryName,
    regionIsoCode,
    metroCode,
    countryIsoCode,
    regionName
FROM TABLE(
    EXTERN(
        '{"type":"http","uris":["https://druid.apache.org/data/wikipedia.json.gz"]}',
        '{"type":"json"}',
        '[{"name":"isRobot","type":"string"},{"name":"channel","type":"string"},{"name":"timestamp","type":"string"},{"name":"flags","type":"string"},{"name":"isUnpatrolled","type":"string"},{"name":"page","type":"string"},{"name":"diffUrl","type":"string"},{"name":"added","type":"long"},{"name":"comment","type":"string"},{"name":"commentLength","type":"long"},{"name":"isNew","type":"string"},{"name":"isMinor","type":"string"},{"name":"delta","type":"long"},{"name":"isAnonymous","type":"string"},{"name":"user","type":"string"},{"name":"deltaBucket","type":"long"},{"name":"deleted","type":"long"},{"name":"namespace","type":"string"},{"name":"cityName","type":"string"},{"name":"countryName","type":"string"},{"name":"regionIsoCode","type":"string"},{"name":"metroCode","type":"long"},{"name":"countryIsoCode","type":"string"},{"name":"regionName","type":"string"}]'
    )
)
PARTITIONED BY HOUR
CLUSTERED BY channel
"""

engine = sqla.create_engine("druid://localhost:8888/druid/v2/sql/")
engine.execute(query)

Query taken from this example

i get

ProgrammingError: (pydruid.db.exceptions.ProgrammingError) Plan validation failed (org.apache.calcite.tools.ValidationException): Cannot execute INSERT with SQL engine 'native'.

Using:

  • pydruid[sqlalchemy]==0.6.5
  • sqlalchemy==1.4.46

Is there a way to specify a different engine? it will be great to use sql-msq-task instead of native.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions