-
Notifications
You must be signed in to change notification settings - Fork 202
Open
Description
When querying a Druid datasource via PyDruid, I noticed that applying millis_to_timestamp(CAST(created_on AS bigint)) on a column ingested as a timestamp leads to fewer rows being returned than querying the raw column directly.
Environment:
Druid version: 30.0.1
PyDruid version: 0.6.9
Python version: 3.8.8
Datasource ingestion type: Kafka ingestion with timestamp transformation
Ingestion spec snippet:
{
"type": "expression",
"name": "created_on",
"expression": "timestamp_parse(created_on, 'yyyy-MM-dd HH:mm:ss.SSS')"
}
**Sample raw value:**
"created_on": "2025-08-08 03:45:07.009"
Query and Python code:
from pydruid.db import connect
import pandas as pd
def druid_runner():
conn = connect(
host='<host>',
port=443,
path='/druid/v2/sql/',
scheme='https'
)
curs = conn.cursor()
# Query 1: Using millis_to_timestamp + CAST
sql_query_ts = """
SELECT millis_to_timestamp(CAST(created_on AS bigint)) AS created_on
FROM <datasource_name>
WHERE opCode <> 'D'
"""
df_ts = pd.DataFrame(curs.execute(sql_query_ts))
print("Rows with timestamp conversion:", len(df_ts))
# Query 2: Selecting raw column
sql_query_raw = """
SELECT created_on
FROM <datasource_name>
WHERE opCode <> 'D'
"""
df_raw = pd.DataFrame(curs.execute(sql_query_raw))
print("Rows with raw created_on:", len(df_raw))
if __name__ == '__main__':
druid_runner()
**Observed result**
Query with timestamp conversion → 74,945 rows
Query with raw created_on → 101,332 rows
Expected both queries to return the same number of rows.
Metadata
Metadata
Assignees
Labels
No labels