Skip to content

Commit d384cca

Browse files
committed
Add admin/properties-session.rst
1 parent fe3b959 commit d384cca

File tree

3 files changed

+384
-59
lines changed

3 files changed

+384
-59
lines changed

presto-docs/src/main/sphinx/admin.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ Administration
88
admin/web-interface
99
admin/tuning
1010
admin/properties
11+
admin/properties-session
1112
admin/spill
1213
admin/exchange-materialization
1314
admin/cte-materialization
Lines changed: 327 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,327 @@
1+
=========================
2+
Presto Session Properties
3+
=========================
4+
5+
This section describes session properties that may be used to tune
6+
Presto or alter its behavior when required.
7+
8+
The following is not a complete list of all session properties
9+
available in Presto, and does not include any connector-specific
10+
catalog properties.
11+
12+
For information on catalog properties, see the :doc:`connector documentation </connector/>`.
13+
14+
For information on configuration properties, see :doc:`properties`.
15+
16+
17+
.. contents::
18+
:local:
19+
:backlinks: none
20+
:depth: 1
21+
22+
General Properties
23+
------------------
24+
25+
``join_distribution_type``
26+
^^^^^^^^^^^^^^^^^^^^^^^^^^
27+
28+
* **Type:** ``string``
29+
* **Allowed values:** ``AUTOMATIC``, ``PARTITIONED``, ``BROADCAST``
30+
* **Default value:** ``AUTOMATIC``
31+
32+
The type of distributed join to use. When set to ``PARTITIONED``, presto will
33+
use hash distributed joins. When set to ``BROADCAST``, it will broadcast the
34+
right table to all nodes in the cluster that have data from the left table.
35+
Partitioned joins require redistributing both tables using a hash of the join key.
36+
This can be slower (sometimes substantially) than broadcast joins, but allows much
37+
larger joins. In particular broadcast joins will be faster if the right table is
38+
much smaller than the left. However, broadcast joins require that the tables on the right
39+
side of the join after filtering fit in memory on each node, whereas distributed joins
40+
only need to fit in distributed memory across all nodes. When set to ``AUTOMATIC``,
41+
Presto will make a cost based decision as to which distribution type is optimal.
42+
It will also consider switching the left and right inputs to the join. In ``AUTOMATIC``
43+
mode, Presto will default to hash distributed joins if no cost could be computed, such as if
44+
the tables do not have statistics.
45+
46+
The corresponding configuration property is :ref:`admin/properties:\`\`join-distribution-type\`\``.
47+
48+
49+
``redistribute_writes``
50+
^^^^^^^^^^^^^^^^^^^^^^^
51+
52+
* **Type:** ``boolean``
53+
* **Default value:** ``true``
54+
55+
This property enables redistribution of data before writing. This can
56+
eliminate the performance impact of data skew when writing by hashing it
57+
across nodes in the cluster. It can be disabled when it is known that the
58+
output data set is not skewed in order to avoid the overhead of hashing and
59+
redistributing all the data across the network.
60+
61+
The corresponding configuration property is :ref:`admin/properties:\`\`redistribute-writes\`\``.
62+
63+
``task_writer_count``
64+
^^^^^^^^^^^^^^^^^^^^^
65+
66+
* **Type:** ``integer``
67+
* **Default value:** ``1``
68+
69+
Default number of local parallel table writer threads per worker. It is required
70+
to be a power of two for a Java query engine.
71+
72+
The corresponding configuration property is :ref:`admin/properties:\`\`task.writer-count\`\``.
73+
74+
``task_partitioned_writer_count``
75+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
76+
77+
* **Type:** ``integer``
78+
* **Default value:** ``task_writer_count``
79+
80+
Number of local parallel table writer threads per worker for partitioned writes. If not
81+
set, the number set by ``task_writer_count`` will be used. It is required to be a power
82+
of two for a Java query engine.
83+
84+
Spilling Properties
85+
-------------------
86+
87+
``spill_enabled``
88+
^^^^^^^^^^^^^^^^^
89+
90+
* **Type:** ``boolean``
91+
* **Default value:** ``false``
92+
93+
Try spilling memory to disk to avoid exceeding memory limits for the query.
94+
95+
Spilling works by offloading memory to disk. This process can allow a query with a large memory
96+
footprint to pass at the cost of slower execution times. Currently, spilling is supported only for
97+
aggregations and joins (inner and outer), so this property will not reduce memory usage required for
98+
window functions, sorting and other join types.
99+
100+
Be aware that this is an experimental feature and should be used with care.
101+
102+
The corresponding configuration property is :ref:`admin/properties:\`\`experimental.spill-enabled\`\``.
103+
104+
``join_spill_enabled``
105+
^^^^^^^^^^^^^^^^^^^^^^
106+
107+
* **Type:** ``boolean``
108+
* **Default value:** ``true``
109+
110+
When ``spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for joins to
111+
avoid exceeding memory limits for the query.
112+
113+
The corresponding configuration property is :ref:`admin/properties:\`\`experimental.join-spill-enabled\`\``.
114+
115+
``aggregation_spill_enabled``
116+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
117+
118+
* **Type:** ``boolean``
119+
* **Default value:** ``true``
120+
121+
When ``spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for aggregations to
122+
avoid exceeding memory limits for the query.
123+
124+
The corresponding configuration property is :ref:`admin/properties:\`\`experimental.aggregation-spill-enabled\`\``.
125+
126+
``distinct_aggregation_spill_enabled``
127+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
128+
129+
* **Type:** ``boolean``
130+
* **Default value:** ``true``
131+
132+
When ``aggregation_spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for distinct
133+
aggregations to avoid exceeding memory limits for the query.
134+
135+
The corresponding configuration property is :ref:`admin/properties:\`\`experimental.distinct-aggregation-spill-enabled\`\``.
136+
137+
``order_by_aggregation_spill_enabled``
138+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
139+
140+
* **Type:** ``boolean``
141+
* **Default value:** ``true``
142+
143+
When ``aggregation_spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for order by
144+
aggregations to avoid exceeding memory limits for the query.
145+
146+
The corresponding configuration property is :ref:`admin/properties:\`\`experimental.order-by-aggregation-spill-enabled\`\``.
147+
148+
``window_spill_enabled``
149+
^^^^^^^^^^^^^^^^^^^^^^^^
150+
151+
* **Type:** ``boolean``
152+
* **Default value:** ``true``
153+
154+
When ``spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for window functions to
155+
avoid exceeding memory limits for the query.
156+
157+
The corresponding configuration property is :ref:`admin/properties:\`\`experimental.window-spill-enabled\`\``.
158+
159+
``order_by_spill_enabled``
160+
^^^^^^^^^^^^^^^^^^^^^^^^^^
161+
162+
* **Type:** ``boolean``
163+
* **Default value:** ``true``
164+
165+
When ``spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for order by to
166+
avoid exceeding memory limits for the query.
167+
168+
The corresponding configuration property is :ref:`admin/properties:\`\`experimental.order-by-spill-enabled\`\``.
169+
170+
``aggregation_operator_unspill_memory_limit``
171+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
172+
173+
* **Type:** ``data size``
174+
* **Default value:** ``4 MB``
175+
176+
Limit for memory used for unspilling a single aggregation operator instance.
177+
178+
The corresponding configuration property is :ref:`admin/properties:\`\`experimental.aggregation-operator-unspill-memory-limit\`\``.
179+
180+
Task Properties
181+
---------------
182+
183+
``task_concurrency``
184+
^^^^^^^^^^^^^^^^^^^^
185+
186+
* **Type:** ``integer``
187+
* **Restrictions:** must be a power of two
188+
* **Default value:** ``16``
189+
190+
Default local concurrency for parallel operators such as joins and aggregations.
191+
This value should be adjusted up or down based on the query concurrency and worker
192+
resource utilization. Lower values are better for clusters that run many queries
193+
concurrently because the cluster will already be utilized by all the running
194+
queries, so adding more concurrency will result in slow downs due to context
195+
switching and other overhead. Higher values are better for clusters that only run
196+
one or a few queries at a time.
197+
198+
The corresponding configuration property is :ref:`admin/properties:\`\`task.concurrency\`\``.
199+
200+
``task_writer_count``
201+
^^^^^^^^^^^^^^^^^^^^^
202+
203+
* **Type:** ``integer``
204+
* **Restrictions:** must be a power of two
205+
* **Default value:** ``1``
206+
207+
The number of concurrent writer threads per worker per query. Increasing this value may
208+
increase write speed, especially when a query is not I/O bound and can take advantage
209+
of additional CPU for parallel writes (some connectors can be bottlenecked on CPU when
210+
writing due to compression or other factors). Setting this too high may cause the cluster
211+
to become overloaded due to excessive resource utilization.
212+
213+
The corresponding configuration property is :ref:`admin/properties:\`\`task.writer-count\`\``.
214+
215+
Optimizer Properties
216+
--------------------
217+
218+
``dictionary_aggregation``
219+
^^^^^^^^^^^^^^^^^^^^^^^^^^
220+
221+
* **Type:** ``boolean``
222+
* **Default value:** ``false``
223+
224+
Enables optimization for aggregations on dictionaries.
225+
226+
The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.dictionary-aggregation\`\``.
227+
228+
``optimize_hash_generation``
229+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
230+
231+
* **Type:** ``boolean``
232+
* **Default value:** ``true``
233+
234+
Compute hash codes for distribution, joins, and aggregations early during execution,
235+
allowing result to be shared between operations later in the query. This can reduce
236+
CPU usage by avoiding computing the same hash multiple times, but at the cost of
237+
additional network transfer for the hashes. In most cases it will decrease overall
238+
query processing time.
239+
240+
It is often helpful to disable this property when using :doc:`/sql/explain` in order
241+
to make the query plan easier to read.
242+
243+
The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.optimize-hash-generation\`\``.
244+
245+
``push_aggregation_through_join``
246+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
247+
248+
* **Type:** ``boolean``
249+
* **Default value:** ``true``
250+
251+
When an aggregation is above an outer join and all columns from the outer side of the join
252+
are in the grouping clause, the aggregation is pushed below the outer join. This optimization
253+
is particularly useful for correlated scalar subqueries, which get rewritten to an aggregation
254+
over an outer join. For example::
255+
256+
SELECT * FROM item i
257+
WHERE i.i_current_price > (
258+
SELECT AVG(j.i_current_price) FROM item j
259+
WHERE i.i_category = j.i_category);
260+
261+
Enabling this optimization can substantially speed up queries by reducing
262+
the amount of data that needs to be processed by the join. However, it may slow down some
263+
queries that have very selective joins.
264+
265+
The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.push-aggregation-through-join\`\``.
266+
267+
``push_table_write_through_union``
268+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
269+
270+
* **Type:** ``boolean``
271+
* **Default value:** ``true``
272+
273+
Parallelize writes when using ``UNION ALL`` in queries that write data. This improves the
274+
speed of writing output tables in ``UNION ALL`` queries because these writes do not require
275+
additional synchronization when collecting results. Enabling this optimization can improve
276+
``UNION ALL`` speed when write speed is not yet saturated. However, it may slow down queries
277+
in an already heavily loaded system.
278+
279+
The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.push-table-write-through-union\`\``.
280+
281+
``join_reordering_strategy``
282+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
283+
284+
* **Type:** ``string``
285+
* **Allowed values:** ``AUTOMATIC``, ``ELIMINATE_CROSS_JOINS``, ``NONE``
286+
* **Default value:** ``AUTOMATIC``
287+
288+
The join reordering strategy to use. ``NONE`` maintains the order the tables are listed in the
289+
query. ``ELIMINATE_CROSS_JOINS`` reorders joins to eliminate cross joins where possible and
290+
otherwise maintains the original query order. When reordering joins it also strives to maintain the
291+
original table order as much as possible. ``AUTOMATIC`` enumerates possible orders and uses
292+
statistics-based cost estimation to determine the least cost order. If stats are not available or if
293+
for any reason a cost could not be computed, the ``ELIMINATE_CROSS_JOINS`` strategy is used.
294+
295+
The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.join-reordering-strategy\`\``.
296+
297+
``confidence_based_broadcast``
298+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
299+
300+
* **Type:** ``boolean``
301+
* **Default value:** ``false``
302+
303+
Enable broadcasting based on the confidence of the statistics that are being used, by
304+
broadcasting the side of a joinNode which has the highest (``HIGH`` or ``FACT``) confidence statistics.
305+
If both sides have the same confidence statistics, then the original behavior will be followed.
306+
307+
The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.confidence-based-broadcast\`\``.
308+
309+
``treat-low-confidence-zero-estimation-as-unknown``
310+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
311+
312+
* **Type:** ``boolean``
313+
* **Default value:** ``false``
314+
315+
Enable treating ``LOW`` confidence, zero estimations as ``UNKNOWN`` during joins.
316+
317+
The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.treat-low-confidence-zero-estimation-as-unknown\`\``.
318+
319+
``retry-query-with-history-based-optimization``
320+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
321+
322+
* **Type:** ``boolean``
323+
* **Default value:** ``false``
324+
325+
Enable retry for failed queries who can potentially be helped by HBO.
326+
327+
The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.retry-query-with-history-based-optimization\`\``.

0 commit comments

Comments
 (0)