|
| 1 | +========================= |
| 2 | +Presto Session Properties |
| 3 | +========================= |
| 4 | + |
| 5 | +This section describes session properties that may be used to tune |
| 6 | +Presto or alter its behavior when required. |
| 7 | + |
| 8 | +The following is not a complete list of all session properties |
| 9 | +available in Presto, and does not include any connector-specific |
| 10 | +catalog properties. |
| 11 | + |
| 12 | +For information on catalog properties, see the :doc:`connector documentation </connector/>`. |
| 13 | + |
| 14 | +For information on configuration properties, see :doc:`properties`. |
| 15 | + |
| 16 | + |
| 17 | +.. contents:: |
| 18 | + :local: |
| 19 | + :backlinks: none |
| 20 | + :depth: 1 |
| 21 | + |
| 22 | +General Properties |
| 23 | +------------------ |
| 24 | + |
| 25 | +``join_distribution_type`` |
| 26 | +^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 27 | + |
| 28 | +* **Type:** ``string`` |
| 29 | +* **Allowed values:** ``AUTOMATIC``, ``PARTITIONED``, ``BROADCAST`` |
| 30 | +* **Default value:** ``AUTOMATIC`` |
| 31 | + |
| 32 | +The type of distributed join to use. When set to ``PARTITIONED``, presto will |
| 33 | +use hash distributed joins. When set to ``BROADCAST``, it will broadcast the |
| 34 | +right table to all nodes in the cluster that have data from the left table. |
| 35 | +Partitioned joins require redistributing both tables using a hash of the join key. |
| 36 | +This can be slower (sometimes substantially) than broadcast joins, but allows much |
| 37 | +larger joins. In particular broadcast joins will be faster if the right table is |
| 38 | +much smaller than the left. However, broadcast joins require that the tables on the right |
| 39 | +side of the join after filtering fit in memory on each node, whereas distributed joins |
| 40 | +only need to fit in distributed memory across all nodes. When set to ``AUTOMATIC``, |
| 41 | +Presto will make a cost based decision as to which distribution type is optimal. |
| 42 | +It will also consider switching the left and right inputs to the join. In ``AUTOMATIC`` |
| 43 | +mode, Presto will default to hash distributed joins if no cost could be computed, such as if |
| 44 | +the tables do not have statistics. |
| 45 | + |
| 46 | +The corresponding configuration property is :ref:`admin/properties:\`\`join-distribution-type\`\``. |
| 47 | + |
| 48 | + |
| 49 | +``redistribute_writes`` |
| 50 | +^^^^^^^^^^^^^^^^^^^^^^^ |
| 51 | + |
| 52 | +* **Type:** ``boolean`` |
| 53 | +* **Default value:** ``true`` |
| 54 | + |
| 55 | +This property enables redistribution of data before writing. This can |
| 56 | +eliminate the performance impact of data skew when writing by hashing it |
| 57 | +across nodes in the cluster. It can be disabled when it is known that the |
| 58 | +output data set is not skewed in order to avoid the overhead of hashing and |
| 59 | +redistributing all the data across the network. |
| 60 | + |
| 61 | +The corresponding configuration property is :ref:`admin/properties:\`\`redistribute-writes\`\``. |
| 62 | + |
| 63 | +``task_writer_count`` |
| 64 | +^^^^^^^^^^^^^^^^^^^^^ |
| 65 | + |
| 66 | +* **Type:** ``integer`` |
| 67 | +* **Default value:** ``1`` |
| 68 | + |
| 69 | +Default number of local parallel table writer threads per worker. It is required |
| 70 | +to be a power of two for a Java query engine. |
| 71 | + |
| 72 | +The corresponding configuration property is :ref:`admin/properties:\`\`task.writer-count\`\``. |
| 73 | + |
| 74 | +``task_partitioned_writer_count`` |
| 75 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 76 | + |
| 77 | +* **Type:** ``integer`` |
| 78 | +* **Default value:** ``task_writer_count`` |
| 79 | + |
| 80 | +Number of local parallel table writer threads per worker for partitioned writes. If not |
| 81 | +set, the number set by ``task_writer_count`` will be used. It is required to be a power |
| 82 | +of two for a Java query engine. |
| 83 | + |
| 84 | +Spilling Properties |
| 85 | +------------------- |
| 86 | + |
| 87 | +``spill_enabled`` |
| 88 | +^^^^^^^^^^^^^^^^^ |
| 89 | + |
| 90 | +* **Type:** ``boolean`` |
| 91 | +* **Default value:** ``false`` |
| 92 | + |
| 93 | +Try spilling memory to disk to avoid exceeding memory limits for the query. |
| 94 | + |
| 95 | +Spilling works by offloading memory to disk. This process can allow a query with a large memory |
| 96 | +footprint to pass at the cost of slower execution times. Currently, spilling is supported only for |
| 97 | +aggregations and joins (inner and outer), so this property will not reduce memory usage required for |
| 98 | +window functions, sorting and other join types. |
| 99 | + |
| 100 | +Be aware that this is an experimental feature and should be used with care. |
| 101 | + |
| 102 | +The corresponding configuration property is :ref:`admin/properties:\`\`experimental.spill-enabled\`\``. |
| 103 | + |
| 104 | +``join_spill_enabled`` |
| 105 | +^^^^^^^^^^^^^^^^^^^^^^ |
| 106 | + |
| 107 | +* **Type:** ``boolean`` |
| 108 | +* **Default value:** ``true`` |
| 109 | + |
| 110 | +When ``spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for joins to |
| 111 | +avoid exceeding memory limits for the query. |
| 112 | + |
| 113 | +The corresponding configuration property is :ref:`admin/properties:\`\`experimental.join-spill-enabled\`\``. |
| 114 | + |
| 115 | +``aggregation_spill_enabled`` |
| 116 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 117 | + |
| 118 | +* **Type:** ``boolean`` |
| 119 | +* **Default value:** ``true`` |
| 120 | + |
| 121 | +When ``spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for aggregations to |
| 122 | +avoid exceeding memory limits for the query. |
| 123 | + |
| 124 | +The corresponding configuration property is :ref:`admin/properties:\`\`experimental.aggregation-spill-enabled\`\``. |
| 125 | + |
| 126 | +``distinct_aggregation_spill_enabled`` |
| 127 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 128 | + |
| 129 | +* **Type:** ``boolean`` |
| 130 | +* **Default value:** ``true`` |
| 131 | + |
| 132 | +When ``aggregation_spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for distinct |
| 133 | +aggregations to avoid exceeding memory limits for the query. |
| 134 | + |
| 135 | +The corresponding configuration property is :ref:`admin/properties:\`\`experimental.distinct-aggregation-spill-enabled\`\``. |
| 136 | + |
| 137 | +``order_by_aggregation_spill_enabled`` |
| 138 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 139 | + |
| 140 | +* **Type:** ``boolean`` |
| 141 | +* **Default value:** ``true`` |
| 142 | + |
| 143 | +When ``aggregation_spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for order by |
| 144 | +aggregations to avoid exceeding memory limits for the query. |
| 145 | + |
| 146 | +The corresponding configuration property is :ref:`admin/properties:\`\`experimental.order-by-aggregation-spill-enabled\`\``. |
| 147 | + |
| 148 | +``window_spill_enabled`` |
| 149 | +^^^^^^^^^^^^^^^^^^^^^^^^ |
| 150 | + |
| 151 | +* **Type:** ``boolean`` |
| 152 | +* **Default value:** ``true`` |
| 153 | + |
| 154 | +When ``spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for window functions to |
| 155 | +avoid exceeding memory limits for the query. |
| 156 | + |
| 157 | +The corresponding configuration property is :ref:`admin/properties:\`\`experimental.window-spill-enabled\`\``. |
| 158 | + |
| 159 | +``order_by_spill_enabled`` |
| 160 | +^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 161 | + |
| 162 | +* **Type:** ``boolean`` |
| 163 | +* **Default value:** ``true`` |
| 164 | + |
| 165 | +When ``spill_enabled`` is ``true``, this determines whether Presto will try spilling memory to disk for order by to |
| 166 | +avoid exceeding memory limits for the query. |
| 167 | + |
| 168 | +The corresponding configuration property is :ref:`admin/properties:\`\`experimental.order-by-spill-enabled\`\``. |
| 169 | + |
| 170 | +``aggregation_operator_unspill_memory_limit`` |
| 171 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 172 | + |
| 173 | +* **Type:** ``data size`` |
| 174 | +* **Default value:** ``4 MB`` |
| 175 | + |
| 176 | +Limit for memory used for unspilling a single aggregation operator instance. |
| 177 | + |
| 178 | +The corresponding configuration property is :ref:`admin/properties:\`\`experimental.aggregation-operator-unspill-memory-limit\`\``. |
| 179 | + |
| 180 | +Task Properties |
| 181 | +--------------- |
| 182 | + |
| 183 | +``task_concurrency`` |
| 184 | +^^^^^^^^^^^^^^^^^^^^ |
| 185 | + |
| 186 | +* **Type:** ``integer`` |
| 187 | +* **Restrictions:** must be a power of two |
| 188 | +* **Default value:** ``16`` |
| 189 | + |
| 190 | +Default local concurrency for parallel operators such as joins and aggregations. |
| 191 | +This value should be adjusted up or down based on the query concurrency and worker |
| 192 | +resource utilization. Lower values are better for clusters that run many queries |
| 193 | +concurrently because the cluster will already be utilized by all the running |
| 194 | +queries, so adding more concurrency will result in slow downs due to context |
| 195 | +switching and other overhead. Higher values are better for clusters that only run |
| 196 | +one or a few queries at a time. |
| 197 | + |
| 198 | +The corresponding configuration property is :ref:`admin/properties:\`\`task.concurrency\`\``. |
| 199 | + |
| 200 | +``task_writer_count`` |
| 201 | +^^^^^^^^^^^^^^^^^^^^^ |
| 202 | + |
| 203 | +* **Type:** ``integer`` |
| 204 | +* **Restrictions:** must be a power of two |
| 205 | +* **Default value:** ``1`` |
| 206 | + |
| 207 | +The number of concurrent writer threads per worker per query. Increasing this value may |
| 208 | +increase write speed, especially when a query is not I/O bound and can take advantage |
| 209 | +of additional CPU for parallel writes (some connectors can be bottlenecked on CPU when |
| 210 | +writing due to compression or other factors). Setting this too high may cause the cluster |
| 211 | +to become overloaded due to excessive resource utilization. |
| 212 | + |
| 213 | +The corresponding configuration property is :ref:`admin/properties:\`\`task.writer-count\`\``. |
| 214 | + |
| 215 | +Optimizer Properties |
| 216 | +-------------------- |
| 217 | + |
| 218 | +``dictionary_aggregation`` |
| 219 | +^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 220 | + |
| 221 | +* **Type:** ``boolean`` |
| 222 | +* **Default value:** ``false`` |
| 223 | + |
| 224 | +Enables optimization for aggregations on dictionaries. |
| 225 | + |
| 226 | +The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.dictionary-aggregation\`\``. |
| 227 | + |
| 228 | +``optimize_hash_generation`` |
| 229 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 230 | + |
| 231 | +* **Type:** ``boolean`` |
| 232 | +* **Default value:** ``true`` |
| 233 | + |
| 234 | +Compute hash codes for distribution, joins, and aggregations early during execution, |
| 235 | +allowing result to be shared between operations later in the query. This can reduce |
| 236 | +CPU usage by avoiding computing the same hash multiple times, but at the cost of |
| 237 | +additional network transfer for the hashes. In most cases it will decrease overall |
| 238 | +query processing time. |
| 239 | + |
| 240 | +It is often helpful to disable this property when using :doc:`/sql/explain` in order |
| 241 | +to make the query plan easier to read. |
| 242 | + |
| 243 | +The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.optimize-hash-generation\`\``. |
| 244 | + |
| 245 | +``push_aggregation_through_join`` |
| 246 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 247 | + |
| 248 | +* **Type:** ``boolean`` |
| 249 | +* **Default value:** ``true`` |
| 250 | + |
| 251 | +When an aggregation is above an outer join and all columns from the outer side of the join |
| 252 | +are in the grouping clause, the aggregation is pushed below the outer join. This optimization |
| 253 | +is particularly useful for correlated scalar subqueries, which get rewritten to an aggregation |
| 254 | +over an outer join. For example:: |
| 255 | + |
| 256 | + SELECT * FROM item i |
| 257 | + WHERE i.i_current_price > ( |
| 258 | + SELECT AVG(j.i_current_price) FROM item j |
| 259 | + WHERE i.i_category = j.i_category); |
| 260 | + |
| 261 | +Enabling this optimization can substantially speed up queries by reducing |
| 262 | +the amount of data that needs to be processed by the join. However, it may slow down some |
| 263 | +queries that have very selective joins. |
| 264 | + |
| 265 | +The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.push-aggregation-through-join\`\``. |
| 266 | + |
| 267 | +``push_table_write_through_union`` |
| 268 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 269 | + |
| 270 | +* **Type:** ``boolean`` |
| 271 | +* **Default value:** ``true`` |
| 272 | + |
| 273 | +Parallelize writes when using ``UNION ALL`` in queries that write data. This improves the |
| 274 | +speed of writing output tables in ``UNION ALL`` queries because these writes do not require |
| 275 | +additional synchronization when collecting results. Enabling this optimization can improve |
| 276 | +``UNION ALL`` speed when write speed is not yet saturated. However, it may slow down queries |
| 277 | +in an already heavily loaded system. |
| 278 | + |
| 279 | +The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.push-table-write-through-union\`\``. |
| 280 | + |
| 281 | +``join_reordering_strategy`` |
| 282 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 283 | + |
| 284 | +* **Type:** ``string`` |
| 285 | +* **Allowed values:** ``AUTOMATIC``, ``ELIMINATE_CROSS_JOINS``, ``NONE`` |
| 286 | +* **Default value:** ``AUTOMATIC`` |
| 287 | + |
| 288 | +The join reordering strategy to use. ``NONE`` maintains the order the tables are listed in the |
| 289 | +query. ``ELIMINATE_CROSS_JOINS`` reorders joins to eliminate cross joins where possible and |
| 290 | +otherwise maintains the original query order. When reordering joins it also strives to maintain the |
| 291 | +original table order as much as possible. ``AUTOMATIC`` enumerates possible orders and uses |
| 292 | +statistics-based cost estimation to determine the least cost order. If stats are not available or if |
| 293 | +for any reason a cost could not be computed, the ``ELIMINATE_CROSS_JOINS`` strategy is used. |
| 294 | + |
| 295 | +The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.join-reordering-strategy\`\``. |
| 296 | + |
| 297 | +``confidence_based_broadcast`` |
| 298 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 299 | + |
| 300 | +* **Type:** ``boolean`` |
| 301 | +* **Default value:** ``false`` |
| 302 | + |
| 303 | +Enable broadcasting based on the confidence of the statistics that are being used, by |
| 304 | +broadcasting the side of a joinNode which has the highest (``HIGH`` or ``FACT``) confidence statistics. |
| 305 | +If both sides have the same confidence statistics, then the original behavior will be followed. |
| 306 | + |
| 307 | +The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.confidence-based-broadcast\`\``. |
| 308 | + |
| 309 | +``treat-low-confidence-zero-estimation-as-unknown`` |
| 310 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 311 | + |
| 312 | +* **Type:** ``boolean`` |
| 313 | +* **Default value:** ``false`` |
| 314 | + |
| 315 | +Enable treating ``LOW`` confidence, zero estimations as ``UNKNOWN`` during joins. |
| 316 | + |
| 317 | +The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.treat-low-confidence-zero-estimation-as-unknown\`\``. |
| 318 | + |
| 319 | +``retry-query-with-history-based-optimization`` |
| 320 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 321 | + |
| 322 | +* **Type:** ``boolean`` |
| 323 | +* **Default value:** ``false`` |
| 324 | + |
| 325 | +Enable retry for failed queries who can potentially be helped by HBO. |
| 326 | + |
| 327 | +The corresponding configuration property is :ref:`admin/properties:\`\`optimizer.retry-query-with-history-based-optimization\`\``. |
0 commit comments