-
Notifications
You must be signed in to change notification settings - Fork 627
fix(parquet store gateways): correctly locate labels parquet files locally #11894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
francoposa
approved these changes
Jun 27, 2025
npazosmendez
added a commit
that referenced
this pull request
Jun 27, 2025
jesusvazquez
pushed a commit
that referenced
this pull request
Jun 30, 2025
npazosmendez
added a commit
that referenced
this pull request
Jul 1, 2025
npazosmendez
added a commit
that referenced
this pull request
Jul 7, 2025
francoposa
pushed a commit
that referenced
this pull request
Jul 8, 2025
francoposa
pushed a commit
that referenced
this pull request
Jul 8, 2025
jesusvazquez
pushed a commit
that referenced
this pull request
Jul 10, 2025
jesusvazquez
pushed a commit
that referenced
this pull request
Jul 14, 2025
npazosmendez
added a commit
that referenced
this pull request
Jul 18, 2025
francoposa
pushed a commit
that referenced
this pull request
Jul 21, 2025
francoposa
added a commit
that referenced
this pull request
Jul 31, 2025
* bring in prometheus/parquet-common code to new package; replace efficient-go errors with pkg/errors; satisfy mimir-prometheus ChunkSeries interface * revert breaking upgrade to thanos/objstore * fix test require * attempt to update go version for strange errors * fix stringlabels issues * update license headers with AGPL and upstream attribution * fix errors.Is lints fix errors.Is lints * fix sort and cancel cause lints * correct go.mod & vendor in from main to solve conflicts * use env var to flag parquet promql acceptance * fix deps from main again * fix deps from main again * fix deps from main again * fix deps from main again implement new parquet-converter service (#11499) * bring in parquet-converter from parquet-mimir PoC * make docs * make reference-help * stop using the compactor's config * remove BlockRanges config, convert all levels of blocks * drop unused BlockWithExtension struct * rename ownBlock to own * move index fetch outside of for loop * lowercase logs * wording: compact => convert * some cleanup * skip blocks for which compaction mark failed download * simplfy convertBlock function * cleanup * Write Compact Mark * remove parquetIndex, we don't neeed it yet at least * use MetaFetcher to discover blocks * make reference-help and mark as experimental * cleanup: we don't need indexes anymore * revert index loader changes * basic TestParquetConverter * make reference-help * lint * happy linter * make docs * fix: correctly initialize memerlist KV for parquet converter * lint: sort lines * more wording fixes: compact => convert * licence header * version 1 * remove parquet-converter from 'backend' and 'all' modules it's experimental and meant to be run alone * address docs feedback * remove unused consts * increase timeout for a test TestPartitionReader_ShouldNotMissRecordsIfKafkaReturnsAFetchBothWithAnErrorAndSomeRecords parquet-converter: Introduce metrics and ring test (#11600) * parquet-converter: Introduce metrics and ring test This commit introduces a ring test to verify that sharding is working as expected. It also introduces metrics to measure total conversions, failures and durations. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> converter: proper error handling to measure failures parquet converter in docker compose (#11633) * add parquet-converter to docker-compose microservices setup * format jsonnet fix(parquet converter): close TSDB block after conversion (#11635) parquet: vendor back from parquet-common (#11644) introduce store-gateway.parquet-enabled flag & docs (#11722) upgrade prometheus parquet-common dependency (#11723) parquet store-gateways introduce stores interface (#11724) * declare Stores interface satisfied by BucketStores and future Parquet store * add casts to for uses of existing impl which are not protected by interface * stub out parquet bucket stores implementation * most minimal initialization of Parquet Bucket Stores when flag is enabled * license header parquet: Scaffolding for parquet bucket store Series() (#11729) * parquet: Scaffolding for parquet bucket store * use parquetshardopener and be sure to close them * gci pkg/storegateway/parquet_bucket_stores.go Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix split between Parquet Stores and each tenant's Store (#11735) fix split between Parquet Stores and each tenant's Store parquet store-gateways blocks sync and lazy reader (#11759) parquet-bucket-store: finish implementing Stores interface (#11772) We're trying to mirror the existing bucket store structure for the parquet implementation and in this PR i'm just trying to implement some of the necessary methods starting with building up the series sets for labels calls. - Series - LabelNames - LabelValues --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: Nicolas Pazos <nicolas.pazos-mendez@grafana.com> Co-authored-by: Nicolás Pazos <npazosmendez@gmail.com> fix(parquet): share `ReaderPoolMetrics` instance (#11851) We create multiple instances of `ReaderPool`, passing the registry and creating the metrics on the fly causes panics. fix(parquet store gateway): close things that should be closed (#11865) feat(parquet store gateway): support download labels file without validating (#11866) Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: francoposa <franco@francoposa.io> fix(parquet store gateway): pass blockReader to bucket block constructor (#11875) fix: don't stop nil services fix(parquet store gateways): correctly locate labels parquet files locally (#11894) parquet bucket store: add some debug logging (#11925) Adding few log statements to the existing code path with useful information to understand when and why we are returning 0 series. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet store gateways: several fixes and basic tests (#11929) Co-authored-by: francoposa <franco@francoposa.io> Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet converter: include user id in converter counter metrics (#11966) Adding user id to the converter metrics to better track converter progress through tenants. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Parquet converter: Implement priority queue for block conversion (#11980) This PR redesigns the parquet converter to use a non-blocking priority queue that prioritises recently uploaded blocks for conversion. * Priority Queue Implementation: - Replaces blocking nested loops with a thread-safe priority queue using container/heap - Blocks are prioritized by ULID timestamp, ensuring older blocks are processed first * Separate block discovery: - There is a new discovery goroutine that periodically discovers users and blocks, enqueuing them for processing - If the block was previously processed it will be marked as converted and skipped the next time its discovered. - There is a new configuration flag `parquet-converter.max-block-age` that allows us to have a rolling window of blocks so we dont queue up all the work at once. We can set this to 30 days and only blocks up to 30 days old will be converted, when the work is completed we can go and increase that window again. - There is a new processing goroutine that continuously consumes from the priority queue and converts blocks - Main Loop remains responsive and handles only service lifecycle events * New metrics - Since we added a priority queue, I added 5 new metrics for queue monitoring: - cortex_parquet_converter_queue_size - Current queue depth - cortex_parquet_converter_queue_wait_time_seconds - Time blocks spend queued - cortex_parquet_converter_queue_items_enqueued_total - Total blocks enqueued - cortex_parquet_converter_queue_items_processed_total - Total blocks processed - cortex_parquet_converter_queue_items_dropped_total - Total blocks dropped when queue closed The idea here is that by looking at the queue metrics we can have an idea of how much scaling up we need to deal with the pending work. Also, before this PR we had no idea of how much work was left to be done but now we will. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix(parquet store gateway): obey query sharding matchers (#12018) Inefficient, but at least correct query sharding. The new test on sharding fails on the base branch. It's not trivial to add caching to the hashes like the main path does, because we don't have a `SeriesRef` to use as a cache key at the block level (to match what the main path does). We could in theory use something like the row number in the parquet file, but we don't have easy access to that in this part of the code. In any case, the priority right now is correctness, we'll work on optimizing later as appropriate. For referece, see how query sharding is handled on the main path: https://github.com/grafana/mimir/blob/604775d447c0a9e893fa6930ef8f2d403ebe6757/pkg/storegateway/series_refs.go#L1021-L1047 fix(parquet store gateway): panic in Series call with SkipChunks (#12020) `chunksIt` is `nil` when `SkipChunks` is `true`. parquet-converter debug log messages (#12021) Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> chore(parquet): Bump parquet-common dependency (#12023) Brings the last commit from parquet-common [0811a700a852759c16799358b4424d9888afec3f](prometheus-community/parquet-common@0811a70) See link for the diff between the two commits prometheus-community/parquet-common@76512c6...0811a70 --------- Co-authored-by: francoposa <franco@francoposa.io> feature(parquet): Implement store-gateway limits (#12040) This PR is based on the upstream work prometheus-community/parquet-common#81 The idea is to implement a set of basic quota limiters that can protect us against potential bad queries for the gateways. Note we had to bring bits of the code available in the querier in upstream because we have our own chunk querier in Mimir. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
jesusvazquez
pushed a commit
that referenced
this pull request
Aug 2, 2025
* bring in prometheus/parquet-common code to new package; replace efficient-go errors with pkg/errors; satisfy mimir-prometheus ChunkSeries interface * revert breaking upgrade to thanos/objstore * fix test require * attempt to update go version for strange errors * fix stringlabels issues * update license headers with AGPL and upstream attribution * fix errors.Is lints fix errors.Is lints * fix sort and cancel cause lints * correct go.mod & vendor in from main to solve conflicts * use env var to flag parquet promql acceptance * fix deps from main again * fix deps from main again * fix deps from main again * fix deps from main again implement new parquet-converter service (#11499) * bring in parquet-converter from parquet-mimir PoC * make docs * make reference-help * stop using the compactor's config * remove BlockRanges config, convert all levels of blocks * drop unused BlockWithExtension struct * rename ownBlock to own * move index fetch outside of for loop * lowercase logs * wording: compact => convert * some cleanup * skip blocks for which compaction mark failed download * simplfy convertBlock function * cleanup * Write Compact Mark * remove parquetIndex, we don't neeed it yet at least * use MetaFetcher to discover blocks * make reference-help and mark as experimental * cleanup: we don't need indexes anymore * revert index loader changes * basic TestParquetConverter * make reference-help * lint * happy linter * make docs * fix: correctly initialize memerlist KV for parquet converter * lint: sort lines * more wording fixes: compact => convert * licence header * version 1 * remove parquet-converter from 'backend' and 'all' modules it's experimental and meant to be run alone * address docs feedback * remove unused consts * increase timeout for a test TestPartitionReader_ShouldNotMissRecordsIfKafkaReturnsAFetchBothWithAnErrorAndSomeRecords parquet-converter: Introduce metrics and ring test (#11600) * parquet-converter: Introduce metrics and ring test This commit introduces a ring test to verify that sharding is working as expected. It also introduces metrics to measure total conversions, failures and durations. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> converter: proper error handling to measure failures parquet converter in docker compose (#11633) * add parquet-converter to docker-compose microservices setup * format jsonnet fix(parquet converter): close TSDB block after conversion (#11635) parquet: vendor back from parquet-common (#11644) introduce store-gateway.parquet-enabled flag & docs (#11722) upgrade prometheus parquet-common dependency (#11723) parquet store-gateways introduce stores interface (#11724) * declare Stores interface satisfied by BucketStores and future Parquet store * add casts to for uses of existing impl which are not protected by interface * stub out parquet bucket stores implementation * most minimal initialization of Parquet Bucket Stores when flag is enabled * license header parquet: Scaffolding for parquet bucket store Series() (#11729) * parquet: Scaffolding for parquet bucket store * use parquetshardopener and be sure to close them * gci pkg/storegateway/parquet_bucket_stores.go Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix split between Parquet Stores and each tenant's Store (#11735) fix split between Parquet Stores and each tenant's Store parquet store-gateways blocks sync and lazy reader (#11759) parquet-bucket-store: finish implementing Stores interface (#11772) We're trying to mirror the existing bucket store structure for the parquet implementation and in this PR i'm just trying to implement some of the necessary methods starting with building up the series sets for labels calls. - Series - LabelNames - LabelValues --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: Nicolas Pazos <nicolas.pazos-mendez@grafana.com> Co-authored-by: Nicolás Pazos <npazosmendez@gmail.com> fix(parquet): share `ReaderPoolMetrics` instance (#11851) We create multiple instances of `ReaderPool`, passing the registry and creating the metrics on the fly causes panics. fix(parquet store gateway): close things that should be closed (#11865) feat(parquet store gateway): support download labels file without validating (#11866) Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: francoposa <franco@francoposa.io> fix(parquet store gateway): pass blockReader to bucket block constructor (#11875) fix: don't stop nil services fix(parquet store gateways): correctly locate labels parquet files locally (#11894) parquet bucket store: add some debug logging (#11925) Adding few log statements to the existing code path with useful information to understand when and why we are returning 0 series. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet store gateways: several fixes and basic tests (#11929) Co-authored-by: francoposa <franco@francoposa.io> Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet converter: include user id in converter counter metrics (#11966) Adding user id to the converter metrics to better track converter progress through tenants. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Parquet converter: Implement priority queue for block conversion (#11980) This PR redesigns the parquet converter to use a non-blocking priority queue that prioritises recently uploaded blocks for conversion. * Priority Queue Implementation: - Replaces blocking nested loops with a thread-safe priority queue using container/heap - Blocks are prioritized by ULID timestamp, ensuring older blocks are processed first * Separate block discovery: - There is a new discovery goroutine that periodically discovers users and blocks, enqueuing them for processing - If the block was previously processed it will be marked as converted and skipped the next time its discovered. - There is a new configuration flag `parquet-converter.max-block-age` that allows us to have a rolling window of blocks so we dont queue up all the work at once. We can set this to 30 days and only blocks up to 30 days old will be converted, when the work is completed we can go and increase that window again. - There is a new processing goroutine that continuously consumes from the priority queue and converts blocks - Main Loop remains responsive and handles only service lifecycle events * New metrics - Since we added a priority queue, I added 5 new metrics for queue monitoring: - cortex_parquet_converter_queue_size - Current queue depth - cortex_parquet_converter_queue_wait_time_seconds - Time blocks spend queued - cortex_parquet_converter_queue_items_enqueued_total - Total blocks enqueued - cortex_parquet_converter_queue_items_processed_total - Total blocks processed - cortex_parquet_converter_queue_items_dropped_total - Total blocks dropped when queue closed The idea here is that by looking at the queue metrics we can have an idea of how much scaling up we need to deal with the pending work. Also, before this PR we had no idea of how much work was left to be done but now we will. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix(parquet store gateway): obey query sharding matchers (#12018) Inefficient, but at least correct query sharding. The new test on sharding fails on the base branch. It's not trivial to add caching to the hashes like the main path does, because we don't have a `SeriesRef` to use as a cache key at the block level (to match what the main path does). We could in theory use something like the row number in the parquet file, but we don't have easy access to that in this part of the code. In any case, the priority right now is correctness, we'll work on optimizing later as appropriate. For referece, see how query sharding is handled on the main path: https://github.com/grafana/mimir/blob/604775d447c0a9e893fa6930ef8f2d403ebe6757/pkg/storegateway/series_refs.go#L1021-L1047 fix(parquet store gateway): panic in Series call with SkipChunks (#12020) `chunksIt` is `nil` when `SkipChunks` is `true`. parquet-converter debug log messages (#12021) Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> chore(parquet): Bump parquet-common dependency (#12023) Brings the last commit from parquet-common [0811a700a852759c16799358b4424d9888afec3f](prometheus-community/parquet-common@0811a70) See link for the diff between the two commits prometheus-community/parquet-common@76512c6...0811a70 --------- Co-authored-by: francoposa <franco@francoposa.io> feature(parquet): Implement store-gateway limits (#12040) This PR is based on the upstream work prometheus-community/parquet-common#81 The idea is to implement a set of basic quota limiters that can protect us against potential bad queries for the gateways. Note we had to bring bits of the code available in the querier in upstream because we have our own chunk querier in Mimir. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
jesusvazquez
pushed a commit
that referenced
this pull request
Aug 2, 2025
* bring in prometheus/parquet-common code to new package; replace efficient-go errors with pkg/errors; satisfy mimir-prometheus ChunkSeries interface * revert breaking upgrade to thanos/objstore * fix test require * attempt to update go version for strange errors * fix stringlabels issues * update license headers with AGPL and upstream attribution * fix errors.Is lints fix errors.Is lints * fix sort and cancel cause lints * correct go.mod & vendor in from main to solve conflicts * use env var to flag parquet promql acceptance * fix deps from main again * fix deps from main again * fix deps from main again * fix deps from main again implement new parquet-converter service (#11499) * bring in parquet-converter from parquet-mimir PoC * make docs * make reference-help * stop using the compactor's config * remove BlockRanges config, convert all levels of blocks * drop unused BlockWithExtension struct * rename ownBlock to own * move index fetch outside of for loop * lowercase logs * wording: compact => convert * some cleanup * skip blocks for which compaction mark failed download * simplfy convertBlock function * cleanup * Write Compact Mark * remove parquetIndex, we don't neeed it yet at least * use MetaFetcher to discover blocks * make reference-help and mark as experimental * cleanup: we don't need indexes anymore * revert index loader changes * basic TestParquetConverter * make reference-help * lint * happy linter * make docs * fix: correctly initialize memerlist KV for parquet converter * lint: sort lines * more wording fixes: compact => convert * licence header * version 1 * remove parquet-converter from 'backend' and 'all' modules it's experimental and meant to be run alone * address docs feedback * remove unused consts * increase timeout for a test TestPartitionReader_ShouldNotMissRecordsIfKafkaReturnsAFetchBothWithAnErrorAndSomeRecords parquet-converter: Introduce metrics and ring test (#11600) * parquet-converter: Introduce metrics and ring test This commit introduces a ring test to verify that sharding is working as expected. It also introduces metrics to measure total conversions, failures and durations. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> converter: proper error handling to measure failures parquet converter in docker compose (#11633) * add parquet-converter to docker-compose microservices setup * format jsonnet fix(parquet converter): close TSDB block after conversion (#11635) parquet: vendor back from parquet-common (#11644) introduce store-gateway.parquet-enabled flag & docs (#11722) upgrade prometheus parquet-common dependency (#11723) parquet store-gateways introduce stores interface (#11724) * declare Stores interface satisfied by BucketStores and future Parquet store * add casts to for uses of existing impl which are not protected by interface * stub out parquet bucket stores implementation * most minimal initialization of Parquet Bucket Stores when flag is enabled * license header parquet: Scaffolding for parquet bucket store Series() (#11729) * parquet: Scaffolding for parquet bucket store * use parquetshardopener and be sure to close them * gci pkg/storegateway/parquet_bucket_stores.go Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix split between Parquet Stores and each tenant's Store (#11735) fix split between Parquet Stores and each tenant's Store parquet store-gateways blocks sync and lazy reader (#11759) parquet-bucket-store: finish implementing Stores interface (#11772) We're trying to mirror the existing bucket store structure for the parquet implementation and in this PR i'm just trying to implement some of the necessary methods starting with building up the series sets for labels calls. - Series - LabelNames - LabelValues --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: Nicolas Pazos <nicolas.pazos-mendez@grafana.com> Co-authored-by: Nicolás Pazos <npazosmendez@gmail.com> fix(parquet): share `ReaderPoolMetrics` instance (#11851) We create multiple instances of `ReaderPool`, passing the registry and creating the metrics on the fly causes panics. fix(parquet store gateway): close things that should be closed (#11865) feat(parquet store gateway): support download labels file without validating (#11866) Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: francoposa <franco@francoposa.io> fix(parquet store gateway): pass blockReader to bucket block constructor (#11875) fix: don't stop nil services fix(parquet store gateways): correctly locate labels parquet files locally (#11894) parquet bucket store: add some debug logging (#11925) Adding few log statements to the existing code path with useful information to understand when and why we are returning 0 series. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet store gateways: several fixes and basic tests (#11929) Co-authored-by: francoposa <franco@francoposa.io> Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet converter: include user id in converter counter metrics (#11966) Adding user id to the converter metrics to better track converter progress through tenants. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Parquet converter: Implement priority queue for block conversion (#11980) This PR redesigns the parquet converter to use a non-blocking priority queue that prioritises recently uploaded blocks for conversion. * Priority Queue Implementation: - Replaces blocking nested loops with a thread-safe priority queue using container/heap - Blocks are prioritized by ULID timestamp, ensuring older blocks are processed first * Separate block discovery: - There is a new discovery goroutine that periodically discovers users and blocks, enqueuing them for processing - If the block was previously processed it will be marked as converted and skipped the next time its discovered. - There is a new configuration flag `parquet-converter.max-block-age` that allows us to have a rolling window of blocks so we dont queue up all the work at once. We can set this to 30 days and only blocks up to 30 days old will be converted, when the work is completed we can go and increase that window again. - There is a new processing goroutine that continuously consumes from the priority queue and converts blocks - Main Loop remains responsive and handles only service lifecycle events * New metrics - Since we added a priority queue, I added 5 new metrics for queue monitoring: - cortex_parquet_converter_queue_size - Current queue depth - cortex_parquet_converter_queue_wait_time_seconds - Time blocks spend queued - cortex_parquet_converter_queue_items_enqueued_total - Total blocks enqueued - cortex_parquet_converter_queue_items_processed_total - Total blocks processed - cortex_parquet_converter_queue_items_dropped_total - Total blocks dropped when queue closed The idea here is that by looking at the queue metrics we can have an idea of how much scaling up we need to deal with the pending work. Also, before this PR we had no idea of how much work was left to be done but now we will. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix(parquet store gateway): obey query sharding matchers (#12018) Inefficient, but at least correct query sharding. The new test on sharding fails on the base branch. It's not trivial to add caching to the hashes like the main path does, because we don't have a `SeriesRef` to use as a cache key at the block level (to match what the main path does). We could in theory use something like the row number in the parquet file, but we don't have easy access to that in this part of the code. In any case, the priority right now is correctness, we'll work on optimizing later as appropriate. For referece, see how query sharding is handled on the main path: https://github.com/grafana/mimir/blob/604775d447c0a9e893fa6930ef8f2d403ebe6757/pkg/storegateway/series_refs.go#L1021-L1047 fix(parquet store gateway): panic in Series call with SkipChunks (#12020) `chunksIt` is `nil` when `SkipChunks` is `true`. parquet-converter debug log messages (#12021) Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> chore(parquet): Bump parquet-common dependency (#12023) Brings the last commit from parquet-common [0811a700a852759c16799358b4424d9888afec3f](prometheus-community/parquet-common@0811a70) See link for the diff between the two commits prometheus-community/parquet-common@76512c6...0811a70 --------- Co-authored-by: francoposa <franco@francoposa.io> feature(parquet): Implement store-gateway limits (#12040) This PR is based on the upstream work prometheus-community/parquet-common#81 The idea is to implement a set of basic quota limiters that can protect us against potential bad queries for the gateways. Note we had to bring bits of the code available in the querier in upstream because we have our own chunk querier in Mimir. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
jesusvazquez
pushed a commit
that referenced
this pull request
Aug 8, 2025
* bring in prometheus/parquet-common code to new package; replace efficient-go errors with pkg/errors; satisfy mimir-prometheus ChunkSeries interface * revert breaking upgrade to thanos/objstore * fix test require * attempt to update go version for strange errors * fix stringlabels issues * update license headers with AGPL and upstream attribution * fix errors.Is lints fix errors.Is lints * fix sort and cancel cause lints * correct go.mod & vendor in from main to solve conflicts * use env var to flag parquet promql acceptance * fix deps from main again * fix deps from main again * fix deps from main again * fix deps from main again implement new parquet-converter service (#11499) * bring in parquet-converter from parquet-mimir PoC * make docs * make reference-help * stop using the compactor's config * remove BlockRanges config, convert all levels of blocks * drop unused BlockWithExtension struct * rename ownBlock to own * move index fetch outside of for loop * lowercase logs * wording: compact => convert * some cleanup * skip blocks for which compaction mark failed download * simplfy convertBlock function * cleanup * Write Compact Mark * remove parquetIndex, we don't neeed it yet at least * use MetaFetcher to discover blocks * make reference-help and mark as experimental * cleanup: we don't need indexes anymore * revert index loader changes * basic TestParquetConverter * make reference-help * lint * happy linter * make docs * fix: correctly initialize memerlist KV for parquet converter * lint: sort lines * more wording fixes: compact => convert * licence header * version 1 * remove parquet-converter from 'backend' and 'all' modules it's experimental and meant to be run alone * address docs feedback * remove unused consts * increase timeout for a test TestPartitionReader_ShouldNotMissRecordsIfKafkaReturnsAFetchBothWithAnErrorAndSomeRecords parquet-converter: Introduce metrics and ring test (#11600) * parquet-converter: Introduce metrics and ring test This commit introduces a ring test to verify that sharding is working as expected. It also introduces metrics to measure total conversions, failures and durations. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> converter: proper error handling to measure failures parquet converter in docker compose (#11633) * add parquet-converter to docker-compose microservices setup * format jsonnet fix(parquet converter): close TSDB block after conversion (#11635) parquet: vendor back from parquet-common (#11644) introduce store-gateway.parquet-enabled flag & docs (#11722) upgrade prometheus parquet-common dependency (#11723) parquet store-gateways introduce stores interface (#11724) * declare Stores interface satisfied by BucketStores and future Parquet store * add casts to for uses of existing impl which are not protected by interface * stub out parquet bucket stores implementation * most minimal initialization of Parquet Bucket Stores when flag is enabled * license header parquet: Scaffolding for parquet bucket store Series() (#11729) * parquet: Scaffolding for parquet bucket store * use parquetshardopener and be sure to close them * gci pkg/storegateway/parquet_bucket_stores.go Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix split between Parquet Stores and each tenant's Store (#11735) fix split between Parquet Stores and each tenant's Store parquet store-gateways blocks sync and lazy reader (#11759) parquet-bucket-store: finish implementing Stores interface (#11772) We're trying to mirror the existing bucket store structure for the parquet implementation and in this PR i'm just trying to implement some of the necessary methods starting with building up the series sets for labels calls. - Series - LabelNames - LabelValues --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: Nicolas Pazos <nicolas.pazos-mendez@grafana.com> Co-authored-by: Nicolás Pazos <npazosmendez@gmail.com> fix(parquet): share `ReaderPoolMetrics` instance (#11851) We create multiple instances of `ReaderPool`, passing the registry and creating the metrics on the fly causes panics. fix(parquet store gateway): close things that should be closed (#11865) feat(parquet store gateway): support download labels file without validating (#11866) Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: francoposa <franco@francoposa.io> fix(parquet store gateway): pass blockReader to bucket block constructor (#11875) fix: don't stop nil services fix(parquet store gateways): correctly locate labels parquet files locally (#11894) parquet bucket store: add some debug logging (#11925) Adding few log statements to the existing code path with useful information to understand when and why we are returning 0 series. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet store gateways: several fixes and basic tests (#11929) Co-authored-by: francoposa <franco@francoposa.io> Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet converter: include user id in converter counter metrics (#11966) Adding user id to the converter metrics to better track converter progress through tenants. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Parquet converter: Implement priority queue for block conversion (#11980) This PR redesigns the parquet converter to use a non-blocking priority queue that prioritises recently uploaded blocks for conversion. * Priority Queue Implementation: - Replaces blocking nested loops with a thread-safe priority queue using container/heap - Blocks are prioritized by ULID timestamp, ensuring older blocks are processed first * Separate block discovery: - There is a new discovery goroutine that periodically discovers users and blocks, enqueuing them for processing - If the block was previously processed it will be marked as converted and skipped the next time its discovered. - There is a new configuration flag `parquet-converter.max-block-age` that allows us to have a rolling window of blocks so we dont queue up all the work at once. We can set this to 30 days and only blocks up to 30 days old will be converted, when the work is completed we can go and increase that window again. - There is a new processing goroutine that continuously consumes from the priority queue and converts blocks - Main Loop remains responsive and handles only service lifecycle events * New metrics - Since we added a priority queue, I added 5 new metrics for queue monitoring: - cortex_parquet_converter_queue_size - Current queue depth - cortex_parquet_converter_queue_wait_time_seconds - Time blocks spend queued - cortex_parquet_converter_queue_items_enqueued_total - Total blocks enqueued - cortex_parquet_converter_queue_items_processed_total - Total blocks processed - cortex_parquet_converter_queue_items_dropped_total - Total blocks dropped when queue closed The idea here is that by looking at the queue metrics we can have an idea of how much scaling up we need to deal with the pending work. Also, before this PR we had no idea of how much work was left to be done but now we will. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix(parquet store gateway): obey query sharding matchers (#12018) Inefficient, but at least correct query sharding. The new test on sharding fails on the base branch. It's not trivial to add caching to the hashes like the main path does, because we don't have a `SeriesRef` to use as a cache key at the block level (to match what the main path does). We could in theory use something like the row number in the parquet file, but we don't have easy access to that in this part of the code. In any case, the priority right now is correctness, we'll work on optimizing later as appropriate. For referece, see how query sharding is handled on the main path: https://github.com/grafana/mimir/blob/604775d447c0a9e893fa6930ef8f2d403ebe6757/pkg/storegateway/series_refs.go#L1021-L1047 fix(parquet store gateway): panic in Series call with SkipChunks (#12020) `chunksIt` is `nil` when `SkipChunks` is `true`. parquet-converter debug log messages (#12021) Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> chore(parquet): Bump parquet-common dependency (#12023) Brings the last commit from parquet-common [0811a700a852759c16799358b4424d9888afec3f](prometheus-community/parquet-common@0811a70) See link for the diff between the two commits prometheus-community/parquet-common@76512c6...0811a70 --------- Co-authored-by: francoposa <franco@francoposa.io> feature(parquet): Implement store-gateway limits (#12040) This PR is based on the upstream work prometheus-community/parquet-common#81 The idea is to implement a set of basic quota limiters that can protect us against potential bad queries for the gateways. Note we had to bring bits of the code available in the querier in upstream because we have our own chunk querier in Mimir. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
jesusvazquez
pushed a commit
that referenced
this pull request
Aug 8, 2025
* bring in prometheus/parquet-common code to new package; replace efficient-go errors with pkg/errors; satisfy mimir-prometheus ChunkSeries interface * revert breaking upgrade to thanos/objstore * fix test require * attempt to update go version for strange errors * fix stringlabels issues * update license headers with AGPL and upstream attribution * fix errors.Is lints fix errors.Is lints * fix sort and cancel cause lints * correct go.mod & vendor in from main to solve conflicts * use env var to flag parquet promql acceptance * fix deps from main again * fix deps from main again * fix deps from main again * fix deps from main again implement new parquet-converter service (#11499) * bring in parquet-converter from parquet-mimir PoC * make docs * make reference-help * stop using the compactor's config * remove BlockRanges config, convert all levels of blocks * drop unused BlockWithExtension struct * rename ownBlock to own * move index fetch outside of for loop * lowercase logs * wording: compact => convert * some cleanup * skip blocks for which compaction mark failed download * simplfy convertBlock function * cleanup * Write Compact Mark * remove parquetIndex, we don't neeed it yet at least * use MetaFetcher to discover blocks * make reference-help and mark as experimental * cleanup: we don't need indexes anymore * revert index loader changes * basic TestParquetConverter * make reference-help * lint * happy linter * make docs * fix: correctly initialize memerlist KV for parquet converter * lint: sort lines * more wording fixes: compact => convert * licence header * version 1 * remove parquet-converter from 'backend' and 'all' modules it's experimental and meant to be run alone * address docs feedback * remove unused consts * increase timeout for a test TestPartitionReader_ShouldNotMissRecordsIfKafkaReturnsAFetchBothWithAnErrorAndSomeRecords parquet-converter: Introduce metrics and ring test (#11600) * parquet-converter: Introduce metrics and ring test This commit introduces a ring test to verify that sharding is working as expected. It also introduces metrics to measure total conversions, failures and durations. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> converter: proper error handling to measure failures parquet converter in docker compose (#11633) * add parquet-converter to docker-compose microservices setup * format jsonnet fix(parquet converter): close TSDB block after conversion (#11635) parquet: vendor back from parquet-common (#11644) introduce store-gateway.parquet-enabled flag & docs (#11722) upgrade prometheus parquet-common dependency (#11723) parquet store-gateways introduce stores interface (#11724) * declare Stores interface satisfied by BucketStores and future Parquet store * add casts to for uses of existing impl which are not protected by interface * stub out parquet bucket stores implementation * most minimal initialization of Parquet Bucket Stores when flag is enabled * license header parquet: Scaffolding for parquet bucket store Series() (#11729) * parquet: Scaffolding for parquet bucket store * use parquetshardopener and be sure to close them * gci pkg/storegateway/parquet_bucket_stores.go Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix split between Parquet Stores and each tenant's Store (#11735) fix split between Parquet Stores and each tenant's Store parquet store-gateways blocks sync and lazy reader (#11759) parquet-bucket-store: finish implementing Stores interface (#11772) We're trying to mirror the existing bucket store structure for the parquet implementation and in this PR i'm just trying to implement some of the necessary methods starting with building up the series sets for labels calls. - Series - LabelNames - LabelValues --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: Nicolas Pazos <nicolas.pazos-mendez@grafana.com> Co-authored-by: Nicolás Pazos <npazosmendez@gmail.com> fix(parquet): share `ReaderPoolMetrics` instance (#11851) We create multiple instances of `ReaderPool`, passing the registry and creating the metrics on the fly causes panics. fix(parquet store gateway): close things that should be closed (#11865) feat(parquet store gateway): support download labels file without validating (#11866) Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: francoposa <franco@francoposa.io> fix(parquet store gateway): pass blockReader to bucket block constructor (#11875) fix: don't stop nil services fix(parquet store gateways): correctly locate labels parquet files locally (#11894) parquet bucket store: add some debug logging (#11925) Adding few log statements to the existing code path with useful information to understand when and why we are returning 0 series. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet store gateways: several fixes and basic tests (#11929) Co-authored-by: francoposa <franco@francoposa.io> Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet converter: include user id in converter counter metrics (#11966) Adding user id to the converter metrics to better track converter progress through tenants. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Parquet converter: Implement priority queue for block conversion (#11980) This PR redesigns the parquet converter to use a non-blocking priority queue that prioritises recently uploaded blocks for conversion. * Priority Queue Implementation: - Replaces blocking nested loops with a thread-safe priority queue using container/heap - Blocks are prioritized by ULID timestamp, ensuring older blocks are processed first * Separate block discovery: - There is a new discovery goroutine that periodically discovers users and blocks, enqueuing them for processing - If the block was previously processed it will be marked as converted and skipped the next time its discovered. - There is a new configuration flag `parquet-converter.max-block-age` that allows us to have a rolling window of blocks so we dont queue up all the work at once. We can set this to 30 days and only blocks up to 30 days old will be converted, when the work is completed we can go and increase that window again. - There is a new processing goroutine that continuously consumes from the priority queue and converts blocks - Main Loop remains responsive and handles only service lifecycle events * New metrics - Since we added a priority queue, I added 5 new metrics for queue monitoring: - cortex_parquet_converter_queue_size - Current queue depth - cortex_parquet_converter_queue_wait_time_seconds - Time blocks spend queued - cortex_parquet_converter_queue_items_enqueued_total - Total blocks enqueued - cortex_parquet_converter_queue_items_processed_total - Total blocks processed - cortex_parquet_converter_queue_items_dropped_total - Total blocks dropped when queue closed The idea here is that by looking at the queue metrics we can have an idea of how much scaling up we need to deal with the pending work. Also, before this PR we had no idea of how much work was left to be done but now we will. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix(parquet store gateway): obey query sharding matchers (#12018) Inefficient, but at least correct query sharding. The new test on sharding fails on the base branch. It's not trivial to add caching to the hashes like the main path does, because we don't have a `SeriesRef` to use as a cache key at the block level (to match what the main path does). We could in theory use something like the row number in the parquet file, but we don't have easy access to that in this part of the code. In any case, the priority right now is correctness, we'll work on optimizing later as appropriate. For referece, see how query sharding is handled on the main path: https://github.com/grafana/mimir/blob/604775d447c0a9e893fa6930ef8f2d403ebe6757/pkg/storegateway/series_refs.go#L1021-L1047 fix(parquet store gateway): panic in Series call with SkipChunks (#12020) `chunksIt` is `nil` when `SkipChunks` is `true`. parquet-converter debug log messages (#12021) Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> chore(parquet): Bump parquet-common dependency (#12023) Brings the last commit from parquet-common [0811a700a852759c16799358b4424d9888afec3f](prometheus-community/parquet-common@0811a70) See link for the diff between the two commits prometheus-community/parquet-common@76512c6...0811a70 --------- Co-authored-by: francoposa <franco@francoposa.io> feature(parquet): Implement store-gateway limits (#12040) This PR is based on the upstream work prometheus-community/parquet-common#81 The idea is to implement a set of basic quota limiters that can protect us against potential bad queries for the gateways. Note we had to bring bits of the code available in the querier in upstream because we have our own chunk querier in Mimir. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
francoposa
added a commit
that referenced
this pull request
Aug 11, 2025
* bring in prometheus/parquet-common code to new package; replace efficient-go errors with pkg/errors; satisfy mimir-prometheus ChunkSeries interface * revert breaking upgrade to thanos/objstore * fix test require * attempt to update go version for strange errors * fix stringlabels issues * update license headers with AGPL and upstream attribution * fix errors.Is lints fix errors.Is lints * fix sort and cancel cause lints * correct go.mod & vendor in from main to solve conflicts * use env var to flag parquet promql acceptance * fix deps from main again * fix deps from main again * fix deps from main again * fix deps from main again implement new parquet-converter service (#11499) * bring in parquet-converter from parquet-mimir PoC * make docs * make reference-help * stop using the compactor's config * remove BlockRanges config, convert all levels of blocks * drop unused BlockWithExtension struct * rename ownBlock to own * move index fetch outside of for loop * lowercase logs * wording: compact => convert * some cleanup * skip blocks for which compaction mark failed download * simplfy convertBlock function * cleanup * Write Compact Mark * remove parquetIndex, we don't neeed it yet at least * use MetaFetcher to discover blocks * make reference-help and mark as experimental * cleanup: we don't need indexes anymore * revert index loader changes * basic TestParquetConverter * make reference-help * lint * happy linter * make docs * fix: correctly initialize memerlist KV for parquet converter * lint: sort lines * more wording fixes: compact => convert * licence header * version 1 * remove parquet-converter from 'backend' and 'all' modules it's experimental and meant to be run alone * address docs feedback * remove unused consts * increase timeout for a test TestPartitionReader_ShouldNotMissRecordsIfKafkaReturnsAFetchBothWithAnErrorAndSomeRecords parquet-converter: Introduce metrics and ring test (#11600) * parquet-converter: Introduce metrics and ring test This commit introduces a ring test to verify that sharding is working as expected. It also introduces metrics to measure total conversions, failures and durations. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> converter: proper error handling to measure failures parquet converter in docker compose (#11633) * add parquet-converter to docker-compose microservices setup * format jsonnet fix(parquet converter): close TSDB block after conversion (#11635) parquet: vendor back from parquet-common (#11644) introduce store-gateway.parquet-enabled flag & docs (#11722) upgrade prometheus parquet-common dependency (#11723) parquet store-gateways introduce stores interface (#11724) * declare Stores interface satisfied by BucketStores and future Parquet store * add casts to for uses of existing impl which are not protected by interface * stub out parquet bucket stores implementation * most minimal initialization of Parquet Bucket Stores when flag is enabled * license header parquet: Scaffolding for parquet bucket store Series() (#11729) * parquet: Scaffolding for parquet bucket store * use parquetshardopener and be sure to close them * gci pkg/storegateway/parquet_bucket_stores.go Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix split between Parquet Stores and each tenant's Store (#11735) fix split between Parquet Stores and each tenant's Store parquet store-gateways blocks sync and lazy reader (#11759) parquet-bucket-store: finish implementing Stores interface (#11772) We're trying to mirror the existing bucket store structure for the parquet implementation and in this PR i'm just trying to implement some of the necessary methods starting with building up the series sets for labels calls. - Series - LabelNames - LabelValues --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: Nicolas Pazos <nicolas.pazos-mendez@grafana.com> Co-authored-by: Nicolás Pazos <npazosmendez@gmail.com> fix(parquet): share `ReaderPoolMetrics` instance (#11851) We create multiple instances of `ReaderPool`, passing the registry and creating the metrics on the fly causes panics. fix(parquet store gateway): close things that should be closed (#11865) feat(parquet store gateway): support download labels file without validating (#11866) Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: francoposa <franco@francoposa.io> fix(parquet store gateway): pass blockReader to bucket block constructor (#11875) fix: don't stop nil services fix(parquet store gateways): correctly locate labels parquet files locally (#11894) parquet bucket store: add some debug logging (#11925) Adding few log statements to the existing code path with useful information to understand when and why we are returning 0 series. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet store gateways: several fixes and basic tests (#11929) Co-authored-by: francoposa <franco@francoposa.io> Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet converter: include user id in converter counter metrics (#11966) Adding user id to the converter metrics to better track converter progress through tenants. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Parquet converter: Implement priority queue for block conversion (#11980) This PR redesigns the parquet converter to use a non-blocking priority queue that prioritises recently uploaded blocks for conversion. * Priority Queue Implementation: - Replaces blocking nested loops with a thread-safe priority queue using container/heap - Blocks are prioritized by ULID timestamp, ensuring older blocks are processed first * Separate block discovery: - There is a new discovery goroutine that periodically discovers users and blocks, enqueuing them for processing - If the block was previously processed it will be marked as converted and skipped the next time its discovered. - There is a new configuration flag `parquet-converter.max-block-age` that allows us to have a rolling window of blocks so we dont queue up all the work at once. We can set this to 30 days and only blocks up to 30 days old will be converted, when the work is completed we can go and increase that window again. - There is a new processing goroutine that continuously consumes from the priority queue and converts blocks - Main Loop remains responsive and handles only service lifecycle events * New metrics - Since we added a priority queue, I added 5 new metrics for queue monitoring: - cortex_parquet_converter_queue_size - Current queue depth - cortex_parquet_converter_queue_wait_time_seconds - Time blocks spend queued - cortex_parquet_converter_queue_items_enqueued_total - Total blocks enqueued - cortex_parquet_converter_queue_items_processed_total - Total blocks processed - cortex_parquet_converter_queue_items_dropped_total - Total blocks dropped when queue closed The idea here is that by looking at the queue metrics we can have an idea of how much scaling up we need to deal with the pending work. Also, before this PR we had no idea of how much work was left to be done but now we will. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix(parquet store gateway): obey query sharding matchers (#12018) Inefficient, but at least correct query sharding. The new test on sharding fails on the base branch. It's not trivial to add caching to the hashes like the main path does, because we don't have a `SeriesRef` to use as a cache key at the block level (to match what the main path does). We could in theory use something like the row number in the parquet file, but we don't have easy access to that in this part of the code. In any case, the priority right now is correctness, we'll work on optimizing later as appropriate. For referece, see how query sharding is handled on the main path: https://github.com/grafana/mimir/blob/604775d447c0a9e893fa6930ef8f2d403ebe6757/pkg/storegateway/series_refs.go#L1021-L1047 fix(parquet store gateway): panic in Series call with SkipChunks (#12020) `chunksIt` is `nil` when `SkipChunks` is `true`. parquet-converter debug log messages (#12021) Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> chore(parquet): Bump parquet-common dependency (#12023) Brings the last commit from parquet-common [0811a700a852759c16799358b4424d9888afec3f](prometheus-community/parquet-common@0811a70) See link for the diff between the two commits prometheus-community/parquet-common@76512c6...0811a70 --------- Co-authored-by: francoposa <franco@francoposa.io> feature(parquet): Implement store-gateway limits (#12040) This PR is based on the upstream work prometheus-community/parquet-common#81 The idea is to implement a set of basic quota limiters that can protect us against potential bad queries for the gateways. Note we had to bring bits of the code available in the querier in upstream because we have our own chunk querier in Mimir. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
francoposa
added a commit
that referenced
this pull request
Aug 11, 2025
* bring in prometheus/parquet-common code to new package; replace efficient-go errors with pkg/errors; satisfy mimir-prometheus ChunkSeries interface * revert breaking upgrade to thanos/objstore * fix test require * attempt to update go version for strange errors * fix stringlabels issues * update license headers with AGPL and upstream attribution * fix errors.Is lints fix errors.Is lints * fix sort and cancel cause lints * correct go.mod & vendor in from main to solve conflicts * use env var to flag parquet promql acceptance * fix deps from main again * fix deps from main again * fix deps from main again * fix deps from main again implement new parquet-converter service (#11499) * bring in parquet-converter from parquet-mimir PoC * make docs * make reference-help * stop using the compactor's config * remove BlockRanges config, convert all levels of blocks * drop unused BlockWithExtension struct * rename ownBlock to own * move index fetch outside of for loop * lowercase logs * wording: compact => convert * some cleanup * skip blocks for which compaction mark failed download * simplfy convertBlock function * cleanup * Write Compact Mark * remove parquetIndex, we don't neeed it yet at least * use MetaFetcher to discover blocks * make reference-help and mark as experimental * cleanup: we don't need indexes anymore * revert index loader changes * basic TestParquetConverter * make reference-help * lint * happy linter * make docs * fix: correctly initialize memerlist KV for parquet converter * lint: sort lines * more wording fixes: compact => convert * licence header * version 1 * remove parquet-converter from 'backend' and 'all' modules it's experimental and meant to be run alone * address docs feedback * remove unused consts * increase timeout for a test TestPartitionReader_ShouldNotMissRecordsIfKafkaReturnsAFetchBothWithAnErrorAndSomeRecords parquet-converter: Introduce metrics and ring test (#11600) * parquet-converter: Introduce metrics and ring test This commit introduces a ring test to verify that sharding is working as expected. It also introduces metrics to measure total conversions, failures and durations. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> converter: proper error handling to measure failures parquet converter in docker compose (#11633) * add parquet-converter to docker-compose microservices setup * format jsonnet fix(parquet converter): close TSDB block after conversion (#11635) parquet: vendor back from parquet-common (#11644) introduce store-gateway.parquet-enabled flag & docs (#11722) upgrade prometheus parquet-common dependency (#11723) parquet store-gateways introduce stores interface (#11724) * declare Stores interface satisfied by BucketStores and future Parquet store * add casts to for uses of existing impl which are not protected by interface * stub out parquet bucket stores implementation * most minimal initialization of Parquet Bucket Stores when flag is enabled * license header parquet: Scaffolding for parquet bucket store Series() (#11729) * parquet: Scaffolding for parquet bucket store * use parquetshardopener and be sure to close them * gci pkg/storegateway/parquet_bucket_stores.go Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix split between Parquet Stores and each tenant's Store (#11735) fix split between Parquet Stores and each tenant's Store parquet store-gateways blocks sync and lazy reader (#11759) parquet-bucket-store: finish implementing Stores interface (#11772) We're trying to mirror the existing bucket store structure for the parquet implementation and in this PR i'm just trying to implement some of the necessary methods starting with building up the series sets for labels calls. - Series - LabelNames - LabelValues --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: Nicolas Pazos <nicolas.pazos-mendez@grafana.com> Co-authored-by: Nicolás Pazos <npazosmendez@gmail.com> fix(parquet): share `ReaderPoolMetrics` instance (#11851) We create multiple instances of `ReaderPool`, passing the registry and creating the metrics on the fly causes panics. fix(parquet store gateway): close things that should be closed (#11865) feat(parquet store gateway): support download labels file without validating (#11866) Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: francoposa <franco@francoposa.io> fix(parquet store gateway): pass blockReader to bucket block constructor (#11875) fix: don't stop nil services fix(parquet store gateways): correctly locate labels parquet files locally (#11894) parquet bucket store: add some debug logging (#11925) Adding few log statements to the existing code path with useful information to understand when and why we are returning 0 series. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet store gateways: several fixes and basic tests (#11929) Co-authored-by: francoposa <franco@francoposa.io> Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet converter: include user id in converter counter metrics (#11966) Adding user id to the converter metrics to better track converter progress through tenants. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Parquet converter: Implement priority queue for block conversion (#11980) This PR redesigns the parquet converter to use a non-blocking priority queue that prioritises recently uploaded blocks for conversion. * Priority Queue Implementation: - Replaces blocking nested loops with a thread-safe priority queue using container/heap - Blocks are prioritized by ULID timestamp, ensuring older blocks are processed first * Separate block discovery: - There is a new discovery goroutine that periodically discovers users and blocks, enqueuing them for processing - If the block was previously processed it will be marked as converted and skipped the next time its discovered. - There is a new configuration flag `parquet-converter.max-block-age` that allows us to have a rolling window of blocks so we dont queue up all the work at once. We can set this to 30 days and only blocks up to 30 days old will be converted, when the work is completed we can go and increase that window again. - There is a new processing goroutine that continuously consumes from the priority queue and converts blocks - Main Loop remains responsive and handles only service lifecycle events * New metrics - Since we added a priority queue, I added 5 new metrics for queue monitoring: - cortex_parquet_converter_queue_size - Current queue depth - cortex_parquet_converter_queue_wait_time_seconds - Time blocks spend queued - cortex_parquet_converter_queue_items_enqueued_total - Total blocks enqueued - cortex_parquet_converter_queue_items_processed_total - Total blocks processed - cortex_parquet_converter_queue_items_dropped_total - Total blocks dropped when queue closed The idea here is that by looking at the queue metrics we can have an idea of how much scaling up we need to deal with the pending work. Also, before this PR we had no idea of how much work was left to be done but now we will. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix(parquet store gateway): obey query sharding matchers (#12018) Inefficient, but at least correct query sharding. The new test on sharding fails on the base branch. It's not trivial to add caching to the hashes like the main path does, because we don't have a `SeriesRef` to use as a cache key at the block level (to match what the main path does). We could in theory use something like the row number in the parquet file, but we don't have easy access to that in this part of the code. In any case, the priority right now is correctness, we'll work on optimizing later as appropriate. For referece, see how query sharding is handled on the main path: https://github.com/grafana/mimir/blob/604775d447c0a9e893fa6930ef8f2d403ebe6757/pkg/storegateway/series_refs.go#L1021-L1047 fix(parquet store gateway): panic in Series call with SkipChunks (#12020) `chunksIt` is `nil` when `SkipChunks` is `true`. parquet-converter debug log messages (#12021) Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> chore(parquet): Bump parquet-common dependency (#12023) Brings the last commit from parquet-common [0811a700a852759c16799358b4424d9888afec3f](prometheus-community/parquet-common@0811a70) See link for the diff between the two commits prometheus-community/parquet-common@76512c6...0811a70 --------- Co-authored-by: francoposa <franco@francoposa.io> feature(parquet): Implement store-gateway limits (#12040) This PR is based on the upstream work prometheus-community/parquet-common#81 The idea is to implement a set of basic quota limiters that can protect us against potential bad queries for the gateways. Note we had to bring bits of the code available in the querier in upstream because we have our own chunk querier in Mimir. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
francoposa
added a commit
that referenced
this pull request
Aug 12, 2025
* bring in prometheus/parquet-common code to new package; replace efficient-go errors with pkg/errors; satisfy mimir-prometheus ChunkSeries interface * revert breaking upgrade to thanos/objstore * fix test require * attempt to update go version for strange errors * fix stringlabels issues * update license headers with AGPL and upstream attribution * fix errors.Is lints fix errors.Is lints * fix sort and cancel cause lints * correct go.mod & vendor in from main to solve conflicts * use env var to flag parquet promql acceptance * fix deps from main again * fix deps from main again * fix deps from main again * fix deps from main again implement new parquet-converter service (#11499) * bring in parquet-converter from parquet-mimir PoC * make docs * make reference-help * stop using the compactor's config * remove BlockRanges config, convert all levels of blocks * drop unused BlockWithExtension struct * rename ownBlock to own * move index fetch outside of for loop * lowercase logs * wording: compact => convert * some cleanup * skip blocks for which compaction mark failed download * simplfy convertBlock function * cleanup * Write Compact Mark * remove parquetIndex, we don't neeed it yet at least * use MetaFetcher to discover blocks * make reference-help and mark as experimental * cleanup: we don't need indexes anymore * revert index loader changes * basic TestParquetConverter * make reference-help * lint * happy linter * make docs * fix: correctly initialize memerlist KV for parquet converter * lint: sort lines * more wording fixes: compact => convert * licence header * version 1 * remove parquet-converter from 'backend' and 'all' modules it's experimental and meant to be run alone * address docs feedback * remove unused consts * increase timeout for a test TestPartitionReader_ShouldNotMissRecordsIfKafkaReturnsAFetchBothWithAnErrorAndSomeRecords parquet-converter: Introduce metrics and ring test (#11600) * parquet-converter: Introduce metrics and ring test This commit introduces a ring test to verify that sharding is working as expected. It also introduces metrics to measure total conversions, failures and durations. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> converter: proper error handling to measure failures parquet converter in docker compose (#11633) * add parquet-converter to docker-compose microservices setup * format jsonnet fix(parquet converter): close TSDB block after conversion (#11635) parquet: vendor back from parquet-common (#11644) introduce store-gateway.parquet-enabled flag & docs (#11722) upgrade prometheus parquet-common dependency (#11723) parquet store-gateways introduce stores interface (#11724) * declare Stores interface satisfied by BucketStores and future Parquet store * add casts to for uses of existing impl which are not protected by interface * stub out parquet bucket stores implementation * most minimal initialization of Parquet Bucket Stores when flag is enabled * license header parquet: Scaffolding for parquet bucket store Series() (#11729) * parquet: Scaffolding for parquet bucket store * use parquetshardopener and be sure to close them * gci pkg/storegateway/parquet_bucket_stores.go Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix split between Parquet Stores and each tenant's Store (#11735) fix split between Parquet Stores and each tenant's Store parquet store-gateways blocks sync and lazy reader (#11759) parquet-bucket-store: finish implementing Stores interface (#11772) We're trying to mirror the existing bucket store structure for the parquet implementation and in this PR i'm just trying to implement some of the necessary methods starting with building up the series sets for labels calls. - Series - LabelNames - LabelValues --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: Nicolas Pazos <nicolas.pazos-mendez@grafana.com> Co-authored-by: Nicolás Pazos <npazosmendez@gmail.com> fix(parquet): share `ReaderPoolMetrics` instance (#11851) We create multiple instances of `ReaderPool`, passing the registry and creating the metrics on the fly causes panics. fix(parquet store gateway): close things that should be closed (#11865) feat(parquet store gateway): support download labels file without validating (#11866) Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com> Co-authored-by: francoposa <franco@francoposa.io> fix(parquet store gateway): pass blockReader to bucket block constructor (#11875) fix: don't stop nil services fix(parquet store gateways): correctly locate labels parquet files locally (#11894) parquet bucket store: add some debug logging (#11925) Adding few log statements to the existing code path with useful information to understand when and why we are returning 0 series. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet store gateways: several fixes and basic tests (#11929) Co-authored-by: francoposa <franco@francoposa.io> Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> parquet converter: include user id in converter counter metrics (#11966) Adding user id to the converter metrics to better track converter progress through tenants. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Parquet converter: Implement priority queue for block conversion (#11980) This PR redesigns the parquet converter to use a non-blocking priority queue that prioritises recently uploaded blocks for conversion. * Priority Queue Implementation: - Replaces blocking nested loops with a thread-safe priority queue using container/heap - Blocks are prioritized by ULID timestamp, ensuring older blocks are processed first * Separate block discovery: - There is a new discovery goroutine that periodically discovers users and blocks, enqueuing them for processing - If the block was previously processed it will be marked as converted and skipped the next time its discovered. - There is a new configuration flag `parquet-converter.max-block-age` that allows us to have a rolling window of blocks so we dont queue up all the work at once. We can set this to 30 days and only blocks up to 30 days old will be converted, when the work is completed we can go and increase that window again. - There is a new processing goroutine that continuously consumes from the priority queue and converts blocks - Main Loop remains responsive and handles only service lifecycle events * New metrics - Since we added a priority queue, I added 5 new metrics for queue monitoring: - cortex_parquet_converter_queue_size - Current queue depth - cortex_parquet_converter_queue_wait_time_seconds - Time blocks spend queued - cortex_parquet_converter_queue_items_enqueued_total - Total blocks enqueued - cortex_parquet_converter_queue_items_processed_total - Total blocks processed - cortex_parquet_converter_queue_items_dropped_total - Total blocks dropped when queue closed The idea here is that by looking at the queue metrics we can have an idea of how much scaling up we need to deal with the pending work. Also, before this PR we had no idea of how much work was left to be done but now we will. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> fix(parquet store gateway): obey query sharding matchers (#12018) Inefficient, but at least correct query sharding. The new test on sharding fails on the base branch. It's not trivial to add caching to the hashes like the main path does, because we don't have a `SeriesRef` to use as a cache key at the block level (to match what the main path does). We could in theory use something like the row number in the parquet file, but we don't have easy access to that in this part of the code. In any case, the priority right now is correctness, we'll work on optimizing later as appropriate. For referece, see how query sharding is handled on the main path: https://github.com/grafana/mimir/blob/604775d447c0a9e893fa6930ef8f2d403ebe6757/pkg/storegateway/series_refs.go#L1021-L1047 fix(parquet store gateway): panic in Series call with SkipChunks (#12020) `chunksIt` is `nil` when `SkipChunks` is `true`. parquet-converter debug log messages (#12021) Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> chore(parquet): Bump parquet-common dependency (#12023) Brings the last commit from parquet-common [0811a700a852759c16799358b4424d9888afec3f](prometheus-community/parquet-common@0811a70) See link for the diff between the two commits prometheus-community/parquet-common@76512c6...0811a70 --------- Co-authored-by: francoposa <franco@francoposa.io> feature(parquet): Implement store-gateway limits (#12040) This PR is based on the upstream work prometheus-community/parquet-common#81 The idea is to implement a set of basic quota limiters that can protect us against potential bad queries for the gateways. Note we had to bring bits of the code available in the querier in upstream because we have our own chunk querier in Mimir. --------- Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.