Skip to content

Add OpenTelemetry Collector extension/component #603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

martinjt
Copy link

@martinjt martinjt commented Mar 26, 2025

**Closes #602 **

PR Checklist

  • Created a feature/dev branch in your fork (vs. submitting directly from a commit on main)
  • Based off latest main branch of toolkit
  • PR doesn't include merge commits (always rebase on top of our main, if needed)
  • New integration
    • Docs are written
    • Added description of major feature to project description for NuGet package (4000 total character limit, so don't push entire description over that)
  • Tests for the changes have been added (for bug fixes / features) (if applicable)
  • Contains NO breaking changes
  • Every new API (including internal ones) has full XML docs
  • Code follows all style conventions

Other information

@martinjt
Copy link
Author

@dotnet-policy-service agree

@martinjt
Copy link
Author

I'm not sure how ready this is yet. I've been playing with the API a bit to give it enough flexibility around the certificates, images and config files.

Feedback very welcome, I wanted to get this up as is before I added any tests and worked on the documentation, etc.

Copy link
Member

@aaronpowell aaronpowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense that, rather than having a default collector image set, that it's a required property to be provided and there's a method that will set the defaults for you?

So, AddOpenTelemetryCollector is used if you want to specify a collector, whereas AddOpenTelemetryDefaultCollection uses the one that you have currently specified as a default.

Comment on lines 3 to 7
<PropertyGroup>
<TargetFramework>net9.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<PropertyGroup>
<TargetFramework>net9.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>

This will all be provided by the props files above.

Comment on lines 11 to 14
public EnvironmentVariableHook(ILogger<EnvironmentVariableHook> logger)
{
_logger = logger;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a primary constructor

var resources = appModel.GetProjectResources();
var collectorResource = appModel.Resources.OfType<CollectorResource>().FirstOrDefault();

if (collectorResource == null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (collectorResource == null)
if (collectorResource is null)

return Task.CompletedTask;
}

if (resources.Count() == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (resources.Count() == 0)
if (!resources.Any())

foreach (var resourceItem in resources)
{
_logger.LogDebug($"Forwarding Telemetry for {resourceItem.Name} to the collector");
if (resourceItem == null) continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (resourceItem == null) continue;
if (resourceItem is null) continue;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File no longer needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File no longer needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can likely use this in a few places, it should help unblock #388 and #444, so could it go into the Shared folder and be added to the csproj.

/// <summary>
/// The version of the collector, defaults to latest
/// </summary>
public string CollectorVersion { get; set; } = "latest";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pin to a version so it's known stable per the package version

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that pinning is the right approach for that, especially since there isn't a v1 yet. It's generally accepted that people don't pin even in production.

/// <summary>
/// The image of the collector, defaults to ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib
/// </summary>
public string CollectorImage { get; set; } = "ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should be split to registry/image like we do with other container resources. And is there a reason you're using ghcr.io not Docker Hub?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ghcr is the recommended default since the limits went into dockerhub.

Do you have an example of the pattern other integrations use for the registry split.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public const string Registry = "docker.io";
public const string Image = "ollama/ollama";

@martinjt
Copy link
Author

Interesting approach, I'm honestly not sure, but maybe a .WithDefaultImage() might be better?

I was thinking that the approach I showed was more common, but happy if not.

I kinda wanted a minimal default command though.

@aaronpowell
Copy link
Member

Interesting approach, I'm honestly not sure, but maybe a .WithDefaultImage() might be better?

I was thinking that the approach I showed was more common, but happy if not.

I kinda wanted a minimal default command though.

The problem with a method like WithDefaultImage is that you could setup one without any image specified which would be a runtime error, which would be a less than desirable outcome.

@github-actions github-actions bot added the Stale label Apr 6, 2025
@aaronpowell aaronpowell added awaiting response Waiting for the author of the issue to provide more information or answer a question and removed Stale labels Apr 8, 2025
@martinjt
Copy link
Author

martinjt commented Apr 9, 2025

I agree it's not ideal, but I'd prefer to not have them need to know what the image link is to collector. So I believe the best option is to use the default image we recommend in Otel right now (which is latest) when the extension is called without parameters.

@martinjt martinjt force-pushed the otel-collector-component branch from 7ac9a9d to a38c078 Compare August 5, 2025 20:39
@martinjt
Copy link
Author

martinjt commented Aug 5, 2025

I'm coming back at this now @aaronpowell, with a hope to move it forward so I don't need to maintain a separate one (happy to be codeowner here).

On the devcerts, it seems those issues haven't got a resolution, would it be ok to leave the devcerts in this project for now, and consolidate/move later? This project would actually fix both of those issues as the users could run the non-ssl collector and forward their data through the collector as a way to unblock them.

The big outstanding issue is the collector image model, can you let me know what you'd suggest there. It should be latest since there isn't a v1 of the collector yet, and the advice from the collector team is to always use latest.

@aaronpowell
Copy link
Member

On the devcerts, it seems those issues haven't got a resolution, would it be ok to leave the devcerts in this project for now, and consolidate/move later? This project would actually fix both of those issues as the users could run the non-ssl collector and forward their data through the collector as a way to unblock them.

If you can pop it in the Shared folder we can then add it as a linked file into several projects that could benefit from it in the CT as well.

The big outstanding issue is the collector image model, can you let me know what you'd suggest there. It should be latest since there isn't a v1 of the collector yet, and the advice from the collector team is to always use latest.

What's the release frequency that they are going with? The main reason that we pin to a version is so that we don't risk any unexpected changes to command line args or other behaviour getting introduced that impact someone who hasn't opt-ed in to using a newer version. If they are releasing new versions daily, then pinned isn't ideal, but generally speaking, pinning a version is going to be preferred, and people can always specify a version override if we don't release an updated package quick enough (Ollama is a good example of where we get caught on that front).

@martinjt
Copy link
Author

martinjt commented Aug 6, 2025

The release cycle for the collector is every 2 weeks. The compatibility issues wouldn't be commandline args, its more likely the config of collector components that might change, but we haven't done a breaking change in about 6 months and those are expected to be the last.

The biggest issue with pinning a version is security issues, since the amount of components in the collector means that the updates contain those updates regularly.

I'll happily take guidance on what the best way to get this published is.

@martinjt martinjt force-pushed the otel-collector-component branch from 0ae0e03 to 7c26b63 Compare August 7, 2025 20:57
@martinjt
Copy link
Author

martinjt commented Aug 7, 2025

I've done a few major updates, so it probably needs a full re-review.

  • Implemented inbound SSL, which can also be forced off (this helps fix some of the clients that can't use HTTPS because of the certificate issues)
  • Outbound SSL was removed, it won't work until the Certificate supports host.docker.internal or the *.localhost stuff works.
  • Moved to using .WithConfig("config.yaml"), I'm not sure how I'd fail startup if they don't provide that, I'm happy to implement it.
  • .WithConfig() supports multiple configs
  • Moved to using Random ports supplied by Aspire
  • Added SSL certificates for HTTP and gRPC
  • Made the Environment variables detect the protocol that the App wants and use that (useful as Node, by default, supports HTTP not gRPC)
  • Support for both of the endpoint environment variable names ASPIRE_DASHBOARD_OTLP_ENDPOINT_URL and DOTNET_DASHBOARD_OTLP_ENDPOINT_URL
  • Updated DevCerts to not try and add the certificates twice (this will only be useful when the Outbound SSL functionality is available.

This does require the user to implement insecure_skip_verify, but I can't find a way around that until the new certificate work to make the authority match.

@martinjt martinjt marked this pull request as ready for review August 7, 2025 21:35
@Copilot Copilot AI review requested due to automatic review settings August 7, 2025 21:35
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an OpenTelemetry Collector integration to the .NET Aspire Community Toolkit, enabling users to add an OpenTelemetry Collector container to their Aspire applications for observability data collection and forwarding.

  • Introduces a new hosting integration with configurable settings for the OpenTelemetry Collector
  • Provides automatic HTTPS certificate handling for secure connections in development mode
  • Implements environment variable hooks to automatically configure project resources to send telemetry data to the collector

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/Shared/DevCertHostingExtensions.cs Shared utility for injecting ASP.NET Core HTTPS dev certificates into resources via environment variables
src/CommunityToolkit.Aspire.Hosting.OpenTelemetryCollector/OpenTelemetryCollectorSettings.cs Configuration settings class for the OpenTelemetry Collector
src/CommunityToolkit.Aspire.Hosting.OpenTelemetryCollector/EnvironmentVariableHook.cs Lifecycle hook to automatically configure project resources with collector endpoints
src/CommunityToolkit.Aspire.Hosting.OpenTelemetryCollector/CommunityToolkit.Aspire.Hosting.OpenTelemetryCollector.csproj Project file for the OpenTelemetry Collector integration
src/CommunityToolkit.Aspire.Hosting.OpenTelemetryCollector/CollectorResource.cs Resource definition for the OpenTelemetry Collector container
src/CommunityToolkit.Aspire.Hosting.OpenTelemetryCollector/CollectorExtensions.cs Extension methods for adding and configuring the OpenTelemetry Collector
CommunityToolkit.Aspire.slnx Solution file updated to include the new project

martinjt and others added 2 commits August 7, 2025 22:41
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@davidfowl davidfowl removed the awaiting response Waiting for the author of the issue to provide more information or answer a question label Aug 9, 2025
/// This method <strong>does not</strong> configure an HTTPS endpoint on the resource.
/// Use <see cref="ResourceBuilderExtensions.WithHttpsEndpoint{TResource}"/> to configure an HTTPS endpoint.
/// </remarks>
public static IResourceBuilder<TResource> RunWithHttpsDevCertificate<TResource>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't ship this in any of the intergrations as yet and I think we'd rewrite it at this point to behave a bit better given the advances in the model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something "wrong" in the design of how this has been implemented that should be revised? Having a way to bootstrap the dev certificate into a container has been a pain point with a few integrations and having them working with OTEL (DAB and Go Feature Flags off the top of my head).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can self-contain this inside of the Collector to reduce the impact? This was code taken from an example built by the Aspire team with a few amendments to allow some resilience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's internal, so it'll only be available in the assemblies it's packaged with, and I want to use it for others, so I'm find with where it is situated.

I'm more questioning @davidfowl's comment, as I'm not sure what he's looking for or suggesting.

Copy link
Member

@aaronpowell aaronpowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main change here would be to use the event callbacks of 9.4.

Then would you be able to introduce an example project so I can get a better idea of how this works.

/// <returns></returns>
public static IResourceBuilder<CollectorResource> WithAppForwarding(this IResourceBuilder<CollectorResource> builder)
{
builder.ApplicationBuilder.Services.TryAddLifecycleHook<EnvironmentVariableHook>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifecycle hooks are mostly considered legacy with the event system now in place. The hook should be replacable with the OnAfterEndpointsAllocated callback.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to gather from the documentation whether the OnAfterEndpointsAllocated is resource specific, or part of the wider DistributedAppModel. My concern being that this needs to be applied after the Aspire code has added the dashboard environment variables.

/// <summary>
/// The image of the collector, defaults to ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib
/// </summary>
public string CollectorImage { get; set; } = "ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public const string Registry = "docker.io";
public const string Image = "ollama/ollama";

/// This method <strong>does not</strong> configure an HTTPS endpoint on the resource.
/// Use <see cref="ResourceBuilderExtensions.WithHttpsEndpoint{TResource}"/> to configure an HTTPS endpoint.
/// </remarks>
public static IResourceBuilder<TResource> RunWithHttpsDevCertificate<TResource>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something "wrong" in the design of how this has been implemented that should be revised? Having a way to bootstrap the dev certificate into a container has been a pain point with a few integrations and having them working with OTEL (DAB and Go Feature Flags off the top of my head).

@martinjt
Copy link
Author

I've done everything suggested other than moving the EnvironmentVariableHook. Since this needs to occur after all the resources had endpoints allocated, I'm struggling to see how I might achieve that.

I did have a thought of using OnAfterEndpointsAllocatedEvent, but that's deprecated, and then thought of using OnResourceEndpointsAllocated event on the collector resource, then access each resource to add that same event to add the endpoint, but I'm not sure how to access the eventing model from there?

@aaronpowell
Copy link
Member

Yes, sorry, I was mistaken, OnResourceEndpointsAllocated is the event that you'd want on the collector resource.

Is the logic that you'd have multiple collectors registered, or is it a singleton sort of thing?

@martinjt
Copy link
Author

The collector is a service that intercepts the OTLP telemetry from all the other services, so it intercepts all the other project resources and overrides the OTLP environment variable to point it to the collector instead.

For that reason, I thought that using the resource focused events wasn't something I could do.

This is instead of telling the user to access every resource and reference it to the collector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add OpenTelemetry Collector extension
3 participants