Opened at 2025-01-16T14:22:00Z
Last modified at 2025-06-02T08:39:01Z
#4162 new enhancement
Infrastructure as Code to manage DNS configurations
Reported by: | btlogy | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | undecided |
Component: | dev-infrastructure | Version: | n/a |
Keywords: | IaC | Cc: | |
Launchpad Bug: |
Description (last modified by btlogy)
Scope
AsIs: The DNS configurations of tahoe-lafs.org are manually managed by Meejah and/or Brian via the admin WebUI provided by the DNS registrar and hosting 3rd party Gandi.
The current DNS configurations lack of visibility, reproducibility and agility, making it difficult, error-prone and slow to be audited, reviewed, changed or improved.
ToBe: The DNS configuration would be declaratively defined in a version-controlled repository and deployed using automated workflows, based on the principle of Infrastructure as Code (IaC).
Value
- Contributors would be able to see the current configurations and propose changes using a well known workflow (pull request).
- Maintainers would be able to approve and deploy changes w/o direct interact with the DNS provider.
- The configurations and the workflows would be consistent, repeatable, and easily auditable.
Requirements
- A fresh export of the DNS tahoe-lafs.org zone hosted by Gandi (optional)
- A valid Personal Access Token (PAT) to read/write this zone via API of Gandi
- Permissions to create/manage secrets in infrastructure repository
- OpenToFu plan defining the current state in the existing infrastructure repository (WiP here)
- Automated workflow (e.g.: using GHA) to continuously integrate and deploy the plan (WiP here)
Additional information
This enhancement is a very nice to have requirement for the execution of the MoveOffTrac project (in which we are already planning to re-use and expand the existing Infrastructure as Code repository:
And has already been discussed here:
In addtion, it could help making progress/improvement on those issues:
Change History (28)
comment:1 Changed at 2025-01-16T14:24:49Z by btlogy
- Description modified (diff)
comment:2 Changed at 2025-01-16T14:28:50Z by btlogy
- Description modified (diff)
comment:3 Changed at 2025-01-16T14:29:17Z by btlogy
- Keywords IaC added
comment:4 Changed at 2025-01-16T19:09:29Z by btlogy
comment:5 Changed at 2025-01-16T19:12:30Z by btlogy
- Description modified (diff)
comment:6 Changed at 2025-01-17T15:58:24Z by btlogy
- Description modified (diff)
comment:7 Changed at 2025-01-28T10:50:49Z by btlogy
If this is not already the case (and if it still is possible), it would preferable to have the "tahoe-lafs.org" domain assigned to what Gandi defines as an organization (which does not have to be a legal entity).
Alternatively, the domain could stay assigned to the individual who's has first registered the name, because Gandi seems to treat each user account as an "user organization" anyway. Though, the steps below will then give access to all other domains (and products) own by that individual. Which may not be a problem as long as all those resources are solely related to Tahoe-LAFS...
Here are the proposed steps based on test made in the sandbox of Gandi:
- Create a team under the selected organization (e.g.: "DomainOps")
- Give this team only 2 additional permissions (on top of the default "View Organization") in the "Domains" scope: "See and renew domain names" and "Manage domain name technical configurations"
- Add members to this team by inviting them via their email address.
This should allow new members to create and rotate Personal Access Token which will be used as secrets by (GitHub) runners to manage the DNS records (and nothing else).
Unfortunately, the sandbox does not allow to fully test the steps above. So it might be needed to add the permission to "Manage personal access tokens" from the "Organization" scope to the team...
comment:8 Changed at 2025-04-04T08:45:11Z by btlogy
This issue is blocking the deployment of the new services which should hosts the tickets, wiki and landing page after their migration out of Trac:
comment:9 Changed at 2025-04-07T18:16:30Z by meejah
I have explained many months ago that I'm fine with adding an A or AAAA record or two (via "click ops" in my Gandi account), but I don't believe I can create a token for just the delegated access I have for tahoe-lafs.org and will of course not be giving a token for my Gandi account, which controls many other domains.
Nobody has asked me to add such records. I was told you were talking to Brian about this.
(My understanding of the wider "migration" piece was that the next step was to weigh pros / cons of another self-hosted situation vs. cloud-hosted services. I'm not privy to any discussion that may have happened with Brian or others outside N&B and this tracker though)
comment:10 Changed at 2025-04-08T18:03:33Z by blaisep
In today's Nuts&Bolts, we suggested preparing the new content of the A, CNAME, AAAA, etc records and sending them to Meejah for a one-off change.
The more robust automation can be deployed separately. This decouples the architecture from the implementation details.
comment:11 Changed at 2025-04-08T20:11:03Z by btlogy
I would not have proposed the idea of an API key to delegate the access if it was not possible. But only the owner of the domain can act upon this. Hence the discussion with Brian.
Also, it's not going to be a single "one-off" change: there will be a few steps and possibly some urgent rollback(s) if things are not going as planned. Having someone else doing the "click ops" synchronously (in an other timezone) will be increase downtime, be error-prone and make it uselessly harder for the person who will be migrating.
comment:12 Changed at 2025-04-08T20:26:31Z by btlogy
If Brian is not available for this delegation and since Meejah can not do it, maybe an alternative is to have the zone hosted elsewhere and change the NS records (separating the registrar from the hosting service).
If the automation is not deemed valuable on the long term, this option could be reverted after the migration (with only 2 one-off changes then).
comment:13 Changed at 2025-04-10T16:41:28Z by meejah
Can you describe what all these complex / real-time DNS changes are for?
I was imaging that any transition plan would look a lot more like: 1. set up new thing; 2. point DNS at new thing.
What's the best spot right now for an overview: pros, cons, alternatives etc? (Both so I know the right thing to look at, and for anyone reading this)
comment:14 Changed at 2025-04-15T18:14:41Z by blaisep
- Tasks:
Setup New Thing:
- @meejah extracts list of hosts record in *.tahoe-lafs.org @ gandi
- @meejah reduces TTL to 60 (optional)
- @b3n prepares Hetzner with that list of hosts
Point DNS to New Thing
- @meejah updates gandi with the new NS server
Change gandi.net to point tahoe-lafs to the Hetzner server
Now: ` tahoe-lafs.org. 10800 IN NS a.dns.gandi.net. tahoe-lafs.org. 10800 IN NS c.dns.gandi.net. tahoe-lafs.org. 10800 IN NS b.dns.gandi.net. `
Later: ` tahoe-lafs.org. 60 IN NS hydrogen.ns.hetzner.com tahoe-lafs.org. 60 IN NS helium.ns.hetzner.com tahoe-lafs.org. 60 IN NS oxygen.ns.hetzner.com `
comment:15 Changed at 2025-04-15T20:45:25Z by btlogy
To clarify a bit:
- This issue is about defining the DNS configuration as code, not about migrating off Trac (though it would help doing this)
- The initial proposal described all the way above was to delegate the management of the DNS records using features provided by Gandi which is currently both the DNS registrar and the DNS hosting party. The new proposal described by Blaise in the comment above is an alternative way of (hopefully) achieving the same delegation, but by splitting the registrar from the hosting.
- The pros are the same as the ones listed in the initial description (see value), in addition to those new ones:
- separating the role of the registrar from the hosting one would likely reduce possible disruption due to losing (access to) one of those party (e.g. the DNS zone could be easily migrated elsewhere).
- the steps to achieve the delegation seem slightly simpler than the one describe in comment:7: https://docs.gandi.net/en/domain_names/common_operations/changing_nameservers.html
- As for the cons and more alternatives, I would invite anyone wiling to participate to describe those here in some new comments.
comment:16 Changed at 2025-04-15T21:02:17Z by btlogy
I was imaging that any transition plan would look a lot more like: 1. set up new thing; 2. point DNS at new thing.
Not if we want to smoothly integrate the services that will be replaced with the ones that will still be hosted on the Linode server (e.g.: valid certificates and working outgoing mail traffic).
Though, I suppose a couple of hours of blackout between step 1 and 2 and all legacy services left unreachable could work too.
comment:17 Changed at 2025-04-28T16:42:48Z by meejah
Can we not use HTTP-01 challenge for certificates? This does not require DNS changes, and is the default for Let's Encryp AFAIK.
Adding self-hosting of email (and DNS?) seems like it goes the wrong way here; much of the "problem" being solved is that maintenance of self-hosted systems hasn't gone well for Tahoe-LAFS. Self-hosted CI rotted a while ago (i.e. nobody updated BuildBot?, or its runners), and getting rid of self-hosted wiki+issues is much of the current "ask" here.
comment:18 Changed at 2025-05-01T20:24:39Z by btlogy
Can we not use HTTP-01 challenge for certificates? This does not require DNS changes...
Yes we can and that's the default approach indeed, but HTTP-01 challenges does rely on having the DNS records changed so Let's Encrypt can reach the server which needs a certificate.
Alternatively, we may try DNS-01 challenge to get a valid certificate for https://tahoe-lafs.org/ w/o changing the related CNAME yet. But either way, both require some DNS records to be changed.
Adding self-hosting of email (and DNS?) seems like it goes the wrong way here
As far as I remember, the outgoing email traffic from Trac is already self-hosted on the Linode server and a similar service will be required for the replacement of Trac (e.g.: email validation). And to make this work better than it actually is (see other tickets in the description), more DNS records will be required (e.g. DKIM).
much of the "problem" being solved is that maintenance of self-hosted systems hasn't gone well for Tahoe-LAFS. Self-hosted CI rotted a while ago (i.e. nobody updated BuildBot??, or its runners)...
Then, let's try to make it easier for the nobodies who are willing to help here by managing the infrastructure as code.
getting rid of self-hosted wiki+issues is much of the current "ask" here.
The ask here is to manage DNS configurations from code and I've prepared a PR to make a step in that direction:
Hopefully this would help to replace the self-hosted wiki+issues with a solution that should be easier to manage, starting with the related DNS records.
comment:19 Changed at 2025-05-06T02:50:13Z by meejah
Concentrating on the wider point: my understanding of the entire "get off Trac" effort is to reduce self-hosting.
Instead, it seems like this all points to an increased burden: continuing to self-host a Trac replacement, hosting a Wiki / landing-page replacement, hosting redirection infrastructure and the addition of self-hosted/managed DNS as well. All while keeping (some of) the email self-hosted (the difficulties of self-hosting email was a main reason e.g. the mailing-list is hosted elsewhere: email simply wasn't getting through consistently).
comment:20 Changed at 2025-05-06T08:19:37Z by btlogy
Meejah: could you please post your comments about "get off Trac" on the correct issue: either #4095 for the requirement (where self-hosting was explicitly not excluded) or #4161 for the execution.
This issue is about "Infrastructure as Code to manage DNS configurations" which, in my opinion, could be really helpful for many issues.
My apologies if it was not clear in the above comments, but this issue is not proposing "the addition of self-hosted/managed DNS". The DNS zone is already managed by Gandi and since we've concluded that delegating its management programmatically with a token was unfeasible (there at the moment), the proposal it to have the zone managed by Hetzner instead.
The goal here (and in #4095) is rather to decrease the burden: it is not adding anything new, but just making what's already there easier to manage.
Meejah: could you please attach an export of the DNS zone tahoe-lafs.org currently hosted by Gandi?
comment:21 Changed at 2025-05-14T07:49:29Z by btlogy
During the last N&B (13th of May), Meejah said that:
- he was too busy to answer the related requests in Trac (or by email),
- he is unsure if he can export the content of the zone or change the name servers,
- he is only knows that he can change A and AAAA records.
But he also clearly stated he will not do the changes described above w/o an explicit approval from Brian.
Recap:
- only Brian and Meejah can manage the DNS and Meejah only partly
- the initial ask for a token to read/write DNS records at Gandi could not be achieved w/o direct involvement of Brian.
- the workaround to host the zone elsewhere might be possible, assuming Meejah has the privileges, but he will not do it w/o approval from Brian anyway.
I still hope Meejah will upload the content of the zone, so we have a copy of those data somewhere, and/or commenting on why he could/would not do it (there should not be any secrets in there IMO).
Meanwhile, I've created a new PR to manage only a sub-zone which should allow us to achieve the goals described above, albeit only for new services:
comment:22 Changed at 2025-05-14T08:38:14Z by btlogy
Meejah, could you please add the 3 following records:
of.tahoe-lafs.org. 3600 IN NS helium.ns.hetzner.de. of.tahoe-lafs.org. 3600 IN NS hydrogen.ns.hetzner.com. of.tahoe-lafs.org. 3600 IN NS oxygen.ns.hetzner.com.
Those name servers comes from Hetzner documentation and should allow us to manage the DNS configurations as described, albeit only of some new services rather then all of existing and future ones.
comment:23 Changed at 2025-05-15T11:44:02Z by hacklschorsch
I think the managed sub-zone is a good way forward.
It allows us to trial the GitHub-based CD setup. If we want, we can make parts of it more public/permanent as we go (i.e. by CNAME'ing www.tahoe-lafs.org to www.of.tahoe-lafs.org or so). And if all works well for long enough and we agree that would be useful, we can still use this to manage the 2nd level domain later.
comment:24 Changed at 2025-05-18T23:44:15Z by meejah
But he also clearly stated he will not do the changes described above w/o an explicit approval from Brian.
What I actually said is that I would not delegate the entire zone, nor give SSH access to Brian's machine without buy-in from Brian.
comment:25 Changed at 2025-05-21T10:35:31Z by hacklschorsch
From reviewing @btlogy's good work on this (see infrastructure#56), Hetzner seems to not have (at least official) support for hosting sub-zones.
We now have a working configuration, but it's not compliant to the spec - I can't say how bad that is / if that could come to bite us later.
If we want to go DNS-spec-compliant, we could try another DNS provider that does support sub-zones proper. Here's two examples I picked from this list in the Let's Encrypt forum and that both support configuration through OpenTofu:
- https://desec.io/ seems fully featured, has dnssec (mandatory even), and from the docs seems to support subdomain zones. They also are open source, Berlin-based, privacy focused, non-profit funded (i.a.) by NLnet and RIPE and the EU.
- https://dns.he.net/ Hurricane Electric is one of the cooler ones that do not require dnssec
comment:26 Changed at 2025-05-21T11:56:22Z by btlogy
Thank you hacklschorsch for those researches.
Hetzner has confirmed by email that they do support zone only second level domain and they do not recommend any changes to our current setup in regards with this limitation.
So, to be complete: an other simple way to become compliant would be to delegate the whole domain as initially planned (see draft PR in infrastructure/pull#49)
One side effect of having the NS records available in the zone of the 2nd level domain and absent from the sub one is to fail scanning tool such as this one:
https://mxtoolbox.com/SuperTool.aspx?action=dns%3aof.tahoe-lafs.org
Despite the feedback I got from the support of Hetzner, this might become a problem indeed.
comment:27 Changed at 2025-06-02T07:50:55Z by btlogy
In addition to the lack of (full) support for subdomain zone by Hetzner, I've also bumped into the lack of support for DNSSEC delegation for subdomains: most registrars, when they do support DNSSEC, only allows the owner to manage DS keys to a (2nd level) domain (e.g. tahoe-lafs.org ). Which means the chain of trust will be broken for any subdomain (e.g. of.tahoe-lafs.org).
In short, it's best to avoid subdomain delegation whenever we can...
comment:28 Changed at 2025-06-02T08:39:01Z by btlogy
I've quickly tested the [https://dns.he.net/ Free DNS] service of Hurricane Electric:
- I do not see any documented API (and no way to create a token for it),
- I have found only one (rather old) TF provider which allows only to update (existing ?) A and AAAA records,
- I have not managed to create the of.tahoe-lafs.org subdomain even manually (it seems like we need to delegate first!?),
Even if they would support subdomain better than Hetzner, Hurricane Electric does not seem to be a match for IaC.
The Gandi API allows to create Personal Access Token with some granularity in terms of: