Matrix delegation and how it may bite you
540 words, estimated reading time: 3
minute(s)
Originally published on January 14, 2024
Last modified on August 21, 2024
For those who don’t know, delegation in Matrix is used in server-to-server
communication to figure out which server serves a given domain. As an example,
if my own Matrix homeserver was running on matrix.dujemihanovic.xyz instead of
dujemihanovic.xyz, I could delegate the latter to the former to save anyone
wanting to contact me from having to type out the matrix..
Besides domain name, delegation can also be used to specify which port to use
for server-to-server communication. The default is 8448, and if it’s blocked
you can use delegation to use 443 for server-to-server as client-to-server
does by default. However, if you can, I’d strongly suggest using 8448! I had
been delegating S2S to 443 almost the whole time I have had this server for no
reason and it seems that this caused an extremely weird issue with a certain
room:
What happened?
Message fetching kept breaking constantly. What I mean by that is that when I joined the room everything would work fine the first few messages, but at some point I would start getting notifications without any new message being present in that room. I have noticed that logging out and back in would get the missing messages in my client, but then the forementioned cycle would repeat again no matter how many times I logged out and back in (this also happened on other clients besides Element desktop). To confirm my homeserver was the issue, I joined the room with my old matrix.org account and sure enough that worked just fine.
I tried the usual things such as restarting Dendrite and the whole VPS, but to no avail. I was pretty insistent that the issue was not with my homeserver but the main server hosting the room (which, unsurprisingly, turned out to be false) and so I gave up on that. The eyeopening moment was me reading the conduit documentation (I had considered migrating to it), specifically this:
If Conduit runs behind Cloudflare reverse proxy, which doesn’t support port 8448 on free plans,
This implies that routing server-to-server traffic to 443 should only be done
if it’s absolutely impossible to use 8448 for this, and the Synapse
documentation
said something similar:
However, if your homeserver’s APIs aren’t accessible on port 8448 and on the domain server_name points to, you will need to let other servers know how to find it using delegation.
Fixing the issue
Encouraged by this, I fixed up my server:
- allow port
8448inufw - add something like this to
Caddyfile:
dujemihanovic.xyz:8448 {
reverse_proxy /_matrix/* localhost:8008
}
- change
/.well-known/matrix/serverto point todujemihanovic.xyz:8448(in theory, I could have gotten rid of thatreturndirective altogether as8448is default anyway, but I still chose to specify it just to be safe) - reload
caddyand restartdendrite(the latter is, again, just to be safe)
Once all this was done, the room finally started acting normally.
Small sidenote
I must note that delegating federation to 443 should not cause breakage like
this. Despite this, it still did so in my case and for that reason I wrote
about it anyway. It’s very unlikely that you will be affected by this issue, but
I still believe it should be pointed out in the event that it does.