Matrix delegation and how it may bite you

540 words, estimated reading time: 3 minute(s)
Originally published on January 14, 2024
Last modified on January 15, 2024

For those who don’t know, delegation in Matrix is used in server-to-server communication to figure out which server serves a given domain. As an example, if my own Matrix homeserver was running on matrix.dujemihanovic.xyz instead of dujemihanovic.xyz, I could delegate the latter to the former to save anyone wanting to contact me from having to type out the matrix..

Besides domain name, delegation can also be used to specify which port to use for server-to-server communication. The default is 8448, and if it’s blocked you can use delegation to use 443 for server-to-server as client-to-server does by default. However, if you can, I’d strongly suggest using 8448! I had been delegating S2S to 443 almost the whole time I have had this server for no reason and it seems that this caused an extremely weird issue with a certain room:

What happened?

Message fetching kept breaking constantly. What I mean by that is that when I joined the room everything would work fine the first few messages, but at some point I would start getting notifications without any new message being present in that room. I have noticed that logging out and back in would get the missing messages in my client, but then the forementioned cycle would repeat again no matter how many times I logged out and back in (this also happened on other clients besides Element desktop). To confirm my homeserver was the issue, I joined the room with my old matrix.org account and sure enough that worked just fine.

I tried the usual things such as restarting Dendrite and the whole VPS, but to no avail. I was pretty insistent that the issue was not with my homeserver but the main server hosting the room (which, unsurprisingly, turned out to be false) and so I gave up on that. The eyeopening moment was me reading the conduit documentation (I had considered migrating to it), specifically this:

If Conduit runs behind Cloudflare reverse proxy, which doesn’t support port 8448 on free plans,

This implies that routing server-to-server traffic to 443 should only be done if it’s absolutely impossible to use 8448 for this, and the Synapse documentation said something similar:

However, if your homeserver’s APIs aren’t accessible on port 8448 and on the domain server_name points to, you will need to let other servers know how to find it using delegation.

Fixing the issue

Encouraged by this, I fixed up my server:

dujemihanovic.xyz:8448 {
        reverse_proxy /_matrix/* localhost:8008
}

Once all this was done, the room finally started acting normally.

Small sidenote

I must note that delegating federation to 443 should not cause breakage like this. Despite this, it still did so in my case and for that reason I wrote about it anyway. It’s very unlikely that you will be affected by this issue, but I still believe it should be pointed out in the event that it does.