Sometimes you’re trying to fix a problem for ages (months in our case) and while the solution is really simple, you only find it by complete accident and while looking for something completely different.

(And yes, I do think that we need to finally get a network admin to take care of those things) 

For several months, our Exchange server “randomly” denied communicating with several of our partner’s mail servers. There were several of our partners who were not able to send us email and their emails would always bounce, although we could communicate wonderfully with the rest of the world. What was stunning is that there wasn’t any apparent commonality between the denied senders and the problem came and went and sometimes it would work and sometimes it wouldn’t.

First we thought that something was broken about our DNS entries and specifically about our MX record and how it was mapped to the actual server host record. So we reconfigured that – to no avail. Then we thought it’d be some problem with the SMTP filters in the firewall and spent time analyzing that. When that didn’t go anywhere, we suspected something was fishy about the network routing – it wasn’t any of that either. I literally spent hours looking at network traces trying to figure out what the problem was – nothing.

Yesterday, while looking for something totally different, I found the issue. Some time ago, during one of the email worm floods, we put in an explicit “deny” access control entry into the SMTP service for one Korean and one Japanese SMTP server that were sending us hundreds of messages per minute. The error that we made was to deny access by the server DNS name and not by their concrete IP address.

What happened was that because of this setting our SMTP server would turn around and try to resolve every sender’s IP address back to a host name to perform that check and that’s independent of the “Perform reverse DNS lookup on incoming messages” setting in the “Delivery”/“Advanced Delivery” dialog. It would then simply deny access to all those servers for which it could not find a host name by reverse lookup. I removed those two entries and now it all works again.

Of course, the error isn’t really ours, but the problem was. What’s broken is that the whole reverse DNS lookup story is something that seems (is) really hard to set up and that quite a few mail servers simply don’t reversely resolve into any host name. DNS sucks.

Updated: