DNS is perhaps one of the biggest sources of frustration for our clients (trumped perhaps only be e-mail). These issues mostly stem from the way DNS caching works.
Since every domain on the internet requires DNS to function, there are tons of DNS queries flying around all the time. DNS was engineered to be very fast (it primarily uses UDP instead of TCP), and since the actual DNS records (known as “zones” when you’re referring to a set for a domain) are just text and a number that don’t usually change very often, it’s very common to cache DNS records at multiple levels.
First, a bit about how DNS works; when you request a URL like ‘nexcess.net’, your computer doesn’t know how to connect to ‘nexcess.net’ directly, because lower-level components talk in hexadecimal (which is why IPs in IPv4 only go up to 255.255.255.255 — FF.FF.FF.FF) and not in full text.
It’s also important to get some terminology down. The “resolver” is the client, which is usually an operating system component on your computer or one DNS server asking another DNS server. A “recursive” nameserver is what ends up doing a lot of the legwork for requests that it doesn’t have in cache — it recursively queries all the other nameservers needed to resolve your DNS request into an IP address that your computer understands. These would be what your ISP operates and what your home or office router likely has built-in. An “authoritative” nameserver is the source — this basically says “I have the master set of records for so-and-so domain!”. The authoritative nameserver is specified in the SOA DNS record of a DNS zone, if you’re curious. The “timeout” or “TTL” value of a DNS record is the “Time to Live”, and it specifies (in an ideal world) how long the DNS record is valid before the authoritative nameserver should be consulted for a fresh record, in case it changed. A typical default value here is 14400 — since the TTL is specified in seconds, 14400 would be 4 hours.
When you ask your web browser to find ‘nexcess.net’, it will probably look in the fastest cache it has: the browser cache in RAM. If it can’t find the IP of ‘nexcess.net’ in the browser cache, it’ll pass the DNS request to the operating system — which has a cache of its own. If your operating system doesn’t have the DNS record, it will ask whatever DNS servers have been configured (in linux, this is typically configured via /etc/resolve.conf or Start-> run -> ncpa.cpl) which may be your local network nameserver (if one is configured, as is typical on a Microsoft Active Directory network and most other NAT-type environments) or the nameserver of your ISP. The nameserver specified in your operating system DNS configuration will almost always be a recursive resolver.
Another important thing to understand is how DNS is actually resolved. If you’ve ever worked with the popular DNS server daemon BIND, you know that failing to put a ‘.’ at the end of your domain will result in your DNS zone not working right. This is because BIND requires the full domain, starting with the root, which is above ‘com’ and ‘org’ and ‘net’ and all the other top-level domains (“TLD”) and is known as “the root”. When you look for ‘www.nexcess.net’, a root DNS server would have a record for ‘net.’ which would point to a server that is authoritative for ‘nexcess.net.’ which would have a record for whatever server is authoritative for ‘www.nexcess.net.’, etc. It’s common practice to just set up a CNAME record [alias] for the ‘www’ sub-domain to point to the main website. This is a convention that started before the actual graphical web was popular and it was typical for servers to run things other than the world-wide web directly on their top-level domain. ‘www’ isn’t anything special beyond a commonly used subdomain.
Now that you hopefully understand a bit about DNS, I’ll let you in on a little secret: lots of caching nameservers don’t honor TTL requests properly. The reasoning behind this is something along the lines of “domains don’t change hands very often, and when they do, a day or so is an acceptable time to wait”. Therefore, a typical DNS server will ignore TTLs of less than X seconds. X really varies from ISP to ISP, but one common value is 4 hours. Another is 24 hours. Some very high-traffic nameservers in other countries might ignore a TTL under 48 hours. If you’re not seeing the problem yet, let me explain:
Let’s say you host your DNS records with us for the website ‘example.com’. ‘example.com’ uses nameservers dns1-1.nexcess.net and dns1-2.nexcess.net. The TTL of all the DNS records is set to 14400 seconds (4 hours). User A requests the site from his ISP, which is 4 hops away from the actual server hosting ‘example.com’. Since his operating system, local nameserver, and ISP nameserver all don’t know what the IP of ‘example.com’ is (since it’s the first time he’s tried to visit it after finding it on Google), his ISP’s caching nameservers reach out all the way to the nexcess.net nameservers, dns1-1.nexcess.net and dns1-2.nexcess.net. They reply with the IP address for example.com and the TTL value of 4 hours. User A’s ISP caches these results for faster lookup next time.
User B has already been to ‘example.com’, but User A just moved ‘example.com’ over to Nexcess a few hours ago. User B has already been to ‘example.com’, she has the records cached in her operating system, on her router, and at her ISP. It’s been 6 hours since ‘example.com’ moved, so you’d think that all the cache would have expired by now, but you’d be wrong. User B’s ISP thinks that caching DNS records for less than 24 hours is ridiculous, and caches them for 1 day if the TTL is set below 1 day. User B tries to go to ‘example.com’, but all she gets is an “Account Deleted” page from User A’s old web host who has since deleted the account after User A moved to Nexcess and cancelled their old web hosting account. User B promptly sends User A an e-mail explaining that his website is broken, and User A ends up opening a ticket with Nexcess about the issue, which eventually leads them to write a blog post about it.Posted in: General