The following is work-in-progress. Part of the point of a blog is to look for feedback. So feedback is greatly appreciated. Its also to keep others from making my mistakes when thinking about the problem. Additionally, there may be things wrong in this post. Anything wrong is my fault. Feedback can be sent in email to nweaver at ICSI dot Berkeley dot EDU. Minor edits added 8/20
This post will focus on DNS blind attacks, detection strategies, and responses for protecting DNS resolvers. But first some background. There are at least four major properties of DNS attacks.
Blind vs Aware: In a blind attack, the attacker does not know the entropy sources (eg, TXID, port selection, captialization) of the request. In an aware attack, the attacker knows this information.
Transparent vs Blocking: In a transparent attack, the attacker is unable to block the normal authoritative server's response. In a blocking attack, the attacker can block the DNS server's response.
Direct Record vs glue: In a Direct Record attack, the attacker is only interested in corrupting the actual query, eg, that bad.foo.com point to EVIL. In a glue attack, the attacker is interested in producing corrupted answer, authoritative or additional records which are accepted by the server beyond the server's request, eg, that 123.foo.com points to target.foo.com and target.foo.com points to EVIL.
Transaction vs cache: In a transaction attack, the attacker only needs to corrupt the client which made the actual query. A cache attack requires that the resolver not just forward the attacker's value, but also cache it to use for future responses.
For example, Kaminski's attack is a "Blind Glue Cache" attack: the goal is to use a blind attack to cause the resolver to cache a bad authority or additional record, causing subsquent lookups for the target domain to go to the attacker's "authoritative" server.
There are many MANY variants possible, but I believe (if I'm wrong, PLEASE tell me) that all require the victim resolver to cache more than just the requested mapping of A goes to some value, because this enables the attacker to "Race until win against garbage" instead of "Race once and retry after TTL expiration". Normally, it is also a "Transparent" attack, but it could also involve a DOS of the authoritative server or enough traffic that the server's authoritative response is not returned.
Likewise, a man-in-the-middle is a "Aware and Blocking" attacker: he sees all packets and can ensure that only his response arrives, and nothing short of DNSSEC (or equivelent) can prevent a man-in-the-middle attack on DNS.
Blind attacks are actually easy to detect. Any blind attack which has a probability of succeeding will generate a large number of unsolicited DNS replies: DNS answers where the reply does not correspond to a request because it uses a different transaction ID and/or port. In our in-progress DNS cache poisoning detector, dns-poison.bro, these alerts are collected in the dns_weirds.log as "Unsolicited DNS Replies". I'd assume there is a similar snort rule available. (Note, the bro code is VERY experimental, and should NOT be viewed as authoritative. It is included mostly for interest and experimentation).
However, there are some issues that arise. To begin with, there can be false positives in the IDS. The first (marked in dns_weirds.log as "Since there has been no previous DNS traffic on this 4-tuple, we are ignoring it") can occur when the DNS requesting packet was dropped by the monitor so the IDS only sees the reply. The second is that a nontrivial number of servers (~100 when seen from ICSI over a roughly 10 day period) will generate this alert spontaneously, sending invalid replies to accompany valid replies. Thus a LOW rate of alerts is to be expected.
But a bigger problem is how to ACT on this information. There have been many proposals (I've been guilty of one myself) that by maintaining a count of failures per remote server, one could build a detection and response cycle against blind attacks into the DNS resolver without having to look for more entropy than the transaction ID.
This could work for a blind attacker who's targeting a particular authority: It doesn't require much stateholding (a single counter per authoritative server), and the response is simple: void the cache for any requests involving this authority (block "blind cache" attacks, regardless of glue policy or transparency), and observe that only limited scenarios benefit from a blind, transactional attack. [1]
However, there is a blind attack proposed by Masataka Ohta, poisted out to me by Wouter Wijngaards which can defeat such a detection and response cycle when only 16 bits of entropy (just the DNS transaction ID) is used, by generating poisoning requests targeting different domains when winning a single domain is sufficient for the attacker's goal.
As a real example, assume that the attacker has a list of 1000 domains, with targeting any single domain sufficient for his goal. At a single victim, he generates 1000 queries, on 1000 different authoritative servers, and also launches just 1000 blind responses. With just 16 bits of entropy, there is a 1.5% chance of success, and the successful packet will never tigger the alert. Thus I was WRONG: although such fine-grained detection is suitable for the IDS, it is NOT suitable for the DNS resolver as a substitute for more entropy.
But good entropy is (mostly) sufficient for countering blind attacks. If there are 32 bits of entropy (from a combination of a strongly random transaction ID, and a random source port), a blind attacker who wants a .1% chance of success would need to send rougly 4 million packets. The exact odds of success for the attacker with N bits of entropy and K attacker packets is 1 - (1 - 2^-N)^K. At 100B per packet, this would require sending 400 MB of data.
Add in a random selection of authoritative resolver (what unbound does to gain 2b of entropy) and random capitalization (David Dagon's 0x20 method) and you can easily get to 40b of entropy. Now just that .1% probability of success requires sending 1 billion packets, or 100 GB of data. Even .01% of success requires 10 GB of data.
Such an attack represents a significant DoS. If an attacker can send 100 GB of data, with spoofed source address, at your DNS resolver, to have just a .1% chance of corrupting a single cache entry, the attacker can instead make a lot more money sending 100 GB of data at a blackmail victim.
Yes, this represents a probabilistic bounds, but it is a very good bounds. Bots are a valuable resource, especially Bots capable of delivering spoofed IP packets. They can send spam, look for stolen credentials, or perform lots of other mayhem. Yet Bots which DoS are more likely to be removed, by both the ISP and end users due to the traffic. Attackers with that sort of resource aren't interested in just petty damage: they have reasons for their actions and wasting money is not one of them. "You don't need to outrun the bear, just the guy next to you".
Plus (Warning: Unbaked Speculation Ahead, feedback wanted!) one might be able to set an arbitrary bound on preventing blind, cache attacks, including Ohta's variant, by keeping a single counter of ALL unsolicited replies (rather than a per-server count which does not work). If it ever reaches the threshhold where an attacker has an X% chance of at least one success, simply invalidate the entire cache, because you don't know what's been corrupted.
This would effect a nontrivial increase in latency and traffic as the cache is repopulated, but even with a very sensitive threshold (eg .01%) and moderately poor entropy (32b), an attacker would need to send 40 MB of spoofed data just to degrade the cache performance a bit. If the cache had 1M active entries it needed to subsequently refetch, the attack would generate a few hundred MB of additional traffic to the DNS resolver from the additional requests. So there would be some amplification, but only a moderate (roughly 4x) amount to gain a very tight bounds on the blind attack.
For 40b of entropy, the attacker would have to send nearly 10 GB of data to meet this very sensitive threshold, yet would still only require fetching an additional few hundred MB of data back into the cache, a trivially small amount of amplification: the 10 GB DOS will be a far bigger effect on the server.
Finally (Warning: Still Rather Unbaked Speculation Ahead. Feedback wanted) there are many proposals to duplicate requests to increase entropy. For example David Dagon has proposed using duplication when a remote authoritative server doesn't support 0x20. I believe this can be useful also if the DNS server is behind a NAT or firewall that rewrites ports. If your resolver is behind a firewall or NAT you lose the entropy present in the port number. Since 32b of entropy is a bare minimum, and 40b is comfortable, losing nearly 16b of entropy is a big deal, and something needs to be done to bring the entropy up to a comfortable level.
Fortunatly, a resolver can check if its behind a NAT if there is a special authoritative nameserver. We have an ugly prototype: port.{anything}.nettest.icir.org will return the UDP port in the lower 16 bits of the returned address. This does create a phone-home, but a resolver implementation could use such a special name to check for the presence of a NAT or other device which rewrites UDP ports. (Dispite feeling otherwise, NATs and firewalls are not actually malicious, and would not benefit from whitelisting such a test.)
Now a resolver could respond in the following way: Instead of a single query, it generates two queries for the same value, returning the first response it gets, but only caching the response if both are identical. This should protect the cache, as forcing the attacker to match both queries effectively doubles the entropy. Thus if there are only 20 bits of entropy (a 16b ID plus 4 characters of capitalization/resolver selection, because the resolver is behind a NAT), for purposes of cache poising, the blind attacker now faces effectively 40 bits of entropy.
However, this reduces protection against a blind transaction attack. For purposes of corrupting the transaction, the attacker only needs to win one race out of two. Thus instead of facing 20b of entropy for corrupting the transaction, the attacker only needs 19b, as it doubles the chance for success. Since I can come up with only limited scenarios (dealing with already broken websites) where a blind, transaction attack would be useful, this risk of transactional corruption may be acceptable. But if transaction protection is desired, the resolver needs to wait for both responses, and only return them if they agree, leaving an open issue of what to do with a highly volatile domain which returns different responses to the same request.
Thus the conclusions:
1) For IDS purposes, its easy to detect blind cache poisoning attacks, although there are some low rate false positives that operators need to be aware of.
2) For purposes of the DNS resolver, "detect and respond" to blind attacks without increasing the entropy is a bad idea: it doesn't prevent all blind attacks, instead creating a false sense of security.
3) But adding entropy, to at least 32b, ideally 40b, and the blind attack is effectively defanged.
4) For further benefit, it appears plausible that the blind caching attack against a well-randomized server could be detected and blocked, and it looks plausible to protect the cache but not transactions from blind attacks even when behind a NAT by using duplicate requests.
5) Please give me feedback! It was feedback from experts that convinced me "there is no substitute for entropy".
Much thanks to feedback from Christian Kreibich, Vern Paxson, Robin Sommer, Mattias Vallentin and Wouter Wijngaaards.
[1] The only scenario I can come up with is cookie stealing, or similar hijacking, and can actually be used against a major site. The following example is how TRADITIONAL blind cache poisoning, against a 16b transaction ID as the only entropy, can be used to intercept yahoo mail as a blind transactional attack, rather than a cache attack.
mail.yahoo.com eventually resolves to being login.yahoo.akadns.net which has a ttl of just 60 seconds. Thus the attacker gets the victim to go to a web page the attacker controls. This attacker code has a bit of javascript that opens up an iFrame which points to mail.yahoo.com at a time the attacker choses.
When the web browser does the lookup, the attacker sends ~1000 poison attempts at the victim's DNS resolver. If it fails, ah well, no big loss, the attacker closes the iframe (by redirecting to another page) and tries again in 60 seconds. But if it succeeds, the web browser THINKS mail.yahoo.com is the attacker's machine, allowing him to man-in-the-middle the browser and enabling him to get the victim's login cookie etc (which are happily transmitted in the clear!). If DNS is protected by just a 16b transaction ID, the attacker has a 1.5% chance of winning the FIRST race, and if the victim doesn't notice for an hour, the attacker has a 60% chance.
Thus, at least for the purposes of protecting Yahoo mail, even DNS transactions need to be protected with significantly more entropy. (However, this is an extreme case, and Yahoo mail is broken in so many other ways because attackers on a wireless network can do the same thing.)