Friday, August 22, 2008

Some More Thoughts on Protecting DNS

I posted the following to the namedroppers list, and I'm crossposting here for comments. Again, this is "in progress", and largely shaped by discussions with many people, NOT a finished work. Comments are greatly appreciated.

One thought (again, I don't know how much this has been hashed out) is how robust do DNS resolvers REALLY need to be?

Any comments on the following?

I personally believe that resolvers should be robust to anything short of a man-in-the-middle ("Aware & Blocking"), yet man-in-the-middle is almost irrelevant for purposes of defending DNS. [1] Thus as long as the DNS resolvers are strong to anything short of the man-in-the-middle, its good in practice.

At the same time, such robustness must ONLY involve changes to the resolvers: the resolvers should be able to protect themselves, without relying on the authoritative servers making changes (although authoritative servers may suffer more load, especially servers which return some out-of-standard nonsenses.). [2]

And probabilistic protection is perfectly good. We don't need perfect protection against blind attacks, we just need to ensure that the attacker resources involved are far better spent elsewhere.

My thoughts:

Against blind attacks, I believe the following principles should apply:

  • No birthday attacks, of course.
  • >=20b of entropy for purposes of protecting the transaction [3]
  • >=40b of entropy for purposes of protecting the cache [4]
  • Detecting when network entropy (port #) is reduced or eliminated by an in-path network device, allowing the resolver to accurately estimate the entropy it is generating in the requests.
  • Detect and respond when there is .001% of a chance that a blind attack created one or more corrupt cache entries by either revalidating or voiding the cache.
  • NO "race until win" scenarios for ANY given mapping in the cache. ALL blind attacks should be either "Race once per TTL" or "Race until another race until win". Blind attacks should always cost time as well a packets. [5]

ALL of the previous principles seem doable using existing techniques (port and 0x20 randomization, duplication, a single global counter to count unsolicited replies, and some seemingly simple data acceptance policies)

Against transparent attacks, I believe the following should apply:

If a resolver, within its normal transaction timeout, receives a second or subsequent reply with a differing value but which would have been otherwise accepted (matching on TXID, port, server, 0x20, etc), this must be considered an attack on both the transaction and the cache. [6]

If the resolver is an intermediary, it MUST forward the second response back to the original requester, with the TTL for the "new" value set to 1 second.

For the resolver's cache, any entry which MAY have been potentially affected by the second reply MUST be treated as corrupted by an attack. Thus, depending on how much information is recorded and available, various cache entries may need to be voided or undone (up to all cache entries for the domain in question, if more precise record keeping is unavailable). Even if this would not be considered an attack, it doesn't actually make sense to cache anything which may have been dependent on this response.

For the final client, it is up to the client to determine what to do, but the strong suggestion is to consider this an attack and abort any transaction involving this name.

However, if the end client does not treat this as an attack and abort subsequent TCP connections to the final IP, this loss of transaction protection means that the few authorities which trigger this alert through "normal" behavior will simply see their results not cached rather than not accepted by the client.

[1] If an attacker can man-in-the-middle the naming stream of a victim, he can probably man-in-the-middle the data stream. If the end protocol desired is weak to a man-in-the-middle, reliable naming does no good. If the end protocol is robust to a man-in-the-middle, it never really trusted the DNS record.

[2] This is why I'm not a believer in DNSSEC. It has both bad network effects (both the resolver and authority needs to change behavior, yet the changed behavior only protects the intersection of the two sets, so neither resolvers nor authorities have strong internal incentives to deploy), yet does not actually provide, to my mind, all that much meaningful system security because the other options available to a man-in-the-middle in practice.

[3] Blind transaction attacks, when combined with protections that provide "race once per TTL per name", as far as I can tell, only work against some seriously broken protocols. By specifying a different level of protection for transactions and the cache, this allows duplication to protect the cache while still successfully resolving (but NOT caching) sites which return different values on back-to-back queries when duplication is needed to meet the entropy budget.

[4] Anyone who wants to launch a TB DOS to poison my cache has better things to do.

[5] I believe that the proper policy on accepting data into the cache can provide this, combined with treating a failure to lookup a name as an event with a ~30 second TTL. This property is an important COMPLEMENT to increased entropy, not a substitute. "Race until win" does not reduce the number of packets an attacker requires, only the time period.

[6] This will occur with high probability whenever there is a blind attack that doesn't also block the authoritative server or create traffic with a DOS condition. It will ALSO occur with high probability if an attacker is able to directly observe the request to craft a reply, but is not fully in path. More importantly, a preliminary survey survey suggests that only a few authorities actually do return such nonsense.

Tuesday, August 19, 2008

Entropy, Blind Attacks, and DNS

The following is work-in-progress. Part of the point of a blog is to look for feedback. So feedback is greatly appreciated. Its also to keep others from making my mistakes when thinking about the problem. Additionally, there may be things wrong in this post. Anything wrong is my fault. Feedback can be sent in email to nweaver at ICSI dot Berkeley dot EDU. Minor edits added 8/20

This post will focus on DNS blind attacks, detection strategies, and responses for protecting DNS resolvers. But first some background. There are at least four major properties of DNS attacks.

Blind vs Aware: In a blind attack, the attacker does not know the entropy sources (eg, TXID, port selection, captialization) of the request. In an aware attack, the attacker knows this information.

Transparent vs Blocking: In a transparent attack, the attacker is unable to block the normal authoritative server's response. In a blocking attack, the attacker can block the DNS server's response.

Direct Record vs glue: In a Direct Record attack, the attacker is only interested in corrupting the actual query, eg, that point to EVIL. In a glue attack, the attacker is interested in producing corrupted answer, authoritative or additional records which are accepted by the server beyond the server's request, eg, that points to and points to EVIL.

Transaction vs cache: In a transaction attack, the attacker only needs to corrupt the client which made the actual query. A cache attack requires that the resolver not just forward the attacker's value, but also cache it to use for future responses.

For example, Kaminski's attack is a "Blind Glue Cache" attack: the goal is to use a blind attack to cause the resolver to cache a bad authority or additional record, causing subsquent lookups for the target domain to go to the attacker's "authoritative" server.

There are many MANY variants possible, but I believe (if I'm wrong, PLEASE tell me) that all require the victim resolver to cache more than just the requested mapping of A goes to some value, because this enables the attacker to "Race until win against garbage" instead of "Race once and retry after TTL expiration". Normally, it is also a "Transparent" attack, but it could also involve a DOS of the authoritative server or enough traffic that the server's authoritative response is not returned.

Likewise, a man-in-the-middle is a "Aware and Blocking" attacker: he sees all packets and can ensure that only his response arrives, and nothing short of DNSSEC (or equivelent) can prevent a man-in-the-middle attack on DNS.

Blind attacks are actually easy to detect. Any blind attack which has a probability of succeeding will generate a large number of unsolicited DNS replies: DNS answers where the reply does not correspond to a request because it uses a different transaction ID and/or port. In our in-progress DNS cache poisoning detector, dns-poison.bro, these alerts are collected in the dns_weirds.log as "Unsolicited DNS Replies". I'd assume there is a similar snort rule available. (Note, the bro code is VERY experimental, and should NOT be viewed as authoritative. It is included mostly for interest and experimentation).

However, there are some issues that arise. To begin with, there can be false positives in the IDS. The first (marked in dns_weirds.log as "Since there has been no previous DNS traffic on this 4-tuple, we are ignoring it") can occur when the DNS requesting packet was dropped by the monitor so the IDS only sees the reply. The second is that a nontrivial number of servers (~100 when seen from ICSI over a roughly 10 day period) will generate this alert spontaneously, sending invalid replies to accompany valid replies. Thus a LOW rate of alerts is to be expected.

But a bigger problem is how to ACT on this information. There have been many proposals (I've been guilty of one myself) that by maintaining a count of failures per remote server, one could build a detection and response cycle against blind attacks into the DNS resolver without having to look for more entropy than the transaction ID.

This could work for a blind attacker who's targeting a particular authority: It doesn't require much stateholding (a single counter per authoritative server), and the response is simple: void the cache for any requests involving this authority (block "blind cache" attacks, regardless of glue policy or transparency), and observe that only limited scenarios benefit from a blind, transactional attack. [1]

However, there is a blind attack proposed by Masataka Ohta, poisted out to me by Wouter Wijngaards which can defeat such a detection and response cycle when only 16 bits of entropy (just the DNS transaction ID) is used, by generating poisoning requests targeting different domains when winning a single domain is sufficient for the attacker's goal.

As a real example, assume that the attacker has a list of 1000 domains, with targeting any single domain sufficient for his goal. At a single victim, he generates 1000 queries, on 1000 different authoritative servers, and also launches just 1000 blind responses. With just 16 bits of entropy, there is a 1.5% chance of success, and the successful packet will never tigger the alert. Thus I was WRONG: although such fine-grained detection is suitable for the IDS, it is NOT suitable for the DNS resolver as a substitute for more entropy.

But good entropy is (mostly) sufficient for countering blind attacks. If there are 32 bits of entropy (from a combination of a strongly random transaction ID, and a random source port), a blind attacker who wants a .1% chance of success would need to send rougly 4 million packets. The exact odds of success for the attacker with N bits of entropy and K attacker packets is 1 - (1 - 2^-N)^K. At 100B per packet, this would require sending 400 MB of data.

Add in a random selection of authoritative resolver (what unbound does to gain 2b of entropy) and random capitalization (David Dagon's 0x20 method) and you can easily get to 40b of entropy. Now just that .1% probability of success requires sending 1 billion packets, or 100 GB of data. Even .01% of success requires 10 GB of data.

Such an attack represents a significant DoS. If an attacker can send 100 GB of data, with spoofed source address, at your DNS resolver, to have just a .1% chance of corrupting a single cache entry, the attacker can instead make a lot more money sending 100 GB of data at a blackmail victim.

Yes, this represents a probabilistic bounds, but it is a very good bounds. Bots are a valuable resource, especially Bots capable of delivering spoofed IP packets. They can send spam, look for stolen credentials, or perform lots of other mayhem. Yet Bots which DoS are more likely to be removed, by both the ISP and end users due to the traffic. Attackers with that sort of resource aren't interested in just petty damage: they have reasons for their actions and wasting money is not one of them. "You don't need to outrun the bear, just the guy next to you".

Plus (Warning: Unbaked Speculation Ahead, feedback wanted!) one might be able to set an arbitrary bound on preventing blind, cache attacks, including Ohta's variant, by keeping a single counter of ALL unsolicited replies (rather than a per-server count which does not work). If it ever reaches the threshhold where an attacker has an X% chance of at least one success, simply invalidate the entire cache, because you don't know what's been corrupted.

This would effect a nontrivial increase in latency and traffic as the cache is repopulated, but even with a very sensitive threshold (eg .01%) and moderately poor entropy (32b), an attacker would need to send 40 MB of spoofed data just to degrade the cache performance a bit. If the cache had 1M active entries it needed to subsequently refetch, the attack would generate a few hundred MB of additional traffic to the DNS resolver from the additional requests. So there would be some amplification, but only a moderate (roughly 4x) amount to gain a very tight bounds on the blind attack.

For 40b of entropy, the attacker would have to send nearly 10 GB of data to meet this very sensitive threshold, yet would still only require fetching an additional few hundred MB of data back into the cache, a trivially small amount of amplification: the 10 GB DOS will be a far bigger effect on the server.

Finally (Warning: Still Rather Unbaked Speculation Ahead. Feedback wanted) there are many proposals to duplicate requests to increase entropy. For example David Dagon has proposed using duplication when a remote authoritative server doesn't support 0x20. I believe this can be useful also if the DNS server is behind a NAT or firewall that rewrites ports. If your resolver is behind a firewall or NAT you lose the entropy present in the port number. Since 32b of entropy is a bare minimum, and 40b is comfortable, losing nearly 16b of entropy is a big deal, and something needs to be done to bring the entropy up to a comfortable level.

Fortunatly, a resolver can check if its behind a NAT if there is a special authoritative nameserver. We have an ugly prototype: port.{anything} will return the UDP port in the lower 16 bits of the returned address. This does create a phone-home, but a resolver implementation could use such a special name to check for the presence of a NAT or other device which rewrites UDP ports. (Dispite feeling otherwise, NATs and firewalls are not actually malicious, and would not benefit from whitelisting such a test.)

Now a resolver could respond in the following way: Instead of a single query, it generates two queries for the same value, returning the first response it gets, but only caching the response if both are identical. This should protect the cache, as forcing the attacker to match both queries effectively doubles the entropy. Thus if there are only 20 bits of entropy (a 16b ID plus 4 characters of capitalization/resolver selection, because the resolver is behind a NAT), for purposes of cache poising, the blind attacker now faces effectively 40 bits of entropy.

However, this reduces protection against a blind transaction attack. For purposes of corrupting the transaction, the attacker only needs to win one race out of two. Thus instead of facing 20b of entropy for corrupting the transaction, the attacker only needs 19b, as it doubles the chance for success. Since I can come up with only limited scenarios (dealing with already broken websites) where a blind, transaction attack would be useful, this risk of transactional corruption may be acceptable. But if transaction protection is desired, the resolver needs to wait for both responses, and only return them if they agree, leaving an open issue of what to do with a highly volatile domain which returns different responses to the same request.

Thus the conclusions:

1) For IDS purposes, its easy to detect blind cache poisoning attacks, although there are some low rate false positives that operators need to be aware of.

2) For purposes of the DNS resolver, "detect and respond" to blind attacks without increasing the entropy is a bad idea: it doesn't prevent all blind attacks, instead creating a false sense of security.

3) But adding entropy, to at least 32b, ideally 40b, and the blind attack is effectively defanged.

4) For further benefit, it appears plausible that the blind caching attack against a well-randomized server could be detected and blocked, and it looks plausible to protect the cache but not transactions from blind attacks even when behind a NAT by using duplicate requests.

5) Please give me feedback! It was feedback from experts that convinced me "there is no substitute for entropy".

Much thanks to feedback from Christian Kreibich, Vern Paxson, Robin Sommer, Mattias Vallentin and Wouter Wijngaaards.

[1] The only scenario I can come up with is cookie stealing, or similar hijacking, and can actually be used against a major site. The following example is how TRADITIONAL blind cache poisoning, against a 16b transaction ID as the only entropy, can be used to intercept yahoo mail as a blind transactional attack, rather than a cache attack. eventually resolves to being which has a ttl of just 60 seconds. Thus the attacker gets the victim to go to a web page the attacker controls. This attacker code has a bit of javascript that opens up an iFrame which points to at a time the attacker choses.

When the web browser does the lookup, the attacker sends ~1000 poison attempts at the victim's DNS resolver. If it fails, ah well, no big loss, the attacker closes the iframe (by redirecting to another page) and tries again in 60 seconds. But if it succeeds, the web browser THINKS is the attacker's machine, allowing him to man-in-the-middle the browser and enabling him to get the victim's login cookie etc (which are happily transmitted in the clear!). If DNS is protected by just a 16b transaction ID, the attacker has a 1.5% chance of winning the FIRST race, and if the victim doesn't notice for an hour, the attacker has a 60% chance.

Thus, at least for the purposes of protecting Yahoo mail, even DNS transactions need to be protected with significantly more entropy. (However, this is an extreme case, and Yahoo mail is broken in so many other ways because attackers on a wireless network can do the same thing.)

Sunday, August 17, 2008

New Time Machine Release & Paper

Jointly with our colleagues at TU Berlin, we have just released a new version of the Time Machine, a high-performance packet bulk-recorder.

This is a major release providing many improvements in functionality and performance, including a NIDS interface for automatic control and retrieval. See the web page for more information about the changes.

We will present the Time Machine at the ACM SIGCOMM conference this week in Seattle, WA. The SIGCOMM paper also contains more background information, including a performance evaluation and experiences with interfacing the Time Machine to the Bro NIDS.

Wednesday, August 6, 2008

HotSec Paper on Network Visibility

We presented the following paper at HotSec last week: