Vulnerability Scanning and Clouds: An Attempt to Move the Dialog On…

June 28, 2009

By Craig Balding

Much has been said about public IaaS providers that expressly forbid customers from running network scans against their cloud hosted infrastructure. Failure to comply with the Terms of Service can result in account suspension or termination (ouch!). This post is my attempt to suggest a way forward. I welcome your feedback…

As has been noted before, a blanket ban on legitimate scanning activity by customers of their own infrastructure (whether outsourced or not) undermines security assurance processes and can make regulatory compliance impossible; e.g. PCI DSS mandates network vulnerability scanning as a control.

Vulnerability scanning is a stalwart practice of the Information Security community. Enterprises invest considerable time and money developing vulnerability management programs to help assess IT security risk across applications and infrastructure. Specifically, vulnerability scanners help identify potential security weaknesses at scale; e.g. missing patches, default passwords, coding or configuration weaknesses.

Vulnerability scanning is front of mind for Internet exposed or partner connected infrastructure. However, when said infrastructure is owned and/or operated by a service provider, some of the existing challenges associated with vulnerability scanning are magnified:

Scans can cause outages. This can happen if the scanning policy includes Denial of Service checks or the scanning engine is configured with “aggressive” settings; e.g. connection entries in firewall state tables get exhausted. Its also possible for scans to tickle obscure bugs in the target - or devices enroute to the target. Even without a full-on outage, poorly configured scans can still negatively impact performance or availability for other customers of shared infrastructure.
Identifying unauthorised scans. Without a trusted, robust process for “blessing” or approving source IP addresses of customer scan engines, service providers cannot distinguish legitimate scans from scans with the evil bit set. Sure, they can use whois to determine source network ownership but even if the scan originates from a customer owned network, this does not necessarily mean it is authorised! Given this, many providers take the stance that all scans are treated as hostile unless pre-agreed.
Scanning may trigger automated or manual actions by the provider. A common automated response from a provider is to apply traffic shaping to slow down the scan, or simply block the client IP address via an ACL update. This can lead to false negatives; i.e. vulnerabilities present are not discovered as the scanner IP was automagically identified as a noisy vulnerability scanner and auto-throttled/blocked. Even half smart attackers can quickly deduces the presence of auto-response mechanisms (”huh, no response now”) so either switches to slow probes from multiple sources or goes for gold with a one-shot exploit.

Enterprise customers on dedicated infrastructure at Tier 1 web hosting providers will either contract the hosting company (or their security partner) to perform vulnerability scans or do it themselves. Either way, for scanning to happen, agreement will need to be reached on scan scope, types of scans to be run (scanning tools & policies), time windows and source IP addresses used. Beyond that are the process issues of how results will be communicated, integration with ticketing systems etc.

The provider will limit the scan scope to the dedicated infrastructure allocated to the customer - the scanning of shared infrastructure by the customer is generally a ‘no no’. This, along with management networks will be scanned by the provider to meet customer compliance mandates or security policies.

With Cloud “Infrastructure as a Service” providers, things get a little more complicated.

A cloud is multi-tenant; i.e. the cloud platform is shared to multiple customers through software abstraction. The provider will naturally be concerned with the impact of any scanning activity, particularly if it causes any SLA violations.
Further, cloud customers can spin up infrastructure on demand. New virtual servers can be brought to life automagically to handle increased load. This increased infrastructure footprint is still subject to the same compliance mandates though; i.e. it must be scanned within some time period of its appearance. Even if spinning up copies of “known good/secure” virtual machine (VM), you still need to scan them. New vulnerablities are published all the time, along with corresponding vulnerability checks - hence the need for both regular scans and representative scans. Further, vulnerbility scanning isn’t just testing the VM, its also helping you verify the security controls outside the VM that are designed to protect it; e.g. a providers’ software firewall. Picking and choosing which pieces of your hosted infrastructure to scan is a slippery slope to selective exposure if not handled with care.
Finally, we shouldn’t discount the “Clouding around” factor. Credit card payments for “instant on” infrastructure changes the dynamic between cloud consumer and cloud provider. Similar to low end, consumer oriented shared hosting before it, you may never speak with, let alone meet, an employee of your provider before you use their services. There simply isn’t a conversation about scanning (the “conversation” today is a monologue found in the Terms of Service). Plus, if the provider fails to meet your needs, you can drop them at a moments notice and switch to another (Cloud baggage permitting…). In other words, its either not possible, or not convenient to call up your provider to agree the principle and logistics of scanning the services they host on your behalf. Enterprise customers - or at least their security teams - will be wanting that conversation and can likely strike a deal with a modified ToS to allow scanning of some sort but this seems unncessarily exclusionist to me.

We can address these issues through a mix of provider open-mindedness, policy, process, technology and contract.

For cloud providers to attract certain customers, they may need to soften their policy on vulnerability scanning. Taking a hardline “no” stance precludes some workloads from ever entering the cloudosphere (with bigger consequences for enterprises seeking a strategic cloud partner). A preferred scenario has the cloud provider showing some understanding of enterprise prospects assurance needs and defining scanning parameters acceptable to their own operations risk tolerance.

Scanning is not an “unknown” risk, rather its a very well understood activity with quantifiable elements (packet rate, state table usage etc). Normal rate limiting could be temporarily or permanently loosened for customer approved IP addresses to enable scans against a customers cloud IP addresses (not API endpoints or cloud providers websites!) to complete in a reasonable time window. Besides, Internet systems are scanned, probed and attacked constantly by script kiddies, Internet surveyors and an assortment of bots and other lifeforms. So the bad guys get to scan because they don’t care and yet the customer, who wants to do the “right thing”, is not allowed to. Is that rational?

Assuming a cloud provider with a more measured approach towards vulnerability scanning of customer cloud infrastructure, we now need a simple, mutually trusted mechanism to agree scan sources, rate limits etc. Something like an “ScanAuth” (Scan Authorize) API call offered by cloud providers that a customer can call with parameters for conveying source IP address(es) that will perform the scanning, and optionally a subset of their Cloud hosted IP addresses, scan start time and/or duration. This request would be signed by the customers API secret/private key as per other privileged API calls. The provider receiving the request can rely on the digital signature as proof that a scan is authorised with the associated parameters. After the provider has processed the scan authorisation request, the provider could return a status code approving or denying the request (with a possible reason code to allow resubmission with more acceptable parameters). This response can optionally include rate limits which the customer can use to tune the intensity of their scanner.

The provider can now whitelist the customer provided scanner IP(s) for the duration of the requested scanning window such that active countermeasures like anti-DoS controls are not triggered, resulting in a ‘cleaner’ scan (and hence a more accurate report).

Should the scanning activity exceed any specified limits, or communicate with IP addresses not associated with customer virtual machines, the provider could instantly blacklist the scanning IP or apply traffic shaping.

The bottom line: when everyone is clear on the need, approval process, scan parameters and abuse policy, this can be done with very little fuss.

A “ScanAuth” API call empowers the customer (or their nominated 3rd party) to scan their hosted Cloud infrastructure confident in the knowledge they won’t fall foul of the providers Terms of Service. This avoids a situation where either a customers Cloud services are interrupted by an angry provider (availability fail!) or in the worst case, getting kicked off the Cloud entirely. Clearly, a lose/lose scenario.

What do you think?