Your Turn At The Bar Again? Security Costs in a Pay Per Drink Cloud

May 01, 2008

By Craig Balding

With in-house IT, you pay your upfront capital costs and maintenance fees and you get whatever compute power you paid for. If you over-specify, you have excess computer power or disk - you are wasting money. If you under-specify, you may be forced to raid your ‘rainy day’ budget and order new hardware.

A primary selling point of Cloud Computing is the ‘pay by the drink’ billing model - you only pay for the CPU cycles and storage you use - that’s it.

If you run any IT security tools at all, Cloud Computing may impact the way you calculate your IT security budgets.

Assessing The Cost of Runtime Security

Security costs can be overt or hidden:

budget items spread across infrastructure, security, compliance, midrange.
the runtime security costs of security tools that execute on the systems.

How many organisations know their runtime security compute costs? My guess is not many. Under the traditional IT billing model, you mostly don’t need to figure this stuff out. As long as your security tools don’t chew up the CPU unnecessarily or fill the disk, everyone is happy.

The performance of security products varies greatly. On the negative side, poor design or implementation are problems only the vendor can address. Site specific issues arise through all kinds of madness - customers failing to “read the label” and provision properly, insufficiently trained people making poor configuration choices or simply relying on the default settings in a very non-default environment!

The negative side effects of in-line security tools hit home as system load increases. Access checks, logging and other ‘in-line’ security operations may perform fine under normal load fail to scale as load increases past a certain threshold. This can lead to CPU spikes or poor disk access patterns.

Switch Off Or Pay Up?

To bring this closer to home, lets explore how the impact of security tools plays out today under traditional IT and tomorrow, under Cloud Computing. Lets eavesdrop on a fictitious conversation between Oscar the ORACLE DBA and Simon the Security Dude.

Oscar: Hey Simon, your Security Agents are killing system performance again. Anna in accounts called up to say they can’t do the Quarterly close, the jobs are getting killed before they finish.

Simon: Hi Simon, I understand but we can’t just disable all the security!

Oscar: Well, we need to do something if we are going to finish posting our numbers this quarter. Are you volunteering to explain to our CEO why we didn’t?

Simon: Hmm. Let me check the agent logs, perhaps there is a problem.

Oscar: I already checked them, no errors reported.

Simon: Hmm. I’ll log a call with the Premium International Support Service.

Oscar: You did that last time and the support guy stuck to the party line that the security agent takes 5-10% of CPU. We know those numbers are wrong from our benchmarking - sometimes it takes 20% of CPU and always a lot more during quarter close.

Simon: Hmm. Are there any other processes running on the system we can disable for a while?

Oscar: Nope - we’re running a tight a ship as we can here. I’ve already told Steve from sourcing he is going to have to wait for his reports.

Simon: Hmm. Bugger. OK, I’ll disable the agents - but you must tell me as soon as the quarter close completes so I can start them up again.

Oscar: Thanks - will do.

A classic conversation under the ‘old regime’. Simon is forced into an operational security decision due to an under-specified system or an over indulgent security agent. His only option in this scenario is to disable the poorly scaling security tool. He can’t just scream “Need more power!” and additional CPUs appear.

Now lets see how this plays out with Cloud Computing, where the change in paradigm will remove the compute limits and make your on the spot risk decisions link directly to your costs and security tool efficiencies:

Simon the Security Dude receives an auto-generated email from the Cloud Provider:

A virtual CPU was auto-inserted on virtual machine image FINANCE1 at 10:30am as Runtime Security Compute usage exceeded the agreed threshold in the SLA. Please note, you have now reached your soft credit limit - please click the link below to authorize an increase. You currently have 4USD left in your account.

So what does Simon do now? He already tapped into his security compute budget five times this week and he’s running low. The silver lining is that at least he gets to make the decision now - he isn’t forced to ’switch off security’. If he has the cash, he can attempt to buy his way out of the problem. The obvious negative is “death by a thousand costs” - he’s running out of budget.

The root cause of the problem is that prior to moving to the Cloud, Simon didn’t have a handle on how much runtime security was *really* costing. He didn’t know (a) his runtime security costs or (b) how much of that cost was unnecessary - caused by security tool inefficiency. He wasn’t the one paying, so most of the time he didn’t have to care. Even if he had found a way to calculate his costs, he’d still have to figure out how performance differences of Cloud Computing would skew his numbers.

And therein lies the rub: if you don’t know your security runtime costs are today - and where the waste is - how will you cope “tomorrow” when it’s always your turn to pay for drinks at the Cloud Bar?