Several months ago, I wrote a post about the Top 3 Mistakes in Data Loss Prevention (DLP). In that post, I mentioned that the typical (and logical) first step for all DLP programs was the ability to identify and classify sensitive data. But that sounds a little easier than it often turns out to be. There are a number of reasons why sensitive data identification can be tricky, here are just a few of those:
- Many organizations don’t have well documented data sources – this consideration applies mainly to structured data in databases, where one would hope that data dictionaries are available – but often aren’t.
- There are differing opinions as to scope. For example, should any or all unstructured data be considered when doing such assessments? Also, data that passes between organizations can lead to tough ownership and liability questions.
- There are different types of sensitivity to consider. There is PCI (payment card industry) data, PII, PHI, SOX, military and other sorts of data that might be considered sensitive based on how it is used or how it could be exploited by attackers. I will review some of these in greater depth in a moment.
- There are sometimes differences in opinion as to whether the focus should be directed only to IT systems or should expand to all devices within an enterprise which might contain data (whether attached to networks or not).
- The task of actually doing this (depending on the scope and the size of the enterprise) in itself can be quite daunting. This part is typically underestimated and in some cases this sort of assessment may be the first time the enterprise in question has ever tried to understand all of their data.
A big reason why organizations might be thinking about identifying potentially sensitive data is due to massive and continuous breaches which have occurred over the past few years. The latest, announced just this week, was at Quest Diagnostics, thus ensuring that PHI data (Personally Identifiable Health Information) was likely involved. Other breaches, such as the one at the Office of Personnel Management (OPM), involved many more records – in the case of OPM some 22 million PII records of current and former government contractors were stolen. The bottom line is that industry has spent a lot of time, money and effort on security but relatively little effort in determining what exactly might be at risk if or when that security fails. This is actually worse than it sounds for a lot of reasons, but it doesn’t sound good anyway. Here are some of the reasons why not knowing your data is a bad thing:
- Because if you don’t know you have something that needs to be protected, you’re not likely to protect it. Or perhaps, some things need to be protected much more than others, but they’ve all been grouped into a generic, universal protection paradigm.
- Just because your organization doesn’t know it has these assets, that doesn’t mean someone in the organization can’t find out and of course people on the outside could as well. This means that there may be mechanisms, both internal and external where this data is getting stolen and no one is aware because the manner of attack isn’t as obvious.
- Depending on the data involved, there could even be situations where data loss or theft could literally destroy one’s business or organization. This has already happened and is likely to happen again.
The topic of how to actually go about doing this type of evaluation can be rather complex, but I’d like to highlight at least a few principles that could help guide such as effort; they are as follows:
Understand the scope implications – the level of effort associated with the project is entirely dependent upon the scope chosen. If this is understood up front, then the likelihood of completing it on time increases exponentially. This might require there to be some inventory work done even before the project begins to help quantify a rough idea of how many systems and attributes may exist. Scope here also refers to the granularity of the evaluation – for example attribute level versus table level scoring.
Define your criteria up front – There are a number of guidelines and regulations regarding sensitive data to choose from – however some of them overlap or conflict with one another. The key thing to keep in mind here is that common sense can quickly resolve those types of issues and perhaps more important is how metrics will be defined and assessed. A metric in this case would be a risk rating or level based upon various classes of sensitive data that can be used as models for how other data can be rated.
Automate the evaluation as much as possible – This is especially important in larger organizations with many systems and lots of unstructured data. The automation can take many forms, including exporting data dictionaries (where and when they exist) or data model metadata into a tracking database or use of a DLP tool to identify instances of sensitive data found in documents and other unstructured sources. The latter activity depends upon rulesets created in the criteria definition stage – those rulesets will be very much dependent on the tools used.
Have a Mitigation Policy & Approach defined before you start – This is both a common sense and liability consideration. Depending on the regulations involved, for PHI you’d have to be thinking about HIPAA and HITECH for example, there can be penalties for knowingly not correcting sensitive data issues. That means as soon you find sensitive data, you must have a plan in motion to correct it.
Assign the proper resource/s to ensure it’s done right and gets finished – Don’t expect that you’ll have all the necessary talent in-house to conduct an evaluation like this. Even if you do have the right people, chances are they’re already fully occupied with other mission critical tasks. It’s important to get the right people involved as the results will determine the organization’s overall vulnerability for years to come.
Define the next Steps as part of the initial evaluation – There will always findings coming out of an evaluation like this that will require some sort of remediation. Not all of those remediations can occur immediately, thus it is important, and especially considering that there may be liability considerations, that all of the necessary next steps to address the findings are planned out before the evaluation concludes.
While this set of suggestions is not meant to be comprehensive, it does present a starting point for most organizations in how they can address issues relating to sensitive data. The reality is that almost every organization does have some sensitive data and the liability surrounding that data has grown tremendously in recent years. The costs of not getting control of this issue almost always outweigh the costs of implementing that control.
Copyright 2016, Stephen Lahanas