Today’s economy runs on data. The creation, sharing, analysis, and storage of data form the backbone of nearly every major business in the world. Even the smallest businesses handle large amounts of data daily. Because of its value, data is often considered one of the most important assets a business possesses. As a result, it must be properly protected. Managing the risks posed to sensitive data is a core function of cybersecurity. However, how can a business secure its data if it does not know what data it has or where it resides? It cannot.
Data discovery is the systematic practice of identifying, inventorying, and understanding data assets across an environment so that both people and systems can use them responsibly. This is especially important because different types of data carry different levels of sensitivity and therefore require different security controls. Discovering and inventorying business data enables the next step: assigning classification levels to each category of data based on its sensitivity.
Businesses should begin by reviewing their business processes and identifying the specific data involved in each one. For example, employee payroll relies on employee timesheets, tax information, job and salary details, and union-related data. Even a relatively small process like payroll involves multiple categories of data. Similarly, an e-commerce business selling clothing would handle customer transactions, shipping information, payment card data, and product inventory records.
Once key business processes and their associated data categories have been identified, businesses should determine where this data is stored and which systems it interacts with during transmission. Hardware and software asset inventories support this effort, as organizations should be able to identify what data is expected to reside on each system. In today’s hybrid environments—where both on-premises and cloud infrastructure are common—it is important to include cloud applications in this review. For example, a company cloud storage account containing project images can easily be overlooked. These mappings between data and systems will later support the creation of data flow diagrams and help define system boundaries.
It is also important to document metadata. Metadata is information that describes the characteristics of data. This includes structural metadata, which defines data formats, syntax, and organization, and descriptive metadata, which provides information about the content itself, such as security labels. In simple terms, metadata is “data about data.” It plays an important role in understanding how data is used and where it fits within the business. For example, in Windows NTFS file systems, file properties can reveal file types, creation and modification dates, and associated user accounts. For a more thorough data discovery process, this metadata should also be captured and documented.
The end result of data discovery should be a comprehensive inventory of the data used within the business environment. This inventory should include the type and category of data, the business units responsible for it, the systems and networks it traverses, and relevant metadata such as file types, authors, and timestamps.
