Tokenization is the process of replacing sensitive data values with surrogate values or “tokens”. It is typically applied to sensitive data elements, including Personally Identifiable Information (PII), Electronic Protected Health Information (e-PHI), and Primary Account Numbers (PAN).
Tokenization enables similar benefits to encryption, but the two differ in several ways. Encryption involves transforming human-readable cleartext into illegible ciphertext, which can only be reversed with the proper key. On the other hand, tokenization retains the original structure of the data fields but replaces the sensitive data with random strings or “tokens” that have the same length and format of the original data. Hence, tokenization is often referred to as obfuscation. Tokens should have no value to an attacker.
There are two versions of tokenization to be familiar with:
- Reversible: these tokens can be converted back to their original values before the tokenization process. This process is known as pseudonymization.
- Irreversible: these tokens cannot be converted back to their human-readable form. Thus, the original sensitive data is permanently obfuscated. This is done by using a one-way hash function. Using this form of tokenization is referred to as anonymization.
As you can see, each method of tokenization is appropriate for a different need. Reversible tokens should be utilized in situations where two parties agree to share data with each other, but said data must be hidden from other actors who may apprehend it as it travels between the two parties. Anonymization should be used when a specific piece of data needs to be published or viewed by a third party, but individual sensitive elements need to be permanently blocked. Hence, it is appropriate for third-party analytics and for producing testing data.
Individual tokens can also be separated into two categories:
- Single-Use: This token is unique to one single transaction
- Multi-Use: This token is attached to a specific data field and can follow it across transactions
Many users find tokenization easier to understand and implement than encryption. Organizations should consider implementing a reputable solution that provides data tokenization in their digital infrastructure. Tokenization can also be required for compliance with regulations such as PCI-DSS and HIPAA.
