As with S3 cloud pools, Azure cloud pools allow objects to be accessed through Azure. This means objects must be stored in their native form, without packing. Vail uses Azure block blobs to store data. Azure block storage is different enough from AWS S3 storage that support for linked buckets is too complex for this initial implementation.
Although the concepts are the same, Azure blob storage is significantly different from S3 storage. The differences present several issues:
Azure doesn't use delete markers. Without linking this isn't a problem (destination cloud storage doesn't transfer delete markers or tiny objects). The lack of delete markers is the primary reason that linked buckets are not supported.
The addition of a delete marker in Vail could be handled easily using an unversioned delete with an IfUnmodifiedSince qualifier, but the removal of a delete marker may or may not require making a previous version current on Azure. Tracking delete and undelete operations made on the Azure container is even more difficult, as we would have to create a virtual delete marker in Vail if all the objects on the Azure container are marked as previous versions (and remove that marker if a version is made current).
Azure doesn't handle slash and backslash as ordinary characters. Backslash is converted to slash when a blob is written. If two blob names differ only in their use of slash vs. backslash, Azure will treat these separate objects as the same object. Any repeated slash characters in a blob name are compressed to a single slash, and any leading slash is ignored. The end result is that all of these objects are treated as the same object by Azure:
Azure limits the number of slashes that can be included in a key name to 256, with a total name limit of 1024. S3 has a limit of 1024 on the total name length (and there's nothing special about slashes). If an object name has more than 250 slash characters, Vail will strip them when writing to Azure. This accommodates slashes used when Vail adds its path to the object name.
Azure restricts key names that can be used for user metadata. The common practice of using dash to separate words in http headers is not supported, and user metadata is restricted to C# naming conventions for key values. User metadata that contains unsupported characters will not be copied to Azure storage. Vail uses metadata stored in the dynamodb database. The Spectra-ID and Spectra-Date metadata values are stored on Azure as SpectraID and SpectraDate.
The current version of a blob cannot be explicitly deleted. An unversioned delete of the blob must be issued first (making the version non-current), then the versioned delete can be issued.
Multipart uploads on Azure don't provide a unique identifier. To prevent simultaneous uploads from interfering with each other, the part IDs must include a unique value in addition to the part number.
New versions cannot be created if the current blob is on the archive storage tier. The blob must first be deleted (which makes all versions of the object previous versions), then the new version can be added. This is handled by first attempting the PUT (or multipart upload), and if an error indicates the current blob is in the archive tier, the blob is deleted and the operation is retried. This means some blob data may be sent multiple times if a current version exists on archive storage.
Credentials used to access a storage account don't grant access to the properties of the storage account itself. This prevents Vail from determining if a container has versioning enabled. Versioning is still required, we just can't enforce it programmatically.
All storage in Vail is assigned an AWS storage class. Azure blob storage supports Hot, Cool, and Archive tiers. An object on the Azure Archive tier must be changed to either the Hot or Cool tier before it can be accessed. As with AWS GLACIER, this restore can take a very long time (on the order of hours). Vail maps the assigned AWS storage class to an Azure tier as follows:
AWS Storage Class | Azure Tier |
---|---|
STANDARD_IA | Cool |
GLACIER, DEEP_ARCHIVE | Archive |
anything else | Standard |
Although Azure supports multiple authentication schemes (including active directory), we use the storage container shared secret key for authentication. In addition to enabling versioning on the storage account, the credentials provided to Vail must be configured to include Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete and Microsoft.Storage/storageAccounts/blobServices/containers/blobs/deleteBlobVersion/action permissions through Azure RBAC.
Block blobs can be sent as a single entity, or uploaded in ranges, which is analogous to a multipart upload. Vail will determine which method to use based on the object size.
Azure supports immutable objects, both with explicit control and with an expiration. Vail's object locking doesn't use Azure's immutable storage settings, but does work with it using special clone deletion processing. Azure storage behaves like S3 storage in this respect. SeeObject Locking for details.
Each version in Vail stores a list of the pack data for every pool where the data is stored. Data stored in an Azure cloud pool is also included in the version's pack lists. The pack list reference for a cloud pool identifies the version id and ETag of the version as it exists in the cloud's bucket. The version id is always used when referencing cloud pool data. The ETag is used as a final validation when azure cloud pool data is used.