The File Format¶
The file format is block oriented, similar like common image formats (e.g. PNG). It uses a magic header, and blocks with a four byte identifier and 64bit size field. Blocks can have a static size, or use a “chunked format” where the size is determined by following a sequence of smaller “data-chunks”.
This modular framework would allow to randomly add and mix different blocks and types, yet this format is very strict. Not only the number of blocks is defined, also the exact sequence of block is fixed and must not be changed. Using a strict format is speeding up verification and processing of the data. Also, it reduced risks of attacks using manipulated files.
Overall Structure¶
8 bytes with magic
0xfe
,FFE
,0x0d
,0x0a
,0x1a
,0x0a
n blocks with the following format:
Static Blocks¶
Static blocks are the default to efficiently read and write data from random accessible media:
4 bytes with the block type.
8 bytes with the size of the block. Big-endian, unsigned, 64-bit. A zero value indicates an empty block. No data bytes will follow a such block. The size value has to be less than 0xffff_0000_0000_0000.
n bytes with the data of the block.
Size values equal or greater than 0xffff_0000_0000_0000 are reserved for extensions of the format and therefore invalid regular size values.
Chunked Blocks¶
To allow encrypting large streams, there is an alternative chunked data format. This format uses the value
0xffff800000000000
as the size of the block, to indicate the data size is defined by chunks. The data of the block
consists of a sequence of chunks, each prefixed with a 16-bit size and the corresponding number of bytes. A size value
of zero indicates the end of the sequence.
This format is currently only allowed for the DATA
block.
4 bytes with the block type. (
DATA
)8 bytes indicating the stream format. Value ==
0xffff800000000000
- n data chunks with the following format:
2 bytes with the size of the data chunk. Big-endian, unsigned, 16-bit. Value 0 means: end of stream.
n bytes as specified with the data chunk size.
The chunks shall make use of a maximum size of 0xffff bytes for all except the last block. The decryptor may check if this is the case and reject files with random chunk sizes.
Block Types and Correct Order¶
CONF
Encryption configuration (maximum size 128 bytes)¶
k:RSA-4096,e:AES-256,b:CBC,h:SHA3-512,v:1
Comma separated fields “<key>:<value>”. The result must be ASCII encoded and only contain the following limited range of
characters (regular expression): [-_,:A-Za-z0-9]+
k
key algorithme
encryption algorithmb
block algorithmh
hash algorithmv
file format version
There is no need to allow other algorithms. If the configuration does not match exactly the shown string, it is safe to stop decoding with an error.
This string is part of the file format to describe the used algorithms in a format which can be read and understand by humans. It shall allow to reconstruct the data, if the knowledge of this format get lost, but the keys and files are still available.
EPUB
Encryption Key Hash (maximum size 1k)¶
The hash for the used public key to encrypt the symmetric key. This hash is generated by using a SHA3-512 hash on the DER encoded public key in SubjectPublicKeyInfo format.
This block allows a quick verification which key was used to encrypt the file, without trial-and-error. The hash itself is required for the decryption process.
ESYM
Encrypted Symmetric Key (maximum size 1k)¶
For RSA-4096: The symmetric key is encrypted using OAEP padding, with the given hash size and no label.
This block is the key for the asymmetric encryption of the data. The 16 bytes of the AES-256 key are encrypted using the public RSA key. This is done using the optimal asymmetric encryption padding for RSA, also called RSA-OAEP and standardized in PKCS#1 und RFC 3447. The hash used in this encryption is SHA256, the mask function is MFG1.
encryption_key = os.urandom(AES_KEY_LENGTH_BYTES)
encrypted_encryption_key = public_key.encrypt(
encryption_key,
OAEP(
mgf=MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None))
All encrypted blocks in the file are encrypted using this AES-256 key, but with different initialization vectors. See section “Encrypted File Format” for all details.
META
Encrypted metadata (maximum size 10k)¶
The block with the encrypted metadata. If no metadata is given, this block and the hash block is empty.
See section “Metadata Format” for all details.
MDHA
Hash for metadata (maximum size 1k)¶
The hash for the decrypted metadata. Encrypted.
DATA
Encrypted file data (no maximum size)¶
The encrypted file data.
For empty files with a size of zero, this block and the hash block is empty.
DTHA
Hash for the decrypted data.¶
The hash for the decrypted file data. Encrypted.
ENDH
= End of file with hash¶
Marks the end of the file and contains a SHA3-512 hash for the whole file, up to this block, without the bytes of the block type. The bytes following this block type are therefore always 0x40,0,0,0,0,0,0,0 for a 64 byte sized block, followed by the 64 byte hash.
In order to quickly check the integrity of a file, you can create a digest of the file data up to
file_size - 76 bytes
, then skip 12 bytes, read the next 64 and compare the digest.
Block Order¶
The blocks must appear in the shown order.
Encrypted Data Block Format¶
Static Blocks¶
The data format in an encrypted data block:
- 8 bytes, big endian, unsigned, with the size of the decrypted data.
If the encrypted file is empty, the block is empty and this size is not given.
If this value is greater than zero, it is the size of the decrypted data in bytes.
16 bytes (for AES-256/CBC) with the IV for the encryption.
The encrypted data, aligned to the cipher block size.
If the encrypted data is empty, this block is empty.
Streamed Blocks¶
As the size of the encrypted data is not known the encryption format for streamed blocks is different.
16 bytes (for AES-256/CBC) with the IV for the encryption.
The encrypted data, padded with ISO/IEC 9797-1 padding method 2.
ISO padding adds the byte 0x80 and fills the last block with zero bytes.
Metadata Format¶
Each file can contain a custom block with metadata.
The metadata is stored as a UTF-8 encoded block in JSON format.
The JSON block must be stored compact, without pretty formatting.
The JSON block must encode a top level object like this:
{
"attribute1": "data2",
"attribute2": "data2",
"attribute3": "data3"
}
So there has to be an object with attributes (no top-level list, etc.), but the format of the attribute is user defined and can contain lists and nested objects.
The size of the encrypted metadata must not exceed 100k.
Field names only consist of lowercase letters and the underscore character. They must be shorter than 64 characters.
Predefined Metadata Fields¶
file_path
The original absolute path of the encrypted file.file_name
The original filename of the encrypted file.file_size
The original size of the encrypted file.created
The original created UTC date/time in ISO format (yyyy-mm-ddThh:mm:ss)modified
The original modified UTC date/time in ISO format (yyyy-mm-ddThh:mm:ss)mime_type
The MIME type of the file contents.version
A version of the file, free format.encryptor
The name of the application which encrypted the file.
Error Handling on Decoding¶
If there is a problem, decoding shall simply stop. Do not try recover from the problem.
If the file is smaller than 256 bytes, it is invalid, stop decoding.
If an unknown block type field is read, stop decoding.
If an expected block is missing, stop decoding.
If the size of the block, exceeds the specified size, stop decoding.
If the
CONF
field do not match the exact encryption specification, stop decoding.If the hash does not match the decoded data, stop decoding.