Network Neutral Data Interchange 0.1

a simplified hierarchical object model for EDI objects, **Baseline EDI Model**.

This specification describes a privacy-respecting mechanism for storing, indexing, and retrieving encrypted data at a storage provider for EDI transaction sets. It is often useful an organization wants to protect data in a way that the storage provider cannot view, analyze, aggregate, or resell the data. This approach also ensures that application data is portable and protected from storage provider data breaches.

Introduction

We store a significant amount of sensitive data online, such as personally identifying information (PII), trade secrets, family pictures, and customer information. The data that we store is often not protected in an appropriate manner.

Legislation, such as the General Data Protection Regulation (GDPR), incentivizes service providers to better preserve individuals' privacy, primarily through making the providers liable in the event of a data breach. This liability pressure has revealed a technological gap, whereby providers are often not equipped with technology that can suitably protect their customers. Encrypted Data Vaults fill this gap and provide a variety of other benefits.

This specification describes a privacy-respecting mechanism for transmitting, storing, indexing, and retrieving encrypted data at a storage provider. It is often useful when an individual or organization wants to protect data in a way that the storage provider cannot view, analyze, aggregate, or resell the data. This approach also ensures that application data is portable and protected from storage provider data breaches.

Why Do We Need Encrypted Data Vaults?

Explain why individuals and organizations that want to protect their privacy, trade secrets, and ensure data portability will benefit from using this technology. Explain how giving a standard API for the storage of user data empowering users to "bring their own storage", giving them control of their own information. Explain how applications that are written against a standard API and assume that users will bring their own storage can separate concerns and focus on the functionality of their application, removing the need to deal with storage infrastructure (instead leaving it to a specialist service provider that is chosen by the user).

Requiring client-side (edge) encryption for all data and metadata at the same time as enabling the user to store data on multiple devices and to share data with others, whilst also having searchable or queryable data, has been historically very difficult to implement in one system. Trade-offs are often made which sacrifice privacy in favor of usability, or vice versa.

Due to a number of maturing technologies and standards, we are hopeful that such trade-offs are no longer necessary, and that it is possible to design a privacy-preserving protocol for encrypted decentralized data storage that has broad practical appeal.

Ecosystem Overview

The problem of decentralized data storage has been approached from various different angles, and personal data stores (PDS), decentralized or otherwise, have a long history in commercial and academic settings. Different approaches have resulted in variations in terminology and architectures. The diagram below shows the types of components that are emerging, and the roles they play. Encrypted Data Vaults fulfill the low-level encrypted storage role.

diagram showing
the roles of different technologies in the encrypted
data vaults and Network Neutral Data Interchange ecosystem and how they interact. — Secure Data Storage layers

This section describes the roles of the core actors and the relationships between them in an ecosystem where this specification is expected to be useful. A role is an abstraction that might be implemented in many different ways. The separation of roles suggests likely interfaces and protocols for standardization. The following roles are introduced in this specification:

data vault controller: A role an entity might perform by creating, managing, and deleting data vaults. This entity is also responsible for granting and revoking authorization to storage agents to the data vaults that are under its control.
storage agent: A role an entity might perform by creating, updating, and deleting data in a data vault. This entity is typically granted authorization to to access a data vault by a data vault controller.
storage provider: A role an entity might perform by providing a raw data storage mechanism to a data vault controller. It is impossible for this entity to see the data that it is storing due to all data being encrypted at rest and in transit to and from the storage provider.

Requirements

The following sections elaborate on the requirements that have been gathered from the core use cases.

Privacy and multi-party encryption

One of the main goals of this system is ensuring the privacy of an entity's data so that it cannot be accessed by unauthorized parties, including the storage provider.

To accomplish this, the data must be encrypted both while it is in transit (being sent over a network) and while it is at rest (on a storage system).

Since data could be shared with more than one entity, it is also necessary for the encryption mechanism to support encrypting data to multiple parties.

Sharing and authorization

It is necessary to have a mechanism that enables authorized sharing of encrypted information among one or more entities.

The system is expected to specify one mandatory authorization scheme, but also allow other alternate authorization schemes. Examples of authorization schemes include OAuth2, Web Access Control, and [[ZCAP]]s (Authorization Capabilities).

Identifiers

The system should be identifier agnostic. In general, identifiers that are a form of URN or URL are preferred. While it is presumed that [[edi-CORE]] (Decentralized Identifiers, edis) will be used by the system in a few important ways, hard-coding the implementations to edis would be an anti-pattern.

Versioning and replication

It is expected that information can be backed up on a continuous basis. For this reason, it is necessary for the system to support at least one mandatory versioning strategy and one mandatory replication SLA, but also allow other alternate versioning and replication strategies.

Metadata and searching

Large volumes of data are expected to be stored using this system, which then need to be efficiently and selectively retrieved. To that end, an encrypted search mechanism is a necessary feature of the system.

It is important for clients to be able to associate metadata with the data such that it can be searched. At the same time, since privacy of both data and metadata is a key requirement, the metadata must be stored in an encrypted state, and service providers must be able to perform those searches in an opaque and privacy-preserving way, without being able to see the metadata.

Protocols

Since this system can reside in a variety of operating environments, it is important that at least one protocol is mandatory, but that other protocols are also allowed by the design. Examples of protocols include HTTP, gRPC, Bluetooth, and various binary on-the-wire protocols. An HTTPS API is defined in .

Design goals

This section elaborates upon a number of guiding principles and design goals that shape Encrypted Data Vaults.

Layered and modular architecture

A layered architectural approach is used to ensure that the foundation for the system is easy to implement while allowing more complex functionality to be layered on top of the lower foundations.

For example, Layer 1 might contain the mandatory features for the most basic system, Layer 2 might contain useful features for most deployments, Layer 3 might contain advanced features needed by a small subset of the ecosystem, and Layer 4 might contain extremely complex features that are needed by a very small subset of the ecosystem.

Prioritize privacy

This system is intended to protect an entity's privacy. When exploring new features, always ask "How would this impact privacy?". New features that negatively impact privacy are expected to undergo extreme scrutiny to determine if the trade-offs are worth the new functionality.

Push implementation complexity to the client

Servers in this system are expected to provide functionality strongly focused on the storage and retrieval of encrypted data. The more a server knows, the greater the risk to the privacy of the entity storing the data, and the more liability the service provider might have for hosting data. In addition, pushing complexity to the client enables service providers to provide stable server-side implementations while innovation can by carried out by clients.

Core Concepts

The following sections outline core concepts, such as encrypted storage, which form the foundation of this specification.

Encrypted Storage

An important consideration of encrypted data stores is which components of the architecture have access to the (unencrypted) data, or who controls the private keys. There are roughly three approaches: storage-side encryption, client-side (edge) encryption, and gateway-side encryption (which is a hybrid of the previous two).

Any data storage systems that let the user store arbitrary data also support client-side encryption at the most basic level. That is, they let the user encrypt data themselves, and then store it. This doesn't mean these systems are optimized for encrypted data however. Querying and access control for encrypted data may be difficult.

Storage-side encryption is usually implemented as whole- disk encryption or filesystem-level encryption. This is widely supported and understood, and any type of hosted cloud storage is likely to use storage-side encryption. In this scenario the private keys are managed by the service provider or controller of the storage server, which may be a different entity than the user who is storing the data. Encrypting the data while it resides on disk is a useful security measure should physical access to the storage hardware be compromised, but does not guarantee that only the original user who stored the data has access.

Conversely, client-side encryption offers a high level of security and privacy, especially if metadata can be encrypted as well. Encryption is done at the individual data object level, usually aided by a keychain or wallet client, so the user has direct access to the private keys. This comes at a cost, however, since the significant responsibility of key management and recovery falls squarely onto the end user. In addition, the question of key management becomes more complex when data needs to be shared.

Gateway-side encryption systems take an approach that combines techniques from storage-side and client-side encryption architectures. These storage systems, typically encountered among multi-server clusters or some "encryption as a platform" cloud service providers, recognize that client-side key management may be too difficult for some users and use cases, and offer to perform encryption and decryption themselves in a way that is transparent to the client application. At the same time, they aim to minimize the number of components (storage servers) that have access to the private decryption keys. As a result, the keys usually reside on "gateway" servers, which encrypt the data before passing it to the storage servers. The encryption/decryption is transparent to the client, and the data is opaque to the storage servers, which can be modular/pluggable as a result. Gateway-side encryption provides some benefits over storage-side systems, but also share the drawbacks: the gateway sysadmin controls the keys, not the user.

Structured Documents

The fundamental unit of storage in data vaults is the encrypted structured document which, when decrypted, provides a data structure that can be expressed in popular syntaxes such as JSON and CBOR. Documents can store structured data and metadata about the structured data. Structured document sizes are limited to 16MB.

Streams

For files larger than 16MB or for raw binary data formats such as audio, video, and office productivity files, a streaming API is provided that enables data to be streamed to/from a data vault. Streams are described using structured documents, but the storage of the data is separated from the structured document using a hashlink to the encrypted content.

Indexing

Data vaults are expected to store a very large number of documents of varying kinds. This means that it is important to be able to search the documents in a timely way, which creates a challenge for the storage provider as the content is encrypted. Previously this has been worked around with a certain amount of unencrypted metadata attached to the data objects. Another possibility is unencrypted listings of pointers to filtered subsets of data.

In the case of data vaults, an encrypted search scheme is provided for secure data vaults that enable data vault clients to do meta data indexing while not leaking metadata to the storage provider.

Architecture

Review this section for language that should be properly normative.

This section describes the architecture of the Encrypted Data Vault protocol, in the form of a client-server relationship. The vault isregarded as the server and the client acts as the interface used to interact with the vault.

This architecture is layered in nature, where the foundational layer consists of an operational system with minimal features, and where more advanced features are layered on top. Implementations can choose to implement only the foundational layer, or optionally, additional layers consisting of a richer set of features for more advanced use cases.

Server and client responsibilities

The server is assumed to be of low trust, and must have no visibility into the data that it persists. However, even in this model, the server still has a set of minimum responsibilities it must adhere to.

The client is responsible for providing an interface to the server, with bindings for each relevant protocol (HTTP, RPC, or binary over-the-wire protocols), as required by the implementation.

All encryption and decryption of data is done on the client side, at the edges. The data (including metadata) MUST be opaque to the server, and the architecture is designed to prevent the server from being able to decrypt it.

Layer 1 (L1) responsibilities

Layer 1 consists of a client-server system that is capable of encrypting data in transit and at rest.

Server: validate requests (L1)

When a vault client makes a request to store, query, modify, or delete data in the vault, the server validates the request. Since the actual data and metadata in any given request is encrypted, such validation is necessarily limited and largely depends on the protocol and the semantics of the request.

Server: Persist data (L1)

The mechanism a server uses to persist data, such as storage on a local, networked, or distributed file system, is determined by the implementation. The persistence mechanism is expected to adhere to the common expectations of a data storage provider, such as reliable storage and retrieval of data.

Server: Persist global configuration (L1)

A vault has a global configuration that defines the following properties:

Stream chunk size
Other config metadata

The configuration allows the the client to perform capability discovery regarding things like authorization, protocol, and replication mechanisms that are used by the server.

Server: enforcement of authorization policies (L1)

When a client makes a request to store, query, modify, or delete data in the vault, the server enforces any authorization policy that is associated with the request.

Client: encrypted data chunking (L1)

An Encrypted Data Vault is capable of storing many different types of data, including large unstructured binary data. This means that storing a file as a single entry would be challenging for systems that have limits on single record sizes. For example, some databases set the maximum size for a single record to 16MB. As a result, it is necessary that large data is chunked into sizes that are easily managed by a server. It is the responsibility of the client to set the chunk size of each resource and chunk large data into manageable chunks for the server. It is the responsibility of the server to deny requests to store chunks larger that it can handle.

Each chunk is encrypted individually using authenticated encryption. Doing so protects against attacks where an attacking server replaces chunks in a large file and requires the entire file to be downloaded and decrypted by the victim before determining that the file is compromised. Encrypting each chunk with authenticated encryption ensures that a client knows that it has a valid chunk before proceeding to the next one. Note that another authorized client can still perform an attack by doing authenticated encryption on a chunk, but a server is not capable of launching the same attack.

Client: Resource structure (L1)

The process of storing encrypted data starts with the creation of a Resource by the client, with the following structure.

Resource:

id (required)
meta
- meta.contentType MIME type
content - entire payload, or a manifest-like list of hashlinks to individual chunks

If the data is less than the chunk size, it is embedded directly into the content.

Otherwise, the data is sharded into chunks by the client (see next section), and each chunk is encrypted and sent to the server. In this case, content contains a manifest-like listing of URIs to individual chunks (integrity-protected by [[HASHLINK]].

Client: Encrypted resource structure (L1)

The process of creating the Encrypted Resource. If the data was sharded into chunks, this is done after the individual chunks are written to the server.

id
index - encrypted index tags prepared by the client (for use with privacy-preserving querying over encrypted resources)
Chunk size (if different from the default in global config)
Versioning metadata - such as sequence numbers, Git-like hashes, or other mechanisms
Encrypted resource payload - encoded as a jwe [[RFC7516]], cwe [[RFC8152]] or other appropriate mechanism

Layer 2 (L2) responsibilities

Layer 2 consists of a system that is capable of sharing data among multiple entities, of versioning and replication, and of performing privacy-preserving searches in an efficient manner.

Client: Encrypted search indexes (L2)

To enable privacy-preserving querying (where the search index is opaque to the server), the client must prepare a list of encrypted index tags (which are stored in the Encrypted Resource, alongside the encrypted data contents).

Need details about salting and encryption mechanism of index tags.

Client: Versioning and replication (L2)

A server must support at least one versioning/change control mechanism. Replication is done by the client, not by the server (since the client controls the keys, knows about which other servers to replicate to, etc.). If an Encrypted Data Vault implementation aims to provide replication functionality, it MUST also pick a versioning/change control strategy (since replication necessarily involves conflict resolution). Some versioning strategies are implicit ("last write wins", eg. rsync or uploading a file to a file hosting service), but keep in mind that a replication strategy always implies that some sort of conflict resolution mechanism should be involved.

Client: Sharing with other entities

An individual vault's choice of authorization mechanism determines how a client shares resources with other entities (authorization capability link or similar mechanism).

Layer 3 (L3) responsibilities

Server: Notifications (L3)

It is helpful if data storage providers are able to notify clients when changes to persisted data occurs. A server may optionally implement a mechanism by which clients can subscribe to changes in the vault.

Client: Vault-wide integrity protection (L3)

Vault-wide integrity protection is provided to prevent a variety of storage provider attacks where data is modified in a way that is undetectable, such as if documents are reverted to older versions or deleted. This protection requires that a global catalog of all the resource identifiers that belong to a user, along with the most recent version, is stored and kept up to date by the client. Some clients may store a copy of this catalog locally (and include integrity protection mechanism such as [[HASHLINK]] to guard against interference or deletion by the server.

Property	Description
sequence	A unique counter for the data vault in order to ensure that clients are properly synchronized to the data vault. The value is required and MUST be an unsigned 64-bit number.
controller	The entity or cryptographic key that is in control of the data vault. The value is required and MUST be a URI.
invoker	The root entities or cryptographic key(s) that are authorized to invoke an authorization capability to modify the data vault's configuration or read or write to it. The value is optional, but if present, MUST be a URI or an array of URIs. When this value is not present, the value of controller property is used for the same purpose.
delegator	The root entities or cryptographic key(s) that are authorized to delegate authorization capabilities to modify the data vault's configuration or read or write to it. The value is optional, but if present, MUST be a URI or an array of URIs. When this value is not present, the value of controller property is used for the same purpose.
referenceId	Used to express an application-specific reference identifier. The value is optional and, if present, MUST be a string.
keyAgreementKey.id	An identifier for the key agreement key. The value is required and MUST be a URI. The key agreement key is used to derive a secret that is then used to generate a key encryption key for the receiver.
keyAgreementKey.type	The type of key agreement key. The value is required and MUST be or map to a URI.
hmac.id	An identifier for the HMAC key. The value is required a MUST be or map to a URI.
hmac.type	The type of HMAC key. The value is required and MUST be or map to a URI.

Property	Description
id	An identifier for the structured document. The value is required and MUST be a Base58-encoded 128-bit random value.
meta	Key-value metadata associated with the structured document.
content	Key-value content for the structured document.

Property	Description
meta.chunks	Specifies the number of chunks in the stream.
stream.id	The identifier for the stream. The stream identifier MUST be a URI that references a stream on the same data vault. Once the stream has been written to the data vault, the content identifier MUST be updated such that it is a valid hashlink. To allow for streaming encryption, the value of the digest for the stream is assumed to be unknowable until after the stream has been written. The hashlink MUST exist as a content hash for the stream that has been written to the data vault.

Property	Description
id	An identifier for the encrypted document. The value is required and MUST be a Base58-encoded 128-bit random value.
sequence	A unique counter for the data vault in order to ensure that clients are properly synchronized to the data vault. The value is required and MUST be an unsigned 64-bit number.
jwe or cwe	A JSON Web Encryption or COSE Encrypted value that, if decoded, results in the corresponding StructuredDocument.

Data vault HTTPS API

This section introduces the HTTPS API for interacting with data vaults and their contents.

Discovering Service Endpoints

A website may provide service endpoint discovery by embedding JSON-LD in their top-most HTML web page (e.g. at https://example.com/):

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Example Website</title>
    <link rel="stylesheet" href="style.css">
    <script src="script.js"></script>
    <script type="application/ld+json">
{
  "@context": "https://w3id.org/encrypted-data-vaults/v1",
  "id": "https://example.com/",
  "name": "Example Website",
  "dataVaultManagementService": "https://example.com/data-vaults"
}
    </script>
  </head>
  <body>
    <!-- page content -->
  </body>
</html>

Service descriptions may also be requested via content negotiation. In the following example a JSON-compatible service description is provided (e.g. curl -H "Accept: application/json" https://example.com/):

{
  "@context": "https://w3id.org/encrypted-data-vaults/v1",
  "id": "https://example.com/",
  "name": "Example Website",
  "dataVaultCreationService": "https://example.com/data-vaults"
}

Creating a data vault

A data vault is created by performing an HTTP POST of a DataVaultConfiguration to the dataVaultCreationService. The following HTTP status codes are defined for this service:

HTTP Status	Description
201	data vault creation was successful. The HTTP `Location` header will contain the URL for the newly created data vault.
400	data vault creation failed.
409	A duplicate data vault exists.

An example exchange of a data vault creation is shown below:

POST /data-vaults HTTP/1.1
Host: example.com
Content-Type: application/json
Accept: application/json, text/plain, */*
Accept-Encoding: gzip, deflate

{
  "sequence": 0,
  "controller": "edi:example:110456789",
  "referenceId": "urn:uuid:abc5a436-21f9-4b4c-857d-1f5569b2600d",
  "keyAgreementKey": {
    "id": "https://example.com/kms/11045",
    "type": "X25519KeyAgreementKey2020"
  },
  "hmac": {
    "id": "https://example.com/kms/67891",
    "type": "Sha256HmacKey2020"
  }
}

Explain the purpose of the controller property to root authority. Explain how Authorization Capabilities can be created and invoked via HTTP signatures to authorize reading and writing from/to data vaults.

If the creation of the data vault was successful, an HTTP 201 status code is expected in return:

HTTP/1.1 201 Created
Location: https://example.com/encrypted-data-vaults/z4sRgBJJLnYy
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
Date: Fri, 14 Jun 2020 18:35:33 GMT
Connection: keep-alive
Transfer-Encoding: chunked

Creating a Document

HTTP Status	Description
201	Structured document creation was successful. The HTTP `Location` header will contain the URL for the newly created document.
400	Structured document creation failed.

In order to convert a StructuredDocument to an EncryptedDocument an implementer MUST encode the StructuredDocument as a JWE or a COSE Encrypted object. Once the document is encrypted, it can be sent to the document creation service.

A protocol example of a document creation is shown below:

POST /encrypted-data-vaults/z4sRgBJJLnYy/docs HTTP/1.1
Host: example.com
Content-Type: application/json
Accept: application/json, text/plain, */*
Accept-Encoding: gzip, deflate

{
  "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76114",
  "sequence": 0,
  "jwe": {
    "protected": "eyJlbmMiOiJDMjBQIn0",
    "recipients": [{
      "header": {
        "alg": "A256KW",
        "kid": "https://example.com/kms/zSDn2MzzbxmX"
      },
      "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
    }],
    "iv": "i8Nins2vTI3PlrYW",
    "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
    "tag": "pfZO0JulJcrc3trOZy8rjA"
  }
}

If the creation of the structured document was successful, an HTTP 201 status code is expected in return:

HTTP/1.1 201 Created
Location: https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
Date: Fri, 14 Jun 2020 18:37:12 GMT
Connection: keep-alive
Transfer-Encoding: chunked

Reading a Document

Reading a document from a data vault is performed by retrieving the EncryptedDocument and then decrypting it to a StructuredDocument. The following HTTP status codes are defined for this service:

HTTP Status	Description
200	EncryptedDocument retrieval was successful.
400	EncryptedDocument retrieval failed.
411	EncryptedDocument with given id was not found.

In order to convert an EncryptedDocument to a StructuredDocument an implementer MUST decode the EncryptedDocument from a JWE or a COSE Encrypted object. Once the document is decrypted, it can be processed by the web application.

A protocol example of a document retrieval is shown below:

Explain that the URL path structure is fixed for all data vaults to enable portability and the use of stable URLs (such as through edi URLs) to reference certain documents while allowing users to change their data vault service providers. Explain how this enables portability.

GET https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz HTTP/1.1
Host: example.com
Accept: application/json, text/plain, */*
Accept-Encoding: gzip, deflate

If the retrieval of the encrypted document was successful, an HTTP 200 status code is expected in return:

HTTP/1.1 200 OK
Date: Fri, 14 Jun 2020 18:37:12 GMT
Connection: keep-alive

{
  "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76114",
  "sequence": 0,
  "jwe": {
    "protected": "eyJlbmMiOiJDMjBQIn0",
    "recipients": [{
      "header": {
        "alg": "A256KW",
        "kid": "https://example.com/kms/zSDn2MzzbxmX"
      },
      "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
    }],
    "iv": "i8Nins2vTI3PlrYW",
    "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
    "tag": "pfZO0JulJcrc3trOZy8rjA"
  }
}

Updating a Document

A structured document is updated in a data vault by encoding the updated StructuredDocument as an EncryptedDocument and then performing an HTTP POST to a data vault endpoint created via . The following HTTP status codes are defined for this service:

HTTP Status	Description
200	Structured document update was successful.
400	Structured document update failed.

A protocol example of a document update is shown below:

POST  /encrypted-data-vaults/z4sRgBJJLnYy/docs/zMbxmSDn2Xzz HTTP/1.1
Host: example.com
Content-Type: application/json
Accept: application/json, text/plain, */*
Accept-Encoding: gzip, deflate

{
  "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76114",
  "sequence": 1,
  "jwe": {
    "protected": "eyJlbmMiOiJDMjBQIn0",
    "recipients": [{
      "header": {
        "alg": "A256KW",
        "kid": "https://example.com/kms/zSDn2MzzbxmX"
      },
      "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
    }],
    "iv": "i8Nins2vTI3PlrYW",
    "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
    "tag": "pfZO0JulJcrc3trOZy8rjA"
  }
}

If the update to the encrypted document was successful, an HTTP 200 status code is expected in return:

HTTP/1.1 200 OK
Cache-Control: no-cache, no-store, must-revalidate
Date: Fri, 14 Jun 2020 18:39:52 GMT
Connection: keep-alive

Implementations first encode the metadata associated with the stream into a StructuredDocument:

{
  "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76114",
  "meta": {
    "created": "2020-06-18",
    "contentType": "video/mpeg",
    "contentLength": 56735817
  },
  "content": {
    "id": "https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz"
  }
}

In this case, the value of content.id is a reference to the stream located at https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz, which is the location that the stream MUST be written to. This content identifier MUST be updated to include a hashlink once the stream has been written and its digest is known.

The StructuredDocument above is then transformed to an EncryptedDocument and the procedure in is executed:

POST /encrypted-data-vaults/z4sRgBJJLnYy/docs HTTP/1.1
Host: example.com
Content-Type: application/json
Accept: application/json, text/plain, */*
Accept-Encoding: gzip, deflate

{
  "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76114",
  "sequence": 0,
  "jwe": {
    "protected": "eyJlbmMiOiJDMjBQIn0",
    "recipients": [{
      "header": {
        "alg": "A256KW",
        "kid": "https://example.com/kms/zSDn2MzzbxmX"
      },
      "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
    }],
    "iv": "i8Nins2vTI3PlrYW",
    "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
    "tag": "pfZO0JulJcrc3trOZy8rjA"
  }
}

If the creation of the structured document was successful, an HTTP 201 status code is expected in return:

HTTP/1.1 201 Created
Location: https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/docs/zp4H8ekWn
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
Date: Fri, 14 Jun 2020 18:37:12 GMT
Connection: keep-alive
Transfer-Encoding: chunked

Next, in order to convert a stream to an EncryptedStream an implementer MUST encrypt the stream. Once the stream is encrypted (or as it is encrypted), it can be sent to the stream creation service.

A protocol example of a stream creation is shown below:

POST /encrypted-data-vaults/z4sRgBJJLnYy/streams HTTP/1.1
Host: example.com
Content-Type: application/octet-stream
Transfer-Encoding: chunked
Accept: application/json, text/plain, */*
Accept-Encoding: gzip, deflate

TBD

If the creation of the stream was successful, an HTTP 201 status code is expected in return:

HTTP/1.1 201 Created
Location: https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
Date: Fri, 14 Jun 2020 18:37:12 GMT
Connection: keep-alive
Transfer-Encoding: chunked

Once a stream is created, the metadata related to the stream can be updated in the data vault using the protocol defined in . An example of updating a link to a video file is shown below.

Implementations update the metadata associated with the stream in its StructuredDocument:

{
  "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76114",
  "sequence": 1,
  "meta": {
    "created": "2020-06-18",
    "contentType": "video/mpeg",
    "contentLength": 56735817
  },
  "content": {
    "id": "https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/streams/zMbxmSDn2Xzz?hl=zb47JhaKJ3hJ5Jkw8oan35jK10289Hp",
    "jwe": {
      "protected": "eyJlbmMiOiJDMjBQIn0",
      "recipients": [{
        "header": {
          "alg": "A256KW",
          "kid": "https://example.com/kms/zSDn2MzzbxmX"
        },
        "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
      }],
      "iv": "i8Nins2vTI3PlrYW",
      "tag": "pfZO0JulJcrc3trOZy8rjA"
    }
  }
}

The value of content.id MUST be updated to include a hashlink now that the stream has been written and its digest is known.

The StructuredDocument above is then transformed to an EncryptedDocument and the procedure in is executed:

POST /encrypted-data-vaults/z4sRgBJJLnYy/docs HTTP/1.1
Host: example.com
Content-Type: application/json
Accept: application/json, text/plain, */*
Accept-Encoding: gzip, deflate

{
  "id": "urn:uuid:94684128-c42c-4b28-adb0-aec77bf76114",
  "sequence": 1,
  "jwe": {
    "protected": "eyJlbmMiOiJDMjBQIn0",
    "recipients": [{
      "header": {
        "alg": "A256KW",
        "kid": "https://example.com/kms/zSDn2MzzbxmX"
      },
      "encrypted_key": "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
    }],
    "iv": "i8Nins2vTI3PlrYW",
    "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
    "tag": "pfZO0JulJcrc3trOZy8rjA"
  }
}

If the creation of the structured document was successful, an HTTP 200 status code is expected in return:

HTTP/1.1 200 OK
Location: https://example.com/encrypted-data-vaults/z4sRgBJJLnYy/docs/zp4H8ekWn
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
Date: Fri, 14 Jun 2020 18:37:12 GMT
Connection: keep-alive
Transfer-Encoding: chunked

Creating Encrypted Indexes

It is often useful to search a data vault for structured documents that contain specific metadata. Efficient searching requires the use of search indexes and local access to data. This poses an interesting challenge as the search has to be performed on the storage provider without leaking information that could violate the privacy of the entities that are storing information in the data vault. This section details how encrypted indexes can be created and used to perform efficient searching while protecting the privacy of entities that are storing information in the data vault.

When creating an EncryptedDocument, blinded index properties MAY be used to perform efficient searches. An example of the use of these properties is shown below:

{
  "id": "urn:uuid:698f3fb6-592f-4d22-9e11-462cc4606a10",
  "sequence": 0,
  "indexed": [{
    "sequence": 0,
    "hmac": {
      "id": "https://example.com/kms/z7BgF536GaR",
      "type": "Sha256HmacKey2020"
    },
    "attributes": [{
      "name": "CUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ",
      "value": "RV58Va4911K-18_L5g_vfARXRWEB00knFSGPpukUBro",
      "unique": true
    }, {
      "name": "DUQaxPtSLtd8L3WBAIkJ4DiVJeqoF6bdnhR7lSaPloZ",
      "value": "QV58Va4911K-18_L5g_vfARXRWEB00knFSGPpukUBro"
    }]
  }],
  "jwe": {
    "protected": "eyJlbmMiOiJDMjBQIn0",
    "recipients": [
      {
        "header": {
          "alg": "A256KW",
          "kid": "https://example.com/kms/z7BgF536GaR"
        },
        "encrypted_key":
          "OR1vdCNvf_B68mfUxFQVT-vyXVrBembuiM40mAAjDC1-Qu5iArDbug"
      }
    ],
    "iv": "i8Nins2vTI3PlrYW",
    "ciphertext": "Cb-963UCXblINT8F6MDHzMJN9EAhK3I",
    "tag": "pfZO0JulJcrc3trOZy8rjA"
  }
}

The example above demonstrates the use of unique index values as well as non-unique indexes.

The example above enables the storage provider to build efficient indexes on encrypted properties while enabling storage agents to search the information without leaking information that would create privacy concerns.

Document the following in this section:

The `equals` filter is an object with key-value attribute pairs. Any document that matches *all* given key-value attribute pairs will be returned. If equals is an array, it may contain multiple such filters -- whereby the results will be all documents that matched any one of the filters. If the document's value for a matching a key is an array and the array contains a matching value, the document will be considered a match (provided that other key-value attribute pairs also match).

Here are some examples:

// for the query:
{equals: {"content.foo": "bar"}}
// this will match documents that look like this:
{"content": [ID: 'REF', MIN: 0, MAX: 99999]}
{"content": [ID: 'CTP', MIN: 0, MAX: 99999]}
{"content": [ID: 'CSH', MIN: 0, MAX: 5]},

// for the query:
{equals: [{"content.foo": "bar"}, {"content.foo": "baz"}]}
// this will match documents that look like this:
    [ID: 'BEG', MIN: 1, MAX: 1],
    [ID: 'CUR', MIN: 0, MAX: 1],
    [ID: 'REF', MIN: 0, MAX: 99999],
    [ID: 'PER', MIN: 0, MAX: 3],
    [ID: 'TAX', MIN: 0, MAX: 99999],
    [ID: 'FOB', MIN: 0, MAX: 99999],
    [ID: 'CTP', MIN: 0, MAX: 99999],
    [ID: 'PAM', MIN: 0, MAX: 10],
    [ID: 'CSH', MIN: 0, MAX: 5],

// for the query:
{equals: {"content.foo": ["bar", "baz"]}}
// this will match documents that look like this:
{"content": [ID: 'REF', MIN: 0, MAX: 99999]}
{"content": [ID: 'CTP', MIN: 0, MAX: 99999]}
{"content": [ID: 'CSH', MIN: 0, MAX: 5]},

// for the query:
{equals: {"content.https://schema\\.org/": "bar"}}
// this will match documents that look like this:
{"content": {"https://schema.org": "bar"}}
{"content": {"https://schema.org": ["bar"]}}
{"content": {"https://schema.org": ["bar", "other"]}}

// for the query:
{equals: {"content.foo": {"a": 4, "b": 5}}}
// this will match documents that look like this:
{"content": {"foo": {"a": 4, "b": 5}}}
{"content": {"foo": [{"a": 4, "b": 5}]}}
{"content": {"foo": [{"a": 4, "b": 5}, "other"]}}
{"content": {"foo": {"b": 5, "a": 4}}} // note key order does not matter
{"content": {"foo": [{"b": 5, "a": 4}]}}
{"content": {"foo": [{"b": 5, "a": 4}, "other"]}}

The HMAC blinding process is very close to what @OR13 described above. There are two minor differences that are important:

Before a value is HMAC'd, it is namespaced to its key to prevent leaking information about same values across different keys. This is done by doing `HMAC({key: value})` instead of just `HMAC(value)`.
The input to HMAC for values is run through the JSON canonicalization algorithm, [JCS - RFC8785](https://tools.ietf.org/html/rfc8785) to ensure that property insertion order in the value will not matter. This matters when the value is not a simple primitive such as a string, but instead it is an object such as `{a: 4, b: 5}`.

By way of example, for `equals: [{"content.foo": "bar"}]`, the process is:

1. Set `blinded` to an empty array `[]`.
2. For every element (`{"content.foo": "bar"}`) in the `equals` array:
2.1. For every key (`"content.foo"`) and its value (`"bar"`) in the object:
2.1.1. Set `value` to an object with `key` and its value (`{"content.foo": "bar"}`).
2.1.2. Canonicalize `value` using [JCS](https://tools.ietf.org/html/rfc8785).
2.1.3. Append the object `{[HMAC(key): HMAC(value)]}` to `blinded`.
3. Return `{equals: blinded}`.

Note that the HMAC output is base64url-encoded so it can be treated as string. Also note that indexes may be marked as "unique", enabling storage servers to reject documents that include certain duplicate attribute values. Additionally, an index can be "compound" and unique, allowing storage servers to reject documents that include certain duplicate attribute values within some other group. For example, you can create a compound, unique index on `["content.type", "content.name"]`. This would ensure that only one document with the same `"content.type"` and `"content.name"` could be inserted into storage. But many documents with the same `"content.type"` can be inserted, provided that they do not have the same `"content.name"` for that `"content.type"`. This is a very useful feature for storing different Registry of items in a single EDV.

Compound index values are computed by HMACing together every blinded value for each attribute. For example, for this unique, compound index: `["content.type", "content.country", "content.region"]`, a document like this:

{content: {type: "Location", country: "AU", region: "NSW"}}

Would be indexed by first blinding `{"content.type": "Location"}`, `{"content.country": "AU"}`, and `{"content.region": "NSW"}` just like above. Then index entries would be created for the blinded entry for "content.type", the combination of the blinded entries for "content.type" and "content.country", and finally, the combination of the blinded entries for "content.type", "content.country", and "country.region". The combinations are built by HMACing the concatenated the blinded attribute names using a colon (`:`) and HMACing the concatenated blinded attribute values using a colon (`:`). Note that a colon (`:`) was selected because it is not a character in the base64url alphabet. In pseudo code, blinded compound indexes entries look like:

key = HMAC(blinded1.name):HMAC(blinded2.name):...
value = HMAC(blinded1.value):HMAC(blinded2.value):...
Return {key, value}.

For clarity, the above would be repeated twice for type, country, region example -- both for the type+country combination and the type+country+region combination. The first time it would use blinded entries 1 (type) and 2 (country), and the second time it would use 1 (type), 2 (country), and 3 (region).

This same process is repeated when building a query that targets a compound index. The server sees no difference between a compound index and a regular index, but it does have to be made aware of whether or not an index is unique.

Index entries are stored along with a document in an index field that is identified by an identifier for the HMAC used. A document can have N many such indexes, each using different HMAC keys (and access to those keys may differ).

Provide instructions and examples for how indexes are blinded using an HMAC key.

Explain that multiple entities can maintain their own independent indexes (using their own HMAC key) provided they have been granted this capability. Explain that indexes can be sparse/partial. Explain that indexes have their own sequence number and that it will match the document's sequence number once it is updated.

Add a section showing the update index endpoint and how it works.

Network Provider ServiceProvider

Network Provider ServiceProvider are a formulation of Encrypted Datastores that provide additional application and service-focused functionality. Network Provider ServiceProvider aim to serve as the backbone of decentralized apps (Dapps), wherein a developer of an application or service stores the data for a Subject with that Subject, in their Network Provider Service Broker, instead of in a traditional centralized location owned and controlled by a third-party. Network Provider ServiceProvider are primarily concerned with message relay, data storage, and facilitiating data coordination processes with apps, services, and authorized parties who are allowed to create, update, read, and delete portions of the data and messages they contain. All data in a Service Broker is represented with semantic objects. Some types of data objects within a Service Broker are simply stored and managed, while others are processed in accordance with logical processes defined in a set of standard Interfaces. Each object in a Service Broker is accessible via a globally recognized API that provides inferrential knowability through implicit adderessing of objects based on their semantic type.

Instance Discovery

Object IDs & Relationships

All objects Network Provider ServiceProvider interact with are identified by the hash of their content. To derive an ID for an object in the system. To derive the ID of an object, generate a Base64URL-encoded SHA-256 multihash representation of the object

All objects are one of to types: 1) newly initialized logical objects being added to the system, or 2) a modification of an existing object in the system. Initializing objects MUST NOT have a `root` or `parent` property, as they are the root of the new object in the system and have no parents. Any object in the system that is a modification of another object (e.g. updating a part of whole of an existing object) MUST include a `root` and `parent` property. The `root` property MUST be the hash ID value of the initializing object, and the `parent` property MUST be the hash ID of the last decendant object in the lineage of a logical object in the system. Example of a newly initialized logical object:

{ //  Assume the object hash ID is 110
  "edi": "ISA:foo:a5b4c3",
  "@context": "https://github.com/openediorg/servicebroker/blob/master/spec.md",
  "@type": "Write",
  "interface": "Registry",
  "commit": {
    "strategy": "crdt-delta",
    "data": {...}
  }
}

Example of the first decendant modification of a logical object:

{ //  Assume the object hash ID is 456
  "edi": "ISA:foo:a5b4c3",
  "@context": "https://github.com/openediorg/servicebroker/blob/master/spec.md",
  "@type": "Write",
  "interface": "Registry",
  "commit": {
    "strategy": "crdt-delta",
    "root": "110",
    "parent": "110",
    "data": {...}
  }
}

Example of an additional decendant modification of a logical object:

{ //  Assume the object hash ID is 789
  "edi": "ISA:foo:a5b4c3",
  "@context": "https://github.com/openediorg/servicebroker/blob/master/spec.md",
  "@type": "Write",
  "interface": "Registry",
  "commit": {
    "strategy": "crdt-delta",
    "root": "110",
    "parent": "456",
    "data": {...}
  }
}

All `Write`-type objects MUST declare a commit `strategy` property value, and if the object is a modifcation of an existing logical object in the system, the value MUST match the initializing object's declared `strategy`.

Data Replication

A single entity may have one or more instances of a Service Broker that can span across devices, clouds, and other locations, all of which are addressable via a URI routing mechanism linked to the entity's Decentralized Identitfier. Service Broker instances sync state changes to all objects they contain, ensuring the owner can access their data from anywhere, even when they are offline.

The following replication protocol defines how Service Broker instances replicate data between them to arrive at the same state for the objects they are tasked with maintaining within them. An instance of an entity's logical set of ServiceProvider may not be tasked with maintaining the full set of unique logical objects that exist across all instances, as some instances many be storage or activity limited.

Replication begins with awareness of how Service Broker instances connect to other instances for the purpose of syncronization, which begins with the edi Document of the controlling entity. The edi Document of a Service Broker controller MUST include Service Endpoint entries that allow Service Broker instances to locate and actively initiate syncronization amongst themselves. These Service Endpoints are defined as follows: ...

Object State Syncronization

The syncronization of state for objects in the system is accomplished via a model of extensible Sync Strategies. Objects, and their subsequent modifications, MUST use a standardized Sync Strategy to ensure other Service Broker instances are able to converge the state of objects in the system and achieve the same shared state.

Last Write Wins

Delta-Based CRDT

Authentication

Access Control

Data Models

Profile

Registry

Actions

Permissions

Interfaces

The following set of interfaces facilitate application-level functionality by defining logic for how specific semantically encoded objects are handled and transacted.

Profile

An entity MAY choose to populate their logical Service Broker (referring to all instances as a singular mechanism) with a Profile object that describes the owning entity. The profile object SHOULD use whatever schema(s) and values that best describe them. The Profile interface is addressed via the following methods:

Write

{
  "edi": "ISA:foo:110",
  "@context": "https://github.com/openediorg/servicebroker/blob/master/spec.md",
  "@type": "Write",
  "interface": "Profile",
  "commit": {
    "strategy": "",
    "timestamp": 3463557567,
    "data": {
      "@context": "https://github.com/openediorg/servicebroker/blob/master/spec.md",
      "@type": "Profile",
      "edi": "edi:foo:110"
      "descriptors": [
        {
          "@context": "http://schema.org",
          "@type": "",
          "name": "",
          "description": "",
          "address": {
            "@type": "PostalAddress",
            "streetAddress": "5227 Santa Monica Boulevard",
            "addressLocality": "Los Angeles",
            "addressRegion": "CA"
          }
        },
        {...}
      ]
    }
  }
}

Read

{
  "iss": "bar:bar:456",
  "@context": "https://github.com/openediorg/servicebroker/blob/master/spec.md",
  "@type": "Read",
  "interface": "Profile"
}

Delete

{
  "iss": "bar:foo:110",
  "@context": "https://github.com/openediorg/servicebroker/blob/master/spec.md",
  "@type": "Delete",
  "interface": "Profile"
}

Registry

Write

{
  "edi": "ISA:foo:110",
  "@context": "https://github.com/openediorg/servicebroker/blob/master/spec.md",
  "@type": "Write",
  "interface": "Profile",
  "commit": {
    "SLA": "",
    "timestamp": 3463557567,
    "data": {
      "@context": "https://schema.org/",
      "@type": "SocialMediaPosting",
      "@id":"https://www.pinterest.com/pin/201887995769401047/",
      "datePublished":"2014-10-11",
      "headline":"Leaked new BMW 2 series (m105i)",
      "sharedContent":{
        "@type":"WebPage",
        "headline":"Leaked new BMW 2 series (m105i) ahead of oct 25 reveal",
        "url":"http://www.reddit.com/r/BMW/comments/1oyh6j/leaked_new_bmw_2_series_m105i_ahead_of_oct_25/",
        "author":{
          "@type":"Person",
          "name":"threal135i"
        }
      }
    }
  }
}

Read

{
"iss": "bar:bar:456",
"@context": "https://github.com/openediorg/servicebroker/blob/master/spec.md",
"@type": "Read",
"interface": "Profile"
}

Delete

{
"iss": "bar:foo:110",
"@context": "https://github.com/openediorg/servicebroker/blob/master/spec.md",
"@type": "Delete",
"interface": "Profile"
}

Extension points

Encrypted Data Vaults support a number of extension points:

Protocol/API - One or more protocols such as library APIs, HTTPS, gRPC, or Bluetooth can be used to access the system.
Encryption Strategies - One or more encryption strategies such as AES-GCM or XSalsa20Poly1305 can be used to encrypt data.
Authorization Strategies - One or more authorization strategies such as OAuth2, HTTP Signatures, or Authorization Capabilities can be used to protect access to encrypted data.
Versioning Strategies and Replication Strategies - One or more versioning and replication strategies such as counters, cryptographic hashes, or CRDTs (Conflict-free Replicated Data Types) can be used to synchronize data.
Notification mechanisms - One or more notification mechanisms can be used to signal to clients that data has changed in the vault.

Privacy Considerations

This section details the general privacy considerations and specific privacy implications of deploying this specification into production environments.

Write privacy considerations.

Security Considerations

There are a number of security considerations that implementers should be aware of when processing data described by this specification. Ignoring or not understanding the implications of this section can result in security vulnerabilities.

While this section attempts to highlight a broad set of security considerations, it is not a complete list. Implementers are urged to seek the advice of security and cryptography professionals when implementing mission critical systems using the technology outlined in this specification.

Malicious or accidental modification of data

While a service provider is not able to read data in an Encrypted Data Vault, it is possible for a service provider to delete, add, or modify encrypted data. The deletion, addition, or modification of encrypted data can be prevented by keeping a global manifest of data in the data vault.

Compromised vault

An Encrypted Data Vault can be compromised if the data controller (the entity who holds the decryption keys and appropriate authorization credentials) accidentally grants access to an attacker. For example, a victim might accidentally authorize an attacker to the entire vault or mishandle their encryption key. Once an attacker has access to the system, they may modify, remove, or change the vault's configuration.

Data access timing attacks

While it is normally difficult for a server to determine the Network Provider of an entity as well as the purpose for which that entity is accessing the Encrypted Data Vault, there is always metadata related to access patterns, rough file sizes, and other information that is leaked when an entity accesses the vault. The system has been designed to not leak information that it creates concerning privacy limitations, an approach that protects against many, but not all, surveillance strategies that may be used by servers that are not acting in the best interest of the privacy of the vault's users.

Encrypted data on public networks

Assuming that all encryption schemes will eventually be broken is a safe assumption to make when protecting one's data. For this reason, it is inadvisable that servers use any sort of public storage network to store encrypted data as a storage SLA.

Unencrypted data on server

While this system goes to great lengths to encrypt content and metadata, there are a handful of fields that cannot be encrypted in order to ensure the server can provide the features outlined in this specification. For example, a version number associated with data provides insight into how often the data is modified. The identifiers associated with encrypted content enables a server to gain knowledge by possibly correlating identifiers across documents. Implementations are advised to minimize the amount of information that is stored in an unencrypted fashion.

Partial matching on encrypted indexes

The encrypted indexes used by this system are designed to maximize privacy. As a result, there are a number of operations that are common in search systems that are not available with encrypted indexes, such as partial matching on encrypted text fields or searches over a scalar range. These features might be added in the future through the use of zero-knowledge encryption schemes.

Threat model for malicious service provider

While it is expected that most service providers are not malicious, it is also important to understand what a malicious service provider can and cannot do. The following attacks are possible given a malicious service provider:

Correlation of entities accessing information in a vault
Speculation about the types of files stored in a vault depending on file size and access patterns
Addition, deletion, and modification of encrypted data
Not enforcing authorization policy set on the encrypted data
Exfiltrating encrypted data to an unknown external system

Accessibility Considerations

There are a number of accessibility considerations implementers should be aware of when processing data described in this specification. As with any web standards or protocols implementation, ignoring accessibility issues makes this information unusable to a large subset of the population. It is important to follow accessibility guidelines and standards, such as [[WCAG21]], to ensure all people, regardless of ability, can make use of this data. This is especially important when establishing systems using cryptography, which have historically created problems for assistive technologies.

This section details the general accessibility considerations to take into account when using this data model.

Write accessibility considerations.

Internationalization Considerations

There are a number of internationalization considerations implementers should be aware of when publishing data described in this specification. As with any web standards or protocols implementation, ignoring internationalization makes it difficult for data to be produced and consumed across a disparate set of languages and societies, which would limit the applicability of the specification and significantly diminish its value as a standard.

This section outlines general internationalization considerations to take into account when using this data model.

Write i18n considerations.

Network Provider ServiceProvider

ServiceProvider let you securely store and share data. A Service Broker is a datastore containing semantic data objects at well-known locations. Each object in a Service Broker is signed by an Network Provider and accessible via a globally recognized API format that explicitly maps to semantic data objects. ServiceProvider are addressable via unique identifiers maintained in a global namespace.

One edi to Many Service Broker Instances

A single entity may have one or more instances of a Service Broker, all of which are addressable via a URI routing mechanism linked to the entity's identifier. Service Broker instances sync state changes, ensuring the owner can access data and attestations from anywhere, even when offline.

edi Document Service Endpoint Descriptors

There are two different variations of Service Broker-specific edi Document Service Endpoint descriptors, one that users associate with their edis, and another that Hosts use to direct requests to locations where their Service Broker infrastructure resides.

Users specify their Service Broker instances (different Service Broker Hosts) via the UserServiceEndpoint descriptor:

  "service": [{
  "type": "Network ProviderService Broker",
  "publicKey": "edi:foo:110#key-1",
  "serviceEndpoint": {
    "@context": "schema.Network Provider.foundation/Service Broker",
    "@type": "UserServiceEndpoint",
    "instances": ["edi:bar:456", "edi:zaz:789"]
  }
}]

Hosts specify the locations of their Service Broker offerings via the HostServiceEndpoint descriptor:

"service": [{
  "type": "Network ProviderService Broker",
  "publicKey": "edi:bar:456#key-1",
  "serviceEndpoint": {
    "@context": "schema.Network Provider.foundation/Service Broker",
    "@type": "HostServiceEndpoint",
    "locations": ["https://Service Broker1.bar.com/.Network Provider", "https://Service Broker2.bar.com/blah/.Network Provider"]
  }
}]

Syncing data between ServiceProvider

Service Broker instances must sync data without requiring master-slave relationships or forcing a single implementation for storage or application logic. This requires a shared replication protocol for broadcasting and resolving changes. The protocol for reproducing Service Broker state across multiple instances is in the formative phases of definition/selection, but should be relatively straightforward to integrate on top of any NoSQL datastore.

Service Broker data serialization and export

All ServiceProvider must support the export of their serialized state. This is to ensure the user retains full control over the portability of their data. A later revision to this document will specify the process for invoking this intent and retrieving the serialized data from a Service Broker instance.

Service Broker Protocol Schemes

Service Broker URI Scheme

In addition to the URL path convention for individual ServiceProvider instances, it is important that links to an Network Provider owner's data not be encoded with a dependency on a specific Service Broker instance. To make this possible, we propose the introduction of the following Service Broker URI scheme:

Service Broker://edi:foo:110abc/

User Agents that understand this scheme will leverage the Universal Resolver to lookup the Service Broker instances of the target edi and address the Service Broker endpoints via the Service Endpoints it finds within. This allows the formation of URIs that are not Service Broker instance-specific, allowing a more natural way to link to a edi's data, without having to specify a specific instance. This also prevents the circulation of dead links across the Web, given an Network Provider owner can choose to add/remove new Service Broker instances at any time.

Authentication

The process of authenticating requests to Network Provider ServiceProvider will follow the DIF/W3C edi Auth schemes. These standards are in early phases of formation - more info is available here.

The current Network Provider Service Broker authentication scheme seeks to accomplish two tasks:

mutually authenticate the client and the Service Broker using each's respective edi and associated keys.
encrypt all Service Broker requests and responses such that their contents are only available to the holders of theEncryption Keys involved in the message exchange.

The current authentication scheme is an implementation of edi Auth, as described here. This document will outline how to authenticate Service Broker requests and responses. For full details on the authentication protocol used, and for a reference implementation of the protocol, please refer to the `edi-auth-jose` library.

Authenticating a Service Broker request

Network Provider Service Broker requests and responses are signed and encrypted using theEncryption Keys of the sender and the recipient. This protects the message over any transportation medium. All encrypted requests and responses follow the JSON Web Encryption (JWE) standard.

The steps to construct a JWE are as follows. First, construct a JWT. This JWT will be signed by the sender of the Service Broker request; the `iss`. This JWT must have the following form:

// JWT headers
{
  "alg": "RS256",
  "kid": "edi:example:abc110#key-abc",
  "edi-requester-nonce": "randomized-string",
  "edi-access-token": "eyJhbGciOiJSUzI1N...",
}

// JWT body
{
  "@context": "https://schema.openedi.org/1",
  "@type": "WriteRequest",
  "edi": "ISA:example:abc110",
  ...
}

// JWT signature
uQRqsaky-SeP3m9QPZmTGtRtMoKzyg6wwWF...

The JWT body is just the request whose format is outlined in . The header values must be the following:

Header	Description
`alg`	Standard JWT header. Indicates the algorithm used to sign the JWT.
`kid`	Standard JWT header. The value should take the form `{edi}#{key-id}`. Another app can take this value, resolve the edi, and find the indicated public key that can be used for signature validation of the commit. Here we have used `edi:example:abc110`, because the request is signed with the user's private key.
`edi-requester-nonce`	A randomly generated string that must be cached on the client side. This string will be used to verify the response from the Service Broker in the sections below.
`edi-access-token`	A token that should be cached on the client side and included in each request sent to the Service Broker. Since we do not have an access token yet, leave this property out on the initial request. Sections below explain how to get an access token.

This JWT must use the typical JWT compact serialization format.

We can now use this JWT as the plaintext used to construct our JWE. The JWE must have the following format.

// JWE protected header
{
  "alg": "RSA-OAEP-256",
  "kid": "edi:example:abc456#abc-110",
  "enc": "A128GCM",
}

// JWE encrypted content encryption key
OKOawDo13gRp2ojaHV7LFpZcgV7T6DVZKTyKOM...

// JWE initialization vector
48V1_ALb6US11U3b...

// JWE plaintext (the JWT from above)
eyJhbGciOiJSUzI1NiIsImtpZCI6InRlc3R...

// JWE authentication tag
XFBoMYUZodetZdv...

We strongly reccommend using a JWT library to produce the above JWE. Using a library, you should only need to provide the protected headers and the plaintext. The plaintext value should be the JWT constructed above. The header values are:

Header	Description
`alg`	Standard JWT header. Indicates the algorithm used to encrypt the JWE content encryption key.
`kid`	Standard JWT header. The value should take the form `{edi}#{key-id}`. Indicates the Service Broker's key that is used to encrypt the JWE content encryption key. Here we have used `edi:example:abc456`, since that is the edi used by the Service Broker. The edi used here should match the `aud` value in the Service Broker `WriteRequest`.
`enc`	Standard JWT header. Indicates the algorithm used to encrypt the plaintext using the content encryption key to produce the ciphertext and authentication tag.

Finally, you have a signed and encrypted Service Broker request that can be transmitted to the user's Network Provider Service Broker for secure storage.

Caching the access token

To send a successful request to an Network Provider Service Broker, you need to include an access token in the `edi-access-token` header of the JWE. The access token is a short-lived JWT that can be used across many Service Broker requests until it expires.

On an initial request to an Network Provider Service Broker, you should exclude the `edi-access-token` header. When a Service Broker request does not include this header, the Service Broker will reject the request. Instead, the Service Broker will return a JWE response (as described in the next section) whose payload is an access token. You should extract the access token from the response and cache it somewhere safe. The access token can be used for subsequent requests.

Once you've cached the access token, include it in each request in the `edi-access-token` JWE header as described above.

Eventually, the access token will expire. Its expiry time can be found in the `exp` claim inside the access token. If you attempt to use an expired access token, the Network Provider Service Broker will return an error indicating a new access token is required. To get a new access token, send another Service Broker request without the `edi-access-token` header.

Receiving a Service Broker response

When possible, a Service Broker will respond with a JWE encrypted with the client's edi keys:

eyJhbGciOiJSU0EtT0FFUCIsImVuYyI6IkEyNTZHQ00ifQ...

This JWE can be decrypted with the client's private key following the JWE standard to reproduce the response's plaintext.

The contents of the JWE will either be a valid Service Broker response or a new access token. A new access token will only be included if the `edi-access-token` header was omitted in the request.

Authorization

Access control for data stored in ServiceProvider is currently implemented via a bare bones permission layer. Access to data can be granted by a Service Broker owner, and can be restricted to certain types of data. More features to improve control over data access will be added in the future.

The success of a decentralized Network Provider platform is dependent upon the ability for users to share their data with other people, organizations, apps, and services in a way that respects and protects a user's privacy. In our decentralized platform, all user information & data resides in the user's Network Provider Service Broker. This section outlines the current proposal for Network Provider Service Broker authorization.

Scope of the current design

This proposal is a first cut. The intention is to start extremely simple, and extend the model to include more richness over time. We choose to focus on two simple use cases, described below.

Use case 1: Registering for a website

Alice has added some useful data about her wardrobe style to her Service Broker: her measurements from her tailor, and a list of her favorite clothing brands. When Alice goes to try out a new online clothing retailer, the retailer's website allows her to set up an account using her edi. After signing in her edi, the retailer's website is able to access Alice's style data. Alice does not have to re-enter her sizes in the site, and the site can give her recommended options based on her brand preferences.

Use case 2: Reviewing & managing access

Alice learns that one of the websites she visited is making improper use of her personal data. She would like to immediately remove that website's access to her Service Broker.

Out of scope

These use cases, and the current Service Broker authorization system are not sufficient to consider Network Provider ServiceProvider ready for real world usage. It leaves out several features that have been discussed as being necessary for a minimally viable authorization layer, including:

Features that control what is being granted:

How to grant a permission to a specific object by ID, rather than all objects of a certain type.
How to grant a permission to a property of some object type, rather than the entire object.
How to grant permission to an object type and all of the children object types in its respective schema.
How to filter a permission to only:
- objects created by a specific edi.
- objects created in a certain time period.
- objects larger than some byte size.
How to grant a permission to a zero-knowledge proof of some object, rather than the entire object.
How to grant permission to act as a delegate of a edi when interacting with other ServiceProvider.

Features that control who is being granted access:

How to grant a permission to all edis, and therefore make some data public.
How to create a permission that explicitly denies a edi access to an object.

Features that limit/expand where or when access is granted:

How to time-bound permissions, such that a permission expires automatically.
How to grant permissions to an app on some devices, but not others.

Features that control why access is granted:

How an app can specify why permission is being requested.
How a user can specify why permission is being denied.
How relying parties and trust providers are reviewed for trustworthiness and integrity.

Features that are related to Service Broker authorization, but will be addressed at a later time:

How to request & send callbacks to notify apps of changes to data and permissions in a Service Broker.
How to authorize the execution of services, or extensions, in a Service Broker.
What format(s) the Service Broker uses for requests & responses.
How to encrypt data in a Service Broker such that the Service Broker provider cannot access it.

Clearly, there is a large body of functionality that can be added to Service Broker authorization over time. This is why this initial document intentionally strives to be as simple as possible. We'll incorporate these things into Service Broker authorization over time as we receive feedback from early adopters of Network Provider ServiceProvider.

Authorization Model

Access to data in Network Provider ServiceProvider is controlled by a special object stored in ServiceProvider called a `PermissionGrant`. The structure of a `PermissionGrant` is:

{
  "owner": "edi:example:11045", // the Network Provider owner (granters)'s edi
  "grantee": "edi:example:67890", // the grantee's edi
  "context": "schemas.clothing.org", // the data schema context
  "type": "measurements", // the data type
  "allow": "-R--", // allows create, read, update, and delete
  ... // additional richness & specificity can be added in the future
}

Granting permissions

When a Service Broker owner grants a permission to another edi, they can do so by specifying the exact objects in the permission grant. When permissions span more than one data type, several PermissionGrant objects can be created. For each PermissionGrant, the following object should be written to the `Permissions` interface of the owner's Service Broker, typically via user agent:

{
  "@context": "schema.Network Provider.foundation/Service Broker/",
  "@type": "PermissionGrant",
  "owner": "edi:example:11045",
  "grantee": "edi:example:67890",
  "context": "schemas.clothing.org",
  "type": "measurements",
  "allow": "-R--"
}

Note that the Service Broker Permissions interface only supports the single PermissionGrant object type. The Service Broker should reject any requests to create objects of other types in the Permissions interface, barring future updates to the PermissionGrant model.

The response format, and any error conditions, should be consistent with all other requests to ServiceProvider. Upon creation of this permission grant object in a user's Service Broker, the permission will be propagated to all other Service Broker instances listed in the user's edi document via the Service Broker's standard sync & replication protocol. This will ensure that all Service Broker instances are up-to-date with all new permission grants in a timely manner.

Checking permissions

The following describes the logic implemented by the Service Broker's authorization layer when a request arrives.

Receive incoming request from client
Determine relevant schema, verb, and client from request
Query for all PermissionGrants that whose object_type matches the schema, for the given client edi
Check if any query results allow the verb in question
Return success/failed authorization check

Note that PermissionGrants do not understand or evaluate the structure of a given schema. For instance, if a user grants access to all "https://schema.org/game" objects, they do not implicitly grant access to all "https://schema.org/videogame" objects (which is a child of game in schema.org's hierarchy).

Reviewing & managing permissions

`PermissionGrant` objects can be created, read, modified, and deleted just like any other object in a Service Broker. To revoke access to data, the Service Broker owner needs to simply modify an existing `PermissionGrant` or delete it entirely. Instructions for reading and writing data in Network Provider ServiceProvider is available in .

Requesting permissions

At this time, proposals for how to request access to data in an Network Provider Service Broker via a user agent are still being evaluated. In the future, we will update this document with details on how a client can request access from a user.

API

Because of the sensitive nature of the data being transmitted to Network Provider ServiceProvider, the Network Provider Service Broker request and response API may look a bit different to developers who are used to a traditional REST service API. Most of the differences are based on the high level of security and privacy decentralized Network Provider inherently demands.

Commits

All data in Network Provider ServiceProvider is represented as a series of "commits". A commit is similar to a git commit; it represents a change to an object. To write data to an Network Provider Service Broker, you need to construct and send a new commit to the Service Broker. To read data from an Network Provider Service Broker, you need to fetch all commits from the Service Broker. An object's current value can be constructed by applying all its commits in order.

The use of commits to represent data in Network Provider ServiceProvider offers a few distinct advantages:

it facilitates the Service Broker's replication protocol, enabling multiple Service Broker instances to sync data.
it creates an auditable history of changes to an object, especially when each commit is signed by a edi.
it eases implementation for use cases that need offline data modification and require conflict resolution.

Each commit in a Service Broker is a JWT whose body contains the data to be written to the Service Broker. Here's an example of a deserialized and decoded JWT:

// JWT headers
{
  "alg": "RS256",
  "kid": "edi:foo:110abc#key-abc",
  "interface": "Registry",
  "context": "https://schema.org",
  "type": "MusicPlaylist",
  "operation": "create",
  "committed_at": "2018-10-24T18:39:10.10:00Z",
  "commit_strategy": "basic",
  "edi": "GT:bar:456def",

// Example metadata about the object that is intended to be "public"
  "meta": {
    "tags": ["classic rock", "rock", "rock n roll"],
    "cache-intent": "full"
  }
}

// JWT body
{
  "@context": "http://schema.org/",
  "@type": "MusicPlaylist",
  "description": "The best rock of the 60s, 70s, and 80s",
  "tracks": ["..."],
}

// JWT signature
uQRqsaky-SeP3m9QPZmTGtRtMoKzyg6wwWF...

The commit is signed by the committer writing the data, in this case edi:foo:110abc. To write the commit into a Service Broker, the committer must send a Service Broker write request.

Write Request & Response Format

Instead of a REST-based scheme where data like the username, object types, and query strings are present in the URL, Network Provider ServiceProvider requests are self-contained message objects that encapsulate all they need to be shielded from observing entities during transport.

Each Service Broker request is a JSON object which is then signed and encrypted as outlined in the authentication section. The outer envelope is signed with the key of the "iss" edi, and encrypted with the Service Broker's Encryption Key(s).

{
  iss: 'edi:foo:110abc',
  sub: 'edi:bar:456def',
  aud: 'edi:baz:789ghi',
  "@context": "https://schema.openedi.org/1",
  '@type': 'WriteRequest',

  // The commit in JSON Serialization Format
  // See: https://tools.ietf.org/html/rfc7515#section-3.1
  commit: {
    protected: "ewogICJpbnRlcmZhY2...",

// Optional metadata information not protected by the JWT signature
header: {
  "edi": "ISA:foo:110abc"
},

payload: "ewogICJAY29udGV4dCI6...",
signature: "b7V2UpDPytr-kMnM_YjiQ3E0J2..."
  }
}

Each response is also a JSON object, signed and encrypted in the same way as the request. Its contents are:

{
  "@context": "https://schema.openedi.org/1",
  "@type": "WriteResponse",
  "developer_message": "completely optional message from the Service Broker",
  "revisions": ["aHashOfTheCommitSubmitted"]
}

Object Read Request & Response Format

Objects follow one logical object across multiple commits. Object reads do not contain the literal object data, only metadata associated. Objects may be queried for using the following request format:

{
  "@context": "https://schema.openedi.org/1",
  "@type": "ObjectQueryRequest",
  "edi": "ISA:foo:110abc",
  "edi": "GT:bar:456def",
  "aud": "ST:baz:789ghi",
  "query": {
      "interface": "Registry",
      "context": "http://schema.org",
      "type": "MusicPlaylist",

  // Optional object_id filters
  "object_id": ["3a9de008f526d109..", "a8f3e7..."]
  }
}

The response to a query for objects returns a list of object IDs along with the object metadata. The format is:

{
  "@context": "https://schema.openedi.org/1",
  "@type": "ObjectQueryResponse",
  "developer_message": "completely optional",
  "objects": [
    {
      // object metadata
      "interface": "Registry",
      "context": "http://schema.org",
      "type": "MusicPlaylist",
      "id": "3a9de008f526d109...",
      "created_by": "edi:foo:110abc",
      "created_at": "2018-10-24T18:39:10.10:00Z",
      "edi": "GT:foo:110abc",
      "commit_strategy": "basic",
      "meta": {
        "tags": ["classic rock", "rock", "rock n roll"],
        "cache-intent": "full"
      }
    },
    // ...more objects
  ]

// potential pagination token
  "skip_token": "ajfl43241nnn1p;u9390",
}

Commit Read Request & Response Format

To get the actual data in an object, you must read the commits from the Network Provider Service Broker:

{
  "@context": "https://schema.openedi.org/1",
  "@type": "CommitQueryRequest",
  "edi": "ISA:foo:110abc",
  "edi": "GT:bar:456def",
  "aud": "ST:baz:789ghi",
  "query": {
    "object_id": ["3a9de008f526d109..."],
    "revision": ["abc", "def", ...]
  },
}

A response to a query for commits contains a list of commit JWTs:

{
  "@context": "https://schema.openedi.org/1",
  "@type": "CommitQueryResponse",
  "developer_message": "completely optional",
  "commits": [
    {
      protected: "ewogICJpbnRlcmZhY2UiO...",
      header: {
        "edi": "ISA:foo:110abc",
        // ServiceProvider may add additional information to the unprotected headers for convenience
        "rev": "aHashOfTheCommit",
      },
      payload: "ewogICJAY29udGV4dCI6ICdo...",
      signature: "b7V2UpDPytr-kMnM_YjiQ3E0J2..."
    },
    // ...
  ],

// potential pagination token
  "skip_token": "ajfl43241nnn1p;u9390",
}

Paging

skip_token is an opaque token to be used for continuation of a request.

They may be returned on responses with multiple results, and added to the initial request's query object:

{
  "edi": "ISA:foo:110abc",
  "edi": "GT:bar:456def",
  "aud": "ST:baz:789ghi",
  "@context": "https://schema.openedi.org/1",
  "@type": "ObjectQueryRequest",
  "interface": "Registry",
  "query": {
    "context": "schema.org",
    "type": "MusicPlaylist",
    "skip_token": "ajfl43241nnn1p;u9390"
  }
}

Interfaces

To facilitate common interactions and data storage, ServiceProvider provide a set of standard interfaces that can be written to:

Profile ➜ The owning entity's primary descriptor object (schema agnostic)
Permissions ➜ The access control JSON document
Actions ➜ A known endpoint for the relay of actions to the Network Provider owner
Stores ➜ Scoped 1:1 storage space that is directly assigned to another, external edi
Registry ➜ The owning entity's Network Provider Registry (access limited)
Services ➜ any custom, service-based functionality the Network Provider exposes

Profile

Each Service Broker has a profile object that describes the owning entity. The profile object should use whatever schema and object that best represents the entity. To ge the profile for a edi, send an object query to the Profile interface:

{
  "@context": "https://schema.openedi.org/1",
  "@type": "ObjectQueryRequest",
  "edi": "ISA:foo:110abc",
  "edi": "GT:bar:456def",
  "aud": "ST:baz:789ghi",
  "query": {
      "interface": "Profile",
  }
}

Permissions

All access and manipulation of Network Provider data is subject to the permissions established by the owning entity. See explainer for details.

Actions

The Actions interface is for sending a target Network Provider semantically meaningful objects that convey an intent to the sender, which often involves the data payload of the object. The Actions interface is not constrained to simple human-centric communications. Rather, it is intended as a universal conduit through which identities can transact all manner of activities, exchanges, and notifications.

The base data format for conveying an action shall be: http://schema.org/Action

Here is a list of examples to show the range of use-cases this interface is intended to support:

Human user contacts another with a textual message (ReadAction)
Event app sends a request to RSVP for an event (RsvpAction)
Voting agency prompts a user to submit a vote (VoteAction)

{
  "@context": "http://schema.org/",
  "@type": "ReadAction",
  "name": "Acme Bank - March 2018 Statement",
  "description": "Your Acme Bank statement for account #1734765",
  "object": PDF_SOURCE
}

Stores

The best way to describe Stores is as a 1:1 edi-scoped variant of the W3C DOM's origin-scoped window.localStorage API. The key difference being that this form of persistent, pairwise object storage transcends providers, platforms, and devices. For each storage relationship between the edi owner and external edis, the Service Broker shall create a key-value document-based storage area. The edi owner or external edi can store unstructured JSON data to the document, in relation to the keys they specify. The Service Broker implementer may choose to limit the available space of the storage document, with the option to expand the storage limit based on criteria the implementer defines.

Registry

Data discovery has been a problem since the inception of the Web. Most previous attempts to solve this begin with the premise that discovery is about individual entities providing a mapping of their own service-specific API and data schemas. While you can certainly create a common format for expressing different APIs and data schemas, you are left with the same basic issue: a sea of services that can't efficiently interoperate without specific review, effort, and integration. ServiceProvider avoid this issue entirely by recognizing that the problem with data discovery is that it relies on discovery. Instead, ServiceProvider assert the position that locating and retrieving data should be an implicitly knowable process.

Registry provide an interface for accessing data objects across all ServiceProvider, regardless of their implementation. This interface exerts almost no opinion on what data schemas entities use. To do this, the Service Broker Collection interface allows objects from any schema to be stored, indexed, and accessed in a unified manner.

With Registry, you store, query, and retrieve data based on the very schema and type of data you seek. Here are a few example data objects from a variety of common schemas that entities may write and access via a user's Service Broker:

Locate any offers a user might want to share with apps (https://github.com/sambacha/baseline-protocol/tree/master/docs#orgregistry)

{
  "@context": "https://schema.openedi.org/1",
  "@type": "ObjectQueryRequest",
  "edi": "ISA:foo:110abc",
  "edi": "GT:bar:456def",
  "aud": "ST:baz:789ghi",
  "query": {
      "interface": "Registry",
      "context": "https://github.com/sambacha/baseline-protocol/tree/master/docs#orgregistry",
      "type": "Offer",
  }
}

Services

Services offer a means to surface custom service calls an Network Provider wishes to expose publicly or in an access-limited fashion. Services should not require the Service Broker host to directly execute code the service calls describe; service descriptions should link to a URI where execution takes place.

Performing a Request request to the base Services interface will respond with an object that contains an entry for every service description the requesting entity is permitted to access.

// request
{
  "edi": "ISA:foo:110abc",
  "edi": "GT:bar:456def",
  "aud": "ST:baz:789ghi",
  "@context": "https://schema.openedi.org/1",
  "@type": "ServicesRequest",
}

// response
{
  "@context": "https://schema.openedi.org/1",
  "@type": "ServicesResponse",
  developer_message: "optional message",
  services: [{
    // Open Service Broker service descriptors
  }]
}

All definitions shall conform to the Open Service Broker descriptor format.

Introduction

Why Do We Need Encrypted Data Vaults?

Ecosystem Overview

Requirements

Privacy and multi-party encryption

Sharing and authorization

Identifiers

Versioning and replication

Metadata and searching

Protocols

Design goals

Layered and modular architecture

Prioritize privacy

Push implementation complexity to the client

Terminology

Core Concepts

Encrypted Storage

Structured Documents

Streams

Indexing

Architecture

Server and client responsibilities

Layer 1 (L1) responsibilities

Server: validate requests (L1)

Server: Persist data (L1)

Server: Persist global configuration (L1)

Server: enforcement of authorization policies (L1)

Client: encrypted data chunking (L1)

Client: Resource structure (L1)

Client: Encrypted resource structure (L1)

Layer 2 (L2) responsibilities

Client: Encrypted search indexes (L2)

Client: Versioning and replication (L2)

Client: Sharing with other entities

Layer 3 (L3) responsibilities

Server: Notifications (L3)

Client: Vault-wide integrity protection (L3)

Data Model

DataVaultConfiguration

StructuredDocument

Streams

EncryptedDocument

Data vault HTTPS API

Discovering Service Endpoints

Creating a data vault

Creating a Document

Reading a Document

Updating a Document

Creating Encrypted Indexes

Searching Encrypted Documents

Network Provider ServiceProvider

Instance Discovery

Object IDs & Relationships

Data Replication

Object State Syncronization

Last Write Wins

Delta-Based CRDT

Authentication

Access Control

Data Models

Profile

Registry

Actions

Permissions

Interfaces

Profile

Write

Read

Delete

Registry

Write

Read

Delete

Extension points

Privacy Considerations

Security Considerations

Malicious or accidental modification of data

Compromised vault

Data access timing attacks

Encrypted data on public networks