Main Page

From
Jump to navigation Jump to search

Robust Anonymous Data Records

Robust Anonymous Data Records (RADAR for short) is an application whose purpose is to make redundant backups in a Complex System. In this case, the complex system is a heterogeneous set of computers that agree on a simple rule: they can store their data safely on the network if they agree to lend a portion of their storage to the other members of the network.

Robustness

The application was designed to be able to work on an unstable network. It is never assumed that any other member of the network will be reachable at any point, or that a node knows the complete state of the network. This is why data replication is at the heart of the RADAR concept: the probability that none of the backups that were made are reachable decreases as the number of replications increases.

The members of the network (also known as nodes) use a simple UDP, message-based protocol to exchange information. They regularly send messages broadcasting the space that they have available so the other nodes can apply heuristics to chose where to send their backups. These messages are flooded on the network: whenever a node receives one, it decreases its time to live and sends it to a fixed number of other nodes.

RADAR is also designed to work on networks that are unreliable or have low capacity. For this reason, files are transfered using two different methods, depending on their size:

  • Bigger files (over 10ko) are transferred using a TLS-based protocol. This protocol ensures that the data cannot be tampered with and provides a quick way to authenticate nodes (via signed certificates), as well as all the benefits of regular TCP.
  • Smaller files are transmitted using an UDP-based protocol. The data is signed and the sender's certificate is also provided. The lack of retransmission protocol is offset by the number of replicates that are sent: while the probability of a datagram being lost will never by rendered null, it can be made very small.

Anonymity

The program can be used to process sensible data, such as medical records. For this reason, the data must be made anonymous before leaving its computer of origin. Every piece of data that leaves its node of origin is therefore encrypted using AES-256 and RSA-2048 to ensure that none other than the holder of the private half of the RSA key can decrypt the file.

In the specific case of medical records, the data can also be split:

  • Name and personal data,
  • Textual data (medical practitioner's notes),
  • Numerical data,
  • Other data (medical imagery, etc).

Backup Protocol

The backup protocol follows the following algorithm:

  1. A node loses its data.
  2. It goes back up online.
  3. It sends a backup request to the administrator.
  4. The administrator can then accept or reject the request.
  5. If the request is accepted, the network is flooded with a message that triggers the sending of data to the node.

Implementation

Python

The application is written in Python 3.6. This language offers multiple advantages:

  • Multi-platform. The Python interpreter is available on all the major platforms (Linux, Windows, MacOS) and can be compiled on any other as long as a C compiler is available.
  • Rich Libraries. Python has a very active community of developers and countless libraries are available.

Administration Program

At this point in time, the administration program has not yet been developed: this section lists planned functionalities. However, certificates can be managed via programs such as OpenSSL.

A central administrator controls the whole network. Its main purpose is to deliver or revoke the certificates, as well as authorizing the restoration of backups.

Certificates

Currently, the nodes use signed RSA certificates, though a shift to elliptic curve cryptography in the near future is likely: ECC keys are much smaller for the same level of security, which would significantly reduce the overhead during message exchange.

They are delivered by a self-signed certificate authority, controlled by the administrator. The certificates ensure that only approved nodes can communicate over the network and provide a simple authentication mechanism.

Node Management

Adding Nodes

The administrator can approve certificate signing requests to allow new nodes to join the RADAR network. Nodes that have approved, non-revoked certificates are considered trusted peers by the other members of the network.

Removing Nodes

To remove a node from the network, the administrator only has to add its certificate to the certificate revocation list. Other nodes only have to check the certificates against this list to know if the certificate is still valid.

Backup Management

The administrator can accept or reject any backup request that it receives.