Anonymous Two-Way Data Transfers

Home/Documents/Development/Introduction to ANet/Anonymous Two-Way Data Transfers
 Project Closed
 Site Index
 User Docs
 Introduction to ANet
 What is ANet?
 Peer-to-Peer Networking
 Distributed Networking
 Static Data
 Anonymous Two-Way Data Transfers
 Gateways and Clusters
 The Future
 High-Level Design
 Low-Level Design
 Protocol Specifications
 Task List
 Development Roadmap
 Other Docs
 Mailing Lists
 Contacting Us


by Benad

Static Data is the last of the three kinds of communications built in the ANet protocol (queries, static data and TWDT).

What is an "Anonymous Two-Way Data Transfer"

It is a communication channel between two specific nodes in a distributed network(1). You can send the data from both ends of the channel, in both directions. It is anonymous because you don't need to know the specific address of the node at the other end, thus allowing both ends to change their addresses without affecting the communication channel.

Proxy Paths

To allow this anonymity, both ends of the communication has to establish what is called a "proxy chain" through the different nodes in the network. Here's how it works.

Let's say that node A wants to make a proxy chain though nodes B, C and D. Here, node D will become the "proxy" of node A. The address of D is the only public information A has to give to other nodes to be able to establish connections. So, we have:

  A       B       C       D
Origin                  Proxy

To establish its own proxy chain, node A will tell B: "I want the proxy chain A-B-C-D". Then, B will tell C: "I want the proxy chain B-C-D". Here, C shouldn't know that the chain started form A, since the address of A could change. Finally, C tells D that it wants a proxy chain "C-D".

Whenever a node in the chain changes its address, it will have to inform this change only to its previous and following node in the chain. If a node becomes disconnected, then it's up to the previous node (towards the "origin" node) in the chain to connect itself to its following node in the chain, since the nodes "know" only the existence of the previous node in the chain, not further back.

Reconnection systems assume that all nodes share a common "secret" that only the nodes in the chain will know, allowing them to "prove" that they are not external nodes trying to get in the middle of the chain(2).

If no data is sent through the channel, pings have to be sent in both directions of the chain, back and fourth from the Origin to the Proxy, through the whole chain. If a node does not receive a response from the next node within the time allowed(3), it has to tell its previous node that the connection was broken and that it will try to connect itself with the following node. If either the Origin or the Proxy disconnects, then the whole chain is removed.

Finally, once two nodes have set up their own proxy chains, they can exchange information about what is their proxies, then they both ask their proxies to connect to each other.

Identifying Nodes in a Distributed Network

Already, we're stuck with a major problem. As you may have already read[1], ANet is protocol-agnostic with the connections between the nodes. So, how do we identify the proxy, to allow other to connect to your node through your proxy?

The trick here is simple. You can identify a node only by its address only if it is unique within its protocol and within its own "subnet". A "subnet" can be either the internet, or one specific intranet.

Thus, when you want to connect to a proxy in some protocol and subnet, your search of your own proxy has to be based on that information: you want to find a proxy that can use the protocol and subnet specified by the other proxy.

This search is, again, pretty simple. You send the query: "I want to find a proxy capable of connecting to...". Whenever a node receives that query, it has to remember where it came from. When a node receives a reply, it has to append itself(4) to the list of node the reply contains (which will become the full proxy chain list), and send it to the direction from where it received the original query. Once a node has received a reply, it has to ignore all further replies for the same "proxy chain" query. Those queries will be identified by a pseudo-unique ID (large random number), which will time out after a minute or two. This method will try to make the "fastest" proxy chain, that is the one with the smallest ping time(5).


To summarize, here's what two nodes need to do to establish an anonymous two-way data transfer channel(6):


(1) I know, this might sound totally unrelated to a distributed network, but it is related. The thing is, there is no point in making an abstraction of anything if it removes functionality. Here, if "anonymous two-way data transfers" were not defined in ANet, doing anything similar to TCP between two nodes in the network would require some kind of "hack" of the ANet protocol, or destroying any kind of "anonymity" or "protocol-agnostic" advantages ANet already gave.

(2) Well, this can be easily done if we assume that there is no "packet sniffing" on the network you're using. But if you know that "packet sniffing" can be easily done on your network, you're using a public network or if you're simply paranoid, public key encryption should solve the problem.

(3) Basically, the time to wait is proportional to the distance remaining in the chain, assuming no node is too hasty in its wait and gives up too soon. Since you only know the distance remaining on your side of the network, then pings should actually made with your own proxy.

(4) By "itself" I mean the connection ID that the node in that direction assigned to you. A "connection ID" is used by a node to define, for itself, the other nodes it is connected to, and is unrelated to the underlying connection protocol. A connection ID has meaning only to the node that defined it for itself, but it is enough to make a "path" in a network ("In the node you call A, go to the node it calls B, then go to the node that the last node calls C..."). This is actually the "shared secret" between both ends of a node-to-node connection. More on this in the high-level design.

(5) As with "normal" queries, those queries cannot travel more than some number of nodes in the network (the "time-to-live"), so the "distance" is also important.

(6) Or "ATWDTC"? I think I'll stick with "TWDT", otherwise I'll confuse everyone...


About the references...

[1] Benad, "Distributed Networking". Local link.

Last update for this document: September 1, 2001, at 19:0:36 PST