Anonymous Two-Way Data Transfers

It is a communication channel between two specific nodes in a distributed network(1). You can send the data from both ends of the channel, in both directions. It is anonymous because you don't need to know the specific address of the node at the other end, thus allowing both ends to change their addresses without affecting the communication channel.

Proxy Paths

To allow this anonymity, both ends of the communication has to establish what is called a "proxy chain" through the different nodes in the network. Here's how it works.

Let's say that node A wants to make a proxy chain though nodes B, C and D. Here, node D will become the "proxy" of node A. The address of D is the only public information A has to give to other nodes to be able to establish connections. So, we have:

  A       B       C       D
Origin                  Proxy

To establish its own proxy chain, node A will tell B: "I want the proxy chain A-B-C-D". Then, B will tell C: "I want the proxy chain B-C-D". Here, C shouldn't know that the chain started form A, since the address of A could change. Finally, C tells D that it wants a proxy chain "C-D".

Whenever a node in the chain changes its address, it will have to inform this change only to its previous and following node in the chain. If a node becomes disconnected, then it's up to the previous node (towards the "origin" node) in the chain to connect itself to its following node in the chain, since the nodes "know" only the existence of the previous node in the chain, not further back.

Reconnection systems assume that all nodes share a common "secret" that only the nodes in the chain will know, allowing them to "prove" that they are not external nodes trying to get in the middle of the chain(2).

If no data is sent through the channel, pings have to be sent in both directions of the chain, back and fourth from the Origin to the Proxy, through the whole chain. If a node does not receive a response from the next node within the time allowed(3), it has to tell its previous node that the connection was broken and that it will try to connect itself with the following node. If either the Origin or the Proxy disconnects, then the whole chain is removed.

Finally, once two nodes have set up their own proxy chains, they can exchange information about what is their proxies, then they both ask their proxies to connect to each other.

Identifying Nodes in a Distributed Network

Already, we're stuck with a major problem. As you may have already read[1], ANet is protocol-agnostic with the connections between the nodes. So, how do we identify the proxy, to allow other to connect to your node through your proxy?

The trick here is simple. You can identify a node only by its address only if it is unique within its protocol and within its own "subnet". A "subnet" can be either the internet, or one specific intranet.

Thus, when you want to connect to a proxy in some protocol and subnet, your search of your own proxy has to be based on that information: you want to find a proxy that can use the protocol and subnet specified by the other proxy.

This search is, again, pretty simple. You send the query: "I want to find a proxy capable of connecting to...". Whenever a node receives that query, it has to remember where it came from. When a node receives a reply, it has to append itself(4) to the list of node the reply contains (which will become the full proxy chain list), and send it to the direction from where it received the original query. Once a node has received a reply, it has to ignore all further replies for the same "proxy chain" query. Those queries will be identified by a pseudo-unique ID (large random number), which will time out after a minute or two. This method will try to make the "fastest" proxy chain, that is the one with the smallest ping time(5).

Summary

To summarize, here's what two nodes need to do to establish an anonymous two-way data transfer channel(6):

One node start to make its own proxy chain, for the protocol and subnet it wants.
- The node sends "I want to make a proxy chain" as a query to all the other nodes. It is sent with a pseudo-unique key.
- If a node doesn't want to become the proxy, it has to redistribute the query, but it has to remember from where it received the original query.
- If a node wants to become a proxy, it has to reply, with the information about the protocol and subnet it supports.
- When a node receives a reply, it has to add itself to the list of nodes that will form the proxy chain, in the contents of the reply, and then send the reply back the where its corresponding query came from.
- If any node receives further replies for a "proxy chain" query from which it already passed a reply, it has to ignore those replies.
That node then sends the information about its proxy to the other node.
The other node also makes its own proxy chain, but one that can connect to the other proxy's protocol and subnet.

The node sends "I want to make a proxy chain that can support this..." as a query to all the other nodes. It is sent with a pseudo-unique key.
The rest is similar as before.

At least one of the two nodes tell the proxy, through the proxy chain, to connect to the other proxy.

Notes

(1) I know, this might sound totally unrelated to a distributed network, but it is related. The thing is, there is no point in making an abstraction of anything if it removes functionality. Here, if "anonymous two-way data transfers" were not defined in ANet, doing anything similar to TCP between two nodes in the network would require some kind of "hack" of the ANet protocol, or destroying any kind of "anonymity" or "protocol-agnostic" advantages ANet already gave.

(2) Well, this can be easily done if we assume that there is no "packet sniffing" on the network you're using. But if you know that "packet sniffing" can be easily done on your network, you're using a public network or if you're simply paranoid, public key encryption should solve the problem.

(3) Basically, the time to wait is proportional to the distance remaining in the chain, assuming no node is too hasty in its wait and gives up too soon. Since you only know the distance remaining on your side of the network, then pings should actually made with your own proxy.

(4) By "itself" I mean the connection ID that the node in that direction assigned to you. A "connection ID" is used by a node to define, for itself, the other nodes it is connected to, and is unrelated to the underlying connection protocol. A connection ID has meaning only to the node that defined it for itself, but it is enough to make a "path" in a network ("In the node you call A, go to the node it calls B, then go to the node that the last node calls C..."). This is actually the "shared secret" between both ends of a node-to-node connection. More on this in the high-level design.

(5) As with "normal" queries, those queries cannot travel more than some number of nodes in the network (the "time-to-live"), so the "distance" is also important.

(6) Or "ATWDTC"? I think I'll stick with "TWDT", otherwise I'll confuse everyone...

References

About the references...

[1] Benad, "Distributed Networking". Local link.

Last update for this document: September 1, 2001, at 19:0:36 PST