Static Data

Basically, this is the same as queries, but for data that is bigger and that is relevant for a longer period of time. This is in contrast with queries, that are smaller but are distributed on the network faster(2).

The kind of data that should be used for this is data that is not often changed, that is, not dynamic. For example, web pages that do not change often is something that could be considered as "static" data. If the data has to be generated "on-the-fly" (dynamic), then you should send a query that will initiate a two-way data transfer instead[1].

The Network as a Database

Let's say you have some data that you want to be distributed and kept in the database by as many computers as possible, then you need to distribute static data. Then, all computers that will keep that data will become a "duplicate" of you, and you will also become the "duplicate" of the rest of the network. The advantage of this is that each computer will have locally a copy of your data and won't need to ask you "Do you have that data, and if so, can you give me a copy?".

So, the network somehow becomes a database, where each node has a copy of the same data as all the other nodes(3).

How Static Data is distributed

Because static data is substantially bigger than queries, we cannot afford to send all the static data from a node to all its adjacent nodes. So, uploading static data from a node to another node has to be a two-step process.
The first step is to ask "do you already have that data". Actually, it is "Here's a list of what I have. What do you want?". Then, the other node will reply with a list of the data it wants.
Then, the second step is to actually upload the requested data from one node to the other one.

You have to note though that a node can never ask "I want this data" without being offered the chance to download it (from the list of "Here's what I have"). Thus, distributing static data is a passive process, as a node doesn't need to "trigger" some action to receive static data(4).

Identifying Static Data

Already, we have a huge problem: how can we identify uniquely some static data "packet"(5)? Remember that the network is a distributed one, so no node can control all the creation of the keys that will uniquely identify the static data packet. From one point in the network, there is no safe way to verify that the "key" you want to create is not already used, or to ensure that no other node will use the same key for a different packet.

There would be, in practice, no problem if the key for some packet is a checksum. But then, if we change the contents of the data, its corresponding key will change. So, this kind of key is useful only when the data never changes. Otherwise, there's no way to find "the latest version of that packet".

The solution is to have two keys for each packet.
The "primary" key identifies the expected contents of the packet; this is some kind of file name, so that users can "know" what contents the file should have. The primary key is not unique. Several packets can have the exact same primary key, but it's up to the user to figure out which packet is the "right" one.
The "secondary" key identifies uniquely the actual contents of the packet; this is basically a checksum. So, this can uniquely identify the different "versions" of the packet, and if it is a digital signature, what source made the packet. With the secondary key, we can check if the data has been tampered with, and thus you can destroy the detected "broken" data(6).

Static Data as Preemptive Caching

The major advantage of using static data is to be able to distribute some data before some clients on other nodes may actually need them. Not only that, the nodes will be able to tell what the network contains (as static data) without having to try to ask all the other nodes "What do you have?". While static data is distributed at a lower pace than if a node directly downloaded a specifically requested packet, it greatly reduces the need, if not removes it completely, for sending several queries to "discover" what's on the network. What's on the network (for static data) is what is already on your computer, after some time(7).

Notes

(1) While "static" is the right term, "data" might be too generic. Maybe the term "static packet" or "static record" would be better...

(2) Because of their size, a node will send some static data to a node it is connected to only if the receiving node doesn't already have the data. Thus, nodes must allow other nodes to ask them what static data it has.

(3) "Somehow" is the key word here. ANet is not an implementation of a database, since while it can be used to define storage, it doesn't define any transaction. It is up to the programmers of the database clients to define and enforce the transaction rules.

(4) Actually, a node cannot trigger some action to receive static data without any client interaction, through queries.

(5) With long-term memory structures, the smallest unit is a file. But in networking (especially IP[2]), it is a packet. Since ANet is more a networking protocol than a long-term memory structure, I suppose that is it better to use the term "static data packet" than "file", even though it sounds less "intuitive".

(6) So, I just avoided completley all the "unqueness" problems that are starting to plague Freenet[3]. You see, it's that simple...

(7) In Gnutella[4], you have to constantly ask, through searches, what the other nodes have. So, some users are "spamming" the network with searches like "a", "b", and so on. With ANet, it could be possible to use static data for file lists, so that you know at all times what's on the network. Also, doing "stupid" searches will affect only your computer, as you search in the file lists that are already in your computer.

References

About the references...

[1] Benad, "Anonymous Two-Way Data Transfers". Local link.
[2] University of Southern California , "Internet Protocol". External link. Cached.
[3] Freenet Web Site. External link.
[4] Semi-Official Gnutella Web Site. External link.

Last update for this document: September 1, 2001, at 18:52:12 PST