17/07/2011

讓Python 實現 P2P 功能[下]


A P2P File Sharing Application (Python version)

This page walks through an example of using the P2P framework described on this website to implement a simple P2P protocol and GUI in Python. The program developed here is a very simple file sharing application.

Protocol Design

Before beginning to write any code, we must first think about the design of the P2P protocol that we wish to implement. This is, of course, probably the hardest part. For a true, "pure" P2P network, every node (computer) will be running the exact same set of algorithms and code and there is no centralized server or otherwise special node to keep things organized. Thus, we need to decide on the set of messages that the various nodes will exchange between each other and then focus on the behavior of a single node when it receives each type of message.
In the protcol we develop here, there will be 9 types of messages exchanged between nodes. Each message type is identified by a 4-character string and may be accompanied with additional message data, summarized as follows:
  • NAME: Requests a peer to reply with its "official" peer id.
  • LIST: Requests a peer to reply with the list of peers that it knows about.
  • JOIN pid host port: Requests a peer to add the supplied host/port combination, associated with the node identified by pid, to its list of known peers.
  • QUER return-pid key ttl: Queries a peer to see if the peer has any record of a file name matching key. If so, send a RESP message back to the node identified by return-pid; if not, propagate the query to all known peers with a decreased ttl (time-to-live) value, unless ttl is already 0.
  • RESP file-name pid: Notifies a peer that the node specified by pid has a file with the given name.
  • FGET file-name: Request a peer to reply with the contents of the specified file.
  • QUIT pid: Indicate to a peer that the node identified by pid wishes to be unregistered from the P2P system.
  • REPL ...: Used to indicate an acknowledgement of the other message types above or to send back results of a successful request.
  • ERRO msg: Used to indicate an erroneous or unsuccessful request.
An essential component of any P2P application is to decide how peers actually join to form a network -- that is, how does each node become aware of other nodes in the network. A single node most often does not directly know about all the other nodes in the network but if there are enough nodes with links between them, they should all be able to interact somehow. In our example here, each peer keeps track of a fixed number of other nodes in its peer list. A new node needs to know about at least one peer that is already in the network in order to join. The new node then builds its own list of known peers by asking the initial contact for its list and repeatedly asking each peer in that list for their lists of known nodes, until the new node's list of peers is full (or there are no more peers to contact). The depth of this search is limited by a number-of-hops parameter and the algorithm ensures, of course, that the node itself is not added to its own peer list, nor are duplicate peer names.
In the file sharing application we develop here, each node also keeps track of a mapping of file names to peer identifiers where the files are actually stored. Peer identifiers are strings of the form "host:port", using IP addresses directly for routing purposes. In the next section, we discuss in more detail how a single node should respond to incoming messages of the various types summarized above, and then we are ready to look at the actual Python code, which is quite straightforward since the P2P framework takes care of most of the infrastructure associated with the application other than the initial formation of the network and the message handling. We will also look at writing a graphical user interface for the file sharing protocol.

Handling Message

In this section, we discuss in more detail how the first 7 messages listed above are handled by a peer. The last two message types, REPL and ERRO, are used for replies or responses to incoming messages of the other 7 types. Unless otherwise indicated, responses to the various types of messages are sent back using the same socket connection. In particular, it is only a QUER message that may not result in an immediate reply, so responses to QUER messages (i.e. RESP messages) will be sent back to a node at a later point over a new connection. Thus, a node will often not know immediately the result of a query because, of course, the P2P network will have to be searched to determine which node actually has the file that is being queried for.

NAME

When a peer receives a NAME message (and no additional data), it simply responds by sending back (in a REPL message) its peer id, of the form "host:port".

LIST

When a peer receives a LIST message (and no additional data), it responds by first sending a REPL message with the number of id's in its list of known peers. Then it sends that many additional REPL messages, including in the data portion of each message a string with the peer id, host address, and port number (all three separated by whitespace) associated with each known peer in its list.

JOIN

A JOIN message is accompanied with additional data specifying a sending peer id, IP address, and port number. The JOIN message is used to request a peer to insert the sending node into its list of known peers. If the peer's list is already full, or the specified id is already in the list (or incorrect arguments are specified with the JOIN message), than an ERRO message is sent back in response. Otherwise, the remote node's information is added to the list and a REPL acknowledgement is sent.

QUER

The QUER message is the most complex to handle in this application. Upon receipt of this type of message, if the correct number of additional data is provided (return-peer-id, key, and ttl value), then the peer responds simply with an acknowledgement REPL message; otherwise an ERRO message is returned. Before exiting the handler routine for this message type (and closing the socket connection), however, a separate thread is started that actually processes the query information.
The 'processQuery' thread operates as follows: If the key is found to be a substring of any of the file names in the peer's list of files, then the peer sends a RESP message to the peer that originally initiated the query (given by the return-peer-id argument of the QUER message; in our protocol here, the IP address and port number of the peer can be determined by parsing the return-peer-id string, which will be of the format "host:port"). If the key is not found to be a substring of any of the file names in the peer's list, and the ttl value is greater than 0 (zero), then a QUER message is sent to every node in the list of known peers, with the samereturn-peer-id and key arguments, and the ttl value decremented by 1.

RESP

A peer will receive a RESP message in response to a QUER message that it had previously sent out to others in the network. (The design of our protocol here does not preclude the possibility of peers receiving RESP messages for files they had not queried for -- this may be viewed as either a feature or a bug.) The RESP message includes a file name and the peer id of the node owning a copy of the file. Upon receiving a RESP message, a peer does not need to send any reply; if the file name is not already present in its file list, an entry will be added to the list indicating that the file may be found at the specified node.

FGET

The data of an FGET message consists of a file name. If the peer receiving the FGET message does not own a copy of the specified file or if it is not readable for some reason, an ERRO reply is sent. Otherwise, the entire file is sent as the payload of a REPL message.

QUIT

A QUIT message, including a peer id in the data of the message, tells a peer to remove the specified node's information from its list of known peers as that node is preparing to leave the network. Note that, as is the case in most P2P protocols, node may also leave the network unexpectedly, so some sort of "stabilization" routine should be run by every peer node to periodically update its list of peers.

File Sharing Application Code

Having discussed the file sharing protocol, you may now download and view the actual Python code. This file, btfiler.py, extends the Peer class and contains methods for handling each of the messages as described above, as well as a method, buildpeers for filling in the initial list of known peers as the node attempts to join the P2P network. The constructor registers the handler methods and performs other simple initialization tasks.
You may also at this point look over the GUI front end for the file sharing application. The code is in this file, filergui.py. The next section briefly discusses the contents of this module.
^ TOP

The GUI Application

The FilerGUI class extends the basic Frame class and adds components and event handlers to the window. The window looks like the following:
The constructor of the FilerGUI class constructs and adds the GUI components to the frame window. It then initializes a FilerPeer object listening on the server port specified as a parameter to the constructor. The constructor also takes as a parameter the peer id of the initial peer to contact in order to attempt to join the P2P network. It uses the buildpeers method of the peer class to populate the list of known peers as described in the earlier sections. It then starts the mainloop of the peer object in a separate thread.
The FilerGUI constructor also schedules two tasks to run periodically, every 3 seconds. One is a "stabilizer" routine that checks the list of known peers to ensure that they are all still alive. For this purpose, we just use the checklivepeers method that is provided by the framework. The other task that runs periodically is a "refresh" routine that copies the entries in the internal list of known peers and the list of known file names to the corresponding components of the GUI window: the "Available Files" and "Peer List" list boxes.
The FilerGUI class contains a createWidgets method that lays out all of the components in the window frame above and registers appropriate event handlers with each button. The handlers for each button perform the following tasks, each programmed in its own method entitled "onAdd", "onSearch", etc.:
  • Refresh: Synchronizes the contents of the GUI list components with the internal peer and file lists of the node. This operation happens automatically every few seconds.
  • Remove: Removes the selected node id from the peer list.
  • Rebuild: Takes a "host:port" string from the accompanying text box and uses it to invoke the buildpeers method of the Filer Peer class, attempting to populate the list of known peers.
  • Search: Sends a QUER message with the contents of the accompanying text box to all peers in the list. As discussed above, responses to the QUER message will not arrive immediately. RESP messages received in the background will be handled by the mainloop of the peer as described above in the "Handling Messages" section. If a new entry is in fact added to the file list then it will be reflected the next time the "refresh" routine is executed.
  • Add: Registers the specified file name as being locally stored on the peer. Shared files are stored in the same directory as the application's executable file.
  • Fetch: Downloads the selected file to the local computer by sending an FGET message to the peer that is listed as hosting the file.
Other than the component layout code and the event handlers above, the FilerGUI source file has a main method that launches the application, taking (1) a port number to listen on for incoming connections, (2) the maximum number of peers to store in the peer list, and (3) the peer id of an initial contact in the P2P network that the node wishes to join.