Basic Socket Interface

This section describes the base socket interface. Sockets are an abstract interface designed to deal with any sort of networking. The definitions for the operations described here are in header files cleansocks.h and cleansri.h, though in practice applications will usually #include some higher level headers which will bring these along. All types and operations described here are defined in the cleansocks namespace, so you might want to say using namespace cleansocks;, or qualify the names you are using.

socket s;

A socket is a communications terminus. All messages or connections travel between two sockets, usually located on different computers. A C++ object of class socket represents one of these termination points. You won't generally use this class directly, however, since it is abstract and does not belong to any particular networking technology or protocol family. You will want to create concrete sockets, such as the TCPsocket object described later. Concrete sockets are derived from class socket, and can be used with all the operations described here.

endpoint e;

Endpoints are also abstract, and represent the name of a socket. The system uses an endpoint description to direct the information to the correct socket. Like sockets, you will not create endpoints directly, but use the concrete versions for a specific type of network, such as IPendpoint described later. An IPendpoint is a host name combined with a port number.

connect(socket &s, endpoint &e);

This call attempts to connect the socket s to the socket identified by endpoint e. It is used for stream-oriented protocols like TCP, usually by client programs connecting to a server. If the connection is successful, the socket can be used in the send and recv operations below. If unsuccessful, the method will throw an exception.

bind(socket &s, endpoint &e);

This assigns the given endpoint (name) e to the socket s. This is most often used by servers to specify where clients will need to contact them. For instance, a TCP-based server will use bind to specify which port clients should try to contact. Bind will succeed or throw an exception.

listen(socket &s [, int b]);

This places socket s in listen mode, which allows other sockets to connect to it. Servers use this call to allow clients to connect.

Incoming connections must be accepted (next method), much like answering an incoming phone call. The optional parameter b is the backlog size, which is the number of un-accepted connections allowed before additional connections are refused. The default value is 5. Listen will succeed or throw an exception.

socket s2 = accept(const socket &s [, endpoint& e]);

Accept receives a connection made to s, which must be listening. An accept call suspends the caller until some remote socket attempts to connect to s, then the caller resumes to process the new connection. This is not unlike a read from the keyboard, which waits until until there is data to read, and the reader resumes to process it.

The return value s2 is a local socket connected to the remote one used in the connect call that connected to s. Use s2 with send and recv to communicate with the remote socket. The socket s remains in listening mode, accepting additional connections.

If e is provided, it is an output parameter. After the call returns, it will contain the endpoint (name) identifying the remote socket that connected.

Accept will return a connected socket or throw an exception.

int i = send(socket &s, const void *data, int size [, clean_flags flags ] );

int i = send(socket &s, const string &data [, clean_flags flags ] );

int i = send(socket &s, const array< [ sq ] char, N> &data [, int size ] [, clean_flags flags ] );

This attempts to send data on the indicated socket s, which must be connected. In the first form, the data are the first size bytes starting from the location given by pointer data. In the second form, the data are the contents of the string data. The third form sends data from a std::array of characters (the sq is an optional signed or unsigned). The call sends the first size elements. The size parameter defaults to N; providing a size larger than N throws an exception.

The socket s, which must be connected, meaning must have been sent to a successful connect, or returned by a successful accept. The bytes are sent through that connetion, and the number of bytes sent is returned. This will generally be size, but may be less. Whether a return value less than size is bad depends on what you are doing. If it matters, you'll have to check that yourself.

The send operation will throw an exception for various errors or local network failures. Successful return implies successful dispatch, but send does not detect delivery failures.

The optional parameter flags specifies one or more options which modified the behavior of send. See the discussion below.

int i = recv(socket &s, const void *buf, int size [, clean_flags flags ] )

int i = recv(socket &s, array< [ sq ] char, N> &buf [, int size ] [, clean_flags flags ] );

This receives up to size bytes from socket s. In the first form, the data are placed in the memory region starting at the pointer buf and extending for size bytes. In the second, the data are placed in the first size positions of the character array buf. As with send, the socket s must be connected, and the data come from that connection. In the second form, size is optional, defaulting to the array size N. Recv will throw an exception if you send a size which is larger. In the first form, size is required and cannot be checked for excess by recv.

The return value i is the number of bytes actually received, which is often less than size. Upon failure, recv will throw an exception. If there is no data available, recv will cause your program to wait until some arrives. Even then, return values less than size are common, and simply indicate that size bytes have not (yet) arrived. A return value of zero indicates that the remote socket has been closed, a normal shutdown.

The optional parameter flags specifies one or more options which modify the behavior of recv. See the discussion below.

int i = sendto(socket &s, const void *data, int size [, clean_flags flags ], const endpoint &e);

int i = sendto(socket &s, const string &data [, clean_flags flags ], const endpoint &e);

int i = sendto(socket &s, array< [ sq ] char, N> &data [, int size ] [, clean_flags flags ], const endpoint &e);

Sendto is very similar to send, but for use with connection-less protocols, and an un-connected so s. Absent a connection, we must say where the data are to go, so we need the destination parameter, e. Data are sent to a socket which is bound to the endpoint given by e.

As with send, sendto success assures a valid dispatch of a message, but not necessarily delivery. This matters more without a connection, since we do not know if there even is any socket bound to e. The establishment of a connection provides several guarantees which we lack in this case.

int i = recvfrom(socket &s, const void *buf, int size [, clean_flags flags ] [, endpoint &e]);

int i = recvfrom(socket &s, array< [ sq ] char, N> &buf [, int size ] [, clean_flags flags ] [, endpoint &e] );

The recvfrom call is also used with connection-less protocols, and is very similar to recv. It waits for some data to be available, then places up to size bytes into buf and returns the number actually stored, just as recv does.

The socket s will not be connected, but must have been bound with an endpoint, and it receives data sent to that name. That data could come from anywhere. If e is provided, it is an output parameter and recvfrom fills it in with the endpoint identifier of the socket from which the data were sent.

getsockname(const socket &s, endpoint& e);

Fills in e with the local binding (name) of socket s. If the name has been assigned with the bind call, this will return the endpoint assigned. If the socket has been connected with connect, this returns the local endpoint of the connection (not what you connected to).

close(socket &s);

This closes the socket, indicating it is no longer used. Close terminates transmission, listening, or whatever the socket may be doing, then releases the associated system resources. Sockets should always be closed when no longer needed.

shutdownsend(socket &s)

shutdownrecv(socket &s)

For connected socket s, shut down either the sending or receiving half of the channel. After shutdownsend, recv on the far end will return zero after consuming any data already in transit. After shutdownrecv, our next recv will return zero. Transmission in the other direction may continue. These are required in some situations, but are rarely necessary. More commonly, a program just runs close when it is done with a socket.

Even if you shut down both directions of a connected socket, you still need to close a socket to release resources.

Assigning and copying socket objects is generally allowed, but those objects are themselves references to underlying system objects. Therefore, the copies aren't really independent; operations on any copy effectively operate on all.

On Sending And Receiving

The socket send and recv operations take data from, and put data into, and area of memory specified by a pointer and a size. The first form of the cleansocks version works the same. It would most often be called by sending an array and its size:

char data[256]; recv(s, data, sizeof data);

Under basic C rules, the array name parameter data is sent as a pointer to data[0], and the expression sizeof data evaluates to the physical size of the array, which is 256 bytes in this case. Of course, the method use a void * which can accept any type of data, so it could also be called with

int data[256]; recv(s, data, sizeof data);

Here, sizeof data is the physical size, perhaps 1024 depending on the architecture. Which is what the recv call wants: the physical size of the data area starting from the pointer. If you want to receive something which is not an array, you can do that with the ampersand operator.

int data; recv(s, &data, sizeof data);

This matches the type of the data parameter by explicitly taking the address of the variable, since it's not an array. You can also receive structure types this way.

You can also place data in the middle of an array with code like this:

int data[256]; recv(s, data + 25, 75);

Tells receive place data inside the portion of data from data[25] through data[99].

Note that the recv gets only the start of the storage area and the size you send it. This is fine of you want to send a size smaller than the array to use only a part of the space. But, if you send a size which is larger than the array, such as

char data[256]; recv(s, data, 2500);

the recv has no way to know that you lied to it, and will happily store bytes past the end of the array. This is just a special case of C's lack of bounds checking on arrays, called a “buffer overflow.” It is also an important class of exploitable bugs in network software.

As noted above, cleansocks supports using a C++ 2011 std::array as the reception buffer instead of a native array. It has the advantage that it cannot overflow, but is more limited, since it only accepts arrays of characters (signed or unsigned), and can only fill from the start of the array. This part of the interface may be expanded in the future.

The send is similar, except that it is more common to send a size smaller than the buffer. This is because, while you still specify a data area with a pointer and a size, its meaning actually differs in an important way: For recv, you are specifying a block of memory to receive the transmission of as yet unknown size. For send, you are specifying a block of memory containing data to send, and you had best know how much you had in mind. So you might use a pattern like this:

char data[1024]; char datasize = 0; // Some computation that places, say 427 bytes into data, // and updates datasize to keep track, so it becomes 427 send(s, data, datasize);

Here, some space is allocated but not fully used, so you simply send what needs to be sent.

Buffer overflow is also possible with send, but here the program is sending whatever data appears after the buffer. For sending, this transmits data that was past the end of the buffer, and perhaps no meant to be shared with the other endpoint.

Cleansocks adds the ability to send a std::string, for which the size of the string is used, as well as a std::array, similar to that with recv. These forms are checked and cannot overflow. Cleansocks does not support transmitting std::vector, though that might be a useful extension someday. If you have one lying around, you can send it using the basic interface, like this:

std::vector<char> vc; . . . send(s, vc.data(), vc.size());

though, if you lie to send about the size, it will believe you.

Network transport protocols can use a byte stream, like TCP, or a sequence of messages, like UDP, and well, pretty much anything that's not TCP. The sending and receiving primitives reflect this difference in their behaviors. For a stream protocol, send adds the specified bytes to the stream, and recv gets any bytes which have arrived, up to the buffer size. There is no reason recv should ever return more than the buffer size; if more bytes are available, we'll just get 'em next time. But for a message protocol, recv must receive exactly one message. If the next message won't fit in the given space, the extra bytes will (generally) be discarded, and recv or recvfrom will return the size of actual message, which will be more than the number of bytes actually stored, which is limited to the buffer size.

When sending, the provided buffer size will be the message size, and send or sendto will transmit one message. Most message protocols have some limit on the size of message it can send. If you exceed that, you will get an exception. (A stream protocol does not have individual messages, so there is no limit on the size.)

One important consequence of all this is that, with a message-oriented protocol, each message transmitted in a single send will be received by a single receive of the same size (unless it's not received at all). Each transmission retains its integrity. With a stream-oriented protocol, sending just adds the byte to the stream. A send of 20 bytes followed by another of 15 could be received in a single operation getting 35 bytes, or three of 8, 23 and 4, respectively, or any other combination that receives 35 bytes. Transmissions are merged and divided however the network and OS likes.

Flags

Send and receive operations have an optional flags parameter which modifies their behavior. Native sockets have many flags, which vary from one platform to another. Cleansocks fully supports just three. Any of these may be sent as the flags parameter, and more than one may be sent by combining with the | operator. Most options are meaningful only in limited circumstances, and will be ignored if used otherwise.

CSMSG_PEEK

This is meaningful on recv, and means to return the data but not consume it. The next recv will get the same data, and perhaps more if more has been received. Note that some peek implementations are lazy, so that, given two successive peeks, the second will never return more data than the previous, even if more has arrived. Only a regular recv can consume the newer data.

CSMSG_WAITALL

Also meaningful for recv on a streaming protocol. It instructs recv not to return until the entire buffer has been filled. It will return early if the connection is closed or in case of error.

CSMSG_DONTROUTE

Don't send the packet through a router. Only allows delivery to directly-connected hosts. This is only meaningful for sending, and only useful in some odd circumstances. It is included mainly because it seems to be implemented on all of Linux, BSD and WinSock.

Native socket flags are integer constants named with the prefix MSG_ rather then CSMSG_. Cleansocks flags have their own type rather than integer, which avoids problems with function overloading. (My original intention was to retain the same names, but I was not able to do that in any sane way.) To send multiple flags, combine them with the | operator, just like the native ones.

If you wish to use a native socket flag, you can promote it to a Cleansocks flag with explicit construction, such as clean_flags(MSG_MORE). Cleansocks will simply send the flag to the underlying native send or recv call. This will also limit your program to a platform which supports that flag.

A common Unix flag is MSG_NOSIGNAL. Without it, a send on a closed socket will cause the program to receive a SIGPIPE signal, making it crash if it doesn't provide for the signal. WinSock does not have this mechanism. Cleansocks always sends MSG_NOSIGNAL on Unix, so no signal is delivered. But if the underlying send call returns failure because the socket is closed, Cleansocks will throw a C++ exception. Which is a bit like a signal after all, but consistent across other errors and platforms.

If you promote and send MSG_NOSIGNAL, it will not change the behavior of send, which will be get NOSIGNAL in any case, though your adding it won't do any harm. There is no way to ask Cleansocks not to send MSG_NOSIGNAL.