Here are some simple examples of working sockets programming that will repay careful study. Of course you can always cut and paste the code into your own programs.
This program listens on a nominated port and logs all incoming messages to an indicated file. This version does not multi-thread or multi-task so it can only handle on logging session at a time. The program can be tested by running a telnet session to the selected port number, however it should be noted that many telnet clients do not locally echo the text entered. If you are using such a client check for options to enable local echoing. PC based clients do not usually echo locally, Unix based clients do.
The program source file is called logger.c and it is launched with two command line arguments specifying the port number and the log file name. Both of these could be vetted more thoroughly. The program runs as a daemon and is shut down cleanly on receipt of SIGHUP.
First header files are included.
#include <netdb.h> #include <netinet/in.h> #include <sys/types.h> #include <sys/socket.h> #include <stdio.h> #include <time.h> #include <signal.h> #include <string.h>
The first four header files are associated with Internet sockets programming. The <signal.h> header is required for signal handling and the final three headers are standard ANSI-C headers.
The following global declarations and prototypes appear before main().
union sock
{
struct sockaddr s;
struct sockaddr_in i;
};
void closedown(int);
FILE *logfp;
The function main() starts with the following declarations and code.
main(int argc,char *argv[])
{
int port;
union sock sock,work,peer;
int wsd,sd;
int addlen,peerlen;
time_t now;
char buff[BUFSIZ];
int i,rv,nrv;
if(argc!=3) exit(1);
port = atoi(argv[1]);
logfp = fopen(argv[2],"w");
if(logfp == NULL) exit(1);
The first four lines of declarations are for various network related variables, now holds the current time for logging. The other variables should be fairly obvious. The four lines of code process the command line arguments.
The next portion of code sets up the listening socket.
if(signal(SIGHUP,closedown) == -1) exit(1);
sd = socket(AF_INET,SOCK_STREAM,0);
if(sd == -1)
{
perror("No socket");
exit(1);
}
sock.i.sin_family = AF_INET;
sock.i.sin_port = port;
sock.i.sin_addr.s_addr = INADDR_ANY;
rv = bind(sd,&(sock.s),sizeof(struct sockaddr));
if(rv == -1)
{
perror("Bad Bind");
exit(1);
}
rv = listen(sd,2);
if(rv == -1)
{
perror("Bad listen");
exit(1);
}
The very first line of this block sets up signal handling for a clean closedown. The remaining parts of the code create the socket, fill in the struct sockaddr and bind to the socket and establish that the program is listening on the socket. The main processing loop now starts.
do
{
wsd = accept(sd,&(work.s),&addlen);
peerlen = sizeof(struct sockaddr);
getpeername(wsd,&(peer.s),&peerlen);
time(&now);
fprintf(logfp,"Connection from %s at %s",inet_ntoa(peer.i.sin_addr),ctime(&now));
The accept call blocks until there is an incoming connection request and it then returns a socket (descriptor in wsd) that can be used for communication with the remote client. The Internet related information is in work. Note the need for an otherwise unused variable to hold the length of the address. The function getpeername is used to get the address of the remote host for logging purposes. inet_ntoa converts the remote host Internet address to a string.
The next portion of code is the start of the basic message processing loop for the current connection.
do
{
nrv = read(wsd,buff+i,BUFSIZ-i);
if(nrv > 0)
{
read of course reads from the socket and stores the incoming data in a buffer area. At this stage program design is much complicated by the fact that the data could arrive in dribs and drabs or all at once. When using a Unix telnet client each line of input results in a single message which can be read, on PC telnet client it is not uncommon for the client to send each character as a separate message. To cope with this situation the program keeps track of what is already in the input buffer and, if the end of message has not been seen, and gives the address of the next location in the buffer to read. For the purposes of this exercise it is assumed that messages are terminated by CRLF (i.e. a carriage return character followed by a line feed character), this is the normal behaviour of Internet telnet clients and many other Internet protocols terminate ASCII encoded messages in this fashion.
nrv is, of course, the number of bytes actually read, with zero indicating closed connection and -1 some sort of error.
The next portion of code checks whether there is a complete message in the buffer and, if so, writes it to the logging file.
char *crlfp;
int fraglen;
i += nrv;
buff[i] = '\0';
if(crlfp=strstr(buff,"\r\n"))
{
*crlfp = '\0';
fprintf(logfp,"%s\n",buff);
fraglen = (buff+i)-(crlfp+2);
if(fraglen)
{
strcpy(buff,crlfp+2);
i = fraglen;
}
else i = 0;
}
Design is further complicated by the possibility that the data actually read into the buffer could include both the message terminating CRLF and part of the next message. [Note there is a bug in the program that could result in messages being lost if more than one message is received by a single call of read. This should be easy to fix.]
The program operates by first incrementing i to a position immediately after the last character received and then writing zero to that position so that the buffer contents "so far" are a proper string. [Note there is a bug in the program that could cause a crash if read read in eactly BUFSIZ bytes. This should be easy to fix.] The buffer string is then checked for the presence of CRLF using the library routine strstr, if it is found a zero is written over the CR and the buffer string is now the input message which is logged. It only remains to check whether there is a fragment (of length fraglen) of the next message, move it to the start of the buffer if there is such a fragment, and resetting the value of i.
If read has returned zero, then the connection has been closed. This is duly logged.
}
else
{
close(wsd);
time(&now);
fprintf(logfp,"Connection closed at %s",ctime(&now));
break;
}
} while(1);
} while(1);
}
After logging the break statement takes the program out of the current loop, the loop waiting to accept an incoming connection continues.
And finally here is the code for the signal handling function closedown
void closedown(int i)
{
time_t now;
time(&now);
fprintf(logfp,"Logger closed down at %s",ctime(&now));
fclose(logfp);
exit(0);
}
The program can be compiled, under Solaris, thus
cc -o logger logger.c -lsocket -lnsl
This program will copy a WWW page into a local file. There are two command line arguments, the URL of the page and the name of the local file. There are a number of limitations on the operation of this program.
The program source is called getwww.c and it can be compiled in exactly the same way as the previous example.
First the included header files and global declarations.
#include <netdb.h>
#include <netinet/in.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <stdio.h>
#include <signal.h>
#include <string.h>
#include <stdlib.h>
union sock
{
struct sockaddr s;
struct sockaddr_in i;
};
For details see the previous example.
The function main includes the following data declarations.
main(int argc,char *argv[])
{
union sock sock;
struct hostent host,*hp;
int sd;
char *hostname,*request,*cp;
FILE *local;
char buff[BUFSIZ];
int i,l,nrv;
The following code is used to process the command line arguments.
if(argc!=3) exit(1);
if(cp=strchr(argv[1],'/'))
{
*cp = '\0';
l = strlen(argv[1]);
hostname = malloc(l+1);
strcpy(hostname,argv[1]);
*cp = '/';
l = strlen(cp);
request = malloc(l+1);
strcpy(request,cp);
}
else
{
hostname = argv[1];
request = "/";
}
if((local = fopen(argv[2],"w")) == NULL) exit(1);
The first argument is the URL of the page to be copied. This will always start with a host name, optionally followed by the name of the page required. If the URL consist of just a host name, the program will actually request "/". If a page name is present the program stores both the page name and the requested page name in dynamically allocated memory. [Note there is a bug here, the dynamically allocated memory areas are not subsequently freed leading to memory leakage. This should be easy to fix.]
The second argument is the name of the file to hold the local copy. This is duly opened.
The next portion of code deals with DNS look up and establishing an initial connection with the sever. [Note there is a bug in the program that would cause failure if the host were identified by an IP address rather than a DNS address. This should be easy to fix.]
hp = gethostbyname(hostname);
memcpy(&(sock.i.sin_addr.s_addr),*(hp->h_addr_list),sizeof(struct in_addr));
if(h_errno)
{
fprintf(stderr,"DNS error\n");
exit(1);
}
sock.i.sin_family = AF_INET;
sock.i.sin_port = 80;
sd = socket(AF_INET,SOCK_STREAM,0);
if(sd == 0) exit(1);
if(connect(sd,&(sock.s),sizeof(struct sockaddr_in)) == -1)
{
perror("Connection failed");
exit(1);
}
The Solaris DNS routines set the global variable h_errno if there is any sort of error. The library routine gethostbyname does the actual DNS lookup and memcpy is used to copy the results into the socket information structure. The standard HTTP port (80) is set up. A socket is obtained and a connection opened.
The next part of the code constructs the actual HTTP request in the memory area buff silently hoping that it's big enough. strlen is used to determine the size of the actual request and it is then transmitted using write. [Note there is a bug here, the program should have checked the return value from write. This should be easy to fix.]
sprintf(buff,"GET %s HTTP/1.1\r\nHost: %s\r\nConnection: close\r\n\r\n", request,host); l = strlen(buff); write(sd,buff,l);
Once the program has written the request, it can now read the reply and copy the received bytes into the local file using the following code.
do
{
nrv = read(sd,buff,BUFSIZ);
if(nrv > 0)
{
for(i=0;i<nrv;i++) putc(buff[i],local);
}
else break;
} while(1);
close(sd);
fclose(local);
}
And that is really all there is to a WWW browser apart from interpreting the HTML and rendering it. Of course this simple program doesn't sense network time-outs or other error conditions and there are a significant number of features of HTTP that should be handled.