• Tidak ada hasil yang ditemukan

0471408824ch03. 197KB Jun 05 2011 09:30:41 PM

N/A
N/A
Protected

Academic year: 2017

Membagikan "0471408824ch03. 197KB Jun 05 2011 09:30:41 PM"

Copied!
22
0
0

Teks penuh

(1)

The Brief Life of a Packet

\It isbettertogive than toreceive." - traditional saying.

\Be conservative in what you send, and liberalin what youreceive." - Jon

Postel.

3.1 Roadmap

In this chapter we take a lightweight structured walk-through of the process of constructing and sending a packet in Linux, and then receiving and decoding the packet at the far side. We do this by way of three dierent examples, each seen from three dierent viewpoints: rstly we look at the transfer of some data over TCP connection (e.g. part of an HTTP down-load); then we look at a UDP packet exchange (e.g. part of a DNS lookup); nally, we look at a one way RTP ow. The viewpoints are taken from the source code; from the execution; and from the \wire". In the last case, we also consider a router as part of the wire, and go into a bit more detail about its operation: looking at an extended case which entails capturing some audio, putting it into an RTP packet, calling the socket API to send a packet, going through the socket kernel code (skbus etc), adding the UDP header, rewall lter checking, and multicast. Then the packet traverses a link (SLIP or PPP say) to a router which is also Linux, running diserv - so we get a look at a packet being scheduled in packet queueing code, by a WRR or an other scheduler, then forwarded, say, over an Ethernet, then nally received by driver, IP layer, demultiplexed and passed up through UDP through to the process which is scheduled, reads the socket and outputs the audio.

This is a combination of the way to explain Internet communications ideas from many RFCs and related Internet documents1, where the view of single

protocol is expanded through an operational exampleand a top down view of

the entire protocol stack in one go.

Thus we also provide a rapid introduction to the rest of the book at a fast-track technical level.

1see for example, TCP/IP Illustrated Volume 1.

(2)

3.1.1 User, Application and Protocol Viewpoints

There are at least three very dierent views of communication that one can take:

User from a user perspective, in modern systems, communication is made as

transparent as possible. when clicking on an anchor in a web page, there is no indication whether the down-load is local, or very remote. Many FTP implementations include GUIs that make copying les the same whether remote or local - usually the only visible dierences are in the areas of naming (specifying a remote system name as part of a URL or FTP site name, although this can often be hidden in a global extension to a normal object or le naming system; and security - access to a remote system usually (though not within an organisation) entails presenting more cre-dentials than when accessing local resources.

Application An application is programmed to talk to an Application

Pro-gramming Interface to a lower level system. Normally, such an API hides detail and, especially in the area of data communication, this detail may be of a very complex and rich structure - if the API is to a lower level protocol, then it will hide the concurrency and state machine present in the lower level and attempt to present a more sequential, simpler style of programming for everyday implementors to understand and use.

Proto col The protocol perspective is the most complex. here we need to

un-derstand the way that information is exchanged. Here we need to consider the fact that each end point in a communication session is autonomous, and has its own quirks and failures and successes. The channels between end points are imperfect, introducing errors and independent failures and performance degradation.

Each of these perspectives can be illustrated in dierent ways. In the user perspective, one is interested in the resource naming, and visible performance parameters. In the application perspective, one is interested in the APIs and interface programming. From the protocol perspective, one really wants to understand the range of sequences of possible events and the way that state is changed by events.

From a user perspective, we can show information models and organisation of how resources are mapped into cyberspace. From an API perspective, we can document the interfaces - the function prototypes, pre-requisites and results of functions and how they relate in families to each other are usually part of good systems documentation in any case. However, as we move through the layers of abstraction to lower levels, often these APIs become more complex and the in-teractions more subtle. From a protocol perspective, we need to understand the behaviour of senders and receivers and the structure of information exchanged. This latter view is often best illustrated trough looking at the operational examples of a protocol, showing the actual data exchanged on the wire for correct, and common error cases. in this sense, understanding a protocol stack implementation is often best done by eectively debuggingit.

(3)

3.1. ROADMAP 57

debugging tools (e.g. GDB) introduce changes to the function call mechanisms and therefore actually alter timing which in low level code (especially interrupt handling, but also anything that involves any fancy concurrency and scheduling) will alter the behaviour of the system under examination, often critically to the point where the problem being chased is no longer evident!

However, one can conduct the same type of activity `by hand". Thus in this chapter we start doing what the rest of this book does in detail, which is to take a structured walk through o the Linux kernel communications stack. In doing so we occasionally take a look at a trace of packets on the wire, and occasionally take a look at a trace of function calls, but largely, especially given space and time considerations, we concentrate on the code itself. This is because there is really one main example piece of code, while there are an innite variety of examples of execution traces of the code.

Many tracing, debugging and proling tools do abound, and I would encour-age you to use them to help nd problems (especially non intrusive tracing for performance problems) then all well and good. For now, lets look at some of the code!

Sending

In the diagram below??,we try to illustrate the overall structure of the code,

showing the way the control ows from the application down to the driver, and thence out onto the transmission medium.

Time Application

Transport

Network

Driver

Wire

X

X

X

(4)

in detail at Inter-process communication APIs. The parameters are typically data and addressing information, and these are passed through to the transport protocol in the correct family, for example UDP, TCP, etc

The kernel then triggers the appropriate state machine for the protocol and then passes information through to the IP level which then entails doing the necessary route work, and then adds any appropriate IP header information, nally queueing the data for driver output. The device is then usually woken and actual transmission occurs. In fact, several of these stages may be deferred to allow more ecient use of CPU, memory, and network resources, so that a network scheduler is then part of the picture. In fact there are several complex levels of scheduling which are not always explicit. Sometimes, it is simply a matter of using a ow control and on-o scheduler mechanism, for instance when looking at device output for busy links (or device input for full queues) we may rely simply on semaphores or other mechanisms to handshake between the layers, and wake up device driver, which does link and physical layer work to send or receive a packet.

Receiving

In the diagram here ??we try to show the converse process - from unsolicited

arrival of a frame from the wire, up to delivery to an application that had, in fact, been earlier primed with the idea that it might want to receive some data sometime soon!

Time Application

Transport

Network

Driver

Wire

Application Blocked/Descheduled during this period...until data ready

X

X

X

(5)

3.1. ROADMAP 59

The general system scheduler (after system call) runs network bottom half (and can dispatch tasklets or software interrupts) on completion of any kernel work on behalf of any user process that called a system call. The bottom halves basically complete work that the hardware interrupts started- for example, they can dequeues any packets in the interface queue for IP input , and sends any more packets waiting to send.

IP input checks packets, carries out any re-assembly work needed, get a route if the packet is to be forwarded rather than being destined for a local application process. If it is for local delivery, IP then demultiplexes the packets according to the IP protocol number to the appropriate transport handler receive function (UDP, TCP etc).

TCP (and UDP) input then do further packet header checking, update state machinery, and then demultiplex the packet further to the correct socket (i.e to the queue of data pending for any user process that has an outstanding read on that socket, which is \connected" to the appropriate port numbers.

Once the transport protocol demultiplexing routine has put data on the appropriate socket receive queue, it calls wakeup on any process waiting for data, thus making any such process ready for general running by the system scheduler,

Sometimes a process that had earlier called a recv, recvfrom, recvmsg read-/readv and was asleep - data is copied from receive queue to user space buer; process is now woken IP, and continues with data in buers.

Recent Linux work on avoiding the receive packet copies (copy buer only on write) has improved this performance somewhat.

Forwarding

The nal case of a packet handling pattern we need to worry about is that of forwarding. Forwarding is (almost) entirely kernel bound - with the exception of the process of managing the tables used to make forwarding decisions (see chapter seven ) and performance (see chapter eight) the forwarding task for IP entails back to back, reception and transmission.

3.1.2 Source Code, Execution and Wire Viewpoints

In the rest of this chapter then we look at the source code and trace through some simple vertical cuts across the stack for some examples, with some very slimmed down view of what occurs on the wire during these executions.

3.1.3 State, Memory etc

Throughout the code, protocols need to keep track of where they are. Rather than just using a set of randomly allocated messy global memory,most protocols gather together all the appropriate information on a given ow (at their level -e.g. an IP route, a TCP connection, a UDP session) in one data structure - in BSD Unix this is generally referred to as the Protocol Control Block. In Linux, the general all purpose data structure ofskbuffsis also used to hold this data.

(6)

passing them upwards (e.g. link to the user socket structure to queue data and so on) as well as very complex state machinery for parts of the TCP machine.

3.2 TCP Example

First we look at the example of the Transmission Control Protocol and follow through the code applied to a single data packet as it is transmitted, then subsequently as one is receive.d

3.2.1 Socket Level

In this simple example, we assume that an application process (netscape, apache, etc) has calledwrite()on socket which maps into a system call (see chapter 2),

which subsequently callstcp sendmsg()and and deschedules the process...(see

chapter 2).

3.2.2 TCP Output

Here we at the socket glue, then the tcp work - in the source distribution this (as is most of the rest of the code discussed in this chapter) resides in the

src/net/ipv4/directory in the (version 2.4) kernel distribution tree.

An extract from the code referenced below is displayed in gure??.

inet_sendmsg()

sk->prot->sendmsg() ... tcp_ipv4.c:

struct proto tcp_prot {} tcp.c:

tcp_sendmsg() tcp_output.c: tcp_send\_skb() tcp_transmit_skb()

err = tp->af_specific->queue_xmit(skb); which maps to ip_queue_xmit()

Then in tcp transmit skb(), the state/header changes are managed This

is discussed in detail in chapter six.

3.2.3 IP Output Work

TCP calls IP functions to carry out output. Some of this work is structured as follows, and a relevant extract from the code is illustraed below in gure??.

(7)

3.2. TCPEXAMPLE 61

236 th = (struct tcphdr) skb push(skb, tcp header size );

237 skb;>h.th = th;

238 skb set owner w(skb, sk); 239

240 /Build TCP header and checksum it./

241 th;>source = sk;>sport;

242 th;>dest = sk;>dport;

243 th;>seq = htonl(tcb;>seq);

244 th;>ack seq = htonl(tp;>rcv nxt);

245 ((( u16 )th) + 6) = htons(((tcp header size >>2)<<12)jtcb;>ags);

246 if (tcb;>ags & TCPCB FLAG SYN)f

247 /RFC1323: The window in SYN & SYN/ACK segments

248 is never scaled.

249 /

250 th;>window = htons(tp;>rcv wnd);

251 g else f

252 th;>window = htons(tcp select window(sk));

253 g

254 th;>check = 0;

255 th;>urg ptr = 0;

Figure 3.3: Filling in Some Key TCP elds

NF HOOKis part of the network lter support in Linux which is machinery to

do the low level work for rewalls and other functions and is discussed in more detail in chapter ten.

netfilter.h:

NF_HOOK(pf, hook, skb, indev, outdev, okfn)

NF_hook\_slow((pf), (hook), (skb), (indev), (outdev), (okfn)) ip_queue_xmit2

skb->dst->output(skb)

This queues, or callsokfn(), which maps toop queue xmit2()Theskb->dst

structure was lled in by local route lookup and is link level output routine. Inip queue xmit, header changes are as follows:

This is covered in detail in chapter ve.

3.2.4 Link Level Output

Output at the link level can be complex (see chapter 8). It includes specic technical details of driving devices, but also the important part is that this is where Linux provides dierent treatment for dierent trac types by providing fancy queueing and scheduling management.

(8)

367 /OK,weknowwheretosendit ,allo cateandbuildIPheader./

368 iph=( struct iphdr ) skbpush(skb, sizeof(struct iphdr )+(opt ?opt ;>optlen:0)); 369 (( u16 )iph ) =htons((4<<12)j(5<<8)j(sk ;>protinfo.af inet .tos&0x)); 370 iph;>totlen=htons (skb ;>len );

371 iph;>frago=0;

372 iph;>ttl =sk;>protinfo.afinet .ttl; 373 iph;>proto col=sk;>proto col;

374 iph;>saddr =rt;>rtsrc; 375 iph;>daddr =rt;>rtdst; 376 skb ;>nh .iph =iph ;

377 /Transp ortlayer set skb ;>h .fo oitself./

Figure 3.4: Filling in Some Key IP bits

3.2.5 Link Level Input

Link level input requires dealing with hardware and software interrupts and DMA devices - this is covered also in chapter four in detail. Some relevant extracts from the code are displayed in gures ??,??and??

Network device interrupts end up calling a function via the net devices

structure,

core/netif_rx:

Then the software interrupt schedules the rest. which eventually results in a call to

Next, netrxaction() which dequeues the packet, and calls a packet

handler based on the packet type (via a table) - if IP, this calls ip rcv()in ipinput.c- which checks various things.

3.2.6 IP Input Work

IP input has to decipher the packet, so the rst thing to do is gure out if it is really \for me"., e.g. is it IP version 4!

In ip input iprcv() checks, then calls NHHOOK (to run lters, which

may call, or queues directly iprcvfinish(), which calls iprouteinput()

inroute.cwhich lls in the \internal" route.

This includes a function (forward or local deliver), and callsiplocaldeliver()

inip inputagain, and this then callsipprot->handler.

This pointer was previously lled in (via the inetprotocol structure) in protocol.c, which matches the protocol to the protocol handler function - in

this case, TCP -tcpv4rcv().

This is investigated in more detail in chapter ve.

3.2.7 TCP Input Work

In the IPv4 case, the relevant TCP input code is kicked o fromtcpipv4.c,

which looks up if we have a connection tcpv4lookup, then tries to queue

in process context, or else, callstcpv4dorcv()thentcp rcvestablished()

(9)

1027

1028 static void netdev wakeup(void) 1029 f

1030 unsigned long xo ; 1031

1032 spin lock(&netdev fc lock); 1033 xo = netdev fc xo ; 1034 netdev fc xo = 0; 1035 while (xo )f

1036 int i = z(~xo ); 1037 xo &= ~(1<<i);

1038 netdev fc slots [i ].stimul( netdev fc slots [i ].dev); 1039 g

1040 spin unlock(&netdev fc lock); 1041 g

1042 #endif

1043

1044 static void get sample stats(int cpu) 1045 f

1046 #ifdef RAND LIE

1047 unsigned long rd; 1048 int rq;

1049 #endif

1050 int blog = softnet data[cpu].input pkt queue.qlen; 1051 int avg blog = softnet data[cpu].avg blog; 1052

1053 avg blog = (avg blog>>1)+ (blog>>1);

1054

1055 if (avg blog>mod cong)f

1056 /Above moderate congestion levels./

1057 softnet data[cpu]. cng level = NET RX CN HIGH; 1058 #ifdef RAND LIE

1059 rd = net random();

1060 rq = rd % netdev max backlog;

1061 if (rq<avg blog) /unlucky bastard/

1062 softnet data [cpu]. cng level = NET RX DROP; 1063 #endif

1064 g else if (avg blog>lo cong)f

1065 softnet data[cpu]. cng level = NET RX CN MOD; 1066 #ifdef RAND LIE

1067 rd = net random();

1068 rq = rd % netdev max backlog;

1069 if (rq<avg blog) /unlucky bastard /

1070 softnet data[cpu]. cng level = NET RX CN HIGH; 1071 #endif

1072 g else if (avg blog>no cong)

1073 softnet data[cpu]. cng level = NET RX CN LOW; 1074 else /no congestion/

1075 softnet data[cpu]. cng level = NET RX SUCCESS; 1076

1077 softnet data[cpu].avg blog = avg blog; 1078 g

1079

1080 #ifdef OFFLINE SAMPLE

1081 static void sample queue(unsigned long dummy) 1082 f

1083 /10 ms 0r 1ms;;i dont care;;JHS/

(10)

1135 enqueue:

1136 dev hold(skb;>dev);

1137 skb queue tail (&queue;>input pkt queue,skb);

1138 cpu raise softirq (this cpu , NET RX SOFTIRQ); 1139 local irq restore (ags );

1140 #ifndef OFFLINE SAMPLE

1141 get sample stats(this cpu); 1142 #endif

1143 return softnet data[this cpu ]. cng level ; 1144 g

1145

1146 if (queue;>throttle)f

1147 queue;>throttle = 0;

1148 #ifdef CONFIG NET HW FLOWCONTROL

1149 if (atomic dec and test(&netdev dropping)) 1150 netdev wakeup();

1151 #endif

1152 g

1153 goto enqueue; 1154 g

1155

1156 if (queue;>throttle == 0)f

1157 queue;>throttle = 1;

1158 netdev rx stat[this cpu ]. throttled++; 1159 #ifdef CONFIG NET HW FLOWCONTROL

1160 atomic inc(&netdev dropping); 1161 #endif

1162 g

1163 1164 drop:

1165 netdev rx stat[this cpu ].dropped++; 1166 local irq restore ( ags );

1167

1168 kfree skb(skb);

1169 return NET RX DROP; 1170 g

(11)

1172 /Deliver skb to an old protocol , which is not threaded well

1173 or which do not understand shared skbs. 1174 /

1175 static int deliver to old ones (struct packet typept, struct sk bu skb, int last )

1176 f

1177 static spinlock t net bh lock = SPIN LOCK UNLOCKED; 1178 int ret = NET RX DROP;

1179 1180

1181 if (! last )f

1182 skb = skb clone(skb, GFP ATOMIC); 1183 if (skb == NULL)

1184 return ret; 1185 g

1186

1187 /The assumption (correct one) is that old protocols

1188 did not depened on BHs dierent of NET BH and TIMER BH. 1189 /

1190

1191 /Emulate NET BH with special spinlock /

1192 spin lock(&net bh lock); 1193

1194 /Disable timers and wait for all timers completion /

1195 tasklet disable (bh task vec+TIMER BH); 1196

1197 ret = pt;>func(skb, skb;>dev, pt);

1198

1199 tasklet enable (bh task vec+TIMER BH);

(12)

tcp event data recv(), does any acknowledgement needed and callssk->data ready.

The last function queues data to the socket.

3.2.8 Socket Input Work

Incore/sock.cwe see that the main eect is triggered because sk->data ready = sock def readable;

1083 void so ck defreadable(struct so cksk, int len) 1084 f

1085 readlo ck(&sk;>callback lo ck );

1086 if (sk;>sleep&&waitqueue active (sk;>sleep )) 1087 wakeup interruptible( sk ;>sleep);

1088 skwakeasync (sk,1,POLLIN); 1089 read unlo ck(&sk;>callback lo ck ); 1090 g

Figure 3.8: Socket Receive wake up process

3.2.9 On the wire

16:09:07.462590 brahms.cs.ucl.ac.uk.ssh > ovavu.cs.ucl.ac.uk.1023: P 2466:2574(108) ack 971 win 17376 <nop,nop,timestamp 7115105 382435932> (DF)

3.3 DNS/UDP Example

Here we look at the dierence if the packet was a UDP packet, for example part of a DNS exchange. To a large extent, these are restricted to the actual UDP protocol function itself.

3.3.1 UDP Output

Inudp.c. the main function of interest isudp sendmsg(), for output.

3.3.2 UDP Input

UDP is small enough that it is all in one C le,udp.c. For receive, the function

of interest isudp recvmsg()

3.3.3 On the wire

16:11:35.321380 brahms.cs.ucl.ac.uk.1904 > bells.cs.ucl.ac.uk.domain: 60803+ (43)

(13)

488 if (msg;>msg name)f

489 struct sockaddr in usin = (struct sockaddr in)msg;>msg name;

490 if (msg;>msg namelen<sizeof(usin))

491 return;EINVAL;

492 if (usin;>sin family != AF INET)f

493 if (usin;>sin family != AF UNSPEC)

494 return ;EINVAL;

495 g

496

497 ufh.daddr = usin;>sin addr.s addr;

498 ufh.uh.dest = usin;>sin port;

499 if (ufh.uh.dest == 0) 500 return;EINVAL;

501 g else f

502 if (sk;>state != TCP ESTABLISHED)

503 return;ENOTCONN;

504 ufh.daddr = sk;>daddr;

505 ufh.uh.dest = sk;>dport;

506 /Open fast path for connected socket.

507 Route will not be used, if at least one option is set.

508 /

509 connected = 1;

510 g

511 ipc.addr = sk;>saddr;

512 ufh.uh.source = sk;>sport;

513

514 ipc.opt = NULL;

515 ipc. oif = sk;>bound dev if;

516 if (msg;>msg controllen)f

517 err = ip cmsg send(msg, &ipc); 518 if (err)

519 return err; 520 if (ipc.opt) 521 free = 1; 522 connected = 0; 523 g

524 if (! ipc.opt)

525 ipc.opt = sk;>protinfo.af inet.opt;

526

527 ufh.saddr = ipc.addr;

528 ipc.addr = daddr = ufh.daddr;

(14)

530 if (ipc.opt && ipc.opt;>srr)f

531 if (!daddr) 532 return;EINVAL;

533 daddr = ipc.opt;>faddr;

534 connected = 0; 535 g

536 tos = RT TOS(sk;>protinfo.af inet.tos);

537 if (sk;>localroute jj(msg;>msg ags&MSG DONTROUTE)jj

538 (ipc.opt && ipc.opt;>is strictroute))f

539 tos j= RTO ONLINK;

540 connected = 0; 541 g

542

543 if (MULTICAST(daddr))f

544 if (! ipc. oif )

545 ipc. oif = sk;>protinfo.af inet.mc index;

546 if (!ufh.saddr)

547 ufh.saddr = sk;>protinfo.af inet.mc addr;

548 connected = 0; 549 g

555 err = ip route output(&rt, daddr, ufh.saddr, tos , ipc. oif ); 556 if (err)

557 goto out; 558

559 err =;EACCES;

560 if (rt;>rt ags&RTCF BROADCAST && !sk;>broadcast)

561 goto out; 562 if (connected)

563 sk dst set (sk, dst clone(&rt;>u.dst));

564 g

565

566 if (msg;>msg ags&MSG CONFIRM)

567 goto do conrm; 568 back from conrm: 569

570 ufh.saddr = rt;>rt src;

571 if (! ipc.addr)

572 ufh.daddr = ipc.addr = rt;>rt dst;

573 ufh.uh.len = htons(ulen); 574 ufh.uh.check = 0;

575 ufh.iov = msg;>msg iov;

576 ufh.wcheck = 0; 577

578 /RFC1122: OK. Provides the checksumming facility (MUST) as per/

579 /4.1.3.4. It's congurable by the application via setsockopt()/

580 /(MAY) and it defaults to on (MUST)./

581

582 err = ip build xmit(sk,

583 (sk;>no check == UDP CSUM NOXMIT ?

584 udp getfrag nosum : 585 udp getfrag),

586 &ufh, ulen, &ipc, rt , msg;>msg ags);

(15)

683 skb = skb recv datagram(sk, ags, noblock, &err); 684 if (!skb)

685 goto out; 686

687 copied = skb;>len;sizeof(struct udphdr);

688 if (copied>len)f

689 copied = len;

690 msg;>msg agsj= MSG TRUNC;

691 g

692

693 if (skb;>ip summed==CHECKSUM UNNECESSARY)f

694 err = skb copy datagram iovec(skb, sizeof(struct udphdr), msg;>msg iov,

695 copied);

696 g else if (msg;>msg ags&MSG TRUNC)f

697 if ( udp checksum complete(skb)) 698 goto csum copy err;

699 err = skb copy datagram iovec(skb, sizeof(struct udphdr), msg;>msg iov,

700 copied);

701 g else f

702 err = copy and csum toiovec(msg;>msg iov, skb, sizeof(struct udphdr));

703

704 if (err)

705 goto csum copy err; 706 g

707

708 if (err) 709 goto out free ; 710

711 sock recv timestamp(msg, sk, skb); 712

713 /Copy the address./

714 if (sin) 715 f

716 sin;>sin family = AF INET;

717 sin;>sin port = skb;>h.uh;>source;

718 sin;>sin addr.s addr = skb;>nh.iph;>saddr;

719 memset(sin;>sin zero, 0, sizeof(sin;>sin zero));

720 g

721 if (sk;>protinfo.af inet.cmsg ags)

722 ip cmsg recv(msg, skb); 723 err = copied;

(16)

725 out free :

726 skb free datagram(sk, skb); 727 out:

728 return err; 729

730 csum copy err:

731 UDP INC STATS BH(UdpInErrors); 732

733 /Clear queue./

734 if ( ags &MSG PEEK)f

735 int clear = 0;

736 spin lock irq (&sk;>receive queue.lock);

737 if (skb == skb peek(&sk;>receive queue))f

738 skb unlink(skb, &sk;>receive queue);

739 clear = 1;

740 g

741 spin unlock irq(&sk;>receive queue.lock);

742 if ( clear) 743 kfree skb(skb); 744 g

745

746 skb free datagram(sk, skb); 747

748 return ;EAGAIN;

749 g

750

751 int udp connect(struct socksk, struct sockaddruaddr, int addr len)

752 f

753 struct sockaddr inusin = (struct sockaddr in ) uaddr;

754 struct rtable rt;

755 int err; 756

757

758 if (addr len< sizeof(usin))

759 return;EINVAL;

760

761 if (usin;>sin family != AF INET)

762 return;EAFNOSUPPORT;

763

764 sk dst reset (sk); 765

766 err = ip route connect(&rt, usin;>sin addr.s addr, sk;>saddr,

767 sk;>protinfo.af inet.tosjsk;>localroute, sk;>bound dev if);

768 if (err) 769 return err;

770 if (( rt;>rt ags&RTCF BROADCAST) && !sk;>broadcast)f

771 ip rt put(rt ); 772 return;EACCES;

773 g

780 sk;>state = TCP ESTABLISHED;

781

782 sk dst set(sk, &rt;>u.dst);

783 return(0); 784 g

(17)

3.4. RTP/UDP(MULTICAST)EXAMPLE 71

14:19:51.488399 0:20:af:ab:e1:6e 8:0:20:7d:a5:36 ip 85:

brahms.cs.ucl.ac.uk.2167 > bells.cs.ucl.ac.uk.domain: 6517+ (43) 14:19:51.490923 8:0:20:7d:a5:36 0:20:af:ab:e1:6e ip 214:

bells.cs.ucl.ac.uk.domain > brahms.cs.ucl.ac.uk.2167: 6517* 1/2/2 (172)

14:22:05.123790 0:20:af:ab:e1:6e Broadcast arp 42: arp who-has merci.cs.ucl.ac.uk tell brahms.cs.ucl.ac.uk

3.4 RTP/UDP (multicast) Example

For multicast, most higher level protocols are built on top of UDP, so again we need to look inudp.c.

Sending is the same as for unicast, but the receive case has to handle the chance that there is more than one process waiting for copies of a multicast packet, so receiving is a tad dierent. as we can see inudp v4 mcast deliver()

859 read lock(&udp hash lock);

860 sk = udp hash[ntohs(uh;>dest) & (UDP HTABLE SIZE;1)];

861 dif = skb;>dev;>index;

862 sk = udp v4 mcast next(sk, uh;>dest, daddr, uh;>source, saddr, dif);

863 if (sk)f

864 struct socksknext = NULL;

865

866 dof

867 struct sk bu skb1 = skb;

868

869 sknext = udp v4 mcast next(sk;>next, uh;>dest, daddr,

870 uh;>source, saddr, dif);

871 if (sknext)

872 skb1 = skb clone(skb, GFP ATOMIC); 873

874 if (skb1)

875 udp queue rcv skb(sk, skb1); 876 sk = sknext;

877 gwhile(sknext);

878 g else

879 kfree skb(skb);

880 read unlock(&udp hash lock);

Figure 3.13: UDP Receive Multicast

3.4.1 On the wire

(18)

16:25:49.790001 hocus.cs.ucl.ac.uk > 224.1.127.255: igmp nreport 224.1.127.255 [ttl 1] 16:25:49.829479 d230-17.uoregon.edu.1025 > 224.2.163.188.23824: udp 332

16:25:49.853864 d230-17.uoregon.edu.1025 > 224.2.163.188.23824: udp 332 16:25:49.898744 d230-17.uoregon.edu.1025 > 224.2.163.188.23824: udp 332 16:25:49.992876 d230-17.uoregon.edu.1025 > 224.2.163.188.23824: udp 332 16:25:50.104227 d230-17.uoregon.edu.1025 > 224.2.163.188.23824: udp 332 16:25:50.135776 d230-17.uoregon.edu.1025 > 224.2.163.188.23824: udp 332 16:25:50.212101 d230-17.uoregon.edu.1025 > 224.2.163.188.23824: udp 332 16:25:50.272045 d230-17.uoregon.edu.1025 > 224.2.163.188.23824: udp 332 16:25:50.276215 Ez.Stanford.EDU.33235 > 224.2.163.188.23825: udp 88

3.4.2 ...but with a router in the path...

We know where we are going, but not how to get there - some kind strangers along the way will help - these are routers. See chapter seven, eight for more details Linux as a router.

Linux works well as a router. Basically, you need to consider input, then output - the decision above, where a packet was discovered to be destined \for me", in the call in route.cvalidates the source address for the input device,

them decides if its for me, multicast, loopback or for someone else nd adds it to a hash based cache table of most recent routes

Then if we are forwarding, we do fib lookup(). This wills in the routing

structure, including a reference to the input handler - if its a non local delivery, this will be set toip forward() inip forward.c:

ip forwarddoes various checks (router alerts, route options etc) ttl

pro-cessing, mtu work, fragmenting, generating ICMP errors if necessary, NATs, if

needed, then callsNF HOOK(for net lter work, again see chapter 10). and then

this will schedule a call toip forward finish, which checks the route cache for

a fast route decision, and then doesip send(skb).

Then ip send()does fragmentation or not, and calls ip finish output()

does yet more NF work, then calls dst->neighbour->output(skb); and/or hh->hh output(skb);(hardware header cache stu)

3.5 Socket System Call Trace

As an illustration of the actual total API, including another protocol case,e that of arawsocket, used to do ICMP access, lets look at the strace of the ping

programr, usage: ping bells

PING bells.cs.ucl.ac.uk (128.16.5.31) from 128.16.6.226 : 56(84) bytes of data. 64 bytes from bells.cs.ucl.ac.uk (128.16.5.31): icmp_seq=0 ttl=255 time=819 usec

On the wire, this looks as follows (taken from another example:

(19)

execve("/bin/ping", ["ping", "-c", "1", "bells"], [/* 24 vars */]) = 0

brk(0) = 0x805d3d4

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40014000 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory)

open("/etc/ld.so.cache", O_RDONLY) = 3

fstat(3, {st_mode=S_IFREG|0644, st_size=19066, ...}) = 0

old_mmap(NULL, 19066, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40015000

close(3) = 0

open("/lib/libresolv.so.2", O_RDONLY) = 3

fstat(3, {st_mode=S_IFREG|0755, st_size=169720, ...}) = 0

read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340&\0"..., 4096) = 4096 old_mmap(NULL, 60956, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4001a000

mprotect(0x40026000, 11804, PROT_NONE) = 0

old_mmap(0x40026000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0xb000) = 0x40026000 old_mmap(0x40027000, 7708, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4

close(3) = 0

open("/lib/libc.so.6", O_RDONLY) = 3

fstat(3, {st_mode=S_IFREG|0755, st_size=4101324, ...}) = 0

read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\210\212"..., 4096) = 4096 old_mmap(NULL, 1001564, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40029000 mprotect(0x40116000, 30812, PROT_NONE) = 0

old_mmap(0x40116000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0xec000) = 0x40116000 old_mmap(0x4011a000, 14428, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x

close(3) = 0

mprotect(0x40029000, 970752, PROT_READ|PROT_WRITE) = 0 mprotect(0x40029000, 970752, PROT_READ|PROT_EXEC) = 0

munmap(0x40015000, 19066) = 0

personality(PER_LINUX) = 0

getpid() = 4510

getuid() = 0

socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = 3

setuid(0) = 0

brk(0) = 0x805d3d4

brk(0x805d7ec) = 0x805d7ec

brk(0x805e000) = 0x805e000

gettimeofday({960371553, 351449}, NULL) = 0

getpid() = 4510

open("/etc/resolv.conf", O_RDONLY) = 4

fstat64(0x4, 0xbffff3d0) = -1 ENOSYS (Function not implemented)

fstat(4, {st_mode=S_IFREG|0644, st_size=43, ...}) = 0

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000 read(4, "search cs.ucl.ac.uk\nnameserver 1"..., 4096) = 43

read(4, "", 4096) = 0

close(4) = 0

munmap(0x40015000, 4096) = 0

socket(PF_UNIX, SOCK_STREAM, 0) = 4

connect(4, {sin_family=AF_UNIX, path="

close(4) = 0

(20)

fstat(4, {st_mode=S_IFREG|0644, st_size=1744, ...}) = 0

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000 read(4, "#\n# /etc/nsswitch.conf\n#\n# An ex"..., 4096) = 1744

read(4, "", 4096) = 0

close(4) = 0

munmap(0x40015000, 4096) = 0

open("/etc/ld.so.cache", O_RDONLY) = 4

fstat(4, {st_mode=S_IFREG|0644, st_size=19066, ...}) = 0

old_mmap(NULL, 19066, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40015000

close(4) = 0

open("/lib/libnss_files.so.2", O_RDONLY) = 4

fstat(4, {st_mode=S_IFREG|0755, st_size=246652, ...}) = 0

read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p \0\000"..., 4096) = 4096

brk(0x805f000) = 0x805f000

old_mmap(NULL, 36384, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x4011e000 mprotect(0x40126000, 3616, PROT_NONE) = 0

old_mmap(0x40126000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0x7000) = 0x40

close(4) = 0

munmap(0x40015000, 19066) = 0

open("/etc/host.conf", O_RDONLY) = 4

fstat(4, {st_mode=S_IFREG|0644, st_size=26, ...}) = 0

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000 read(4, "order hosts,bind\nmulti on\n", 4096) = 26

read(4, "", 4096) = 0

close(4) = 0

munmap(0x40015000, 4096) = 0

open("/etc/hosts", O_RDONLY) = 4

fcntl(4, F_GETFD) = 0

fcntl(4, F_SETFD, FD_CLOEXEC) = 0

fstat(4, {st_mode=S_IFREG|0644, st_size=80, ...}) = 0

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000 read(4, "127.0.0.1\tlocalhost.localdomain\t"..., 4096) = 80

read(4, "", 4096) = 0

close(4) = 0

munmap(0x40015000, 4096) = 0

open("/etc/ld.so.cache", O_RDONLY) = 4

fstat(4, {st_mode=S_IFREG|0644, st_size=19066, ...}) = 0

old_mmap(NULL, 19066, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40015000

close(4) = 0

open("/lib/libnss_nisplus.so.2", O_RDONLY) = 4

fstat(4, {st_mode=S_IFREG|0755, st_size=252234, ...}) = 0

read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20\37\0"..., 4096) = 4096 old_mmap(NULL, 41972, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x40127000

mprotect(0x40130000, 5108, PROT_NONE) = 0

old_mmap(0x40130000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0x8000) = 0x40

close(4) = 0

open("/lib/libnsl.so.1", O_RDONLY) = 4

fstat(4, {st_mode=S_IFREG|0755, st_size=370141, ...}) = 0

(21)

mprotect(0x40144000, 14376, PROT_NONE) = 0

old_mmap(0x40144000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0x11000) = 0x40144000 old_mmap(0x40146000, 6184, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4

close(4) = 0

munmap(0x40015000, 19066) = 0

uname({sys="Linux", node="ovavu.cs.ucl.ac.uk", ...}) = 0

open("/var/nis/NIS_COLD_START", O_RDONLY) = -1 ENOENT (No such file or directory) open("/var/nis/NIS_COLD_START", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 4

fstat(4, {st_mode=S_IFREG|0644, st_size=19066, ...}) = 0

old_mmap(NULL, 19066, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40015000

close(4) = 0

open("/lib/libnss_nis.so.2", O_RDONLY) = 4

fstat(4, {st_mode=S_IFREG|0755, st_size=255963, ...}) = 0

read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0`\37\0\000"..., 4096) = 4096 old_mmap(NULL, 38488, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x40148000

mprotect(0x40150000, 5720, PROT_NONE) = 0

old_mmap(0x40150000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0x7000) = 0x40150000

close(4) = 0

munmap(0x40015000, 19066) = 0

uname({sys="Linux", node="ovavu.cs.ucl.ac.uk", ...}) = 0 open("/etc/ld.so.cache", O_RDONLY) = 4

fstat(4, {st_mode=S_IFREG|0644, st_size=19066, ...}) = 0

old_mmap(NULL, 19066, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40015000

close(4) = 0

open("/lib/libnss_dns.so.2", O_RDONLY) = 4

fstat(4, {st_mode=S_IFREG|0755, st_size=67580, ...}) = 0

read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300\r\0"..., 4096) = 4096 old_mmap(NULL, 15088, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x40152000

mprotect(0x40155000, 2800, PROT_NONE) = 0

old_mmap(0x40155000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0x2000) = 0x40155000

close(4) = 0

munmap(0x40015000, 19066) = 0

socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4

connect(4, {sin_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("128.16.5.31")}}, 16) = 0 send(4, "V\'\1\0\0\1\0\0\0\0\0\0\5bells\2cs\3ucl\2ac\2uk\0"..., 36, 0) = 36

time(NULL) = 960371553

poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 5000) = 1

recvfrom(4, "V\'\205\200\0\1\0\1\0\4\0\6\5bells\2cs\3ucl\2ac\2uk\0"..., 1024, 0, {sin_family=AF_IN

close(4) = 0

socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4

connect(4, {sin_family=AF_INET, sin_port=htons(1025), sin_addr=inet_addr("128.16.5.31")}}, 16) = 0 getsockname(4, {sin_family=AF_INET, sin_port=htons(1031), sin_addr=inet_addr("128.16.6.226")}}, [1

close(4) = 0

bind(3, {sin_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("128.16.6.226")}}, 16) = 0 setsockopt(3, IPPROTO_RAW1, [-6202], 4) = 0

getpid() = 4510

fstat(1, {st_mode=S_IFREG|0664, st_size=8905, ...}) = 0

(22)

rt_sigaction(SIGALRM, {0x8049e74, [], SA_INTERRUPT|0x4000000}, NULL, 8) = 0 gettimeofday({960371553, 403056}, NULL) = 0

gettimeofday({960371553, 403316}, NULL) = 0

sendmsg(3, {msg_name(16)={sin_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("128 rt_sigaction(SIGALRM, {0x804a670, [], SA_INTERRUPT|0x4000000}, NULL, 8) = 0

setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={10, 0}}, NULL) = 0

time(NULL) = 960371553

recvfrom(3, "E\0\0TR\250\0\0\377\1\\\337\200\20\5\37\200\20\6\342\0"..., 192, 0, {sin_fam gettimeofday({960371553, 407464}, NULL) = 0

brk(0x8060000) = 0x8060000

socket(PF_UNIX, SOCK_STREAM, 0) = 4

connect(4, {sin_family=AF_UNIX, path="

close(4) = 0

open("/etc/hosts", O_RDONLY) = 4

fcntl(4, F_GETFD) = 0

fcntl(4, F_SETFD, FD_CLOEXEC) = 0

fstat(4, {st_mode=S_IFREG|0644, st_size=80, ...}) = 0

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40016000 read(4, "127.0.0.1\tlocalhost.localdomain\t"..., 4096) = 80

read(4, "", 4096) = 0

close(4) = 0

munmap(0x40016000, 4096) = 0

open("/var/nis/NIS_COLD_START", O_RDONLY) = -1 ENOENT (No such file or directory) uname({sys="Linux", node="ovavu.cs.ucl.ac.uk", ...}) = 0

socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4

connect(4, {sin_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("128.16.5.31")}}, send(4, "V(\1\0\0\1\0\0\0\0\0\0\00231\0015\00216\003128\7in-add"..., 42, 0) = 42

time(NULL) = 960371553

poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 5000) = 1

recvfrom(4, "V(\205\200\0\1\0\1\0\3\0\3\00231\0015\00216\003128\7in"..., 1024, 0, {sin_fa

close(4) = 0

write(1, "PING bells.cs.ucl.ac.uk (128.16."..., 159PING bells.cs.ucl.ac.uk (128.16.5.31) 64 bytes from bells.cs.ucl.ac.uk (128.16.5.31): icmp_seq=0 ttl=255 time=4.1 ms

) = 159

rt_sigaction(SIGALRM, {SIG_IGN}, NULL, 8) = 0 write(1, "\n", 1

) = 1

write(1, "--- bells.cs.ucl.ac.uk ping stat"..., 141--- bells.cs.ucl.ac.uk ping statistic 1 packets transmitted, 1 packets received, 0% packet loss

round-trip min/avg/max = 4.1/4.1/4.1 ms ) = 141

munmap(0x40015000, 4096) = 0

Referensi

Dokumen terkait

Gaji yang diberikan perusahaan sudah layak sesuai dengan penempatan kerja saya.. Gaji yang diberikan perusahaan sudah dapat meningkatkan motivasi dan semangat

Berdasarkan hasil wawancara dengan para kepala dan guru yang dipilih untuk dijadikan informan diketahui bahwa secara umum faktor pendukung pelaksanaan supervisi akademik kepala

Bagi peneliti selanjutnya disarankan untuk melakukan penelitian di luar variabel bebas yang digunakan dalam penelitian ini, misalnya bauran periklanan, bauran distribusi

Kuliah Kerja Praktik, selanjutnya disingkat dengan KKP, adalah mata kuliah wajib yang harus diselesaikan oleh seorang mahasiswa Fakultas Teknik Universitas Syiah

Poor BCS and age of the cows, production system, educational status of dairy owners and problems related to AI had all signiicant effect on the reproductive performance of the

bahwa dalam rangka meningkatkan daya guna dan hasil guna serta kesejahteraan Anggota Tentara Nasional Indonesia, perlu mengubah gaji pokok Prajurit Tentara Nasional

Dari penelitian yang telah dilakukan laju perpindahan panas pada semua variasi sudut static mixer mengalami peningkatan dibandingkan dengan plain tube (tanpa

Corrective Action must be addressed within time frame stated. Verification of action will occur at next visit. Additional follow up may be required as indicated.. Define Close