summaryrefslogtreecommitdiff
path: root/doc/gawkinet.texi
blob: 0573c8f515b7e561b36b13c5d03d171722ec57e7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
2883
2884
2885
2886
2887
2888
2889
2890
2891
2892
2893
2894
2895
2896
2897
2898
2899
2900
2901
2902
2903
2904
2905
2906
2907
2908
2909
2910
2911
2912
2913
2914
2915
2916
2917
2918
2919
2920
2921
2922
2923
2924
2925
2926
2927
2928
2929
2930
2931
2932
2933
2934
2935
2936
2937
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961
2962
2963
2964
2965
2966
2967
2968
2969
2970
2971
2972
2973
2974
2975
2976
2977
2978
2979
2980
2981
2982
2983
2984
2985
2986
2987
2988
2989
2990
2991
2992
2993
2994
2995
2996
2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
3012
3013
3014
3015
3016
3017
3018
3019
3020
3021
3022
3023
3024
3025
3026
3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
3055
3056
3057
3058
3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
3069
3070
3071
3072
3073
3074
3075
3076
3077
3078
3079
3080
3081
3082
3083
3084
3085
3086
3087
3088
3089
3090
3091
3092
3093
3094
3095
3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
3107
3108
3109
3110
3111
3112
3113
3114
3115
3116
3117
3118
3119
3120
3121
3122
3123
3124
3125
3126
3127
3128
3129
3130
3131
3132
3133
3134
3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
3164
3165
3166
3167
3168
3169
3170
3171
3172
3173
3174
3175
3176
3177
3178
3179
3180
3181
3182
3183
3184
3185
3186
3187
3188
3189
3190
3191
3192
3193
3194
3195
3196
3197
3198
3199
3200
3201
3202
3203
3204
3205
3206
3207
3208
3209
3210
3211
3212
3213
3214
3215
3216
3217
3218
3219
3220
3221
3222
3223
3224
3225
3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236
3237
3238
3239
3240
3241
3242
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253
3254
3255
3256
3257
3258
3259
3260
3261
3262
3263
3264
3265
3266
3267
3268
3269
3270
3271
3272
3273
3274
3275
3276
3277
3278
3279
3280
3281
3282
3283
3284
3285
3286
3287
3288
3289
3290
3291
3292
3293
3294
3295
3296
3297
3298
3299
3300
3301
3302
3303
3304
3305
3306
3307
3308
3309
3310
3311
3312
3313
3314
3315
3316
3317
3318
3319
3320
3321
3322
3323
3324
3325
3326
3327
3328
3329
3330
3331
3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
3353
3354
3355
3356
3357
3358
3359
3360
3361
3362
3363
3364
3365
3366
3367
3368
3369
3370
3371
3372
3373
3374
3375
3376
3377
3378
3379
3380
3381
3382
3383
3384
3385
3386
3387
3388
3389
3390
3391
3392
3393
3394
3395
3396
3397
3398
3399
3400
3401
3402
3403
3404
3405
3406
3407
3408
3409
3410
3411
3412
3413
3414
3415
3416
3417
3418
3419
3420
3421
3422
3423
3424
3425
3426
3427
3428
3429
3430
3431
3432
3433
3434
3435
3436
3437
3438
3439
3440
3441
3442
3443
3444
3445
3446
3447
3448
3449
3450
3451
3452
3453
3454
3455
3456
3457
3458
3459
3460
3461
3462
3463
3464
3465
3466
3467
3468
3469
3470
3471
3472
3473
3474
3475
3476
3477
3478
3479
3480
3481
3482
3483
3484
3485
3486
3487
3488
3489
3490
3491
3492
3493
3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504
3505
3506
3507
3508
3509
3510
3511
3512
3513
3514
3515
3516
3517
3518
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530
3531
3532
3533
3534
3535
3536
3537
3538
3539
3540
3541
3542
3543
3544
3545
3546
3547
3548
3549
3550
3551
3552
3553
3554
3555
3556
3557
3558
3559
3560
3561
3562
3563
3564
3565
3566
3567
3568
3569
3570
3571
3572
3573
3574
3575
3576
3577
3578
3579
3580
3581
3582
3583
3584
3585
3586
3587
3588
3589
3590
3591
3592
3593
3594
3595
3596
3597
3598
3599
3600
3601
3602
3603
3604
3605
3606
3607
3608
3609
3610
3611
3612
3613
3614
3615
3616
3617
3618
3619
3620
3621
3622
3623
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639
3640
3641
3642
3643
3644
3645
3646
3647
3648
3649
3650
3651
3652
3653
3654
3655
3656
3657
3658
3659
3660
3661
3662
3663
3664
3665
3666
3667
3668
3669
3670
3671
3672
3673
3674
3675
3676
3677
3678
3679
3680
3681
3682
3683
3684
3685
3686
3687
3688
3689
3690
3691
3692
3693
3694
3695
3696
3697
3698
3699
3700
3701
3702
3703
3704
3705
3706
3707
3708
3709
3710
3711
3712
3713
3714
3715
3716
3717
3718
3719
3720
3721
3722
3723
3724
3725
3726
3727
3728
3729
3730
3731
3732
3733
3734
3735
3736
3737
3738
3739
3740
3741
3742
3743
3744
3745
3746
3747
3748
3749
3750
3751
3752
3753
3754
3755
3756
3757
3758
3759
3760
3761
3762
3763
3764
3765
3766
3767
3768
3769
3770
3771
3772
3773
3774
3775
3776
3777
3778
3779
3780
3781
3782
3783
3784
3785
3786
3787
3788
3789
3790
3791
3792
3793
3794
3795
3796
3797
3798
3799
3800
3801
3802
3803
3804
3805
3806
3807
3808
3809
3810
3811
3812
3813
3814
3815
3816
3817
3818
3819
3820
3821
3822
3823
3824
3825
3826
3827
3828
3829
3830
3831
3832
3833
3834
3835
3836
3837
3838
3839
3840
3841
3842
3843
3844
3845
3846
3847
3848
3849
3850
3851
3852
3853
3854
3855
3856
3857
3858
3859
3860
3861
3862
3863
3864
3865
3866
3867
3868
3869
3870
3871
3872
3873
3874
3875
3876
3877
3878
3879
3880
3881
3882
3883
3884
3885
3886
3887
3888
3889
3890
3891
3892
3893
3894
3895
3896
3897
3898
3899
3900
3901
3902
3903
3904
3905
3906
3907
3908
3909
3910
3911
3912
3913
3914
3915
3916
3917
3918
3919
3920
3921
3922
3923
3924
3925
3926
3927
3928
3929
3930
3931
3932
3933
3934
3935
3936
3937
3938
3939
3940
3941
3942
3943
3944
3945
3946
3947
3948
3949
3950
3951
3952
3953
3954
3955
3956
3957
3958
3959
3960
3961
3962
3963
3964
3965
3966
3967
3968
3969
3970
3971
3972
3973
3974
3975
3976
3977
3978
3979
3980
3981
3982
3983
3984
3985
3986
3987
3988
3989
3990
3991
3992
3993
3994
3995
3996
3997
3998
3999
4000
4001
4002
4003
4004
4005
4006
4007
4008
4009
4010
4011
4012
4013
4014
4015
4016
4017
4018
4019
4020
4021
4022
4023
4024
4025
4026
4027
4028
4029
4030
4031
4032
4033
4034
4035
4036
4037
4038
4039
4040
4041
4042
4043
4044
4045
4046
4047
4048
4049
4050
4051
4052
4053
4054
4055
4056
4057
4058
4059
4060
4061
4062
4063
4064
4065
4066
4067
4068
4069
4070
4071
4072
4073
4074
4075
4076
4077
4078
4079
4080
4081
4082
4083
4084
4085
4086
4087
4088
4089
4090
4091
4092
4093
4094
4095
4096
4097
4098
4099
4100
4101
4102
4103
4104
4105
4106
4107
4108
4109
4110
4111
4112
4113
4114
4115
4116
4117
4118
4119
4120
4121
4122
4123
4124
4125
4126
4127
4128
4129
4130
4131
4132
4133
4134
4135
4136
4137
4138
4139
4140
4141
4142
4143
4144
4145
4146
4147
4148
4149
4150
4151
4152
4153
4154
4155
4156
4157
4158
4159
4160
4161
4162
4163
4164
4165
4166
4167
4168
4169
4170
4171
4172
4173
4174
4175
4176
4177
4178
4179
4180
4181
4182
4183
4184
4185
4186
4187
4188
4189
4190
4191
4192
4193
4194
4195
4196
4197
4198
4199
4200
4201
4202
4203
4204
4205
4206
4207
4208
4209
4210
4211
4212
4213
4214
4215
4216
4217
4218
4219
4220
4221
4222
4223
4224
4225
4226
4227
4228
4229
4230
4231
4232
4233
4234
4235
4236
4237
4238
4239
4240
4241
4242
4243
4244
4245
4246
4247
4248
4249
4250
4251
4252
4253
4254
4255
4256
4257
4258
4259
4260
4261
4262
4263
4264
4265
4266
4267
4268
4269
4270
4271
4272
4273
4274
4275
4276
4277
4278
4279
4280
4281
4282
4283
4284
4285
4286
4287
4288
4289
4290
4291
4292
4293
4294
4295
4296
4297
4298
4299
4300
4301
4302
4303
4304
4305
4306
4307
4308
4309
4310
4311
4312
4313
4314
4315
4316
4317
4318
4319
4320
4321
4322
4323
4324
4325
4326
4327
4328
4329
4330
4331
4332
4333
4334
4335
4336
4337
4338
4339
4340
4341
4342
4343
4344
4345
4346
4347
4348
4349
4350
4351
4352
4353
4354
4355
4356
4357
4358
4359
4360
4361
4362
4363
4364
4365
4366
4367
4368
4369
4370
4371
4372
4373
4374
4375
4376
4377
4378
4379
4380
4381
4382
4383
4384
4385
4386
4387
4388
4389
4390
4391
4392
4393
4394
4395
4396
4397
4398
4399
4400
4401
4402
4403
4404
4405
4406
4407
4408
4409
4410
4411
4412
4413
4414
4415
4416
4417
4418
4419
4420
4421
4422
4423
4424
4425
4426
4427
4428
4429
4430
4431
4432
4433
4434
4435
4436
4437
4438
4439
4440
4441
4442
4443
4444
4445
4446
4447
4448
4449
4450
4451
4452
4453
4454
4455
4456
4457
4458
4459
4460
4461
4462
4463
4464
4465
4466
4467
4468
4469
4470
4471
4472
4473
4474
4475
4476
4477
4478
4479
4480
4481
4482
4483
4484
4485
4486
4487
4488
4489
4490
4491
4492
4493
4494
4495
4496
4497
4498
4499
4500
4501
4502
4503
4504
4505
4506
4507
4508
4509
4510
4511
4512
4513
4514
4515
4516
4517
4518
4519
4520
4521
4522
4523
4524
4525
4526
4527
4528
4529
4530
4531
4532
4533
4534
4535
4536
4537
4538
4539
4540
4541
4542
4543
4544
4545
4546
4547
4548
4549
4550
4551
4552
4553
4554
4555
4556
4557
4558
4559
4560
4561
4562
4563
4564
4565
4566
4567
4568
4569
4570
4571
4572
4573
4574
4575
4576
4577
4578
4579
4580
4581
4582
4583
4584
4585
4586
4587
4588
4589
4590
4591
4592
4593
4594
4595
4596
4597
4598
4599
4600
4601
4602
4603
4604
4605
4606
4607
4608
4609
4610
4611
4612
4613
4614
4615
4616
4617
4618
4619
4620
4621
4622
4623
4624
4625
4626
4627
4628
4629
4630
4631
4632
4633
4634
4635
4636
4637
4638
4639
4640
4641
4642
4643
4644
4645
4646
4647
4648
4649
4650
4651
4652
4653
4654
4655
4656
4657
4658
4659
4660
4661
4662
4663
4664
4665
4666
4667
4668
4669
4670
4671
4672
4673
4674
4675
4676
4677
4678
4679
4680
4681
4682
4683
4684
4685
4686
4687
4688
4689
4690
4691
4692
4693
4694
4695
4696
4697
4698
4699
4700
4701
4702
4703
4704
4705
4706
4707
4708
4709
4710
4711
4712
4713
4714
4715
4716
4717
4718
4719
4720
4721
4722
4723
4724
4725
4726
4727
4728
4729
4730
4731
4732
4733
4734
4735
4736
4737
4738
4739
4740
4741
4742
4743
4744
4745
4746
4747
4748
4749
4750
4751
4752
4753
4754
4755
4756
4757
4758
4759
4760
4761
4762
4763
4764
4765
4766
4767
4768
4769
4770
4771
4772
4773
4774
4775
4776
4777
4778
4779
4780
4781
4782
4783
4784
4785
4786
4787
4788
4789
4790
4791
4792
4793
4794
4795
4796
4797
4798
4799
4800
4801
4802
4803
4804
4805
4806
4807
4808
4809
4810
4811
4812
4813
4814
4815
4816
4817
4818
4819
4820
4821
4822
4823
4824
4825
4826
4827
4828
4829
4830
4831
4832
4833
4834
4835
4836
4837
4838
4839
4840
4841
4842
4843
4844
4845
4846
4847
4848
4849
4850
4851
4852
4853
4854
4855
4856
4857
4858
4859
4860
4861
4862
4863
4864
4865
4866
4867
4868
4869
4870
4871
4872
4873
4874
4875
4876
4877
4878
4879
4880
4881
4882
4883
4884
4885
4886
4887
4888
4889
4890
4891
4892
4893
4894
4895
4896
4897
4898
4899
4900
4901
4902
4903
4904
4905
4906
4907
4908
4909
4910
4911
4912
4913
4914
4915
4916
4917
4918
4919
4920
4921
4922
4923
4924
4925
4926
4927
4928
4929
4930
4931
4932
4933
4934
4935
4936
4937
4938
4939
4940
4941
4942
4943
4944
4945
4946
4947
4948
4949
4950
4951
4952
4953
4954
4955
4956
4957
4958
4959
4960
4961
4962
4963
4964
4965
4966
4967
4968
4969
4970
4971
4972
4973
4974
4975
4976
4977
4978
4979
4980
4981
4982
4983
4984
4985
4986
4987
4988
4989
4990
4991
4992
4993
4994
4995
4996
4997
4998
4999
5000
5001
5002
5003
5004
5005
5006
5007
5008
5009
5010
5011
5012
5013
5014
5015
5016
5017
5018
5019
5020
5021
5022
5023
5024
5025
5026
5027
5028
5029
5030
5031
5032
5033
5034
5035
5036
5037
5038
5039
5040
5041
5042
5043
5044
5045
5046
5047
5048
5049
5050
5051
5052
5053
5054
5055
5056
5057
5058
5059
5060
5061
5062
5063
5064
5065
5066
5067
5068
5069
5070
5071
5072
5073
5074
5075
5076
5077
5078
5079
5080
5081
5082
5083
5084
5085
5086
5087
5088
5089
5090
5091
5092
5093
5094
5095
5096
5097
5098
5099
5100
5101
5102
5103
5104
5105
5106
5107
5108
5109
5110
5111
5112
5113
5114
5115
5116
5117
5118
5119
5120
5121
5122
5123
5124
5125
5126
5127
5128
5129
5130
5131
5132
5133
5134
5135
5136
5137
5138
5139
5140
5141
5142
5143
5144
5145
5146
5147
5148
5149
5150
5151
5152
5153
5154
5155
5156
5157
5158
5159
5160
5161
5162
5163
5164
5165
5166
5167
5168
5169
5170
5171
5172
5173
5174
5175
5176
5177
5178
5179
5180
5181
5182
5183
5184
5185
5186
5187
5188
5189
5190
5191
\input texinfo   @c -*-texinfo-*-      
@c %**start of header (This is for running Texinfo on a region.)
@setfilename gawkinet.info
@settitle TCP/IP Internetworking With @command{gawk}
@c %**end of header (This is for running Texinfo on a region.)
@c FIXME: web vs. Web

@dircategory Text creation and manipulation
@direntry
* Gawkinet: (gawkinet).         TCP/IP Internetworking With `gawk'.
@end direntry

@iftex
@set DOCUMENT book
@set CHAPTER chapter
@set SECTION section
@set DARKCORNER @inmargin{@image{lflashlight,1cm}, @image{rflashlight,1cm}}
@end iftex
@ifinfo
@set DOCUMENT Info file
@set CHAPTER major node
@set SECTION node
@set DARKCORNER (d.c.)
@end ifinfo
@ifhtml
@set DOCUMENT web page
@set CHAPTER chapter
@set SECTION section
@set DARKCORNER (d.c.)
@end ifhtml

@set FSF

@set FN file name
@set FFN File Name

@c merge the function and variable indexes into the concept index
@ifinfo
@synindex fn cp
@synindex vr cp
@end ifinfo
@iftex
@syncodeindex fn cp
@syncodeindex vr cp
@end iftex

@c If "finalout" is commented out, the printed output will show
@c black boxes that mark lines that are too long.  Thus, it is
@c unwise to comment it out when running a master in case there are
@c overfulls which are deemed okay.

@iftex
@finalout
@end iftex

@smallbook

@c Special files are described in chapter 6 Printing Output under
@c 6.7 Special File Names in gawk. I think the networking does not
@c fit into that chapter, thus this separate document. At over 50
@c pages, I think this is the right decision.  ADR.

@set TITLE TCP/IP Internetworking With @command{gawk}
@set EDITION 1.1
@set UPDATE-MONTH April, 2002
@c gawk versions:
@set VERSION 3.1
@set PATCHLEVEL 1

@copying
This is Edition @value{EDITION} of @cite{@value{TITLE}},
for the @value{VERSION}.@value{PATCHLEVEL} (or later) version of the GNU
implementation of AWK.
@sp 2
Copyright (C) 2000, 2001, 2002 Free Software Foundation, Inc.
@sp 2
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with the
Invariant Sections being ``GNU General Public License'', the Front-Cover
texts being (a) (see below), and with the Back-Cover Texts being (b)
(see below).  A copy of the license is included in the section entitled
``GNU Free Documentation License''.

@enumerate a
@item
``A GNU Manual''

@item
``You have freedom to copy and modify this GNU Manual, like GNU
software.  Copies published by the Free Software Foundation raise
funds for GNU development.''
@end enumerate
@end copying

@ifinfo
This file documents the networking features in GNU @command{awk}.

@insertcopying
@end ifinfo

@setchapternewpage odd

@titlepage
@title @value{TITLE}
@subtitle Edition @value{EDITION}
@subtitle @value{UPDATE-MONTH}
@author J@"urgen Kahrs
@author with Arnold D. Robbins

@c Include the Distribution inside the titlepage environment so
@c that headings are turned off.  Headings on and off do not work.

@page
@vskip 0pt plus 1filll
@sp 2
Published by:
@sp 1

Free Software Foundation @*
59 Temple Place --- Suite 330 @*
Boston, MA 02111-1307 USA @*
Phone: +1-617-542-5942 @*
Fax: +1-617-542-2652 @*
Email: @email{gnu@@gnu.org} @*
URL: @uref{http://www.gnu.org/} @*

ISBN 1-882114-93-0 @*

@insertcopying

@c @sp 2
@c Cover art by ?????.
@end titlepage

@iftex
@headings off
@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @|
@oddheading  @| @| @strong{@thischapter}@ @ @ @thispage
@end iftex

@ifnottex
@node Top, Preface, (dir), (dir)
@top General Introduction
@comment node-name, next,          previous, up

This file documents the networking features in GNU Awk (@command{gawk})
version 3.1 and later.

@insertcopying
@end ifnottex

@menu
* Preface::                          About this document.
* Introduction::                     About networkiing.
* Using Networking::                 Some examples.
* Some Applications and Techniques:: More extended examples.
* Links::                            Where to find the stuff mentioned in this
                                     document.
* GNU Free Documentation License::   The license for this document.
* Index::                            The index.

@detailmenu
* Stream Communications::          Sending data streams.
* Datagram Communications::        Sending self-contained messages.
* The TCP/IP Protocols::           How these models work in the Internet.
* Basic Protocols::                The basic protocols.
* Ports::                          The idea behind ports.
* Making Connections::             Making TCP/IP connections.
* Gawk Special Files::             How to do @command{gawk} networking.
* Special File Fields::            The fields in the special file name.
* Comparing Protocols::            Differences between the protocols.
* File /inet/tcp::                 The TCP special file.
* File /inet/udp::                 The UDP special file.
* File /inet/raw::                 The RAW special file.
* TCP Connecting::                 Making a TCP connection.
* Troubleshooting::                Troubleshooting TCP/IP connections.
* Interacting::                    Interacting with a service.
* Setting Up::                     Setting up a service.
* Email::                          Reading email.
* Web page::                       Reading a Web page.
* Primitive Service::              A primitive Web service.
* Interacting Service::            A Web service with interaction.
* CGI Lib::                        A simple CGI library.
* Simple Server::                  A simple Web server.
* Caveats::                        Network programming caveats.
* Challenges::                     Where to go from here.
* PANIC::                          An Emergency Web Server.
* GETURL::                         Retrieving Web Pages.
* REMCONF::                        Remote Configuration Of Embedded Systems.
* URLCHK::                         Look For Changed Web Pages.
* WEBGRAB::                        Extract Links From A Page.
* STATIST::                        Graphing A Statistical Distribution.
* MAZE::                           Walking Through A Maze In Virtual Reality.
* MOBAGWHO::                       A Simple Mobile Agent.
* STOXPRED::                       Stock Market Prediction As A Service.
* PROTBASE::                       Searching Through A Protein Database.
@end detailmenu
@end menu

@contents

@node Preface, Introduction, Top, Top
@unnumbered Preface

In May of 1997, J@"urgen Kahrs felt the need for network access
from @command{awk}, and, with a little help from me, set about adding
features to do this for @command{gawk}.  At that time, he
wrote the bulk of this @value{DOCUMENT}.

The code and documentation were added to the @command{gawk} 3.1 development
tree, and languished somewhat until I could finally get
down to some serious work on that version of @command{gawk}.
This finally happened in the middle of 2000.

Meantime, J@"urgen wrote an article about the Internet special
files and @samp{|&} operator for @cite{Linux Journal}, and made a
networking patch for the production versions of @command{gawk}
available from his home page.
In August of 2000 (for @command{gawk} 3.0.6), this patch
also made it to the main GNU @command{ftp} distribution site.

For release with @command{gawk}, I edited J@"urgen's prose
for English grammar and style, as he is not a native English
speaker.  I also
rearranged the material somewhat for what I felt was a better order of
presentation, and (re)wrote some of the introductory material.

The majority of this document and the code are his work, and the
high quality and interesting ideas speak for themselves.  It is my
hope that these features will be of significant value to the @command{awk}
community.

@sp 1
@noindent
Arnold Robbins @*
Nof Ayalon, ISRAEL @*
March, 2001

@node Introduction, Using Networking, Preface, Top
@chapter Networking Concepts

This @value{CHAPTER} provides a (necessarily) brief intoduction to
computer networking concepts.  For many applications of @command{gawk}
to TCP/IP networking, we hope that this is enough.  For more
advanced tasks, you will need deeper background, and it may be necessary
to switch to lower-level programming in C or C++.

There are two real-life models for the way computers send messages
to each other over a network.  While the analogies are not perfect,
they are close enough to convey the major concepts.
These two models are the phone system (reliable byte-stream communications),
and the postal system (best-effort datagrams).

@menu
* Stream Communications::       Sending data streams.
* Datagram Communications::     Sending self-contained messages.
* The TCP/IP Protocols::        How these models work in the Internet.
* Making Connections::          Making TCP/IP connections.
@end menu

@node Stream Communications, Datagram Communications, Introduction, Introduction
@section Reliable Byte-streams (Phone Calls)

When you make a phone call, the following steps occur:

@enumerate
@item
You dial a number.

@item
The phone system connects to the called party, telling
them there is an incoming call. (Their phone rings.)

@item
The other party answers the call, or, in the case of a
computer network, refuses to answer the call.

@item
Assuming the other party answers, the connection between
you is now a @dfn{duplex} (two-way), @dfn{reliable} (no data lost),
sequenced (data comes out in the order sent) data stream.

@item
You and your friend may now talk freely, with the phone system
moving the data (your voices) from one end to the other.
From your point of view, you have a direct end-to-end
connection with the person on the other end.
@end enumerate

The same steps occur in a duplex reliable computer networking connection.
There is considerably more overhead in setting up the communications,
but once it's done, data moves in both directions, reliably, in sequence.

@node Datagram Communications, The TCP/IP Protocols, Stream Communications, Introduction
@section Best-effort Datagrams (Mailed Letters)

Suppose you mail three different documents to your office on the
other side of the country on two different days.  Doing so
entails the following.

@enumerate
@item
Each document travels in its own envelope.

@item
Each envelope contains both the sender and the
recipient address.

@item
Each envelope may travel a different route to its destination.

@item
The envelopes may arrive in a different order from the one
in which they were sent.

@item
One or more may get lost in the mail.
(Although, fortunately, this does not occur very often.)

@item
In a computer network, one or more @dfn{packets}
may also arrive multiple times.  (This doesn't happen
with the postal system!)

@end enumerate

The important characteristics of datagram communications, like
those of the postal system are thus:

@itemize @bullet
@item
Delivery is ``best effort;'' the data may never get there.

@item
Each message is self-contained, including the source and
destination addresses.

@item
Delivery is @emph{not} sequenced; packets may arrive out
of order, and/or multiple times.

@item
Unlike the phone system, overhead is considerably lower.
It is not necessary to set up the call first.
@end itemize

The price the user pays for the lower overhead of datagram communications
is exactly the lower reliability; it is often necessary for user-level
protocols that use datagram communications to add their own reliability
features on top of the basic communications.

@node The TCP/IP Protocols, Making Connections, Datagram Communications, Introduction
@section The Internet Protocols

The Internet Protocol Suite (usually referred to as just TCP/IP)@footnote{
It should be noted that although the Internet seems to have conquered the
world, there are other networking protocol suites in existence and in use.}
consists of a number of different protocols at different levels or ``layers.''
For our purposes, three protocols provide the fundamental communications
mechanisms.  All other defined protocols are referred to as user-level
protocols (e.g., HTTP, used later in this @value{DOCUMENT}).

@menu
* Basic Protocols::             The basic protocols.
* Ports::                       The idea behind ports.
@end menu

@node Basic Protocols, Ports, The TCP/IP Protocols, The TCP/IP Protocols
@subsection The Basic Internet Protocols

@table @asis
@item IP
The Internet Protocol.  This protocol is almost never used directly by
applications.  It provides the basic packet delivery and routing infrastructure
of the Internet.  Much like the phone company's switching centers or the Post
Office's trucks, it is not of much day-to-day interest to the regular user
(or programmer).
It happens to be a best effort datagram protocol.

@item UDP
The User Datagram Protocol.  This is a best effort datagram protocol.
It provides a small amount of extra reliability over IP, and adds
the notion of @dfn{ports}, described in @ref{Ports, ,TCP and UDP Ports}.

@item TCP
The Transmission Control Protocol.  This is a duplex, reliable, sequenced
byte-stream protocol, again layered on top of IP, and also providing the
notion of ports.  This is the protocol that you will most likely use
when using @command{gawk} for network programming.
@end table

All other user-level protocols use either TCP or UDP to do their basic
communications.  Examples are SMTP (Simple Mail Transfer Protocol),
FTP (File Transfer Protocol), and HTTP (HyperText Transfer Protocol).
@cindex SMTP (Simple Mail Transfer Protocol)
@cindex FTP (File Transfer Protocol)
@cindex HTTP (Hypertext Transfer Protocol)

@node Ports, , Basic Protocols, The TCP/IP Protocols
@subsection TCP and UDP Ports

In the postal system, the address on an envelope indicates a physical
location, such as a residence or office building.  But there may be
more than one person at a location; thus you have to further quantify
the recipient by putting a person or company name on the envelope.

In the phone system, one phone number may represent an entire company,
in which case you need a person's extension number in order to
reach that individual directly.  Or, when you call a home, you have to
say, ``May I please speak to ...'' before talking to the person directly.

IP networking provides the concept of addressing.  An IP address represents
a particular computer, but no more.  In order to reach the mail service
on a system, or the FTP or WWW service on a system, you must have some
way to further specify which service you want.  In the Internet Protocol suite,
this is done with @dfn{port numbers}, which represent the services, much
like an extension number used with a phone number.

Port numbers are 16-bit integers.  Unix and Unix-like systems reserve ports
below 1024 for ``well known'' services, such as SMTP, FTP, and HTTP.
Numbers 1024 and above may be used by any application, although there is no
promise made that a particular port number is always available.

@node Making Connections, , The TCP/IP Protocols, Introduction
@section Making TCP/IP Connections (And Some Terminology)

Two terms come up repeatedly when discussing networking:
@dfn{client} and @dfn{server}.  For now, we'll discuss these terms
at the @dfn{connection level}, when first establishing connections
between two processes on different systems over a network.
(Once the connection is established, the higher level, or
@dfn{application level} protocols,
such as HTTP or FTP, determine who is the client and who is the
server.  Often, it turns out that the client and server are the
same in both roles.)

@cindex servers
The @dfn{server} is the system providing the service, such as the
web server or email server.  It is the @dfn{host} (system) which
is @emph{connected to} in a transaction.
For this to work though, the server must be expecting connections.
Much as there has to be someone at the office building to answer
the phone@footnote{In the days before voice mail systems!}, the
server process (usually) has to be started first and be waiting
for a connection.

@cindex clients
The @dfn{client} is the system requesting the service.
It is the system @emph{initiating the connection} in a transaction.
(Just as when you pick up the phone to call an office or store.)

In the TCP/IP framework, each end of a connection is represented by a pair
of (@var{address}, @var{port}) pairs.  For the duration of the connection,
the ports in use at each end are unique, and cannot be used simultaneously
by other processes on the same system.  (Only after closing a connection
can a new one be built up on the same port. This is contrary to the usual
behavior of fully developed web servers which have to avoid situations
in which they are not reachable. We have to pay this price in order to
enjoy the benefits of a simple communication paradigm in @command{gawk}.)

@cindex blocking
@cindex synchronous communications
Furthermore, once the connection is established, communications are
@dfn{synchronous}.@footnote{For the technically savvy, data reads
block---if there's no incoming data, the program is made to wait until
there is, instead of receiving a ``there's no data'' error return.} I.e.,
each end waits on the other to finish transmitting, before replying. This
is much like two people in a phone conversation.  While both could talk
simultaneously, doing so usually doesn't work too well.

In the case of TCP, the synchronicity is enforced by the protocol when
sending data.  Data writes @dfn{block} until the data have been received on the
other end.  For both TCP and UDP, data reads block until there is incoming
data waiting to be read.  This is summarized in the following table,
where an ``X'' indicates that the given action blocks.

@ifnottex
@multitable {Protocol}  {Reads}  {Writes}
@item TCP @tab X @tab X
@item UDP @tab X @tab
@item RAW @tab X @tab
@end multitable
@end ifnottex
@tex
\centerline{
\vbox{\bigskip % space above the table (about 1 linespace)
% Because we have vertical rules, we can't let TeX insert interline space
% in its usual way.
\offinterlineskip
\halign{\hfil\strut# &\vrule #& \hfil#\hfil& \hfil#\hfil\cr
Protocol&&\quad Reads\quad &Writes\cr
\noalign{\hrule}
\omit&height 2pt\cr
\noalign{\hrule height0pt}% without this the rule does not extend; why?
TCP&&X&X\cr
UDP&&X&\cr
RAW&&X&\cr
}}}
@end tex

@node Using Networking, Some Applications and Techniques, Introduction, Top
@comment node-name, next, previous, up
@chapter Networking With @command{gawk}

@c STARTOFRANGE netgawk
@cindex networks, @command{gawk} and
@c STARTOFRANGE gawknet
@cindex @command{gawk}, networking
The @command{awk} programming language was originally developed as a
pattern-matching language for writing short programs to perform
data manipulation tasks.
@command{awk}'s strength is the manipulation of textual data
that is stored in files.
It was never meant to be used for networking purposes.
To exploit its features in a
networking context, it's necessary to use an access mode for network connections
that resembles the access of files as closely as possible.

@cindex Perl
@cindex Python
@cindex Tcl/Tk
@command{awk} is also meant to be a prototyping language. It is used
to demonstrate feasibility and to play with features and user interfaces.
This can be done with file-like handling of network
connections.
@command{gawk} trades the lack
of many of the advanced features of the TCP/IP family of protocols
for the convenience of simple connection handling.
The advanced
features are available when programming in C or Perl. In fact, the
network programming
in this @value{CHAPTER}
is very similar to what is described in books such as
@cite{Internet Programming with Python},
@cite{Advanced Perl Programming},
or
@cite{Web Client Programming with Perl}.

@cindex Perl, @command{gawk} networking and
@cindex Python, @command{gawk} networking and
@cindex Tcl/Tk, @command{gawk} and
However, you can do the programming here without first having to learn object-oriented
ideology; underlying languages such as Tcl/Tk, Perl, Python; or all of
the libraries necessary to extend these languages before they are ready for the Internet.

@cindex Transmission Control Protocol, See TCP
@cindex TCP (Transmission Control Protocol)
This @value{CHAPTER} demonstrates how to use the TCP protocol. The
other protocols are much less important for most users (UDP) or even
untractable (RAW).

@menu
* Gawk Special Files::          How to do @command{gawk} networking.
* TCP Connecting::              Making a TCP connection.
* Troubleshooting::             Troubleshooting TCP/IP connections.
* Interacting::                 Interacting with a service.
* Setting Up::                  Setting up a service.
* Email::                       Reading email.
* Web page::                    Reading a Web page.
* Primitive Service::           A primitive Web service.
* Interacting Service::         A Web service with interaction.
* Simple Server::               A simple Web server.
* Caveats::                     Network programming caveats.
* Challenges::                  Where to go from here.
@end menu

@node Gawk Special Files, TCP Connecting, Using Networking, Using Networking
@comment node-name,      next,  previous, up
@section @command{gawk}'s Networking Mechanisms

The @samp{|&} operator introduced in @command{gawk} 3.1 for use in
communicating with a @dfn{coprocess} is described in
@ref{Two-way I/O, ,Two-way Communications With Another Process, gawk, GAWK: Effective AWK Programming}.
It shows how to do two-way I/O to a
separate process, sending it data with @code{print} or @code{printf} and
reading data with @code{getline}.  If you haven't read it already, you should
detour there to do so.

@command{gawk} transparently extends the two-way I/O mechanism to simple networking through
the use of special @value{FN}s.  When a ``coprocess'' that matches
the special files we are about to describe
is started, @command{gawk} creates the appropriate network
connection, and then two-way I/O proceeds as usual.

@c last comma is part of see-also
@cindex input/output, two-way, See Also @command{gawk}, networking
@cindex TCP/IP, sockets and
At the C, C++, and Perl level, networking is accomplished
via @dfn{sockets}, an Application Programming Interface (API) originally
developed at the University of California at Berkeley that is now used
almost universally for TCP/IP networking.
Socket level programming, while fairly straightforward, requires paying
attention to a number of details, as well as using binary data.  It is not
well-suited for use from a high-level language like @command{awk}.
The special files provided in @command{gawk} hide the details from
the programmer, making things much simpler and easier to use.
@c Who sez we can't toot our own horn occasionally?

@c STARTOFRANGE filenet
@cindex filenames, for network access
@c STARTOFRANGE gawnetf
@cindex @command{gawk}, networking, filenames
@c STARTOFRANGE netgawf
@cindex networks, @command{gawk} and, filenames
The special @value{FN} for network access is made up of several fields, all
of which are mandatory:

@example
/inet/@var{protocol}/@var{localport}/@var{hostname}/@var{remoteport}
@end example

@cindex @code{/inet/} files (@command{gawk})
@cindex files, @code{/inet/} (@command{gawk})
@cindex localport field
@cindex remoteport field
The @file{/inet/} field is, of course, constant when accessing the network.
The @var{localport} and @var{remoteport} fields do not have a meaning
when used with @file{/inet/raw} because ``ports'' only apply to
TCP and UDP. So, when using @file{/inet/raw}, the port fields always have
to be @samp{0}.

@menu
* Special File Fields::         The fields in the special file name.
* Comparing Protocols::         Differences between the protocols.
@end menu

@node Special File Fields, Comparing Protocols, Gawk Special Files, Gawk Special Files
@subsection The Fields of the Special @value{FFN}
This @value{SECTION} explains the meaning of all the other fields,
as well as the range of values and the defaults.
All of the fields are mandatory.  To let the system pick a value,
or if the field doesn't apply to the protocol, specify it as @samp{0}:

@table @var
@cindex protocol field
@c last comma is part of secondary
@cindex TCP/IP, protocols, selecting
@item protocol
Determines which member of the TCP/IP
family of protocols is selected to transport the data across the
network. There are three possible values (always written in lowercase):
@samp{tcp}, @samp{udp}, and @samp{raw}. The exact meaning of each is
explained later in this @value{SECTION}.

@item localport
@cindex networks, ports, specifying
Determines which port on the local
machine is used to communicate across the network. It has no meaning
with @file{/inet/raw} and must therefore be @samp{0}.  Application-level clients
usually use @samp{0} to indicate they do not care which local port is
used---instead they specify a remote port to connect to. It is vital for
application-level servers to use a number different from @samp{0} here
because their service has to be available at a specific publicly known
port number. It is possible to use a name from @file{/etc/services} here.

@item hostname
@cindex hostname field
@cindex servers, as hosts
Determines which remote host is to
be at the other end of the connection. Application-level servers must fill
this field with a @samp{0} to indicate their being open for all other hosts
to connect to them and enforce connection level server behavior this way.
It is not possible for an application-level server to restrict its
availability to one remote host by entering a host name here.
Application-level clients must enter a name different from @samp{0}.
The name can be either symbolic
(e.g., @samp{jpl-devvax.jpl.nasa.gov}) or numeric (e.g., @samp{128.149.1.143}).

@item remoteport
Determines which port on the remote
machine is used to communicate across the network. It has no meaning
with @file{/inet/raw} and must therefore be 0.
For @file{/inet/tcp} and @file{/inet/udp},
application-level clients @emph{must} use a number
other than @samp{0} to indicate to which port on the remote machine
they want to connect. Application-level servers must not fill this field with
a @samp{0}. Instead they specify a local port to which clients connect.
It is possible to use a name from @file{/etc/services} here.
@end table

@cindex networks, @command{gawk} and, connections
@cindex @command{gawk}, networking, connections
Experts in network programming will notice that the usual
client/server asymmetry found at the level of the socket API is not visible
here. This is for the sake of simplicity of the high-level concept. If this
asymmetry is necessary for your application,
use another language.
For @command{gawk}, it is
more important to enable users to write a client program with a minimum
of code. What happens when first accessing a network connection is seen
in the following pseudocode:

@smallexample
if ((name of remote host given) && (other side accepts connection)) @{
  rendez-vous successful; transmit with getline or print
@} else @{
  if ((other side did not accept) && (localport == 0))
    exit unsuccessful
  if (TCP) @{
    set up a server accepting connections
    this means waiting for the client on the other side to connect
  @} else
    ready
@}
@end smallexample

The exact behavior of this algorithm depends on the values of the
fields of the special @value{FN}. When in doubt, the following table
gives you the combinations of values and their meaning. If this
table is too complicated, focus on the three lines printed in
@strong{bold}. All the examples in
@ref{Using Networking, ,Networking With @command{gawk}},
use only the
patterns printed in bold letters.

@multitable {12345678901234} {123456} {123456} {1234567} {1234567890123456789012345}
@item @sc{protocol} @tab @sc{local port} @tab @sc{host name}
@tab @sc{remote port} @tab @sc{Resulting connection-level behavior}
@item @strong{tcp} @tab @strong{0} @tab @strong{x} @tab @strong{x} @tab
      @strong{Dedicated client, fails if immediately connecting to a
              server on the other side fails}
@item udp      @tab 0 @tab x @tab x @tab Dedicated client
@item raw      @tab 0 @tab x @tab 0 @tab Dedicated client, works only as @code{root}
@item @strong{tcp, udp} @tab @strong{x} @tab @strong{x} @tab @strong{x} @tab
      @strong{Client, switches to dedicated server if necessary}
@item @strong{tcp, udp} @tab @strong{x} @tab @strong{0} @tab @strong{0} @tab
      @strong{Dedicated server}
@item raw      @tab 0 @tab 0 @tab 0 @tab Dedicated server, works only as @code{root}
@item tcp, udp, raw @tab x @tab x @tab 0 @tab Invalid
@item tcp, udp, raw @tab 0 @tab 0 @tab x @tab Invalid
@item tcp, udp, raw @tab x @tab 0 @tab x @tab Invalid
@item tcp, udp @tab 0 @tab 0 @tab 0 @tab Invalid
@item tcp, udp @tab 0 @tab x @tab 0 @tab Invalid
@item raw      @tab x @tab 0 @tab 0 @tab Invalid
@item raw      @tab 0 @tab x @tab x @tab Invalid
@item raw      @tab x @tab x @tab x @tab Invalid
@end multitable

In general, TCP is the preferred mechanism to use.  It is the simplest
protocol to understand and to use.  Use the others only if circumstances
demand low-overhead.

@node Comparing Protocols, , Special File Fields, Gawk Special Files
@subsection Comparing Protocols

This @value{SECTION} develops a pair of programs (sender and receiver)
that do nothing but send a timestamp from one machine to another. The
sender and the receiver are implemented with each of the three protocols
available and demonstrate the differences between them.

@menu
* File /inet/tcp::              The TCP special file.
* File /inet/udp::              The UDP special file.
* File /inet/raw::              The RAW special file.
@end menu

@node File /inet/tcp, File /inet/udp, Comparing Protocols, Comparing Protocols
@subsubsection @file{/inet/tcp}
@cindex @code{/inet/tcp} special files (@command{gawk})
@cindex files, @code{/inet/tcp} (@command{gawk})
@cindex TCP (Transmission Control Protocol)
Once again, always use TCP.
(Use UDP when low overhead is a necessity, and use RAW for
network experimentation.)
The first example is the sender
program:

@example
# Server
BEGIN @{
  print strftime() |& "/inet/tcp/8888/0/0"
  close("/inet/tcp/8888/0/0")
@}
@end example

The receiver is very simple:

@example
# Client
BEGIN @{
  "/inet/tcp/0/localhost/8888" |& getline
  print $0
  close("/inet/tcp/0/localhost/8888")
@}
@end example

TCP guarantees that the bytes arrive at the receiving end in exactly
the same order that they were sent. No byte is lost
(except for broken connections), doubled, or out of order. Some
overhead is necessary to accomplish this, but this is the price to pay for
a reliable service.
It does matter which side starts first. The sender/server has to be started
first, and it waits for the receiver to read a line.

@node File /inet/udp, File /inet/raw, File /inet/tcp, Comparing Protocols
@subsubsection @file{/inet/udp}
@cindex @code{/inet/udp} special files (@command{gawk})
@cindex files, @code{/inet/udp} (@command{gawk})
@cindex UDP (User Datagram Protocol)
@cindex User Datagram Protocol, See UDP
The server and client programs that use UDP are almost identical to their TCP counterparts;
only the @var{protocol} has changed. As before, it does matter which side
starts first. The receiving side blocks and waits for the sender.
In this case, the receiver/client has to be started first:

@page
@example
# Server
BEGIN @{
  print strftime() |& "/inet/udp/8888/0/0"
  close("/inet/udp/8888/0/0")
@}
@end example

The receiver is almost identical to the TCP receiver:

@example
# Client
BEGIN @{
  "/inet/udp/0/localhost/8888" |& getline
  print $0
  close("/inet/udp/0/localhost/8888")
@}
@end example

UDP cannot guarantee that the datagrams at the receiving end will arrive in exactly
the same order they were sent. Some datagrams could be
lost, some doubled, and some out of order. But no overhead is necessary to
accomplish this. This unreliable behavior is good enough for tasks
such as data acquisition, logging, and even stateless services like NFS.

@node File /inet/raw, , File /inet/udp, Comparing Protocols
@subsubsection @file{/inet/raw}
@cindex @code{/inet/raw} special files (@command{gawk})
@cindex files, @code{/inet/raw} (@command{gawk})
@cindex RAW protocol

This is an IP-level protocol. Only @code{root} is allowed to access this
special file. It is meant to be the basis for implementing
and experimenting with transport-level protocols.@footnote{This special file
is reserved, but not otherwise currently implemented.}
In the most general case,
the sender has to supply the encapsulating header bytes in front of the
packet and the receiver has to strip the additional bytes from the message.

@cindex dark corner, RAW protocol
RAW receivers cannot receive packets sent with TCP or UDP because the
operating system does not deliver the packets to a RAW receiver. The
operating system knows about some of the protocols on top of IP
and decides on its own which packet to deliver to which process.
@value{DARKCORNER}
Therefore, the UDP receiver must be used for receiving UDP
datagrams sent with the RAW sender. This is a dark corner, not only of
@command{gawk}, but also of TCP/IP.

@cindex SPAK utility
For extended experimentation with protocols, look into
the approach implemented in a tool called SPAK.
This tool reflects the hierarchical layering of protocols (encapsulation)
in the way data streams are piped out of one program into the next one.
It shows which protocol is based on which other (lower-level) protocol
by looking at the command-line ordering of the program calls.
Cleverly thought out, SPAK is much better than @command{gawk}'s
@file{/inet} for learning the meaning of each and every bit in the
protocol headers.

The next example uses the RAW protocol to emulate
the behavior of UDP. The sender program is the same as above, but with some
additional bytes that fill the places of the UDP fields:

@example
@group
BEGIN @{
  Message = "Hello world\n"
  SourcePort = 0
  DestinationPort = 8888
  MessageLength = length(Message)+8
  RawService = "/inet/raw/0/localhost/0"
  printf("%c%c%c%c%c%c%c%c%s",
      SourcePort/256, SourcePort%256,
      DestinationPort/256, DestinationPort%256,
      MessageLength/256, MessageLength%256,
      0, 0, Message) |& RawService
  fflush(RawService)
  close(RawService)
@}
@end group
@end example

Since this program tries
to emulate the behavior of UDP, it checks if
the RAW sender is understood by the UDP receiver but not if the RAW receiver
can understand the UDP sender. In a real network, the
RAW receiver is hardly
of any use because it gets every IP packet that
comes across the network. There are usually so many packets that
@command{gawk} would be too slow for processing them.
Only on a network with little
traffic can the IP-level receiver program be tested. Programs for analyzing
IP traffic on modem or ISDN channels should be possible.

Port numbers do not have a meaning when using @file{/inet/raw}. Their fields
have to be @samp{0}. Only TCP and UDP use ports. Receiving data from
@file{/inet/raw} is difficult, not only because of processing speed but also
because data is usually binary and not restricted to ASCII. This
implies that line separation with @code{RS} does not work as usual.

@node TCP Connecting, Troubleshooting, Gawk Special Files, Using Networking
@section Establishing a TCP Connection

@c STARTOFRANGE tcpcon
@cindex TCP (Transmission Control Protocol), connection, establishing
@c STARTOFRANGE netcon
@cindex networks, @command{gawk} and, connections
@c STARTOFRANGE gawcon
@cindex @command{gawk}, networking, connections
Let's observe a network connection at work. Type in the following program
and watch the output. Within a second, it connects via TCP (@file{/inet/tcp})
to the machine it is running on (@samp{localhost}) and asks the service
@samp{daytime} on the machine what time it is:

@cindex @code{getline} command
@example
BEGIN @{
  "/inet/tcp/0/localhost/daytime" |& getline
  print $0
  close("/inet/tcp/0/localhost/daytime")
@}
@end example

Even experienced @command{awk} users will find the second line strange in two
respects:

@itemize @bullet
@item
A special file is used as a shell command that pipes its output
into @code{getline}. One would rather expect to see the special file
being read like any other file (@samp{getline <
"/inet/tcp/0/localhost/daytime")}.

@item
@cindex @code{|} (vertical bar), @code{|&} operator (I/O)
@cindex vertical bar (@code{|}), @code{|&} operator (I/O)
The operator @samp{|&} has not been part of any @command{awk}
implementation (until now).
It is actually the only extension of the @command{awk}
language needed (apart from the special files) to introduce network access.
@end itemize

@cindex pipes, networking and
The @samp{|&} operator was introduced in @command{gawk} 3.1 in order to
overcome the crucial restriction that access to files and pipes in
@command{awk} is always unidirectional. It was formerly impossible to use
both access modes on the same file or pipe. Instead of changing the whole
concept of file access, the @samp{|&} operator
behaves exactly like the usual pipe operator except for two additions:

@itemize @bullet
@item
Normal shell commands connected to their @command{gawk} program with a @samp{|&}
pipe can be accessed bidirectionally. The @samp{|&} turns out to be a quite
general, useful, and natural extension of @command{awk}.

@item
Pipes that consist of a special @value{FN} for network connections are not
executed as shell commands. Instead, they can be read and written to, just
like a full-duplex network connection.
@end itemize

In the earlier example, the @samp{|&} operator tells @code{getline}
to read a line from the special file @file{/inet/tcp/0/localhost/daytime}.
We could also have printed a line into the special file. But instead we just
read a line with the time, printed it, and closed the connection.
(While we could just let @command{gawk} close the connection by finishing
the program, in this @value{DOCUMENT}
we are pedantic and always explicitly close the connections.)

@node Troubleshooting, Interacting, TCP Connecting, Using Networking
@section Troubleshooting Connection Problems
@cindex advanced features, network connections
@c last comma is part of secondary
@cindex troubleshooting, networks, connections
It may well be that for some reason the program shown in the previous example does not run on your
machine. When looking at possible reasons for this, you will learn much
about typical problems that arise in network programming. First of all,
your implementation of @command{gawk} may not support network access
because it is
a pre-3.1 version or you do not have a network interface in your machine.
Perhaps your machine uses some other protocol, such as
DECnet or Novell's IPX. For the rest of this @value{CHAPTER},
we will assume
you work on a Unix machine that supports TCP/IP. If the previous example program does
not run on your machine, it may help to replace the name
@samp{localhost} with the name of your machine or its IP address. If it
does, you could replace @samp{localhost} with the name of another machine
in your vicinity---this way, the program connects to another machine.
Now you should see the date and time being printed by the program,
otherwise your machine may not support the @samp{daytime} service.
Try changing the service to @samp{chargen} or @samp{ftp}. This way, the program
connects to other services that should give you some response. If you are
curious, you should have a look at your @file{/etc/services} file. It could
look like this:

@ignore
@multitable {1234567890123} {1234567890123} {123456789012345678901234567890123456789012}
@item Service @strong{name} @tab Service @strong{number}
@item echo @tab 7/tcp @tab echo sends back each line it receivces
@item echo @tab 7/udp @tab echo is good for testing purposes
@item discard @tab 9/tcp @tab discard behaves like @file{/dev/null}
@item discard @tab 9/udp @tab discard just throws away each line
@item daytime @tab 13/tcp @tab daytime sends date & time once per connection
@item daytime @tab 13/udp
@item chargen @tab 19/tcp @tab chargen infinitely produces character sets
@item chargen @tab 19/udp @tab chargen is good for testing purposes
@item ftp @tab 21/tcp @tab ftp is the usual file transfer protocol
@item telnet @tab 23/tcp @tab telnet is the usual login facility
@item smtp @tab 25/tcp @tab smtp is the Simple Mail Transfer Protocol
@item finger @tab 79/tcp @tab finger tells you who is logged in
@item www @tab 80/tcp @tab www is the HyperText Transfer Protocol
@item pop2 @tab 109/tcp @tab pop2 is an older version of pop3
@item pop2 @tab 109/udp
@item pop3 @tab 110/tcp @tab pop3 is the Post Office Protocol
@item pop3 @tab 110/udp @tab pop3 is used for receiving email
@item nntp @tab 119/tcp @tab nntp is the USENET News Transfer Protocol
@item irc @tab 194/tcp @tab irc is the Internet Relay Chat
@item irc @tab 194/udp
@end multitable
@end ignore

@smallexample
# /etc/services:
#
# Network services, Internet style
#
# Name     Number/Protcol  Alternate name # Comments

echo        7/tcp
echo        7/udp
discard     9/tcp         sink null
discard     9/udp         sink null
daytime     13/tcp
daytime     13/udp
chargen     19/tcp        ttytst source
chargen     19/udp        ttytst source
ftp         21/tcp
telnet      23/tcp
smtp        25/tcp        mail
finger      79/tcp
www         80/tcp        http      # WorldWideWeb HTTP
www         80/udp        # HyperText Transfer Protocol
pop-2       109/tcp       postoffice    # POP version 2
pop-2       109/udp
pop-3       110/tcp       # POP version 3
pop-3       110/udp
nntp        119/tcp       readnews untp  # USENET News
irc         194/tcp       # Internet Relay Chat
irc         194/udp
@dots{}
@end smallexample

@cindex Linux
@cindex GNU/Linux
@cindex Microsoft Windows, networking
Here, you find a list of services that traditional Unix machines usually
support. If your GNU/Linux machine does not do so, it may be that these
services are switched off in some startup script. Systems running some
flavor of Microsoft Windows usually do @emph{not} support these services.
Nevertheless, it @emph{is} possible to do networking with @command{gawk} on
Microsoft
Windows.@footnote{Microsoft prefered to ignore the TCP/IP
family of protocols until 1995. Then came the rise of the Netscape browser
as a landmark ``killer application.'' Microsoft added TCP/IP support and
their own browser to Microsoft Windows 95 at the last minute. They even back-ported
their TCP/IP implementation to Microsoft Windows for Workgroups 3.11, but it was
a rather rudimentary and half-hearted implementation. Nevertheless,
the equivalent of @file{/etc/services} resides under
@file{C:\WINNT\system32\drivers\etc\services} on Microsoft Windows 2000.}
The first column of the file gives the name of the service, and
the second column gives a unique number and the protocol that one can use to connect to
this service.
The rest of the line is treated as a comment.
You see that some services (@samp{echo}) support TCP as
well as UDP.

@node Interacting, Setting Up, Troubleshooting, Using Networking
@section Interacting with a Network Service

The next program makes use of the possibility to really interact with a
network service by printing something into the special file. It asks the
so-called @command{finger} service if a user of the machine is logged in. When
testing this program, try to change @samp{localhost} to
some other machine name in your local network:

@c system if test ! -d eg         ; then mkdir eg         ; fi
@c system if test ! -d eg/network ; then mkdir eg/network ; fi
@example
@c file eg/network/fingerclient.awk
BEGIN @{
  NetService = "/inet/tcp/0/localhost/finger"
  print "@var{name}" |& NetService
  while ((NetService |& getline) > 0)
    print $0
  close(NetService)
@}
@c endfile
@end example

After telling the service on the machine which user to look for,
the program repeatedly reads lines that come as a reply. When no more
lines are coming (because the service has closed the connection), the
program also closes the connection. Try replacing @code{"@var{name}"} with your
login name (or the name of someone else logged in).  For a list
of all users currently logged in, replace @var{name} with an empty string
(@code{""}).

@cindex Linux
@cindex GNU/Linux
The final @code{close} command could be safely deleted from
the above script, because the operating system closes any open connection
by default when a script reaches the end of execution. In order to avoid
portability problems, it is best to always close connections explicitly.
With the Linux kernel,
for example, proper closing results in flushing of buffers. Letting
the close happen by default may result in discarding buffers.

@ignore
@c Chuck comments that this seems out of place.  He's right.  I dunno
@c where to put it though.
@cindex @command{finger} utility
@cindex RFC 1288
In the early days of the Internet (up until about 1992), you could use
such a program to check if some user in another country was logged in on
a specific machine.
RFC 1288@footnote{@uref{http://www.cis.ohio-state.edu/htbin/rfc/rfc1288.html}}
provides the exact definition of the @command{finger} protocol.
Every contemporary Unix system also has a command named @command{finger},
which functions as a client for the protocol of the same name.
Still today, some people maintain simple information systems
with this ancient protocol. For example, by typing
@samp{finger quake@@seismo.unr.edu}
you get the latest @dfn{Earthquake Bulletin} for the state of Nevada.

@cindex Earthquake Bulletin
@smallexample
$ finger quake@@seismo.unr.edu

[@dots{}]

DATE-(UTC)-TIME   LAT    LON      DEP   MAG    COMMENTS
yy/mm/dd hh:mm:ss   deg.   deg.    km

98/12/14 21:09:22  37.47N 116.30W   0.0 2.3Md  76.4 km   S of WARM SPRINGS, NEVA
98/12/14 22:05:09  39.69N 120.41W  11.9 2.1Md  53.8 km WNW of RENO, NEVADA      
98/12/15 14:14:19  38.04N 118.60W   2.0 2.3Md  51.0 km   S of HAWTHORNE, NEVADA 
98/12/17 01:49:02  36.06N 117.58W  13.9 3.0Md  74.9 km  SE of LONE PINE, CALIFOR
98/12/17 05:39:26  39.95N 120.87W   6.2 2.6Md 101.6 km WNW of RENO, NEVADA      
98/12/22 06:07:42  38.68N 119.82W   5.2 2.3Md  50.7 km   S of CARSON CITY, NEVAD
@end smallexample

@noindent
This output from @command{finger} contains the time, location, depth,
magnitude, and a short comment about
the earthquakes registered in that region during the last 10 days.
In many places today the use of such services is restricted
because most networks have firewalls and proxy servers between them
and the Internet. Most firewalls are programmed to not let
@command{finger} requests go beyond the local network.

@cindex Coke machine
Another (ab)use of the @command{finger} protocol are several Coke machines
that are connected to the Internet. There is a short list of such
Coke machines.@footnote{@uref{http://ca.yahoo.com/Computers_and_Internet/Internet/Devices_Connected_to_the_Internet/Soda_Machines/}}
You can access them either from the command-line or with a simple
@command{gawk} script. They usually tell you about the different
flavors of Coke and beer available there. If you have an account there,
you can even order some drink this way.
@end ignore

When looking at @file{/etc/services} you may have noticed that the
@samp{daytime} service is also available with @samp{udp}. In the earlier
example, change @samp{tcp} to @samp{udp},
and change @samp{finger} to @samp{daytime}.
After starting the modified program, you see the expected day and time message.
The program then hangs, because it waits for more lines coming from the
service. However, they never come. This behavior is a consequence of the
differences between TCP and UDP. When using UDP, neither party is
automatically informed about the other closing the connection.
Continuing to experiment this way reveals many other subtle
differences between TCP and UDP. To avoid such trouble, one should always
remember the advice Douglas E.@: Comer and David Stevens give in
Volume III of their series @cite{Internetworking With TCP}
(page 14):

@cindex TCP (Transmission Control Protocol), UDP and
@cindex UDP (User Datagram Protocol), TCP and
@cindex Internet, See networks
@quotation
When designing client-server applications, beginners are strongly
advised to use TCP because it provides reliable, connection-oriented
communication. Programs only use UDP if the application protocol handles
reliability, the application requires hardware broadcast or multicast,
or the application cannot tolerate virtual circuit overhead.
@end quotation

@node Setting Up, Email, Interacting, Using Networking
@section Setting Up a Service
@c last comma is part of tertiary
@cindex networks, @command{gawk} and, service, establishing
@c last comma is part of tertiary
@cindex @command{gawk}, networking, service, establishing
The preceding programs behaved as clients that connect to a server somewhere
on the Internet and request a particular service. Now we set up such a
service to mimic the behavior of the @samp{daytime} service.
Such a server does not know in advance who is going to connect to it over
the network. Therefore, we cannot insert a name for the host to connect to
in our special @value{FN}.

Start the following program in one window. Notice that the service does
not have the name @samp{daytime}, but the number @samp{8888}.
From looking at @file{/etc/services}, you know that names like @samp{daytime}
are just mnemonics for predetermined 16-bit integers.
Only the system administrator (@code{root}) could enter
our new service into @file{/etc/services} with an appropriate name.
Also notice that the service name has to be entered into a different field
of the special @value{FN} because we are setting up a server, not a client:

@cindex @command{finger} utility
@cindex servers
@example
BEGIN @{
  print strftime() |& "/inet/tcp/8888/0/0"
  close("/inet/tcp/8888/0/0")
@}
@end example

Now open another window on the same machine.
Copy the client program given as the first example 
(@pxref{TCP Connecting, ,Establishing a TCP Connection})
to a new file and edit it, changing the name @samp{daytime} to
@samp{8888}.  Then start the modified client.  You should get a reply
like this:

@example
Sat Sep 27 19:08:16 CEST 1997
@end example

@noindent
Both programs explicitly close the connection.

@c first comma is part of primary
@cindex Microsoft Windows, networking, ports
@cindex networks, ports, reserved
@cindex Unix, network ports and
Now we will intentionally make a mistake to see what happens when the name
@samp{8888} (the so-called port) is already used by another service.
Start the server
program in both windows. The first one works, but the second one
complains that it could not open the connection. Each port on a single
machine can only be used by one server program at a time. Now terminate the
server program and change the name @samp{8888} to @samp{echo}. After restarting it,
the server program does not run any more, and you know why: there is already
an @samp{echo} service running on your machine. But even if this isn't true,
you would not get
your own @samp{echo} server running on a Unix machine,
because the ports with numbers smaller
than 1024 (@samp{echo} is at port 7) are reserved for @code{root}.
On machines running some flavor of Microsoft Windows, there is no restriction
that reserves ports 1 to 1024 for a privileged user; hence, you can start
an @samp{echo} server there.

Turning this short server program into something really useful is simple.
Imagine a server that first reads a @value{FN} from the client through the
network connection, then does something with the file and
sends a result back to the client. The server-side processing
could be:

@example
BEGIN @{
  NetService = "/inet/tcp/8888/0/0"
  NetService |& getline
  CatPipe    = ("cat " $1)    # sets $0 and the fields
  while ((CatPipe | getline) > 0)
    print $0 |& NetService
  close(NetService)
@}
@end example

@noindent
and we would
have a remote copying facility. Such a server reads the name of a file
from any client that connects to it and transmits the contents of the
named file across the net. The server-side processing could also be
the execution of a command that is transmitted across the network. From this
example, you can see how simple it is to open up a security hole on your
machine. If you allow clients to connect to your machine and
execute arbitrary commands, anyone would be free to do @samp{rm -rf *}.

@node Email, Web page, Setting Up, Using Networking
@section Reading Email
@c @cindex RFC 1939
@c @cindex RFC 821
@cindex @command{gawk}, networking, See Also email
@cindex networks, @command{gawk} and, See Also email
@cindex POP (Post Office Protocol)
@cindex SMTP (Simple Mail Transfer Protocol)
@cindex Post Office Protocol (POP)
@cindex Simple Mail Transfer Protocol (SMTP)
The distribution of email is usually done by dedicated email servers that
communicate with your machine using special protocols. To receive email, we
will use the Post Office Protocol (POP).  Sending can be done with the much
older Simple Mail Transfer Protocol (SMTP).
@ignore
@footnote{RFC 1939 defines POP.
RFC 821 defines SMTP.  See
@uref{http://rfc.fh-koeln.de/doc/rfc/html/rfc.html, RFCs in HTML}.}
@end ignore

@cindex email
When you type in the following program, replace the @var{emailhost} by the
name of your local email server. Ask your administrator if the server has a
POP service, and then use its name or number in the program below.
Now the program is ready to connect to your email server, but it will not
succeed in retrieving your mail because it does not yet know your login
name or password. Replace them in the program and it
shows you the first email the server has in store:

@example
BEGIN @{
  POPService  = "/inet/tcp/0/@var{emailhost}/pop3"
  RS = ORS = "\r\n"
  print "user @var{name}"            |& POPService
  POPService                    |& getline
  print "pass @var{password}"         |& POPService
  POPService                    |& getline
  print "retr 1"                |& POPService
  POPService                    |& getline
  if ($1 != "+OK") exit
  print "quit"                  |& POPService
  RS = "\r\n\\.\r\n"
  POPService |& getline
  print $0
  close(POPService)
@}
@end example

@c @cindex RFC 1939
@cindex record separators, POP and
@cindex @code{RS} variable, POP and
@cindex @code{ORS} variable, POP and
@cindex POP (Post Office Protocol)
The record separators @code{RS} and @code{ORS} are redefined because the
protocol (POP) requires CR-LF to separate lines. After identifying
yourself to the email service, the command @samp{retr 1} instructs the
service to send the first of all your email messages in line. If the service
replies with something other than @samp{+OK}, the program exits; maybe there
is no email. Otherwise, the program first announces that it intends to finish
reading email, and then redefines @code{RS} in order to read the entire
email as multiline input in one record. From the POP RFC, we know that the body
of the email always ends with a single line containing a single dot.
The program looks for this using @samp{RS = "\r\n\\.\r\n"}.
When it finds this sequence in the mail message, it quits.
You can invoke this program as often as you like; it does not delete the
message it reads, but instead leaves it on the server.

@node Web page, Primitive Service, Email, Using Networking
@section Reading a Web Page
@cindex web pages
@cindex HTTP (Hypertext Transfer Protocol)
@cindex Hypertext Transfer Protocol, See HTTP
@c @cindex RFC 2068
@c @cindex RFC 2616

Retrieving a web page from a web server is as simple as
retrieving email from an email server. We only have to use a
similar, but not identical, protocol and a different port. The name of the
protocol is HyperText Transfer Protocol (HTTP) and the port number is usually
80.  As in the preceding @value{SECTION}, ask your administrator about the
name of your local web server or proxy web server and its port number
for HTTP requests.

@ignore
@c Chuck says this stuff isn't necessary
More detailed information about HTTP can be found at
the home of the web protocols,@footnote{@uref{http://www.w3.org/pub/WWW/Protocols}}
including the specification of HTTP in RFC 2068. The protocol specification
in RFC 2068 is concise and you can get it for free. If you need more
explanation and you are willing to pay for a book, you might be
interested in one of these books:

@enumerate

@item
When we started writing web clients and servers with @command{gawk},
the only book available with details about HTTP was the one by Paul Hethmon
called
@cite{Illustrated Guide to HTTP}.@footnote{@uref{http://www.browsebooks.com/Hethmon/?882}}
Hethmon not only describes HTTP,
he also implements a simple web server in C++.

@item
Since July 2000, O'Reilly offers the book by Clinton Wong called
@cite{HTTP Pocket Reference}.@footnote{@uref{http://www.oreilly.com/catalog/httppr}}
It only has 75 pages but its
focus definitely is HTTP. This pocket reference is not a replacement
for the RFC, but I wish I had had it back in 1997 when I started writing
scripts to handle HTTP.

@item
Another small booklet about HTTP is the one by Toexcell Incorporated Staff,
ISBN 1-58348-270-9, called
@cite{Hypertext Transfer Protocol Http 1.0 Specifications}

@end enumerate
@end ignore

The following program employs a rather crude approach toward retrieving a
web page. It uses the prehistoric syntax of HTTP 0.9, which almost all
web servers still support. The most noticeable thing about it is that the
program directs the request to the local proxy server whose name you insert
in the special @value{FN} (which in turn calls @samp{www.yahoo.com}):

@example
BEGIN @{
  RS = ORS = "\r\n"
  HttpService = "/inet/tcp/0/@var{proxy}/80"
  print "GET http://www.yahoo.com"     |& HttpService
  while ((HttpService |& getline) > 0)
     print $0
  close(HttpService)
@}
@end example

@c @cindex RFC 1945
@cindex record separators, HTTP and
@cindex @code{RS} variable, HTTP and
@cindex @code{ORS} variable, HTTP and
@cindex HTTP (Hypertext Transfer Protocol), record separators and
@cindex HTML (Hypertext Markup Language)
@cindex Hypertext Markup Language (HTML)
Again, lines are separated by a redefined @code{RS} and @code{ORS}.
The @code{GET} request that we send to the server is the only kind of
HTTP request that existed when the web was created in the early 1990s.
HTTP calls this @code{GET} request a ``method,'' which tells the
service to transmit a web page (here the home page of the Yahoo! search
engine). Version 1.0 added the request methods @code{HEAD} and
@code{POST}. The current version of HTTP is 1.1,@footnote{Version 1.0 of
HTTP was defined in RFC 1945.  HTTP 1.1 was initially specified in RFC
2068. In June 1999, RFC 2068 was made obsolete by RFC 2616, an update
without any substantial changes.} and knows the additional request
methods @code{OPTIONS}, @code{PUT}, @code{DELETE}, and @code{TRACE}.
You can fill in any valid web address, and the program prints the
HTML code of that page to your screen.

Notice the similarity between the responses of the POP and HTTP
services. First, you get a header that is terminated by an empty line, and
then you get the body of the page in HTML.  The lines of the headers also
have the same form as in POP. There is the name of a parameter,
then a colon, and finally the value of that parameter.

@cindex CGI (Common Gateway Interface), dynamic web pages and
@cindex Common Gateway Interface, See CGI
@cindex GIF image format
@cindex PNG image format
@cindex images, retrieving over networks
Images (@file{.png} or @file{.gif} files) can also be retrieved this way,
but then you
get binary data that should be redirected into a file. Another
application is calling a CGI (Common Gateway Interface) script on some
server. CGI scripts are used when the contents of a web page are not
constant, but generated instantly at the moment you send a request
for the page. For example, to get a detailed report about the current
quotes of Motorola stock shares, call a CGI script at Yahoo! with
the following:

@example
get = "GET http://quote.yahoo.com/q?s=MOT&d=t"
print get |& HttpService
@end example

You can also request weather reports this way.
@ignore
@cindex Boutell, Thomas
A good book to go on with is
the
@cite{HTML Source Book}.@footnote{@uref{http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/book.html}}
There are also some books on CGI programming
like @cite{CGI Programming in C & Perl},
by Thomas Boutell@footnote{@uref{http://cseng.aw.com/bookdetail.qry?ISBN=0-201-42219-0&ptype=0}},
and @cite{The CGI Book}.@footnote{@uref{http://www.cgibook.com}}
Another good source is @cite{The CGI Resource Index}}.@footnote{@uref{http://www.cgi-resources.com}}
@end ignore

@node Primitive Service, Interacting Service, Web page, Using Networking
@section A Primitive Web Service
@c STARTOFRANGE webser
@cindex web service
Now we know enough about HTTP to set up a primitive web service that just
says @code{"Hello, world"} when someone connects to it with a browser.
Compared
to the situation in the preceding @value{SECTION}, our program changes the role. It
tries to behave just like the server we have observed. Since we are setting
up a server here, we have to insert the port number in the @samp{localport}
field of the special @value{FN}. The other two fields (@var{hostname} and
@var{remoteport}) have to contain a @samp{0} because we do not know in
advance which host will connect to our service.

In the early 1990s, all a server had to do was send an HTML document and
close the connection. Here, we adhere to the modern syntax of HTTP.
The steps are as follows:

@enumerate 1
@item
Send a status line telling the web browser that everything
is okay.

@item
Send a line to tell the browser how many bytes follow in the
body of the message. This was not necessary earlier because both
parties knew that the document ended when the connection closed. Nowadays
it is possible to stay connected after the transmission of one web page.
This is to avoid the network traffic necessary for repeatedly establishing
TCP connections for requesting several images. Thus, there is the need to tell
the receiving party how many bytes will be sent. The header is terminated
as usual with an empty line.

@item
Send the @code{"Hello, world"} body
in HTML.
The useless @code{while} loop swallows the request of the browser.
We could actually omit the loop, and on most machines the program would still
work.
First, start the following program:
@end enumerate

@example
@c file eg/network/hello-serv.awk
BEGIN @{
  RS = ORS = "\r\n"
  HttpService = "/inet/tcp/8080/0/0"
  Hello = "<HTML><HEAD>" \
          "<TITLE>A Famous Greeting</TITLE></HEAD>" \
          "<BODY><H1>Hello, world</H1></BODY></HTML>"
  Len = length(Hello) + length(ORS)
  print "HTTP/1.0 200 OK"          |& HttpService
  print "Content-Length: " Len ORS |& HttpService
  print Hello                      |& HttpService
  while ((HttpService |& getline) > 0)
     continue;
  close(HttpService)
@}
@c endfile
@end example

Now, on the same machine, start your favorite browser and let it point to
@uref{http://localhost:8080} (the browser needs to know on which port
our server is listening for requests). If this does not work, the browser
probably tries to connect to a proxy server that does not know your machine.
If so, change the browser's configuration so that the browser does not try to
use a proxy to connect to your machine.

@node Interacting Service, Simple Server, Primitive Service, Using Networking
@section A Web Service with Interaction
@cindex @command{gawk}, web and, See web service
@cindex web browsers, See web service
@c comma is part of primary
@cindex HTTP server, core logic
@cindex servers, HTTP
@ifinfo
This node shows how to set up a simple web server.
The subnode is a library file that we will use with all the examples in
@ref{Some Applications and Techniques}.
@end ifinfo

@menu
* CGI Lib::                     A simple CGI library.
@end menu

Setting up a web service that allows user interaction is more difficult and
shows us the limits of network access in @command{gawk}. In this @value{SECTION},
we develop  a main program (a @code{BEGIN} pattern and its action)
that will become the core of event-driven execution controlled by a
graphical user interface (GUI).
Each HTTP event that the user triggers by some action within the browser
is received in this central procedure. Parameters and menu choices are
extracted from this request, and an appropriate measure is taken according to
the user's choice.
For example:

@cindex HTTP server, core logic
@example
BEGIN @{
  if (MyHost == "") @{
     "uname -n" | getline MyHost
     close("uname -n")
  @}
  if (MyPort ==  0) MyPort = 8080
  HttpService = "/inet/tcp/" MyPort "/0/0"
  MyPrefix    = "http://" MyHost ":" MyPort
  SetUpServer()
  while ("awk" != "complex") @{
    # header lines are terminated this way
    RS = ORS = "\r\n"
    Status   = 200          # this means OK
    Reason   = "OK"
    Header   = TopHeader
    Document = TopDoc
    Footer   = TopFooter
    if        (GETARG["Method"] == "GET") @{
        HandleGET()
    @} else if (GETARG["Method"] == "HEAD") @{
        # not yet implemented
    @} else if (GETARG["Method"] != "") @{
        print "bad method", GETARG["Method"]
    @}
    Prompt = Header Document Footer
    print "HTTP/1.0", Status, Reason       |& HttpService
    print "Connection: Close"              |& HttpService
    print "Pragma: no-cache"               |& HttpService
    len = length(Prompt) + length(ORS)
    print "Content-length:", len           |& HttpService
    print ORS Prompt                       |& HttpService
    # ignore all the header lines
    while ((HttpService |& getline) > 0)
        ;
    # stop talking to this client
    close(HttpService)
    # wait for new client request
    HttpService |& getline          
    # do some logging
    print systime(), strftime(), $0
    # read request parameters
    CGI_setup($1, $2, $3)
  @}
@}
@end example

This web server presents menu choices in the form of HTML links.
Therefore, it has to tell the browser the name of the host it is
residing on. When starting the server, the user may supply the name
of the host from the command line with @samp{gawk -v MyHost="Rumpelstilzchen"}.
If the user does not do this, the server looks up the name of the host it is
running on for later use as a web address in HTML documents. The same
applies to the port number. These values are inserted later into the
HTML content of the web pages to refer to the home system.

Each server that is built around this core has to initialize some
application-dependent variables (such as the default home page) in a procedure
@code{SetUpServer}, which is called immediately before entering the
infinite loop of the server. For now, we will write an instance that
initiates a trivial interaction.  With this home page, the client user
can click on two possible choices, and receive the current date either
in human-readable format or in seconds since 1970:

@example
function SetUpServer() @{
  TopHeader = "<HTML><HEAD>"
  TopHeader = TopHeader \
     "<title>My name is GAWK, GNU AWK</title></HEAD>"
  TopDoc    = "<BODY><h2>\
    Do you prefer your date <A HREF=" MyPrefix \
    "/human>human</A> or \
    <A HREF=" MyPrefix "/POSIX>POSIXed</A>?</h2>" ORS ORS
  TopFooter = "</BODY></HTML>"
@}
@end example

On the first run through the main loop, the default line terminators are
set and the default home page is copied to the actual home page. Since this
is the first run, @code{GETARG["Method"]} is not initialized yet, hence the
case selection over the method does nothing. Now that the home page is
initialized, the server can start communicating to a client browser.

@c @cindex RFC 2068
It does so by printing the HTTP header into the network connection
(@samp{print @dots{} |& HttpService}). This command blocks execution of
the server script until a client connects. If this server
script is compared with the primitive one we wrote before, you will notice
two additional lines in the header. The first instructs the browser
to close the connection after each request. The second tells the
browser that it should never try to @emph{remember} earlier requests
that had identical web addresses (no caching). Otherwise, it could happen
that the browser retrieves the time of day in the previous example just once,
and later it takes the web page from the cache, always displaying the same
time of day although time advances each second.

Having supplied the initial home page to the browser with a valid document
stored in the parameter @code{Prompt}, it closes the connection and waits
for the next request.  When the request comes, a log line is printed that
allows us to see which request the server receives. The final step in the
loop is to call the function @code{CGI_setup}, which reads all the lines
of the request (coming from the browser), processes them, and stores the
transmitted parameters in the array @code{PARAM}. The complete
text of these application-independent functions can be found in
@ref{CGI Lib, ,A Simple CGI Library}.
For now, we use a simplified version of @code{CGI_setup}:

@example
function CGI_setup(   method, uri, version, i) @{
  delete GETARG;         delete MENU;        delete PARAM
  GETARG["Method"] = $1
  GETARG["URI"] = $2
  GETARG["Version"] = $3
  i = index($2, "?")
  # is there a "?" indicating a CGI request?
@group
  if (i > 0) @{
    split(substr($2, 1, i-1), MENU, "[/:]")
    split(substr($2, i+1), PARAM, "&") 
    for (i in PARAM) @{ 
      j = index(PARAM[i], "=")
      GETARG[substr(PARAM[i], 1, j-1)] = \
                                  substr(PARAM[i], j+1)
    @}
  @} else @{    # there is no "?", no need for splitting PARAMs
    split($2, MENU, "[/:]")
  @}
@end group
@}
@end example

At first, the function clears all variables used for
global storage of request parameters. The rest of the function serves
the purpose of filling the global parameters with the extracted new values.
To accomplish this, the name of the requested resource is split into
parts and stored for later evaluation. If the request contains a @samp{?},
then the request has CGI variables seamlessly appended to the web address.
Everything in front of the @samp{?} is split up into menu items, and
everything behind the @samp{?} is a list of @samp{@var{variable}=@var{value}} pairs
(separated by @samp{&}) that also need splitting. This way, CGI variables are
isolated and stored. This procedure lacks recognition of special characters
that are transmitted in coded form@footnote{As defined in RFC 2068.}. Here, any
optional request header and body parts are ignored. We do not need
header parameters and the request body. However, when refining our approach or
working with the @code{POST} and @code{PUT} methods, reading the header
and body
becomes inevitable. Header parameters should then be stored in a global
array as well as the body.

On each subsequent run through the main loop, one request from a browser is
received, evaluated, and answered according to the user's choice. This can be
done by letting the value of the HTTP method guide the main loop into
execution of the procedure @code{HandleGET}, which evaluates the user's
choice. In this case, we have only one hierarchical level of menus,
but in the general case,
menus are nested.
The menu choices at each level are
separated by @samp{/}, just as in @value{FN}s. Notice how simple it is to
construct menus of arbitrary depth:

@example
function HandleGET() @{
  if (       MENU[2] == "human") @{
    Footer = strftime() TopFooter
  @} else if (MENU[2] == "POSIX") @{
    Footer = systime()  TopFooter
  @}
@}
@end example

The disadvantage of this approach is that our server is slow and can
handle only one request at a time. Its main advantage, however, is that
the server
consists of just one @command{gawk} program. No need for installing an
@command{httpd}, and no need for static separate HTML files, CGI scripts, or
@code{root} privileges. This is rapid prototyping.
This program can be started on the same host that runs your browser.
Then let your browser point to @uref{http://localhost:8080}.

@cindex XBM image format
@cindex images, in web pages
@cindex web pages, images in
@cindex GNUPlot utility
It is also possible to include images into the HTML pages.
Most browsers support the not very well-known
@file{.xbm} format,
which may contain only
monochrome pictures but is an ASCII format. Binary images are possible but
not so easy to handle. Another way of including images is to generate them
with a tool such as GNUPlot,
by calling the tool with the @code{system} function or through a pipe.

@node CGI Lib, , Interacting Service, Interacting Service
@subsection A Simple CGI Library
@quotation
@i{HTTP is like being married: you have to be able to handle whatever
you're given, while being very careful what you send back.}@*
Phil Smith III,@*
@uref{http://www.netfunny.com/rhf/jokes/99/Mar/http.html}
@end quotation

@c STARTOFRANGE cgilib
@cindex CGI (Common Gateway Interface), library
In @ref{Interacting Service, ,A Web Service with Interaction},
we saw the function @code{CGI_setup} as part of the web server
``core logic'' framework. The code presented there handles almost
everything necessary for CGI requests.
One thing it doesn't do is handle encoded characters in the requests.
For example, an @samp{&} is encoded as a percent sign followed by
the hexadecimal value: @samp{%26}.  These encoded values should be
decoded.
Following is a simple library to perform these tasks.
This code is used for all web server examples
used throughout the rest of this @value{DOCUMENT}.
If you want to use it for your own web server, store the source code
into a file named @file{inetlib.awk}. Then you can include
these functions into your code by placing the following statement
into your program
(on the first line of your script):

@example
@@include inetlib.awk
@end example

@noindent
But beware, this mechanism is
only possible if you invoke your web server script with @command{igawk}
instead of the usual @command{awk} or @command{gawk}.
Here is the code:

@example
@c file eg/network/coreserv.awk
# CGI Library and core of a web server
@c endfile
@ignore
@c file eg/network/coreserv.awk
#
# Juergen Kahrs, Juergen.Kahrs@@vr-web.de
# with Arnold Robbins, arnold@@gnu.org
# September 2000

@c endfile
@end ignore
@c file eg/network/coreserv.awk
# Global arrays
#   GETARG --- arguments to CGI GET command
#   MENU   --- menu items (path names)
#   PARAM  --- parameters of form x=y

# Optional variable MyHost contains host address
# Optional variable MyPort contains port number
# Needs TopHeader, TopDoc, TopFooter
# Sets MyPrefix, HttpService, Status, Reason

BEGIN @{
  if (MyHost == "") @{
     "uname -n" | getline MyHost
     close("uname -n")
  @}
  if (MyPort ==  0) MyPort = 8080
  HttpService = "/inet/tcp/" MyPort "/0/0"
  MyPrefix    = "http://" MyHost ":" MyPort
  SetUpServer()
  while ("awk" != "complex") @{
    # header lines are terminated this way
    RS = ORS    = "\r\n"
    Status      = 200             # this means OK
    Reason      = "OK"
    Header      = TopHeader
    Document    = TopDoc
    Footer      = TopFooter
    if        (GETARG["Method"] == "GET") @{
        HandleGET()
    @} else if (GETARG["Method"] == "HEAD") @{
        # not yet implemented
    @} else if (GETARG["Method"] != "") @{
        print "bad method", GETARG["Method"]
    @}
    Prompt = Header Document Footer
    print "HTTP/1.0", Status, Reason     |& HttpService
    print "Connection: Close"            |& HttpService
    print "Pragma: no-cache"             |& HttpService
    len = length(Prompt) + length(ORS)
    print "Content-length:", len         |& HttpService
    print ORS Prompt                     |& HttpService
    # ignore all the header lines
    while ((HttpService |& getline) > 0) 
        continue
    # stop talking to this client
    close(HttpService)
    # wait for new client request
    HttpService |& getline
    # do some logging
    print systime(), strftime(), $0
    CGI_setup($1, $2, $3)
  @}
@}

function CGI_setup(   method, uri, version, i)
@{
    delete GETARG
    delete MENU
    delete PARAM
    GETARG["Method"] = method
    GETARG["URI"] = uri
    GETARG["Version"] = version

    i = index(uri, "?")
    if (i > 0) @{  # is there a "?" indicating a CGI request?
        split(substr(uri, 1, i-1), MENU, "[/:]")
        split(substr(uri, i+1), PARAM, "&")
        for (i in PARAM) @{
            PARAM[i] = _CGI_decode(PARAM[i])
            j = index(PARAM[i], "=")
            GETARG[substr(PARAM[i], 1, j-1)] = \
	                                 substr(PARAM[i], j+1)
        @}
    @} else @{ # there is no "?", no need for splitting PARAMs
        split(uri, MENU, "[/:]")
    @}
    for (i in MENU)     # decode characters in path
        if (i > 4)      # but not those in host name
            MENU[i] = _CGI_decode(MENU[i])
@}
@c endfile
@end example

This isolates details in a single function, @code{CGI_setup}.
Decoding of encoded characters is pushed off to a helper function,
@code{_CGI_decode}. The use of the leading underscore (@samp{_}) in
the function name is intended to indicate that it is an ``internal''
function, although there is nothing to enforce this:

@example
@c file eg/network/coreserv.awk
function _CGI_decode(str,   hexdigs, i, pre, code1, code2,
                            val, result)
@{
   hexdigs = "123456789abcdef"

   i = index(str, "%")
   if (i == 0) # no work to do
      return str

   do @{
      pre = substr(str, 1, i-1)   # part before %xx
      code1 = substr(str, i+1, 1) # first hex digit
      code2 = substr(str, i+2, 1) # second hex digit
      str = substr(str, i+3)      # rest of string

      code1 = tolower(code1)
      code2 = tolower(code2)
      val = index(hexdigs, code1) * 16 \
            + index(hexdigs, code2)

      result = result pre sprintf("%c", val)
      i = index(str, "%")
   @} while (i != 0)
   if (length(str) > 0)
      result = result str
   return result
@}
@c endfile
@end example

This works by splitting the string apart around an encoded character.
The two digits are converted to lowercase characters and looked up in a string
of hex digits.  Note that @code{0} is not in the string on purpose;
@code{index} returns zero when it's not found, automatically giving
the correct value!  Once the hexadecimal value is converted from
characters in a string into a numerical value, @code{sprintf}
converts the value back into a real character.
The following is a simple test harness for the above functions:

@example
@c file eg/network/testserv.awk
BEGIN @{
  CGI_setup("GET",
  "http://www.gnu.org/cgi-bin/foo?p1=stuff&p2=stuff%26junk" \
       "&percent=a %25 sign",
  "1.0")
  for (i in MENU)
      printf "MENU[\"%s\"] = %s\n", i, MENU[i]
  for (i in PARAM)
      printf "PARAM[\"%s\"] = %s\n", i, PARAM[i]
  for (i in GETARG)
      printf "GETARG[\"%s\"] = %s\n", i, GETARG[i]
@}
@c endfile
@end example

And this is the result when we run it:

@c artificial line wrap in last output line
@example
$ gawk -f testserv.awk 
@print{} MENU["4"] = www.gnu.org
@print{} MENU["5"] = cgi-bin
@print{} MENU["6"] = foo
@print{} MENU["1"] = http
@print{} MENU["2"] = 
@print{} MENU["3"] = 
@print{} PARAM["1"] = p1=stuff
@print{} PARAM["2"] = p2=stuff&junk
@print{} PARAM["3"] = percent=a % sign
@print{} GETARG["p1"] = stuff
@print{} GETARG["percent"] = a % sign
@print{} GETARG["p2"] = stuff&junk
@print{} GETARG["Method"] = GET
@print{} GETARG["Version"] = 1.0
@print{} GETARG["URI"] = http://www.gnu.org/cgi-bin/foo?p1=stuff&
p2=stuff%26junk&percent=a %25 sign
@end example

@node Simple Server, Caveats, Interacting Service, Using Networking
@section A Simple Web Server
@c STARTOFRANGE webserx
@cindex web servers
@c STARTOFRANGE serweb
@cindex servers, web
In the preceding @value{SECTION}, we built the core logic for event-driven GUIs.
In this @value{SECTION}, we finally extend the core to a real application.
No one would actually write a commercial web server in @command{gawk}, but
it is instructive to see that it is feasible in principle.

@cindex ELIZA program
@cindex Weizenbaum, Joseph
The application is ELIZA, the famous program by Joseph Weizenbaum that
mimics the behavior of a professional psychotherapist when talking to you.
Weizenbaum would certainly object to this description, but this is part of
the legend around ELIZA.
Take the site-independent core logic and append the following code:

@example
@c file eg/network/eliza.awk
function SetUpServer() @{
  SetUpEliza()
  TopHeader = \
    "<HTML><title>An HTTP-based System with GAWK</title>\
    <HEAD><META HTTP-EQUIV=\"Content-Type\"\
    CONTENT=\"text/html; charset=iso-8859-1\"></HEAD>\
    <BODY BGCOLOR=\"#ffffff\" TEXT=\"#000000\"\
    LINK=\"#0000ff\" VLINK=\"#0000ff\"\
    ALINK=\"#0000ff\"> <A NAME=\"top\">"
  TopDoc    = "\
   <h2>Please choose one of the following actions:</h2>\
   <UL>\
   <LI>\
   <A HREF=" MyPrefix "/AboutServer>About this server</A>\
   </LI><LI>\
   <A HREF=" MyPrefix "/AboutELIZA>About Eliza</A></LI>\
   <LI>\
   <A HREF=" MyPrefix \
      "/StartELIZA>Start talking to Eliza</A></LI></UL>"
  TopFooter = "</BODY></HTML>"
@}
@c endfile
@end example

@code{SetUpServer} is similar to the previous example,
except for calling another function, @code{SetUpEliza}.
This approach can be used to implement other kinds of servers.
The only changes needed to do so are hidden in the functions
@code{SetUpServer} and @code{HandleGET}. Perhaps it might be necessary to
implement other HTTP methods.
The @command{igawk} program that comes with @command{gawk}
may be useful for this process.

When extending this example to a complete application, the first
thing to do is to implement the function @code{SetUpServer} to
initialize the HTML pages and some variables. These initializations
determine the way your HTML pages look (colors, titles, menu
items, etc.).

The function @code{HandleGET} is a nested case selection that decides
which page the user wants to see next.  Each nesting level refers to a menu
level of the GUI. Each case implements a certain action of the menu. On the
deepest level of case selection, the handler essentially knows what the
user wants and stores the answer into the variable that holds the HTML
page contents:

@smallexample
@c file eg/network/eliza.awk
function HandleGET() @{
  # A real HTTP server would treat some parts of the URI as a file name.
  # We take parts of the URI as menu choices and go on accordingly.
  if(MENU[2] == "AboutServer") @{
    Document    = "This is not a CGI script.\
      This is an httpd, an HTML file, and a CGI script all \
      in one GAWK script. It needs no separate www-server, \
      no installation, and no root privileges.\
      <p>To run it, do this:</p><ul>\
      <li> start this script with \"gawk -f httpserver.awk\",</li>\
      <li> and on the same host let your www browser open location\
           \"http://localhost:8080\"</li>\
      </ul>\<p>\ Details of HTTP come from:</p><ul>\
            <li>Hethmon:  Illustrated Guide to HTTP</p>\
            <li>RFC 2068</li></ul><p>JK 14.9.1997</p>"
  @} else if (MENU[2] == "AboutELIZA") @{
    Document    = "This is an implementation of the famous ELIZA\
        program by Joseph Weizenbaum. It is written in GAWK and\
/bin/sh: expad: command not found
  @} else if (MENU[2] == "StartELIZA") @{
    gsub(/\+/, " ", GETARG["YouSay"])
    # Here we also have to substitute coded special characters
    Document    = "<form method=GET>" \
      "<h3>" ElizaSays(GETARG["YouSay"]) "</h3>\
      <p><input type=text name=YouSay value=\"\" size=60>\
      <br><input type=submit value=\"Tell her about it\"></p></form>"
  @}
@}
@c endfile
@end smallexample

Now we are down to the heart of ELIZA, so you can see how it works.
Initially the user does not say anything; then ELIZA resets its money
counter and asks the user to tell what comes to mind open heartedly.
The subsequent answers are converted to uppercase characters and stored for
later comparison. ELIZA presents the bill when being confronted with
a sentence that contains the phrase ``shut up.'' Otherwise, it looks for
keywords in the sentence, conjugates the rest of the sentence, remembers
the keyword for later use, and finally selects an answer from the set of
possible answers:

@smallexample
@c file eg/network/eliza.awk
function ElizaSays(YouSay) @{
  if (YouSay == "") @{
    cost = 0
    answer = "HI, IM ELIZA, TELL ME YOUR PROBLEM"
  @} else @{
    q = toupper(YouSay)
    gsub("'", "", q)
    if(q == qold) @{
      answer = "PLEASE DONT REPEAT YOURSELF !"
    @} else @{
      if (index(q, "SHUT UP") > 0) @{
        answer = "WELL, PLEASE PAY YOUR BILL. ITS EXACTLY ... $"\
                 int(100*rand()+30+cost/100)
      @} else @{
        qold = q
        w = "-"                 # no keyword recognized yet
        for (i in k) @{          # search for keywords
          if (index(q, i) > 0) @{
            w = i
            break
          @}
        @}
        if (w == "-") @{         # no keyword, take old subject
          w    = wold
          subj = subjold
        @} else @{                # find subject 
          subj = substr(q, index(q, w) + length(w)+1)
          wold = w
          subjold = subj        #  remember keyword and subject
        @}
        for (i in conj)
           gsub(i, conj[i], q)   # conjugation
        # from all answers to this keyword, select one randomly
        answer = r[indices[int(split(k[w], indices) * rand()) + 1]]
        # insert subject into answer
        gsub("_", subj, answer)
      @}
    @}
  @}
  cost += length(answer) # for later payment : 1 cent per character
  return answer
@}
@c endfile
@end smallexample

In the long but simple function @code{SetUpEliza}, you can see tables
for conjugation, keywords, and answers.@footnote{The version shown
here is abbreviated.  The full version comes with the @command{gawk}
distribution.} The associative array @code{k}
contains indices into the array of answers @code{r}. To choose an
answer, ELIZA just picks an index randomly:

@example
@c file eg/network/eliza.awk
function SetUpEliza() @{
  srand()
  wold = "-"
  subjold = " "

  # table for conjugation
  conj[" ARE "     ] = " AM "
  conj["WERE "     ] = "WAS "
  conj[" YOU "     ] = " I "
  conj["YOUR "     ] = "MY "
  conj[" IVE "     ] =\
  conj[" I HAVE "  ] = " YOU HAVE "
  conj[" YOUVE "   ] =\
  conj[" YOU HAVE "] = " I HAVE "
  conj[" IM "      ] =\
  conj[" I AM "    ] = " YOU ARE "
  conj[" YOURE "   ] =\
  conj[" YOU ARE " ] = " I AM "

  # table of all answers
  r[1]   = "DONT YOU BELIEVE THAT I CAN  _"
  r[2]   = "PERHAPS YOU WOULD LIKE TO BE ABLE TO _ ?"
@c endfile
  @dots{}
@end example
@ignore
@c file eg/network/eliza.awk
  r[3]   = "YOU WANT ME TO BE ABLE TO _ ?"
  r[4]   = "PERHAPS YOU DONT WANT TO _ "
  r[5]   = "DO YOU WANT TO BE ABLE TO _ ?"
  r[6]   = "WHAT MAKES YOU THINK I AM _ ?"
  r[7]   = "DOES IT PLEASE YOU TO BELIEVE I AM _ ?"
  r[8]   = "PERHAPS YOU WOULD LIKE TO BE _ ?"
  r[9]   = "DO YOU SOMETIMES WISH YOU WERE _ ?"
  r[10]  = "DONT YOU REALLY _ ?"
  r[11]  = "WHY DONT YOU _ ?"
  r[12]  = "DO YOU WISH TO BE ABLE TO _ ?"
  r[13]  = "DOES THAT TROUBLE YOU ?"
  r[14]  = "TELL ME MORE ABOUT SUCH FEELINGS"
  r[15]  = "DO YOU OFTEN FEEL _ ?"
  r[16]  = "DO YOU ENJOY FEELING _ ?"
  r[17]  = "DO YOU REALLY BELIEVE I DONT _ ?"
  r[18]  = "PERHAPS IN GOOD TIME I WILL _ "
  r[19]  = "DO YOU WANT ME TO _ ?"
  r[20]  = "DO YOU THINK YOU SHOULD BE ABLE TO _ ?"
  r[21]  = "WHY CANT YOU _ ?"
  r[22]  = "WHY ARE YOU INTERESTED IN WHETHER OR NOT I AM _ ?"
  r[23]  = "WOULD YOU PREFER IF I WERE NOT _ ?"
  r[24]  = "PERHAPS IN YOUR FANTASIES I AM _ "
  r[25]  = "HOW DO YOU KNOW YOU CANT _ ?"
  r[26]  = "HAVE YOU TRIED ?"
  r[27]  = "PERHAPS YOU CAN NOW _ "
  r[28]  = "DID YOU COME TO ME BECAUSE YOU ARE _ ?"
  r[29]  = "HOW LONG HAVE YOU BEEN _ ?"
  r[30]  = "DO YOU BELIEVE ITS NORMAL TO BE _ ?"
  r[31]  = "DO YOU ENJOY BEING _ ?"
  r[32]  = "WE WERE DISCUSSING YOU -- NOT ME"
  r[33]  = "Oh, I _"
  r[34]  = "YOU'RE NOT REALLY TALKING ABOUT ME, ARE YOU ?"
  r[35]  = "WHAT WOULD IT MEAN TO YOU, IF YOU GOT _ ?"
  r[36]  = "WHY DO YOU WANT _ ?"
  r[37]  = "SUPPOSE YOU SOON GOT _"
  r[38]  = "WHAT IF YOU NEVER GOT _ ?"
  r[39]  = "I SOMETIMES ALSO WANT _"
  r[40]  = "WHY DO YOU ASK ?"
  r[41]  = "DOES THAT QUESTION INTEREST YOU ?"
  r[42]  = "WHAT ANSWER WOULD PLEASE YOU THE MOST ?"
  r[43]  = "WHAT DO YOU THINK ?"
  r[44]  = "ARE SUCH QUESTIONS IN YOUR MIND OFTEN ?"
  r[45]  = "WHAT IS IT THAT YOU REALLY WANT TO KNOW ?"
  r[46]  = "HAVE YOU ASKED ANYONE ELSE ?"
  r[47]  = "HAVE YOU ASKED SUCH QUESTIONS BEFORE ?"
  r[48]  = "WHAT ELSE COMES TO MIND WHEN YOU ASK THAT ?"
  r[49]  = "NAMES DON'T INTEREST ME"
  r[50]  = "I DONT CARE ABOUT NAMES -- PLEASE GO ON"
  r[51]  = "IS THAT THE REAL REASON ?"
  r[52]  = "DONT ANY OTHER REASONS COME TO MIND ?"
  r[53]  = "DOES THAT REASON EXPLAIN ANYTHING ELSE ?"
  r[54]  = "WHAT OTHER REASONS MIGHT THERE BE ?"
  r[55]  = "PLEASE DON'T APOLOGIZE !"
  r[56]  = "APOLOGIES ARE NOT NECESSARY"
  r[57]  = "WHAT FEELINGS DO YOU HAVE WHEN YOU APOLOGIZE ?"
  r[58]  = "DON'T BE SO DEFENSIVE"
  r[59]  = "WHAT DOES THAT DREAM SUGGEST TO YOU ?"
  r[60]  = "DO YOU DREAM OFTEN ?"
  r[61]  = "WHAT PERSONS APPEAR IN YOUR DREAMS ?"
  r[62]  = "ARE YOU DISTURBED BY YOUR DREAMS ?"
  r[63]  = "HOW DO YOU DO ... PLEASE STATE YOUR PROBLEM"
  r[64]  = "YOU DON'T SEEM QUITE CERTAIN"
  r[65]  = "WHY THE UNCERTAIN TONE ?"
  r[66]  = "CAN'T YOU BE MORE POSITIVE ?"
  r[67]  = "YOU AREN'T SURE ?"
  r[68]  = "DON'T YOU KNOW ?"
  r[69]  = "WHY NO _ ?"
  r[70]  = "DON'T SAY NO, IT'S ALWAYS SO NEGATIVE"
  r[71]  = "WHY NOT ?"
  r[72]  = "ARE YOU SURE ?"
  r[73]  = "WHY NO ?"
  r[74]  = "WHY ARE YOU CONCERNED ABOUT MY _ ?"
  r[75]  = "WHAT ABOUT YOUR OWN _ ?"
  r[76]  = "CAN'T YOU THINK ABOUT A SPECIFIC EXAMPLE ?"
  r[77]  = "WHEN ?"
  r[78]  = "WHAT ARE YOU THINKING OF ?"
  r[79]  = "REALLY, ALWAYS ?"
  r[80]  = "DO YOU REALLY THINK SO ?"
  r[81]  = "BUT YOU ARE NOT SURE YOU _ "
  r[82]  = "DO YOU DOUBT YOU _ ?"
  r[83]  = "IN WHAT WAY ?"
  r[84]  = "WHAT RESEMBLANCE DO YOU SEE ?"
  r[85]  = "WHAT DOES THE SIMILARITY SUGGEST TO YOU ?"
  r[86]  = "WHAT OTHER CONNECTION DO YOU SEE ?"
  r[87]  = "COULD THERE REALLY BE SOME CONNECTIONS ?"
  r[88]  = "HOW ?"
  r[89]  = "YOU SEEM QUITE POSITIVE"
  r[90]  = "ARE YOU SURE ?"
  r[91]  = "I SEE"
  r[92]  = "I UNDERSTAND"
  r[93]  = "WHY DO YOU BRING UP THE TOPIC OF FRIENDS ?"
  r[94]  = "DO YOUR FRIENDS WORRY YOU ?"
  r[95]  = "DO YOUR FRIENDS PICK ON YOU ?"
  r[96]  = "ARE YOU SURE YOU HAVE ANY FRIENDS ?"
  r[97]  = "DO YOU IMPOSE ON YOUR FRIENDS ?"
  r[98]  = "PERHAPS YOUR LOVE FOR FRIENDS WORRIES YOU"
  r[99]  = "DO COMPUTERS WORRY YOU ?"
  r[100] = "ARE YOU TALKING ABOUT ME IN PARTICULAR ?"
  r[101] = "ARE YOU FRIGHTENED BY MACHINES ?"
  r[102] = "WHY DO YOU MENTION COMPUTERS ?"
  r[103] = "WHAT DO YOU THINK MACHINES HAVE TO DO WITH YOUR PROBLEMS ?"
  r[104] = "DON'T YOU THINK COMPUTERS CAN HELP PEOPLE ?"
  r[105] = "WHAT IS IT ABOUT MACHINES THAT WORRIES YOU ?"
  r[106] = "SAY, DO YOU HAVE ANY PSYCHOLOGICAL PROBLEMS ?"
  r[107] = "WHAT DOES THAT SUGGEST TO YOU ?"
  r[108] = "I SEE"
  r[109] = "IM NOT SURE I UNDERSTAND YOU FULLY"
  r[110] = "COME COME ELUCIDATE YOUR THOUGHTS"
  r[111] = "CAN YOU ELABORATE ON THAT ?"
  r[112] = "THAT IS QUITE INTERESTING"
  r[113] = "WHY DO YOU HAVE PROBLEMS WITH MONEY ?"
  r[114] = "DO YOU THINK MONEY IS EVERYTHING ?"
  r[115] = "ARE YOU SURE THAT MONEY IS THE PROBLEM ?"
  r[116] = "I THINK WE WANT TO TALK ABOUT YOU, NOT ABOUT ME"
  r[117] = "WHAT'S ABOUT ME ?"
  r[118] = "WHY DO YOU ALWAYS BRING UP MY NAME ?"
@c endfile
@end ignore

@example
@c file eg/network/eliza.awk
  # table for looking up answers that
  # fit to a certain keyword 
  k["CAN YOU"]      = "1 2 3"
  k["CAN I"]        = "4 5"
  k["YOU ARE"]      =\
  k["YOURE"]        = "6 7 8 9"
@c endfile
  @dots{}
@end example
@ignore
@c file eg/network/eliza.awk
  k["I DONT"]       = "10 11 12 13"
  k["I FEEL"]       = "14 15 16"
  k["WHY DONT YOU"] = "17 18 19"
  k["WHY CANT I"]   = "20 21"
  k["ARE YOU"]      = "22 23 24"
  k["I CANT"]       = "25 26 27"
  k["I AM"]         =\
  k["IM "]          = "28 29 30 31"
  k["YOU "]         = "32 33 34"
  k["I WANT"]       = "35 36 37 38 39"
  k["WHAT"]         =\
  k["HOW"]          =\
  k["WHO"]          =\
  k["WHERE"]        =\
  k["WHEN"]         =\
  k["WHY"]          = "40 41 42 43 44 45 46 47 48"
  k["NAME"]         = "49 50"
  k["CAUSE"]        = "51 52 53 54"
  k["SORRY"]        = "55 56 57 58"
  k["DREAM"]        = "59 60 61 62"
  k["HELLO"]        =\
  k["HI "]          = "63"
  k["MAYBE"]        = "64 65 66 67 68"
  k[" NO "]         = "69 70 71 72 73"
  k["YOUR"]         = "74 75"
  k["ALWAYS"]       = "76 77 78 79"
  k["THINK"]        = "80 81 82"
  k["LIKE"]         = "83 84 85 86 87 88 89"
  k["YES"]          = "90 91 92"
  k["FRIEND"]       = "93 94 95 96 97 98"
  k["COMPUTER"]     = "99 100 101 102 103 104 105"
  k["-"]            = "106 107 108 109 110 111 112"
  k["MONEY"]        = "113 114 115"
  k["ELIZA"]        = "116 117 118"
@c endfile
@end ignore
@example
@c file eg/network/eliza.awk
@}
@c endfile
@end example

@cindex Humphrys, Mark
@cindex ELIZA program
Some interesting remarks and details (including the original source code
of ELIZA) are found on Mark Humphrys' home page.  Yahoo!  also has a
page with a collection of ELIZA-like programs. Many of them are written
in Java, some of them disclosing the Java source code, and a few even
explain how to modify the Java source code.

@node Caveats, Challenges, Simple Server, Using Networking
@section Network Programming Caveats

@cindex networks, @command{gawk} and, troubleshooting
@cindex @command{gawk}, networking, troubleshooting
@cindex troubleshooting, @command{gawk}, networks
By now it should be clear
that debugging a networked application is more
complicated than debugging a single-process single-hosted application.
The behavior of a networked application sometimes looks noncausal because
it is not reproducible in a strong sense. Whether a network application
works or not sometimes depends on the following:

@itemize @bullet
@item
How crowded the underlying network is

@item
If the party at the other end is running or not

@item
The state of the party at the other end
@end itemize

@cindex troubleshooting, networks, timeouts
The most difficult problems for a beginner arise from the hidden states of the
underlying network. After closing a TCP connection, it's often necessary to wait
a short while before reopening the connection. Even more difficult is the
establishment of a connection that previously ended with a ``broken pipe.''
Those connections have to ``time out'' for a minute or so
before they can reopen.
Check this with the command @samp{netstat -a}, which
provides a list of still ``active'' connections.

@node Challenges, , Caveats, Using Networking
@section Where To Go From Here

@cindex Loebner, Hugh
@cindex contest
Now, you have learned enough to build your own application. You could,
for example, take part in the
Loebner Contest
to win $100,000.  The Loebner Prize is the first formal instantiation of
a Turing Test. Hugh Loebner agreed with The Cambridge Center for Behavioral
Studies to underwrite a contest designed to implement the Turing Test.
Dr.@: Loebner pledged a Grand Prize of $100,000 for the first computer whose
responses were indistinguishable from a human's. Each year an annual prize
of $2000 and a bronze medal is awarded to the @emph{most} human computer.
The winner of the annual contest is the best entry relative to other entries 
that year, irrespective of how good it is in an absolute sense. Here is
an example of a conversation with the winning program of 1997:

@smallexample
(c)1997 Cambridge Center For Behavioral Studies all rights reserved [CONVERSE]
[Intelligent Research Ltd.] Start at: [1997/04/29 08:59:40]
PROGRAM[10:39:20]Hi, my name is Catherine, what's yours?
JUDGE02[10:39:36]My name is Nathan.
PROGRAM[10:39:37]Well, it's nice to meet you, Nathan.
PROGRAM[10:39:45]Did you see that story on CNN last night about the lesbian
PROGRAM[10:40:03]couple who came out at a White House party on Sunday?
JUDGE02[10:40:45]No.  I'm sorry I missed that.  How did the Prez take it?
PROGRAM[10:40:47]Ellen Degeneres was one of them - she was kissing her lover
PROGRAM[10:41:08]in front of Bill Clinton. Do you think he was very
PROGRAM[10:41:21]embarrassed?
JUDGE02[10:41:37]You tell me.  I didn't see it.
PROGRAM[10:41:38]Are you still a Friend of Bill Clinton?
JUDGE02[10:42:22]Yes.  I would like to talk about something else.  Do you know anything about sculpture.
PROGRAM[10:42:24]Well, I suppose he's doing his best like most of us, but he
PROGRAM[10:42:35]does seem to attract an UNUSUAL amount of bad publicity,
PROGRAM[10:42:51]wouldnt you say--can it all be bad luck?
JUDGE02[10:43:20]I asked you if you knew anything about sculpture.
PROGRAM[10:43:21]OK, but what happens now--does it all drift on, with
PROGRAM[10:43:36]everyone waiting to see how much his friends have said about
PROGRAM[10:43:47]him under threat of lots and lots of jail-or will something
PROGRAM[10:44:01]dramatic actually happen?
JUDGE02[10:44:19]Do you know what sculpture is?
PROGRAM[10:44:20]Well, youre more a fan of his than I am I guess---though I
PROGRAM[10:44:34]cant SWEAR what Id have done in that locked hotel room, I
PROGRAM[10:44:51]must say.
JUDGE02[10:45:26]Have you ever been to the Salmagundi Club?
PROGRAM[10:45:29]Never. to go back to the original discussion Time to get
PROGRAM[10:45:43]off this one I think and onto something more down to earth!
JUDGE02[10:46:20]What is more down to earth than art?
PROGRAM[10:46:21]I don't know what -- to be honest. -
PROGRAM[10:46:30]I suppose it depends on where you're coming from, but as the
PROGRAM[10:46:46]song goes in My Fair Lady, "Why can't a woman be more like a
PROGRAM[10:47:03]man? "
JUDGE02[10:47:52]I liked that song but I prefer "Why can't children be more like we are"
JUDGE02[10:48:10]from another play the name of which I cannot remember.
PROGRAM[10:48:12]What, if anything, do you enjoy about being a man?
JUDGE02[10:49:01]I would enjoy all aspects of life no matter what my gender.
PROGRAM[10:49:02]Do you think of yourself as being attractive?
@end smallexample

@cindex Clinton, Bill
This program insists on always speaking about the same story around Bill
Clinton. You see, even a program with a rather narrow mind can behave so
much like a human being that it can win this prize. It is quite common to
let these programs talk to each other via network connections. But during the
competition itself, the program and its computer have to be present at the
place the competition is held. We all would love to see a @command{gawk}
program win in such an event. Maybe it is up to you to accomplish this?

Some other ideas for useful networked applications:
@itemize @bullet
@item
Read the file @file{doc/awkforai.txt} in the @command{gawk} distribution.
It was written by Ronald P.@: Loui (Associate Professor of
Computer Science, at Washington University in St. Louis,
@email{loui@@ai.wustl.edu}) and summarizes why
he teaches @command{gawk} to students of Artificial Intelligence. Here are
some passages from the text:

@cindex AI
@cindex PROLOG
@cindex Loui, Ronald
@cindex agent
@quotation
The GAWK manual can
be consumed in a single lab session and the language can be mastered by
the next morning by the average student.  GAWK's automatic
initialization, implicit coercion, I/O support and lack of pointers
forgive many of the mistakes that young programmers are likely to make.
Those who have seen C but not mastered it are happy to see that GAWK
retains some of the same sensibilities while adding what must be
regarded as spoonsful of syntactic sugar.@*
@dots{}@*
@cindex robot
There are further simple answers.  Probably the best is the fact that
increasingly, undergraduate AI programming is involving the Web.  Oren
Etzioni (University of Washington, Seattle) has for a while been arguing
that the ``softbot'' is replacing the mechanical engineers' robot as the
most glamorous AI testbed.  If the artifact whose behavior needs to be
controlled in an intelligent way is the software agent, then a language
that is well-suited to controlling the software environment is the
appropriate language.  That would imply a scripting language.  If the
robot is KAREL, then the right language is ``turn left; turn right.'' If
the robot is Netscape, then the right language is something that can
generate @samp{netscape -remote 'openURL(http://cs.wustl.edu/~loui)'} with
elan.@*
@dots{}@*
AI programming requires high-level thinking.  There have always been a few
gifted programmers who can write high-level programs in assembly language.
Most however need the ambient abstraction to have a higher floor.@*
@dots{}@*
Second, inference is merely the expansion of notation.  No matter whether
the logic that underlies an AI program is fuzzy, probabilistic, deontic,
defeasible, or deductive, the logic merely defines how strings can be 
transformed into other strings.  A language that provides the best
support for string processing in the end provides the best support for
logic, for the exploration of various logics, and for most forms of
symbolic processing that AI might choose to call ``reasoning'' instead of
``logic.''  The implication is that PROLOG, which saves the AI programmer
from having to write a unifier, saves perhaps two dozen lines of GAWK
code at the expense of strongly biasing the logic and representational
expressiveness of any approach.
@end quotation

Now that @command{gawk} itself can connect to the Internet, it should be obvious
that it is suitable for writing intelligent web agents.

@item
@command{awk} is strong at pattern recognition and string processing.
So, it is well suited to the classic problem of language translation.
A first try could be a program that knows the 100 most frequent English
words and their counterparts in German or French. The service could be
implemented by regularly reading email with the program above, replacing
each word by its translation and sending the translation back via SMTP.
Users would send English email to their translation service and get
back a translated email message in return. As soon as this works,
more effort can be spent on a real translation program.

@item
Another dialogue-oriented application (on the verge
of ridicule) is the email ``support service.'' Troubled customers write an
email to an automatic @command{gawk} service that reads the email. It looks
for keywords in the mail and assembles a reply email accordingly. By carefully
investigating the email header, and repeating these keywords through the
reply email, it is rather simple to give the customer a feeling that
someone cares. Ideally, such a service would search a database of previous
cases for solutions. If none exists, the database could, for example, consist
of all the newsgroups, mailing lists and FAQs on the Internet.
@end itemize

@node Some Applications and Techniques, Links, Using Networking, Top
@comment node-name,    next,  previous,      up

@chapter Some Applications and Techniques
In this @value{CHAPTER}, we look at a number of self-contained
scripts, with an emphasis on concise networking.  Along the way, we
work towards creating building blocks that encapsulate often needed
functions of the networking world, show new techniques that
broaden the scope of problems that can be solved with @command{gawk}, and
explore leading edge technology that may shape the future of networking.

We often refer to the site-independent core of the server that
we built in
@ref{Simple Server, ,A Simple Web Server}.
When building new and nontrivial servers, we
always copy this building block and append new instances of the two
functions @code{SetUpServer} and @code{HandleGET}.

This makes a lot of sense, since
this scheme of event-driven
execution provides @command{gawk} with an interface to the most widely
accepted standard for GUIs: the web browser. Now, @command{gawk} can rival even
Tcl/Tk.

@cindex Tcl/Tk, @command{gawk} and
Tcl and @command{gawk} have much in common. Both are simple scripting languages
that allow us to quickly solve problems with short programs. But Tcl has Tk
on top of it, and @command{gawk} had nothing comparable up to now. While Tcl
needs a large and ever-changing library (Tk, which was bound to the X Window
System until recently), @command{gawk} needs just the networking interface
and some kind of browser on the client's side. Besides better portability,
the most important advantage of this approach (embracing well-established
standards such HTTP and HTML) is that @emph{we do not need to change the
language}. We let others do the work of fighting over protocols and standards.
We can use HTML, JavaScript, VRML, or whatever else comes along to do our work.

@menu
* PANIC::                       An Emergency Web Server.
* GETURL::                      Retrieving Web Pages.
* REMCONF::                     Remote Configuration Of Embedded Systems.
* URLCHK::                      Look For Changed Web Pages.
* WEBGRAB::                     Extract Links From A Page.
* STATIST::                     Graphing A Statistical Distribution.
* MAZE::                        Walking Through A Maze In Virtual Reality.
* MOBAGWHO::                    A Simple Mobile Agent.
* STOXPRED::                    Stock Market Prediction As A Service.
* PROTBASE::                    Searching Through A Protein Database.
@end menu

@node PANIC, GETURL, Some Applications and Techniques, Some Applications and Techniques
@section PANIC: An Emergency Web Server
@cindex PANIC program
@cindex networks, See Also web pages
@cindex web service
At first glance, the @code{"Hello, world"} example in
@ref{Primitive Service, ,A Primitive Web Service},
seems useless. By adding just a few lines, we can turn it into something useful.

The PANIC program tells everyone who connects that the local
site is not working. When a web server breaks down, it makes a difference
if customers get a strange ``network unreachable'' message, or a short message
telling them that the server has a problem. In such an emergency,
the hard disk and everything on it (including the regular web service) may
be unavailable. Rebooting the web server off a diskette makes sense in this
setting.

To use the PANIC program as an emergency web server, all you need are the
@command{gawk} executable and the program below on a diskette. By default,
it connects to port 8080. A different value may be supplied on the
command line:

@example
@c file eg/network/panic.awk
BEGIN @{
  RS = ORS = "\r\n"
  if (MyPort ==  0) MyPort = 8080
  HttpService = "/inet/tcp/" MyPort "/0/0"
  Hello = "<HTML><HEAD><TITLE>Out Of Service</TITLE>" \
     "</HEAD><BODY><H1>" \
     "This site is temporarily out of service." \
     "</H1></BODY></HTML>"
  Len = length(Hello) + length(ORS)
  while ("awk" != "complex") @{
    print "HTTP/1.0 200 OK"          |& HttpService
    print "Content-Length: " Len ORS |& HttpService
    print Hello                      |& HttpService
    while ((HttpService |& getline) > 0)
       continue;
    close(HttpService)
  @}
@}
@c endfile
@end example

@node GETURL, REMCONF, PANIC, Some Applications and Techniques
@section GETURL: Retrieving Web Pages
@cindex GETURL program
@cindex web pages, retrieving
GETURL is a versatile building block for shell scripts that need to retrieve
files from the Internet. It takes a web address as a command-line parameter and
tries to retrieve the contents of this address. The contents are printed
to standard output, while the header is printed to @file{/dev/stderr}.
A surrounding shell script
could analyze the contents and extract the text or the links. An ASCII
browser could be written around GETURL. But more interestingly, web robots are
straightforward to write on top of GETURL. On the Internet, you can find
several programs of the same name that do the same job. They are usually
much more complex internally and at least 10 times longer.

At first, GETURL checks if it was called with exactly one web address.
Then, it checks if the user chose to use a special proxy server whose name
is handed over in a variable. By default, it is assumed that the local
machine serves as proxy. GETURL uses the @code{GET} method by default
to access the web page. By handing over the name of a different method
(such as @code{HEAD}), it is possible to choose a different behavior. With
the @code{HEAD} method, the user does not receive the body of the page
content, but does receive the header:

@example
@c file eg/network/geturl.awk
BEGIN @{
  if (ARGC != 2) @{
    print "GETURL - retrieve Web page via HTTP 1.0"
    print "IN:\n    the URL as a command-line parameter"
    print "PARAM(S):\n    -v Proxy=MyProxy"
    print "OUT:\n    the page content on stdout"
    print "    the page header on stderr"
    print "JK 16.05.1997"
    print "ADR 13.08.2000"
    exit
  @}
  URL = ARGV[1]; ARGV[1] = ""
  if (Proxy     == "")  Proxy     = "127.0.0.1"
  if (ProxyPort ==  0)  ProxyPort = 80
  if (Method    == "")  Method    = "GET"
  HttpService = "/inet/tcp/0/" Proxy "/" ProxyPort
  ORS = RS = "\r\n\r\n"
  print Method " " URL " HTTP/1.0" |& HttpService
  HttpService                      |& getline Header
  print Header > "/dev/stderr"
  while ((HttpService |& getline) > 0)
    printf "%s", $0
  close(HttpService)
@}
@c endfile
@end example

This program can be changed as needed, but be careful with the last lines.
Make sure transmission of binary data is not corrupted by additional line
breaks. Even as it is now, the byte sequence @code{"\r\n\r\n"} would
disappear if it were contained in binary data. Don't get caught in a
trap when trying a quick fix on this one.

@node REMCONF, URLCHK, GETURL, Some Applications and Techniques
@section REMCONF: Remote Configuration of Embedded Systems
@cindex REMCONF program
@cindex Linux
@cindex GNU/Linux
@cindex Yahoo!
Today, you often find powerful processors in embedded systems.  Dedicated
network routers and controllers for all kinds of machinery are examples
of embedded systems. Processors like the Intel 80x86 or the AMD Elan are
able to run multitasking operating systems, such as XINU or GNU/Linux
in embedded PCs.  These systems are small and usually do not have
a keyboard or a display.  Therefore it is difficult to set up their
configuration. There are several widespread ways to set them up:

@itemize @bullet
@item
DIP switches

@item
Read Only Memories such as EPROMs

@item
Serial lines or some kind of keyboard

@item
Network connections via @command{telnet} or SNMP

@item
HTTP connections with HTML GUIs 
@end itemize

In this @value{SECTION}, we look at a solution that uses HTTP connections
to control variables of an embedded system that are stored in a file.
Since embedded systems have tight limits on resources like memory,
it is difficult to employ advanced techniques such as SNMP and HTTP
servers. @command{gawk} fits in quite nicely with its single executable
which needs just a short script to start working.
The following program stores the variables in a file, and a concurrent
process in the embedded system may read the file. The program uses the
site-independent part of the simple web server that we developed in
@ref{Interacting Service, ,A Web Service with Interaction}.
As mentioned there, all we have to do is to write two new procedures
@code{SetUpServer} and @code{HandleGET}:

@smallexample
@c file eg/network/remconf.awk
function SetUpServer() @{
  TopHeader = "<HTML><title>Remote Configuration</title>"
  TopDoc = "<BODY>\
    <h2>Please choose one of the following actions:</h2>\
    <UL>\
      <LI><A HREF=" MyPrefix "/AboutServer>About this server</A></LI>\
      <LI><A HREF=" MyPrefix "/ReadConfig>Read Configuration</A></LI>\
      <LI><A HREF=" MyPrefix "/CheckConfig>Check Configuration</A></LI>\
      <LI><A HREF=" MyPrefix "/ChangeConfig>Change Configuration</A></LI>\
      <LI><A HREF=" MyPrefix "/SaveConfig>Save Configuration</A></LI>\
    </UL>"
  TopFooter  = "</BODY></HTML>"
  if (ConfigFile == "") ConfigFile = "config.asc"
@}
@c endfile
@end smallexample

The function @code{SetUpServer} initializes the top level HTML texts
as usual. It also initializes the name of the file that contains the
configuration parameters and their values. In case the user supplies
a name from the command line, that name is used. The file is expected to
contain one parameter per line, with the name of the parameter in
column one and the value in column two.

The function @code{HandleGET} reflects the structure of the menu
tree as usual. The first menu choice tells the user what this is all
about. The second choice reads the configuration file line by line
and stores the parameters and their values. Notice that the record
separator for this file is @code{"\n"}, in contrast to the record separator
for HTTP. The third menu choice builds an HTML table to show
the contents of the configuration file just read. The fourth choice
does the real work of changing parameters, and the last one just saves
the configuration into a file:

@smallexample
@c file eg/network/remconf.awk
function HandleGET() @{
  if(MENU[2] == "AboutServer") @{
    Document  = "This is a GUI for remote configuration of an\
      embedded system. It is is implemented as one GAWK script."
  @} else if (MENU[2] == "ReadConfig") @{
    RS = "\n"
    while ((getline < ConfigFile) > 0)
       config[$1] = $2;
    close(ConfigFile)
    RS = "\r\n"
    Document = "Configuration has been read."
  @} else if (MENU[2] == "CheckConfig") @{
    Document = "<TABLE BORDER=1 CELLPADDING=5>"
    for (i in config)
      Document = Document "<TR><TD>" i "</TD>" \
        "<TD>" config[i] "</TD></TR>"
    Document = Document "</TABLE>"
  @} else if (MENU[2] == "ChangeConfig") @{
    if ("Param" in GETARG) @{            # any parameter to set?
      if (GETARG["Param"] in config) @{  # is  parameter valid?
        config[GETARG["Param"]] = GETARG["Value"]
        Document = (GETARG["Param"] " = " GETARG["Value"] ".")
      @} else @{
        Document = "Parameter <b>" GETARG["Param"] "</b> is invalid." 
      @}
    @} else @{
      Document = "<FORM method=GET><h4>Change one parameter</h4>\
        <TABLE BORDER CELLPADDING=5>\
        <TR><TD>Parameter</TD><TD>Value</TD></TR>\
        <TR><TD><input type=text name=Param value=\"\" size=20></TD>\
            <TD><input type=text name=Value value=\"\" size=40></TD>\
        </TR></TABLE><input type=submit value=\"Set\"></FORM>"
    @}
  @} else if (MENU[2] == "SaveConfig") @{
    for (i in config)
      printf("%s %s\n", i, config[i]) > ConfigFile
    close(ConfigFile)
    Document = "Configuration has been saved."
  @}
@}
@c endfile
@end smallexample

@cindex MiniSQL
We could also view the configuration file as a database. From this
point of view, the previous program acts like a primitive database server.
Real SQL database systems also make a service available by providing
a TCP port that clients can connect to. But the application level protocols
they use are usually proprietary and also change from time to time.
This is also true for the protocol that
MiniSQL uses.

@node URLCHK, WEBGRAB, REMCONF, Some Applications and Techniques
@section URLCHK: Look for Changed Web Pages
@cindex URLCHK program
Most people who make heavy use of Internet resources have a large
bookmark file with pointers to interesting web sites. It is impossible
to regularly check by hand if any of these sites have changed. A program
is needed to automatically look at the headers of web pages and tell
which ones have changed. URLCHK does the comparison after using GETURL
with the @code{HEAD} method to retrieve the header.

Like GETURL, this program first checks that it is called with exactly
one command-line parameter. URLCHK also takes the same command-line variables
@code{Proxy} and @code{ProxyPort} as GETURL,
because these variables are handed over to GETURL for each URL
that gets checked. The one and only parameter is the name of a file that
contains one line for each URL. In the first column, we find the URL, and
the second and third columns hold the length of the URL's body when checked
for the two last times. Now, we follow this plan:

@enumerate
@item
Read the URLs from the file and remember their most recent lengths

@item
Delete the contents of the file

@item
For each URL, check its new length and write it into the file

@item
If the most recent and the new length differ, tell the user
@end enumerate

It may seem a bit peculiar to read the URLs from a file together
with their two most recent lengths, but this approach has several
advantages. You can call the program again and again with the same
file. After running the program, you can regenerate the changed URLs
by extracting those lines that differ in their second and third columns:

@c inspired by URLCHK in iX 5/97 166.
@smallexample
@c file eg/network/urlchk.awk
BEGIN @{
  if (ARGC != 2) @{
    print "URLCHK - check if URLs have changed"
    print "IN:\n    the file with URLs as a command-line parameter"
    print "    file contains URL, old length, new length"
    print "PARAMS:\n    -v Proxy=MyProxy -v ProxyPort=8080"
    print "OUT:\n    same as file with URLs"
    print "JK 02.03.1998"
    exit
  @}
  URLfile = ARGV[1]; ARGV[1] = ""
  if (Proxy     != "") Proxy     = " -v Proxy="     Proxy
  if (ProxyPort != "") ProxyPort = " -v ProxyPort=" ProxyPort
  while ((getline < URLfile) > 0)
     Length[$1] = $3 + 0
  close(URLfile)      # now, URLfile is read in and can be updated
  GetHeader = "gawk " Proxy ProxyPort " -v Method=\"HEAD\" -f geturl.awk "
  for (i in Length) @{
    GetThisHeader = GetHeader i " 2>&1"
    while ((GetThisHeader | getline) > 0)
      if (toupper($0) ~ /CONTENT-LENGTH/) NewLength = $2 + 0
    close(GetThisHeader)
    print i, Length[i], NewLength > URLfile
    if (Length[i] != NewLength)  # report only changed URLs
      print i, Length[i], NewLength
  @}
  close(URLfile)
@}
@c endfile
@end smallexample

Another thing that may look strange is the way GETURL is called.
Before calling GETURL, we have to check if the proxy variables need
to be passed on. If so, we prepare strings that will become part
of the command line later. In @code{GetHeader}, we store these strings
together with the longest part of the command line. Later, in the loop
over the URLs, @code{GetHeader} is appended with the URL and a redirection
operator to form the command that reads the URL's header over the Internet.
GETURL always produces the headers over @file{/dev/stderr}. That is
the reason why we need the redirection operator to have the header
piped in.

This program is not perfect because it assumes that changing URLs
results in changed lengths, which is not necessarily true. A more
advanced approach is to look at some other header line that
holds time information. But, as always when things get a bit more
complicated, this is left as an exercise to the reader.

@node WEBGRAB, STATIST, URLCHK, Some Applications and Techniques
@section WEBGRAB: Extract Links from a Page
@cindex WEBGRAB program
@c Inspired by iX 1/98 157.
@cindex robot
Sometimes it is necessary to extract links from web pages.
Browsers do it, web robots do it, and sometimes even humans do it.
Since we have a tool like GETURL at hand, we can solve this problem with
some help from the Bourne shell:

@example
@c file eg/network/webgrab.awk
BEGIN @{ RS = "http://[#%&\\+\\-\\./0-9\\:;\\?A-Z_a-z\\~]*" @}
RT != "" @{
   command = ("gawk -v Proxy=MyProxy -f geturl.awk " RT \
               " > doc" NR ".html")
   print command
@}
@c endfile
@end example

Notice that the regular expression for URLs is rather crude. A precise
regular expression is much more complex. But this one works
rather well. One problem is that it is unable to find internal links of
an HTML document.  Another problem is that
@samp{ftp}, @samp{telnet}, @samp{news}, @samp{mailto}, and other kinds
of links are missing in the regular expression.
However, it is straightforward to add them, if doing so is necessary for other tasks.

This program reads an HTML file and prints all the HTTP links that it finds.
It relies on @command{gawk}'s ability to use regular expressions as record
separators. With @code{RS} set to a regular expression that matches links,
the second action is executed each time a non-empty link is found.
We can find the matching link itself in @code{RT}.

The action could use the @code{system} function to let another GETURL
retrieve the page, but here we use a different approach.
This simple program prints shell commands that can be piped into @command{sh}
for execution.  This way it is possible to first extract
the links, wrap shell commands around them, and pipe all the shell commands
into a file. After editing the file, execution of the file retrieves
exactly those files that we really need. In case we do not want to edit,
we can retrieve all the pages like this:

@smallexample
gawk -f geturl.awk http://www.suse.de | gawk -f webgrab.awk | sh
@end smallexample

@cindex Microsoft Windows
After this, you will find the contents of all referenced documents in
files named @file{doc*.html} even if they do not contain HTML code.
The most annoying thing is that we always have to pass the proxy to
GETURL. If you do not like to see the headers of the web pages
appear on the screen, you can redirect them to @file{/dev/null}.
Watching the headers appear can be quite interesting, because
it reveals
interesting details such as which web server the companies use.
Now, it is clear how the clever marketing people
use web robots to determine the
market shares
of Microsoft and Netscape in the web server market.

Port 80 of any web server is like a small hole in a repellent firewall.
After attaching a browser to port 80, we usually catch a glimpse
of the bright side of the server (its home page). With a tool like GETURL
at hand, we are able to discover some of the more concealed
or even ``indecent'' services (i.e., lacking conformity to standards of quality).
It can be exciting to see the fancy CGI scripts that lie
there, revealing the inner workings of the server, ready to be called:

@itemize @bullet
@item
With a command such as:

@example
gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/
@end example

some servers give you a directory listing of the CGI files.
Knowing the names, you can try to call some of them and watch
for useful results. Sometimes there are executables in such directories
(such as Perl interpreters) that you may call remotely. If there are
subdirectories with configuration data of the web server, this can also
be quite interesting to read.

@item
@cindex apache
The well-known Apache web server usually has its CGI files in the
directory @file{/cgi-bin}. There you can often find the scripts
@file{test-cgi} and @file{printenv}. Both tell you some things
about the current connection and the installation of the web server.
Just call:

@smallexample
gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/test-cgi
gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/printenv
@end smallexample

@item
Sometimes it is even possible to retrieve system files like the web
server's log file---possibly containing customer data---or even the file
@file{/etc/passwd}.
(We don't recommend this!)
@end itemize

@strong{Caution:}
Although this may sound funny or simply irrelevant, we are talking about
severe security holes. Try to explore your own system this way and make
sure that none of the above reveals too much information about your system.

@node STATIST, MAZE, WEBGRAB, Some Applications and Techniques
@section STATIST: Graphing a Statistical Distribution
@cindex STATIST program

@cindex GNUPlot utility
@cindex image format
@cindex GIF image format
@cindex PNG image format
@cindex PS image format
@cindex Boutell, Thomas
@iftex
@image{statist,3in}
@end iftex
In the HTTP server examples we've shown thus far, we never present an image
to the browser and its user. Presenting images is one task. Generating
images that reflect some user input and presenting these dynamically
generated images is another. In this @value{SECTION}, we use GNUPlot
for generating @file{.png}, @file{.ps}, or @file{.gif}
files.@footnote{Due to licensing problems, the default
installation of GNUPlot disables the generation of @file{.gif} files.
If your installed version does not accept @samp{set term gif},
just download and install the most recent version of GNUPlot and the
@uref{http://www.boutell.com/gd/, GD library}
by Thomas Boutell.
Otherwise you still have the chance to generate some
ASCII-art style images with GNUPlot by using @samp{set term dumb}.
(We tried it and it worked.)}

The program we develop takes the statistical parameters of two samples
and computes the t-test statistics. As a result, we get the probabilities
that the means and the variances of both samples are the same. In order to
let the user check plausibility, the program presents an image of the
distributions. The statistical computation follows
@cite{Numerical Recipes in C: The Art of Scientific Computing}
by William H.@: Press, Saul A.@: Teukolsky, William T.@: Vetterling, and Brian P. Flannery.
Since @command{gawk} does not have a built-in function
for the computation of the beta function, we use the @code{ibeta} function
of GNUPlot. As a side effect, we learn how to use GNUPlot as a
sophisticated calculator. The comparison of means is done as in @code{tutest},
paragraph 14.2, page 613, and the comparison of variances is done as in @code{ftest},
page 611 in @cite{Numerical Recipes}.
@cindex Numerical Recipes

As usual, we take the site-independent code for servers and append
our own functions @code{SetUpServer} and @code{HandleGET}:

@smallexample
@c file eg/network/statist.awk
function SetUpServer() @{
  TopHeader = "<HTML><title>Statistics with GAWK</title>"
  TopDoc = "<BODY>\
   <h2>Please choose one of the following actions:</h2>\
   <UL>\
    <LI><A HREF=" MyPrefix "/AboutServer>About this server</A></LI>\
    <LI><A HREF=" MyPrefix "/EnterParameters>Enter Parameters</A></LI>\
   </UL>"
  TopFooter  = "</BODY></HTML>"
  GnuPlot    = "gnuplot 2>&1"
  m1=m2=0;    v1=v2=1;    n1=n2=10
@}
@c endfile
@end smallexample

Here, you see the menu structure that the user sees. Later, we
will see how the program structure of the @code{HandleGET} function
reflects the menu structure. What is missing here is the link for the
image we generate. In an event-driven environment, request,
generation, and delivery of images are separated.

Notice the way we initialize the @code{GnuPlot} command string for
the pipe. By default,
GNUPlot outputs the generated image via standard output, as well as
the results of @code{print}(ed) calculations via standard error.
The redirection causes standard error to be mixed into standard
output, enabling us to read results of calculations with @code{getline}.
By initializing the statistical parameters with some meaningful
defaults, we make sure the user gets an image the first time
he uses the program.

@cindex JavaScript
Following is the rather long function @code{HandleGET}, which
implements the contents of this service by reacting to the different
kinds of requests from the browser. Before you start playing with
this script, make sure that your browser supports JavaScript and that it also
has this option switched on. The script uses a short snippet of
JavaScript code for delayed opening of a window with an image.
A more detailed explanation follows:

@smallexample
@c file eg/network/statist.awk
function HandleGET() @{
  if(MENU[2] == "AboutServer") @{
    Document  = "This is a GUI for a statistical computation.\
      It compares means and variances of two distributions.\
      It is implemented as one GAWK script and uses GNUPLOT."
  @} else if (MENU[2] == "EnterParameters") @{
    Document = ""
    if ("m1" in GETARG) @{     # are there parameters to compare?
      Document = Document "<SCRIPT LANGUAGE=\"JavaScript\">\
        setTimeout(\"window.open(\\\"" MyPrefix "/Image" systime()\
         "\\\",\\\"dist\\\", \\\"status=no\\\");\", 1000); </SCRIPT>"
      m1 = GETARG["m1"]; v1 = GETARG["v1"]; n1 = GETARG["n1"]
      m2 = GETARG["m2"]; v2 = GETARG["v2"]; n2 = GETARG["n2"]
      t = (m1-m2)/sqrt(v1/n1+v2/n2)
      df = (v1/n1+v2/n2)*(v1/n1+v2/n2)/((v1/n1)*(v1/n1)/(n1-1) \
           + (v2/n2)*(v2/n2) /(n2-1))
      if (v1>v2) @{
          f = v1/v2
          df1 = n1 - 1
          df2 = n2 - 1
      @} else @{
          f = v2/v1
          df1 = n2 - 1
          df2 = n1 - 1
      @}
      print "pt=ibeta(" df/2 ",0.5," df/(df+t*t) ")"  |& GnuPlot
      print "pF=2.0*ibeta(" df2/2 "," df1/2 "," \
            df2/(df2+df1*f) ")"                    |& GnuPlot
      print "print pt, pF"                         |& GnuPlot
      RS="\n"; GnuPlot |& getline; RS="\r\n"    # $1 is pt, $2 is pF
      print "invsqrt2pi=1.0/sqrt(2.0*pi)"          |& GnuPlot
      print "nd(x)=invsqrt2pi/sd*exp(-0.5*((x-mu)/sd)**2)" |& GnuPlot
      print "set term png small color"             |& GnuPlot
      #print "set term postscript color"           |& GnuPlot
      #print "set term gif medium size 320,240"    |& GnuPlot
      print "set yrange[-0.3:]"                    |& GnuPlot
      print "set label 'p(m1=m2) =" $1 "' at 0,-0.1 left"  |& GnuPlot
      print "set label 'p(v1=v2) =" $2 "' at 0,-0.2 left"  |& GnuPlot
      print "plot mu=" m1 ",sd=" sqrt(v1) ", nd(x) title 'sample 1',\
        mu=" m2 ",sd=" sqrt(v2) ", nd(x) title 'sample 2'" |& GnuPlot
      print "quit"                                         |& GnuPlot
      GnuPlot |& getline Image
      while ((GnuPlot |& getline) > 0)
          Image = Image RS $0
      close(GnuPlot)
    @}
    Document = Document "\
    <h3>Do these samples have the same Gaussian distribution?</h3>\
    <FORM METHOD=GET> <TABLE BORDER CELLPADDING=5>\
    <TR>\
    <TD>1. Mean    </TD>
    <TD><input type=text name=m1 value=" m1 " size=8></TD>\
    <TD>1. Variance</TD>
    <TD><input type=text name=v1 value=" v1 " size=8></TD>\
    <TD>1. Count   </TD>
    <TD><input type=text name=n1 value=" n1 " size=8></TD>\
    </TR><TR>\
    <TD>2. Mean    </TD>
    <TD><input type=text name=m2 value=" m2 " size=8></TD>\
    <TD>2. Variance</TD>
    <TD><input type=text name=v2 value=" v2 " size=8></TD>\
    <TD>2. Count   </TD>
    <TD><input type=text name=n2 value=" n2 " size=8></TD>\
    </TR>                   <input type=submit value=\"Compute\">\      
    </TABLE></FORM><BR>"
  @} else if (MENU[2] ~ "Image") @{     
    Reason = "OK" ORS "Content-type: image/png"
    #Reason = "OK" ORS "Content-type: application/x-postscript"
    #Reason = "OK" ORS "Content-type: image/gif"
    Header = Footer = ""
    Document = Image
  @}
@}
@c endfile
@end smallexample

@cindex PostScript
As usual, we give a short description of the service in the first
menu choice. The third menu choice shows us that generation and
presentation of an image are two separate actions. While the latter
takes place quite instantly in the third menu choice, the former
takes place in the much longer second choice. Image data passes from the
generating action to the presenting action via the variable @code{Image}
that contains a complete @file{.png} image, which is otherwise stored
in a file. If you prefer @file{.ps} or @file{.gif} images over the
default @file{.png} images, you may select these options by uncommenting
the appropriate lines. But remember to do so in two places: when
telling GNUPlot which kind of images to generate, and when transmitting the
image at the end of the program.

Looking at the end of the program,
the way we pass the @samp{Content-type} to the browser is a bit unusual.
It is appended to the @samp{OK} of the first header line
to make sure the type information becomes part of the header.
The other variables that get transmitted across the network are
made empty, because in this case we do not have an HTML document to
transmit, but rather raw image data to contain in the body.

Most of the work is done in the second menu choice. It starts with a
strange JavaScript code snippet. When first implementing this server,
we used a short @code{@w{"<IMG SRC="} MyPrefix "/Image>"} here. But then
browsers got smarter and tried to improve on speed by requesting the
image and the HTML code at the same time. When doing this, the browser
tries to build up a connection for the image request while the request for
the HTML text is not yet completed. The browser tries to connect
to the @command{gawk} server on port 8080 while port 8080 is still in use for
transmission of the HTML text. The connection for the image cannot be
built up, so the image appears as ``broken'' in the browser window.
We solved this problem by telling the browser to open a separate window
for the image, but only after a delay of 1000 milliseconds.
By this time, the server should be ready for serving the next request.

But there is one more subtlety in the JavaScript code.
Each time the JavaScript code opens a window for the image, the
name of the image is appended with a timestamp (@code{systime}).
Why this constant change of name for the image? Initially, we always named
the image @code{Image}, but then the Netscape browser noticed the name
had @emph{not} changed since the previous request and displayed the
previous image (caching behavior). The server core
is implemented so that browsers are told @emph{not} to cache anything.
Obviously HTTP requests do not always work as expected. One way to
circumvent the cache of such overly smart browsers is to change the
name of the image with each request. These three lines of JavaScript
caused us a lot of trouble.

The rest can be broken
down into two phases. At first, we check if there are statistical
parameters. When the program is first started, there usually are no
parameters because it enters the page coming from the top menu.
Then, we only have to present the user a form that he can use to change
statistical parameters and submit them. Subsequently, the submission of
the form causes the execution of the first phase because @emph{now}
there @emph{are} parameters to handle.

Now that we have parameters, we know there will be an image available.
Therefore we insert the JavaScript code here to initiate the opening
of the image in a separate window. Then,
we prepare some variables that will be passed to GNUPlot for calculation
of the probabilities. Prior to reading the results, we must temporarily
change @code{RS} because GNUPlot separates lines with newlines.
After instructing GNUPlot to generate a @file{.png} (or @file{.ps} or
@file{.gif}) image, we initiate the insertion of some text,
explaining the resulting probabilities. The final @samp{plot} command
actually generates the image data. This raw binary has to be read in carefully
without adding, changing, or deleting a single byte. Hence the unusual
initialization of @code{Image} and completion with a @code{while} loop.

When using this server, it soon becomes clear that it is far from being
perfect. It mixes source code of six scripting languages or protocols:

@itemize @bullet
@item GNU @command{awk} implements a server for the protocol:
@item HTTP which transmits:
@item HTML text which contains a short piece of:
@item JavaScript code opening a separate window.
@item A Bourne shell script is used for piping commands into:
@item GNUPlot to generate the image to be opened.
@end itemize

After all this work, the GNUPlot image opens in the JavaScript window
where it can be viewed by the user.

It is probably better not to mix up so many different languages.
The result is not very readable.  Furthermore, the
statistical part of the server does not take care of invalid input.
Among others, using negative variances will cause invalid results.

@node MAZE, MOBAGWHO, STATIST, Some Applications and Techniques
@section MAZE: Walking Through a Maze In Virtual Reality
@cindex MAZE
@cindex VRML
@c VRML in iX 11/96 134.
@quotation
@cindex Perlis, Alan
@i{In the long run, every program becomes rococo, and then rubble.}@*
Alan Perlis
@end quotation

By now, we know how to present arbitrary @samp{Content-type}s to a browser.
In this @value{SECTION}, our server will present a 3D world to our browser.
The 3D world is described in a scene description language (VRML,
Virtual Reality Modeling Language) that allows us to travel through a
perspective view of a 2D maze with our browser. Browsers with a
VRML plugin enable exploration of this technology. We could do
one of those boring @samp{Hello world} examples here, that are usually
presented when introducing novices to
VRML. If you have never written
any VRML code, have a look at
the VRML FAQ.
Presenting a static VRML scene is a bit trivial; in order to expose
@command{gawk}'s new capabilities, we will present a dynamically generated
VRML scene. The function @code{SetUpServer} is very simple because it
only sets the default HTML page and initializes the random number
generator. As usual, the surrounding server lets you browse the maze.

@smallexample
@c file eg/network/maze.awk
function SetUpServer() @{
  TopHeader = "<HTML><title>Walk through a maze</title>"
  TopDoc = "\
    <h2>Please choose one of the following actions:</h2>\
    <UL>\
      <LI><A HREF=" MyPrefix "/AboutServer>About this server</A>\
      <LI><A HREF=" MyPrefix "/VRMLtest>Watch a simple VRML scene</A>\
    </UL>"
  TopFooter  = "</HTML>"
  srand()
@}
@c endfile
@end smallexample

The function @code{HandleGET} is a bit longer because it first computes
the maze and afterwards generates the VRML code that is sent across
the network. As shown in the STATIST example
(@pxref{STATIST}),
we set the type of the
content to VRML and then store the VRML representation of the maze as the
page content. We assume that the maze is stored in a 2D array. Initially,
the maze consists of walls only. Then, we add an entry and an exit to the
maze and let the rest of the work be done by the function @code{MakeMaze}.
Now, only the wall fields are left in the maze. By iterating over the these
fields, we generate one line of VRML code for each wall field.

@smallexample
@c file eg/network/maze.awk
function HandleGET() @{
  if (MENU[2] == "AboutServer") @{
    Document  = "If your browser has a VRML 2 plugin,\
      this server shows you a simple VRML scene."
  @} else if (MENU[2] == "VRMLtest") @{
    XSIZE = YSIZE = 11              # initially, everything is wall
    for (y = 0; y < YSIZE; y++)
       for (x = 0; x < XSIZE; x++)
          Maze[x, y] = "#"
    delete Maze[0, 1]              # entry is not wall
    delete Maze[XSIZE-1, YSIZE-2]  # exit  is not wall
    MakeMaze(1, 1)
    Document = "\
#VRML V2.0 utf8\n\
Group @{\n\
  children [\n\
    PointLight @{\n\
      ambientIntensity 0.2\n\
      color 0.7 0.7 0.7\n\
      location 0.0 8.0 10.0\n\
    @}\n\
    DEF B1 Background @{\n\
      skyColor [0 0 0, 1.0 1.0 1.0 ]\n\
      skyAngle 1.6\n\
      groundColor [1 1 1, 0.8 0.8 0.8, 0.2 0.2 0.2 ]\n\
      groundAngle [ 1.2 1.57 ]\n\
    @}\n\
    DEF Wall Shape @{\n\
      geometry Box @{size 1 1 1@}\n\
      appearance Appearance @{ material Material @{ diffuseColor 0 0 1 @} @}\n\
    @}\n\
    DEF Entry Viewpoint @{\n\
      position 0.5 1.0 5.0\n\
      orientation 0.0 0.0 -1.0 0.52\n\
    @}\n"
    for (i in Maze) @{
      split(i, t, SUBSEP)
      Document = Document "    Transform @{ translation "
      Document = Document t[1] " 0 -" t[2] " children USE Wall @}\n"
    @}
    Document = Document "  ] # end of group for world\n@}"
    Reason = "OK" ORS "Content-type: model/vrml"
    Header = Footer = ""
  @}
@}
@c endfile
@end smallexample

Finally, we have a look at @code{MakeMaze}, the function that generates
the @code{Maze} array. When entered, this function assumes that the array
has been initialized so that each element represents a wall element and
the maze is initially full of wall elements. Only the entrance and the exit
of the maze should have been left free. The parameters of the function tell
us which element must be marked as not being a wall. After this, we take
a look at the four neighbouring elements and remember which we have already
treated. Of all the neighbouring elements, we take one at random and
walk in that direction. Therefore, the wall element in that direction has
to be removed and then, we call the function recursively for that element.
The maze is only completed if we iterate the above procedure for
@emph{all} neighbouring elements (in random order) and for our present
element by recursively calling the function for the present element. This
last iteration could have been done in a loop,
but it is done much simpler recursively.

Notice that elements with coordinates that are both odd are assumed to be
on our way through the maze and the generating process cannot terminate
as long as there is such an element not being @code{delete}d. All other
elements are potentially part of the wall.

@smallexample
@c file eg/network/maze.awk
function MakeMaze(x, y) @{
  delete Maze[x, y]     # here we are, we have no wall here
  p = 0                 # count unvisited fields in all directions
  if (x-2 SUBSEP y   in Maze) d[p++] = "-x"
  if (x   SUBSEP y-2 in Maze) d[p++] = "-y"
  if (x+2 SUBSEP y   in Maze) d[p++] = "+x"
  if (x   SUBSEP y+2 in Maze) d[p++] = "+y"
  if (p>0) @{            # if there are univisited fields, go there
    p = int(p*rand())   # choose one unvisited field at random
    if        (d[p] == "-x") @{ delete Maze[x - 1, y]; MakeMaze(x - 2, y)
    @} else if (d[p] == "-y") @{ delete Maze[x, y - 1]; MakeMaze(x, y - 2)
    @} else if (d[p] == "+x") @{ delete Maze[x + 1, y]; MakeMaze(x + 2, y)
    @} else if (d[p] == "+y") @{ delete Maze[x, y + 1]; MakeMaze(x, y + 2)
    @}                   # we are back from recursion
    MakeMaze(x, y);     # try again while there are unvisited fields
  @}
@}
@c endfile
@end smallexample

@node MOBAGWHO, STOXPRED, MAZE, Some Applications and Techniques
@section MOBAGWHO: a Simple Mobile Agent
@cindex MOBAGWHO program
@cindex agent
@quotation
@cindex Hoare, C.A.R.
@i{There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies.} @*
C. A. R. Hoare
@end quotation

A @dfn{mobile agent} is a program that can be dispatched from a computer and
transported to a remote server for execution. This is called @dfn{migration},
which means that a process on another system is started that is independent
from its originator. Ideally, it wanders through
a network while working for its creator or owner. In places like
the UMBC Agent Web,
people are quite confident that (mobile) agents are a software engineering
paradigm that enables us to significantly increase the efficiency
of our work. Mobile agents could become the mediators between users and
the networking world. For an unbiased view at this technology,
see the remarkable paper @cite{Mobile Agents: Are they a good
idea?}.@footnote{@uref{http://www.research.ibm.com/massive/mobag.ps}}

@ignore
@c Chuck says to take all of this out.
@cindex Tcl/Tk
A good instance of this paradigm is
@cite{Agent Tcl},@footnote{@uref{http://agent.cs.dartmouth.edu/software/agent2.0/}}
an extension of the Tcl language. After introducing a typical
development environment, the aforementioned paper shows a nice little
example application that we will try to rebuild in @command{gawk}. The
@command{who} agent takes a list of servers and wanders from one server
to the next one, always looking to see who is logged in.
Having reached the last
one, it sends back a message with a list of all users it found on each
machine.

But before implementing something that might or might not be a mobile
agent, let us clarify the concept and some important terms. The agent
paradigm in general is such a young scientific discipline that it has
not yet developed a widely-accepted terminology. Some authors try to
give precise definitions, but their scope is often not wide enough
to be generally accepted. Franklin and Graesser ask
@cite{Is it an Agent or just a Program: A Taxonomy for Autonomous
Agents}@footnote{@uref{http://www.msci.memphis.edu/~franklin/AgentProg.html}}
and give even better answers than Caglayan and Harrison in their
@cite{Agent Sourcebook}.@footnote{@uref{http://www.aminda.com/mazzu/sourcebook/}}

@itemize @minus
@item
@i{An autonomous agent is a system situated within and a part of
an environment that senses that environment and acts on it, over time, in
pursuit of its own agenda and so as to effect what it senses in the future.}
(Quoted from Franklin and Graesser.)
@item
A mobile agent is able to transport itself from one machine to another.
@item
The term @dfn{migration} often denotes this process of moving.
But neither of the two sources above even mentions this term, while others
use it regularly.
@end itemize

Before delving into the (rather demanding) details of
implementation, let us give just one more quotation as a final
motivation. Steven Farley published an excellent paper called
@cite{Mobile Agent System Architecture},@footnote{This often
cited text originally appeared as a conference paper here:
@uref{http://www.sigs.com/publications/docs/java/9705/farley.html}
Many bibliographies on the Internet point to this dead link. Meanwhile,
the paper appeared as a contribution to a book called More Java Gems here:
@uref{http://uk.cambridge.org/computerscience/object/catalogue/0521774772/default.htm}}
in which he asks ``Why use an agent architecture?''

@quotation
If client-server systems are the currently established norm and distributed
object systems such as CORBA are defining the future standards, why bother
with agents? Agent architectures have certain advantages over these other
types. Three of the most important advantages are:
@cindex CORBA

@enumerate
@item
An agent performs much processing at the server where local bandwidth
is high, thus reducing the amount of network bandwidth consumed and increasing
overall performance. In contrast, a CORBA client object with the equivalent
functionality of a given agent must make repeated remote method calls to
the server object because CORBA objects cannot move across the network
at runtime.

@item
An agent operates independently of the application from which the
agent was invoked. The agent operates asynchronously, meaning that the
client application does not need to wait for the results. This is especially
important for mobile users who are not always connected to the network.

@item
The use of agents allows for the injection of new functionality into
a system at run time. An agent system essentially contains its own automatic
software distribution mechanism. Since CORBA has no built-in support for
mobile code, new functionality generally has to be installed manually.

@end enumerate

Of course a non-agent system can exhibit these same features with some
work. But the mobile code paradigm supports the transfer of executable
code to a remote location for asynchronous execution from the start. An
agent architecture should be considered for systems where the above features
are primary requirements.
@end quotation
@end ignore

When trying to migrate a process from one system to another,
a server process is needed on the receiving side. Depending on the kind
of server process, several ways of implementation come to mind.
How the process is implemented depends upon the kind of server process:

@itemize @bullet
@item
HTTP can be used as the protocol for delivery of the migrating
process. In this case, we use a common web
server as the receiving server process. A universal CGI script
mediates between migrating process and web server.
Each server willing to accept migrating agents makes this universal
service available. HTTP supplies the @code{POST} method to transfer
some data to a file on the web server. When a CGI script is called
remotely with the @code{POST} method instead of the usual @code{GET} method,
data is transmitted from the client process to the standard input
of the server's CGI script. So, to implement a mobile agent,
we must not only write the agent program to start on the client
side, but also the CGI script to receive the agent on the server side.

@cindex CGI (Common Gateway Interface)
@cindex apache
@item
The @code{PUT} method can also be used for migration. HTTP does not
require a CGI script for migration via @code{PUT}. However, with common web
servers there is no advantage to this solution, because web servers such as
Apache
require explicit activation of a special @code{PUT} script.

@item
@cite{Agent Tcl} pursues a different course; it relies on a dedicated server
process with a dedicated protocol specialized for receiving mobile agents.
@end itemize

Our agent example abuses a common web server as a migration tool. So, it needs a
universal CGI script on the receiving side (the web server). The receiving script is
activated with a @code{POST} request when placed into a location like
@file{/httpd/cgi-bin/PostAgent.sh}. Make sure that the server system uses a
version of @command{gawk} that supports network access (Version 3.1 or later;
verify with @samp{gawk --version}).

@example
@c file eg/network/PostAgent.sh
#!/bin/sh
MobAg=/tmp/MobileAgent.$$
# direct script to mobile agent file
cat > $MobAg
# execute agent concurrently
gawk -f $MobAg $MobAg > /dev/null &
# HTTP header, terminator and body
gawk 'BEGIN @{ print "\r\nAgent started" @}'
rm $MobAg      # delete script file of agent
@c endfile
@end example

By making its process id (@code{$$}) part of the unique @value{FN}, the
script avoids conflicts between concurrent instances of the script.
First, all lines
from standard input (the mobile agent's source code) are copied into
this unique file. Then, the agent is started as a concurrent process
and a short message reporting this fact is sent to the submitting client.
Finally, the script file of the mobile agent is removed because it is
no longer needed. Although it is a short script, there are several noteworthy
points:

@table @asis
@item Security
@emph{There is none}. In fact, the CGI script should never
be made available on a server that is part of the Internet because everyone
would be allowed to execute arbitrary commands with it. This behavior is
acceptable only when performing rapid prototyping.

@item Self-Reference
Each migrating instance of an agent is started
in a way that enables it to read its own source code from standard input
and use the code for subsequent
migrations. This is necessary because it needs to treat the agent's code
as data to transmit. @command{gawk} is not the ideal language for such
a job. Lisp and Tcl are more suitable because they do not make a distinction
between program code and data.

@item Independence
After migration, the agent is not linked to its
former home in any way. By reporting @samp{Agent started}, it waves
``Goodbye'' to its origin. The originator may choose to terminate or not.
@end table

@cindex Lisp
The originating agent itself is started just like any other command-line
script, and reports the results on standard output.  By letting the name
of the original host migrate with the agent, the agent that migrates
to a host far away from its origin can report the result back home.
Having arrived at the end of the journey, the agent establishes
a connection and reports the results.  This is the reason for
determining the name of the host with @samp{uname -n} and storing it
in @code{MyOrigin} for later use.  We may also set variables with the
@option{-v} option from the command line. This interactivity is only
of importance in the context of starting a mobile agent; therefore this
@code{BEGIN} pattern and its action do not take part in migration:

@smallexample
@c file eg/network/mobag.awk
BEGIN @{
  if (ARGC != 2) @{
    print "MOBAG - a simple mobile agent"
    print "CALL:\n    gawk -f mobag.awk mobag.awk"
    print "IN:\n    the name of this script as a command-line parameter"
    print "PARAM:\n    -v MyOrigin=myhost.com"
    print "OUT:\n    the result on stdout"
    print "JK 29.03.1998 01.04.1998"
    exit
  @}
  if (MyOrigin == "") @{
     "uname -n" | getline MyOrigin
     close("uname -n")
  @}
@}
@c endfile
@end smallexample

Since @command{gawk} cannot manipulate and transmit parts of the program
directly, the source code is read and stored in strings.
Therefore, the program scans itself for
the beginning and the ending of functions.
Each line in between is appended to the code string until the end of
the function has been reached. A special case is this part of the program
itself. It is not a function.
Placing a similar framework around it causes it to be treated
like a function. Notice that this mechanism works for all the
functions of the source code, but it cannot guarantee that the order
of the functions is preserved during migration:

@smallexample
@c file eg/network/mobag.awk
#ReadMySelf
/^function /                     @{ FUNC = $2 @}
/^END/ || /^#ReadMySelf/         @{ FUNC = $1 @}
FUNC != ""                       @{ MOBFUN[FUNC] = MOBFUN[FUNC] RS $0 @}
(FUNC != "") && (/^@}/ || /^#EndOfMySelf/) \
                                 @{ FUNC = "" @}
#EndOfMySelf
@c endfile
@end smallexample

The web server code in
@ref{Interacting Service, ,A Web Service with Interaction},
was first developed as a site-independent core. Likewise, the
@command{gawk}-based mobile agent
starts with an agent-independent core, to which can be appended
application-dependent functions.  What follows is the only
application-independent function needed for the mobile agent:

@smallexample
@c file eg/network/mobag.awk
function migrate(Destination, MobCode, Label) @{
  MOBVAR["Label"] = Label
  MOBVAR["Destination"] = Destination
  RS = ORS = "\r\n"
  HttpService = "/inet/tcp/0/" Destination
  for (i in MOBFUN)
     MobCode = (MobCode "\n" MOBFUN[i])
  MobCode = MobCode  "\n\nBEGIN @{"
  for (i in MOBVAR)
     MobCode = (MobCode "\n  MOBVAR[\"" i "\"] = \"" MOBVAR[i] "\"")
  MobCode = MobCode "\n@}\n"
  print "POST /cgi-bin/PostAgent.sh HTTP/1.0"  |& HttpService
  print "Content-length:", length(MobCode) ORS |& HttpService
  printf "%s", MobCode                         |& HttpService
  while ((HttpService |& getline) > 0)
     print $0
  close(HttpService)
@}
@c endfile
@end smallexample

The @code{migrate} function prepares the
aforementioned strings containing the program code and transmits them to a
server. A consequence of this modular approach is that the @code{migrate}
function takes some parameters that aren't needed in this application,
but that will be in future ones. Its mandatory parameter @code{Destination} holds the
name (or IP address) of the server that the agent wants as a host for its
code. The optional parameter @code{MobCode} may contain some @command{gawk}
code that is inserted during migration in front of all other code.
The optional parameter @code{Label} may contain
a string that tells the agent what to do in program execution after
arrival at its new home site. One of the serious obstacles in implementing
a framework for mobile agents is that it does not suffice to migrate the
code. It is also necessary to migrate the state of execution of the agent. In
contrast to @cite{Agent Tcl}, this program does not try to migrate the complete set
of variables. The following conventions are used:

@itemize @bullet
@item
Each variable in an agent program is local to the current host and does
@emph{not} migrate.

@item
The array @code{MOBFUN} shown above is an exception. It is handled
by the function @code{migrate} and does migrate with the application.

@item
The other exception is the array @code{MOBVAR}. Each variable that
takes part in migration has to be an element of this array.
@code{migrate} also takes care of this.
@end itemize

Now it's clear what happens to the @code{Label} parameter of the
function @code{migrate}. It is copied into @code{MOBVAR["Label"]} and
travels alongside the other data. Since travelling takes place via HTTP,
records must be separated with @code{"\r\n"} in @code{RS} and
@code{ORS} as usual. The code assembly for migration takes place in
three steps:

@itemize @bullet
@item
Iterate over @code{MOBFUN} to collect all functions verbatim.

@item
Prepare a @code{BEGIN} pattern and put assignments to mobile
variables into the action part.

@item
Transmission itself resembles GETURL: the header with the request
and the @code{Content-length} is followed by the body. In case there is
any reply over the network, it is read completely and echoed to
standard output to avoid irritating the server.
@end itemize

The application-independent framework is now almost complete. What follows
is the @code{END} pattern that is executed  when the mobile agent has
finished reading its own code. First, it checks whether it is already
running on a remote host or not. In case initialization has not yet taken
place, it starts @code{MyInit}. Otherwise (later, on a remote host), it
starts @code{MyJob}:

@smallexample
@c file eg/network/mobag.awk
END @{
  if (ARGC != 2) exit    # stop when called with wrong parameters
  if (MyOrigin != "")    # is this the originating host?
    MyInit()             # if so, initialize the application
  else                   # we are on a host with migrated data
    MyJob()              # so we do our job
@}
@c endfile
@end smallexample

All that's left to extend the framework into a complete application
is to write two application-specific functions: @code{MyInit} and
@code{MyJob}. Keep in mind that the former is executed once on the
originating host, while the latter is executed after each migration:

@smallexample
@c file eg/network/mobag.awk
function MyInit() @{
  MOBVAR["MyOrigin"] = MyOrigin
  MOBVAR["Machines"] = "localhost/80 max/80 moritz/80 castor/80"
  split(MOBVAR["Machines"], Machines)           # which host is the first?
  migrate(Machines[1], "", "")                  # go to the first host
  while (("/inet/tcp/8080/0/0" |& getline) > 0) # wait for result
    print $0                                    # print result
  close("/inet/tcp/8080/0/0")
@}
@c endfile
@end smallexample

As mentioned earlier, this agent takes the name of its origin
(@code{MyOrigin}) with it. Then, it takes the name of its first
destination and goes there for further work. Notice that this name has
the port number of the web server appended to the name of the server,
because the function @code{migrate} needs it this way to create
the @code{HttpService} variable. Finally, it waits for the result to arrive.
The @code{MyJob} function runs on the remote host:

@smallexample
@c file eg/network/mobag.awk
function MyJob() @{
  # forget this host
  sub(MOBVAR["Destination"], "", MOBVAR["Machines"])
  MOBVAR["Result"]=MOBVAR["Result"] SUBSEP SUBSEP MOBVAR["Destination"] ":"
  while (("who" | getline) > 0)               # who is logged in?
    MOBVAR["Result"] = MOBVAR["Result"] SUBSEP $0
  close("who")
  if (index(MOBVAR["Machines"], "/") > 0) @{   # any more machines to visit?
    split(MOBVAR["Machines"], Machines)       # which host is next?
    migrate(Machines[1], "", "")              # go there
  @} else @{                                    # no more machines
    gsub(SUBSEP, "\n", MOBVAR["Result"])      # send result to origin
    print MOBVAR["Result"] |& "/inet/tcp/0/" MOBVAR["MyOrigin"] "/8080"
    close("/inet/tcp/0/" MOBVAR["MyOrigin"] "/8080")
  @}
@}
@c endfile
@end smallexample

After migrating, the first thing to do in @code{MyJob} is to delete
the name of the current host from the list of hosts to visit. Now, it
is time to start the real work by appending the host's name to the
result string, and reading line by line who is logged in on this host.
A very annoying circumstance is the fact that the elements of
@code{MOBVAR} cannot hold the newline character (@code{"\n"}). If they
did, migration of this string did not work because the string didn't
obey the syntax rule for a string in @command{gawk}.
@code{SUBSEP} is used as a temporary replacement.
If the list of hosts to visit holds
at least one more entry, the agent migrates to that place to go on
working there. Otherwise, we replace the @code{SUBSEP}s
with a newline character in the resulting string, and report it to
the originating host, whose name is stored in @code{MOBVAR["MyOrigin"]}.

@node STOXPRED, PROTBASE, MOBAGWHO, Some Applications and Techniques
@section STOXPRED: Stock Market Prediction As A Service
@cindex STOXPRED program
@cindex Yahoo!
@quotation
@i{Far out in the uncharted backwaters of the unfashionable end of
the Western Spiral arm of the Galaxy lies a small unregarded yellow sun.}

@i{Orbiting this at a distance of roughly ninety-two million miles is an
utterly insignificant little blue-green planet whose ape-descendent life
forms are so amazingly primitive that they still think digital watches are
a pretty neat idea.}

@i{This planet has --- or rather had --- a problem, which was this:
most of the people living on it were unhappy for pretty much of the time.
Many solutions were suggested for this problem, but most of these were
largely concerned with the movements of small green pieces of paper,
which is odd because it wasn't the small green pieces of paper that
were unhappy.} @*
Douglas Adams, @cite{The Hitch Hiker's Guide to the Galaxy}
@end quotation

@cindex @command{cron} utility
Valuable services on the Internet are usually @emph{not} implemented
as mobile agents. There are much simpler ways of implementing services.
All Unix systems provide, for example, the @command{cron} service.
Unix system users can write a list of tasks to be done each day, each
week, twice a day, or just once. The list is entered into a file named
@file{crontab}.  For example, to distribute a newsletter on a daily
basis this way, use @command{cron} for calling a script each day early
in the morning.

@example
# run at 8 am on weekdays, distribute the newsletter
0 8 * * 1-5   $HOME/bin/daily.job >> $HOME/log/newsletter 2>&1
@end example

The script first looks for interesting information on the Internet,
assembles it in a nice form and sends the results via email to
the customers.

The following is an example of a primitive
newsletter on stock market prediction. It is a report which first
tries to predict the change of each share in the Dow Jones Industrial
Index for the particular day. Then it mentions some especially
promising shares as well as some shares which look remarkably bad
on that day. The report ends with the usual disclaimer which tells
every child @emph{not} to try this at home and hurt anybody.
@cindex Dow Jones Industrial Index

@smallexample
Good morning Uncle Scrooge,

This is your daily stock market report for Monday, October 16, 2000.
Here are the predictions for today:

        AA      neutral
        GE      up
        JNJ     down
        MSFT    neutral
        @dots{}
        UTX     up
        DD      down
        IBM     up
        MO      down
        WMT     up
        DIS     up
        INTC    up
        MRK     down
        XOM     down
        EK      down
        IP      down

The most promising shares for today are these:

        INTC            http://biz.yahoo.com/n/i/intc.html

The stock shares to avoid today are these:

        EK              http://biz.yahoo.com/n/e/ek.html
        IP              http://biz.yahoo.com/n/i/ip.html
        DD              http://biz.yahoo.com/n/d/dd.html
        @dots{}
@end smallexample

@ignore
@c Chuck suggests removing this paragraph
If you are not into stock market prediction but want to earn money
with a more humane service, you might prefer to send out horoscopes
to your customers. Or, once every refrigerator in every household on this side
of the Chinese Wall is connected to the Internet, such a service could
inspect the contents of your customer's refrigerators each day and
advise them on nutrition. Big Brother is watching them.
@end ignore

The script as a whole is rather long. In order to ease the pain of
studying other people's source code, we have broken the script
up into meaningful parts which are invoked one after the other.
The basic structure of the script is as follows:

@example
@c file eg/network/stoxpred.awk
BEGIN @{
  Init()
  ReadQuotes()
  CleanUp()
  Prediction()
  Report()
  SendMail()
@}
@c endfile
@end example

The earlier parts store data into variables and arrays which are
subsequently used by later parts of the script. The @code{Init} function
first checks if the script is invoked correctly (without any parameters).
If not, it informs the user of the correct usage. What follows are preparations
for the retrieval of the historical quote data. The names of the 30 stock
shares are stored in an array @code{name} along with the current date
in @code{day}, @code{month}, and @code{year}.

All users who are separated
from the Internet by a firewall and have to direct their Internet accesses
to a proxy must supply the name of the proxy to this script with the
@samp{-v Proxy=@var{name}} option. For most users, the default proxy and
port number should suffice.

@example
@c file eg/network/stoxpred.awk
function Init() @{
  if (ARGC != 1) @{
    print "STOXPRED - daily stock share prediction"
    print "IN:\n    no parameters, nothing on stdin"
    print "PARAM:\n    -v Proxy=MyProxy -v ProxyPort=80"
    print "OUT:\n    commented predictions as email"
    print "JK 09.10.2000"
    exit
  @}
  # Remember ticker symbols from Dow Jones Industrial Index
  StockCount = split("AA GE JNJ MSFT AXP GM JPM PG BA HD KO \
    SBC C HON MCD T CAT HWP MMM UTX DD IBM MO WMT DIS INTC \
    MRK XOM EK IP", name);
  # Remember the current date as the end of the time series
  day   = strftime("%d")
  month = strftime("%m")
  year  = strftime("%Y")
  if (Proxy     == "")  Proxy     = "chart.yahoo.com"
  if (ProxyPort ==  0)  ProxyPort = 80
  YahooData = "/inet/tcp/0/" Proxy "/" ProxyPort
@}
@c endfile
@end example

@cindex CSV format
There are two really interesting parts in the script. One is the
function which reads the historical stock quotes from an Internet
server. The other is the one that does the actual prediction. In
the following function we see how the quotes are read from the
Yahoo server. The data which comes from the server is in
CSV format (comma-separated values):

@example
@c file eg/network/stoxdata.txt
Date,Open,High,Low,Close,Volume
9-Oct-00,22.75,22.75,21.375,22.375,7888500
6-Oct-00,23.8125,24.9375,21.5625,22,10701100
5-Oct-00,24.4375,24.625,23.125,23.50,5810300
@c endfile
@end example

Lines contain values of the same time instant, whereas columns are
separated by commas and contain the kind of data that is described
in the header (first) line. At first, @command{gawk} is instructed to
separate columns by commas (@samp{FS = ","}). In the loop that follows,
a connection to the Yahoo server is first opened, then a download takes
place, and finally the connection is closed. All this happens once for
each ticker symbol. In the body of this loop, an Internet address is
built up as a string according to the rules of the Yahoo server. The
starting and ending date are chosen to be exactly the same, but one year
apart in the past. All the action is initiated within the @code{printf}
command which transmits the request for data to the Yahoo server.

In the inner loop, the server's data is first read and then scanned
line by line. Only lines which have six columns and the name of a month
in the first column contain relevant data. This data is stored
in the two-dimensional array @code{quote}; one dimension
being time, the other being the ticker symbol. During retrieval of the
first stock's data, the calendar names of the time instances are stored
in the array @code{day} because we need them later.

@smallexample
@c file eg/network/stoxpred.awk
function ReadQuotes() @{
  # Retrieve historical data for each ticker symbol
  FS = ","
  for (stock = 1; stock <= StockCount; stock++) @{
    URL = "http://chart.yahoo.com/table.csv?s=" name[stock] \
          "&a=" month "&b=" day   "&c=" year-1 \
          "&d=" month "&e=" day   "&f=" year \
          "g=d&q=q&y=0&z=" name[stock] "&x=.csv"
    printf("GET " URL " HTTP/1.0\r\n\r\n") |& YahooData
    while ((YahooData |& getline) > 0) @{
      if (NF == 6 && $1 ~ /Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec/) @{
        if (stock == 1)
          days[++daycount] = $1;
        quote[$1, stock] = $5
      @}
    @}
    close(YahooData)
  @}
  FS = " "
@}
@c endfile
@end smallexample

Now that we @emph{have} the data, it can be checked once again to make sure
that no individual stock is missing or invalid, and that all the stock quotes are
aligned correctly. Furthermore, we renumber the time instances. The
most recent day gets day number 1 and all other days get consecutive
numbers. All quotes are rounded toward the nearest whole number in US Dollars.

@smallexample  
@c file eg/network/stoxpred.awk
function CleanUp() @{
  # clean up time series; eliminate incomplete data sets
  for (d = 1; d <= daycount; d++) @{
    for (stock = 1; stock <= StockCount; stock++)
      if (! ((days[d], stock) in quote))
          stock = StockCount + 10
    if (stock > StockCount + 1)
        continue
    datacount++
    for (stock = 1; stock <= StockCount; stock++)
      data[datacount, stock] = int(0.5 + quote[days[d], stock])
  @}
  delete quote
  delete days
@}
@c endfile
@end smallexample

Now we have arrived at the second really interesting part of the whole affair.
What we present here is a very primitive prediction algorithm:
@emph{If a stock fell yesterday, assume it will also fall today; if
it rose yesterday, assume it will rise today}.  (Feel free to replace this
algorithm with a smarter one.) If a stock changed in the same direction
on two consecutive days, this is an indication which should be highlighted.
Two-day advances are stored in @code{hot} and two-day declines in
@code{avoid}.

The rest of the function is a sanity check. It counts the number of
correct predictions in relation to the total number of predictions
one could have made in the year before.

@smallexample
@c file eg/network/stoxpred.awk
function Prediction() @{
  # Predict each ticker symbol by prolonging yesterday's trend
  for (stock = 1; stock <= StockCount; stock++) @{
    if         (data[1, stock] > data[2, stock]) @{
      predict[stock] = "up"
    @} else if  (data[1, stock] < data[2, stock]) @{ 
      predict[stock] = "down" 
    @} else @{
      predict[stock] = "neutral"
    @}
    if ((data[1, stock] > data[2, stock]) && (data[2, stock] > data[3, stock]))
      hot[stock] = 1
    if ((data[1, stock] < data[2, stock]) && (data[2, stock] < data[3, stock]))
      avoid[stock] = 1  
  @}
  # Do a plausibility check: how many predictions proved correct?
  for (s = 1; s <= StockCount; s++) @{
    for (d = 1; d <= datacount-2; d++) @{
      if         (data[d+1, s] > data[d+2, s]) @{
        UpCount++
      @} else if  (data[d+1, s] < data[d+2, s]) @{
        DownCount++
      @} else @{
        NeutralCount++
      @}   
      if (((data[d, s]  > data[d+1, s]) && (data[d+1, s]  > data[d+2, s])) ||
          ((data[d, s]  < data[d+1, s]) && (data[d+1, s]  < data[d+2, s])) ||
          ((data[d, s] == data[d+1, s]) && (data[d+1, s] == data[d+2, s])))
        CorrectCount++
    @}   
  @}       
@}
@c endfile
@end smallexample

At this point the hard work has been done: the array @code{predict}
contains the predictions for all the ticker symbols. It is up to the
function @code{Report} to find some nice words to introduce the
desired information.

@smallexample
@c file eg/network/stoxpred.awk
function Report() @{
  # Generate report
  report =        "\nThis is your daily "
  report = report "stock market report for "strftime("%A, %B %d, %Y")".\n"
  report = report "Here are the predictions for today:\n\n"
  for (stock = 1; stock <= StockCount; stock++)  
    report = report "\t" name[stock] "\t" predict[stock] "\n"
  for (stock in hot) @{
    if (HotCount++ == 0)
      report = report "\nThe most promising shares for today are these:\n\n"
    report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
      tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
  @}
  for (stock in avoid) @{
    if (AvoidCount++ == 0)
      report = report "\nThe stock shares to avoid today are these:\n\n"
    report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
      tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
  @}   
  report = report "\nThis sums up to " HotCount+0 " winners and " AvoidCount+0
  report = report " losers. When using this kind\nof prediction scheme for"
  report = report " the 12 months which lie behind us,\nwe get " UpCount
  report = report " 'ups' and " DownCount " 'downs' and " NeutralCount
  report = report " 'neutrals'. Of all\nthese " UpCount+DownCount+NeutralCount
  report = report " predictions " CorrectCount " proved correct next day.\n"
  report = report "A success rate of "\
             int(100*CorrectCount/(UpCount+DownCount+NeutralCount)) "%.\n"
  report = report "Random choice would have produced a 33% success rate.\n"
  report = report "Disclaimer: Like every other prediction of the stock\n"
  report = report "market, this report is, of course, complete nonsense.\n"
  report = report "If you are stupid enough to believe these predictions\n"
  report = report "you should visit a doctor who can treat your ailment."
@}     
@c endfile
@end smallexample

The function @code{SendMail} goes through the list of customers and opens
a pipe to the @code{mail} command for each of them. Each one receives an
email message with a proper subject heading and is addressed with his full name.

@smallexample
@c file eg/network/stoxpred.awk
function SendMail() @{ 
  # send report to customers
  customer["uncle.scrooge@@ducktown.gov"] = "Uncle Scrooge"
  customer["more@@utopia.org"           ] = "Sir Thomas More"
  customer["spinoza@@denhaag.nl"        ] = "Baruch de Spinoza"
  customer["marx@@highgate.uk"          ] = "Karl Marx"
  customer["keynes@@the.long.run"       ] = "John Maynard Keynes"
  customer["bierce@@devil.hell.org"     ] = "Ambrose Bierce"
  customer["laplace@@paris.fr"          ] = "Pierre Simon de Laplace"
  for (c in customer) @{
    MailPipe = "mail -s 'Daily Stock Prediction Newsletter'" c
    print "Good morning " customer[c] "," | MailPipe
    print report "\n.\n" | MailPipe
    close(MailPipe)
  @}
@}   
@c endfile
@end smallexample

Be patient when running the script by hand.
Retrieving the data for all the ticker symbols and sending the emails
may take several minutes to complete, depending upon network traffic
and the speed of the available Internet link.
The quality of the prediction algorithm is likely to be disappointing.
Try to find a better one.
Should you find one with a success rate of more than 50%, please tell
us about it! It is only for the sake of curiosity, of course. @code{:-)}

@ignore
@c chuck says to remove this
Let us give you one final indication as to what one can expect from
a prediction of stock data, which is sometimes said to contain much
randomness. One theory says that all relevant information to be taken
into account when estimating the price of a stock is contained in the
stock quotes. Every bit of useful information has influenced the
fair price. Therefore (the theory says) temporary changes (i.e., fluctuations
within a minute) have to be purely random. But what is the cause of
short-term changes in stock prices?

Stock prices are fixed when supply and demand meet each other.
What people are willing to pay reflects human expectations.
Human expectations are not necessarily random. On the Internet,
you can find an elucidating paper about predictability and human
expectations: 
@uref{http://it.ucsd.edu/IT/Newsletter/archives/meir/05meir.html,
@cite{Reflections on ``Universal Prediction of Individual Sequences''}}
The authors (Feder, Merhav, Gutman) introduce the reader to the subject
by telling a thrilling anecdote.
@cindex Shannon, Claude
@quotation
In the early 50's, at Bell Laboratories, David Hagelbarger built a
simple ``mind reading'' machine, whose purpose was to play the ``penny
matching'' game. In this game, a player chooses head or tail, while a
``mind reading'' machine tries to predict and match his choice.
Surprisingly, as Robert Lucky tells in his book ``Silicon Dreams'',
Hagelbarger's simple, 8-state machine, was able to match the ``pennies''
of its human opponent 5,218 times over the course of 9,795 plays.
Random guessing would lead to such a high success rate with a probability
less than one out of 10 billion! Shannon, who was interested in prediction,
information, and thinking machines, closely followed Hagelbarger's
machine, and eventually built his own stripped-down version of the machine,
having the same states, but one that used a simpler strategy at each state.
As the legend goes, in a duel between the two machines, Shannon's machine
won by a slight margin! No one knows if this was due to a superior algorithm
or just a chance happening associated with the specific sequence at that game.
In any event, the success of both these machines against ``untrained'' human
opponents was explained by the fact that the human opponents cannot draw
completely random
bits.
@end quotation 
@end ignore

@node PROTBASE, , STOXPRED, Some Applications and Techniques
@section PROTBASE: Searching Through A Protein Database
@cindex PROTBASE
@cindex NCBI, National Center for Biotechnology Information
@cindex BLAST, Basic Local Alignment Search Tool
@cindex Hoare, C.A.R.
@quotation
@i{Hoare's Law of Large Problems: Inside every large problem is a small
   problem struggling to get out.}
@end quotation

Yahoo's database of stock market data is just one among the many large
databases on the Internet. Another one is located at NCBI
(National Center for Biotechnology
Information). Established in 1988 as a national resource for molecular
biology information, NCBI creates public databases, conducts research
in computational biology, develops software tools for analyzing genome
data, and disseminates biomedical information. In this section, we
look at one of NCBI's public services, which is called BLAST
(Basic Local Alignment Search Tool).

You probably know that the information necessary for reproducing living
cells is encoded in the genetic material of the cells. The genetic material
is a very long chain of four base nucleotides. It is the order of
appearance (the sequence) of nucleotides which contains the information
about the substance to be produced. Scientists in biotechnology often
find a specific fragment, determine the nucleotide sequence, and need
to know where the sequence at hand comes from. This is where the large
databases enter the game. At NCBI, databases store the knowledge
about which sequences have ever been found and where they have been found.
When the scientist sends his sequence to the BLAST service, the server
looks for regions of genetic material in its database which
look the most similar to the delivered nucleotide sequence. After a
search time of some seconds or minutes the server sends an answer to
the scientist. In order to make access simple, NCBI chose to offer
their database service through popular Internet protocols. There are
four basic ways to use the so-called BLAST services:

@itemize @bullet
@item
The easiest way to use BLAST is through the web. Users may simply point
their browsers at the NCBI home page
and link to the BLAST pages.
NCBI provides a stable URL that may be used to perform BLAST searches
without interactive use of a web browser. This is what we will do later
in this section.
A demonstration client
and a @file{README} file demonstrate how to access this URL.

@item
Currently,
@command{blastcl3} is the standard network BLAST client.
You can download @command{blastcl3} from the
anonymous FTP location.

@item
BLAST 2.0 can be run locally as a full executable and can be used to run
BLAST searches against private local databases, or downloaded copies of the
NCBI databases. BLAST 2.0 executables may be found on the NCBI
anonymous FTP server.

@item
The NCBI BLAST Email server is the best option for people without convenient
access to the web. A similarity search can be performed by sending a properly
formatted mail message containing the nucleotide or protein query sequence to
@email{blast@@ncbi.nlm.nih.gov}. The query sequence is compared against the
specified database using the BLAST algorithm and the results are returned in
an email message. For more information on formulating email BLAST searches,
you can send a message consisting of the word ``HELP'' to the same address,
@email{blast@@ncbi.nlm.nih.gov}.
@end itemize

Our starting point is the demonstration client mentioned in the first option.
The @file{README} file that comes along with the client explains the whole
process in a nutshell. In the rest of this section, we first show
what such requests look like. Then we show how to use @command{gawk} to
implement a client in about 10 lines of code. Finally, we show how to
interpret the result returned from the service.

Sequences are expected to be represented in the standard
IUB/IUPAC amino acid and nucleic acid codes,
with these exceptions:  lower-case letters are accepted and are mapped
into upper-case; a single hyphen or dash can be used to represent a gap
of indeterminate length; and in amino acid sequences, @samp{U} and @samp{*}  
are acceptable letters (see below).  Before submitting a request, any numerical
digits in the query sequence should either be removed or replaced by   
appropriate letter codes (e.g., @samp{N} for unknown nucleic acid residue
or @samp{X} for unknown amino acid residue).
The nucleic acid codes supported are: 

@example
A --> adenosine               M --> A C (amino)
C --> cytidine                S --> G C (strong)
G --> guanine                 W --> A T (weak)
T --> thymidine               B --> G T C
U --> uridine                 D --> G A T
R --> G A (purine)            H --> A C T
Y --> T C (pyrimidine)        V --> G C A
K --> G T (keto)              N --> A G C T (any)
                              -  gap of indeterminate length       
@end example

Now you know the alphabet of nucleotide sequences. The last two lines
of the following example query show you such a sequence, which is obviously
made up only of elements of the alphabet just described. Store this example
query into a file named @file{protbase.request}. You are now ready to send
it to the server with the demonstration client.
  
@example
@c file eg/network/protbase.request
PROGRAM blastn
DATALIB month
EXPECT 0.75
BEGIN
>GAWK310 the gawking gene GNU AWK
tgcttggctgaggagccataggacgagagcttcctggtgaagtgtgtttcttgaaatcat
caccaccatggacagcaaa
@c endfile 
@end example

@cindex FASTA/Pearson format
The actual search request begins with the mandatory parameter @samp{PROGRAM}
in the first column followed by the value @samp{blastn} (the name of the
program) for searching nucleic acids.  The next line contains the mandatory
search parameter @samp{DATALIB} with the value @samp{month} for the newest
nucleic acid sequences.  The third line contains an optional @samp{EXPECT}
parameter and the value desired for it. The fourth line contains the
mandatory @samp{BEGIN} directive, followed by the query sequence in
FASTA/Pearson format.
Each line of information must be less than 80 characters in length.

The ``month'' database contains all new or revised sequences released in the 
last 30 days and is useful for searching against new sequences.
There are five different blast programs, @command{blastn} being the one that
compares a nucleotide  query  sequence  against a nucleotide sequence database.

The last server directive that must appear in every request is the
@samp{BEGIN} directive. The query sequence should immediately follow the
@samp{BEGIN} directive and must appear in FASTA/Pearson format.
A sequence in
FASTA/Pearson format begins with a single-line description.
The description line, which is required, is distinguished from the lines of
sequence data that follow it by having a greater-than (@samp{>}) symbol
in the first column.  For the purposes of the BLAST server, the text of
the description is arbitrary.

If you prefer to use a client written in @command{gawk}, just store the following
10 lines of code into a file named @file{protbase.awk} and use this client
instead. Invoke it with @samp{gawk -f protbase.awk protbase.request}.
Then wait a minute and watch the result coming in. In order to replicate
the demonstration client's behaviour as closely as possible, this client
does not use a proxy server. We could also have extended the client program
in @ref{GETURL, ,Retrieving Web Pages}, to implement the client request from
@file{protbase.awk} as a special case.

@smallexample
@c file eg/network/protbase.awk
@{ request = request "\n" $0 @}

END @{
  BLASTService     = "/inet/tcp/0/www.ncbi.nlm.nih.gov/80"
  printf "POST /cgi-bin/BLAST/nph-blast_report HTTP/1.0\n" |& BLASTService
  printf "Content-Length: " length(request) "\n\n"         |& BLASTService
  printf request                                           |& BLASTService      
  while ((BLASTService |& getline) > 0)
      print $0
  close(BLASTService)
@}
@c endfile
@end smallexample

The demonstration client from NCBI is 214 lines long (written in C) and
it is not immediately obvious what it does. Our client is so short that
it @emph{is} obvious what it does. First it loops over all lines of the
query and stores the whole query into a variable. Then the script
establishes an Internet connection to the NCBI server and transmits the
query by framing it with a proper HTTP request. Finally it receives
and prints the complete result coming from the server.

Now, let us look at the result. It begins with an HTTP header, which you
can ignore. Then there are some comments about the query having been
filtered to avoid spuriously high scores. After this, there is a reference
to the paper that describes the software being used for searching the data
base. After a repitition of the original query's description we find the
list of significant alignments:

@smallexample
@c file eg/network/protbase.result
Sequences producing significant alignments:                        (bits)  Value

gb|AC021182.14|AC021182 Homo sapiens chromosome 7 clone RP11-733...    38  0.20
gb|AC021056.12|AC021056 Homo sapiens chromosome 3 clone RP11-115...    38  0.20
emb|AL160278.10|AL160278 Homo sapiens chromosome 9 clone RP11-57...    38  0.20
emb|AL391139.11|AL391139 Homo sapiens chromosome X clone RP11-35...    38  0.20
emb|AL365192.6|AL365192 Homo sapiens chromosome 6 clone RP3-421H...    38  0.20
emb|AL138812.9|AL138812 Homo sapiens chromosome 11 clone RP1-276...    38  0.20
gb|AC073881.3|AC073881 Homo sapiens chromosome 15 clone CTD-2169...    38  0.20
@c endfile
@end smallexample

This means that the query sequence was found in seven human chromosomes.
But the value 0.20 (20%) means that the probability of an accidental match
is rather high (20%) in all cases and should be taken into account.
You may wonder what the first column means. It is a key to the specific
database in which this occurence was found.  The unique sequence identifiers
reported in the search results can be used as sequence retrieval keys
via the NCBI server. The syntax of sequence header lines used by the NCBI
BLAST server depends on the database from which each sequence was obtained.
The table below lists the identifiers for the databases from which the
sequences were derived.
 
@ifinfo
@example
Database Name                     Identifier Syntax
============================      ========================
GenBank                           gb|accession|locus
EMBL Data Library                 emb|accession|locus
DDBJ, DNA Database of Japan       dbj|accession|locus
NBRF PIR                          pir||entry
Protein Research Foundation       prf||name
SWISS-PROT                        sp|accession|entry name
Brookhaven Protein Data Bank      pdb|entry|chain
Kabat's Sequences of Immuno@dots{}    gnl|kabat|identifier
Patents                           pat|country|number 
GenInfo Backbone Id               bbs|number 
@end example
@end ifinfo

@ifnotinfo
@multitable {Kabat's Sequences of Immuno@dots{}} {@code{@w{sp|accession|entry name}}}
@item GenBank @tab @code{gb|accession|locus}
@item EMBL Data Library @tab @code{emb|accession|locus}
@item DDBJ, DNA Database of Japan @tab @code{dbj|accession|locus}
@item NBRF PIR @tab @code{pir||entry}
@item Protein Research Foundation @tab @code{prf||name}
@item SWISS-PROT @tab @code{@w{sp|accession|entry name}}
@item Brookhaven Protein Data Bank @tab @code{pdb|entry|chain}
@item Kabat's Sequences of Immuno@dots{} @tab @code{gnl|kabat|identifier}
@item Patents @tab @code{pat|country|number}
@item GenInfo Backbone Id @tab @code{bbs|number}
@end multitable
@end ifnotinfo

 
For example, an identifier might be @samp{gb|AC021182.14|AC021182}, where the
@samp{gb} tag indicates that the identifier refers to a GenBank sequence,
@samp{AC021182.14} is its GenBank ACCESSION, and @samp{AC021182} is the GenBank LOCUS.
The identifier contains no spaces, so that a space indicates the end of the
identifier.

Let us continue in the result listing. Each of the seven alignments mentioned
above is subsequently described in detail. We will have a closer look at
the first of them.

@smallexample
>gb|AC021182.14|AC021182 Homo sapiens chromosome 7 clone RP11-733N23, WORKING DRAFT SEQUENCE, 4
             unordered pieces
          Length = 176383

 Score = 38.2 bits (19), Expect = 0.20
 Identities = 19/19 (100%)
 Strand = Plus / Plus

Query: 35    tggtgaagtgtgtttcttg 53
             |||||||||||||||||||
Sbjct: 69786 tggtgaagtgtgtttcttg 69804
@end smallexample 

This alignment was located on the human chromosome 7. The fragment on which
part of the query was found had a total length of 176383. Only 19 of the
nucleotides matched and the matching sequence ran from character 35 to 53
in the query sequence and from 69786 to 69804 in the fragment on chromosome 7.
If you are still reading at this point, you are probably interested in finding
out more about Computational Biology and you might appreciate the following
hints.

@cindex Computational Biology
@cindex Bioinformatics
@enumerate
@item
There is a book called @cite{Introduction to Computational Biology}
by Michael S. Waterman, which is worth reading if you are seriously
interested. You can find a good
book review
on the Internet.

@item
While Waterman's book can explain to you the algorithms employed internally
in the database search engines, most practicioners prefer to approach
the subject differently. The applied side of Computational Biology is
called Bioinformatics, and emphasizes the tools available for day-to-day
work as well as how to actually @emph{use} them. One of the very few affordable
books on Bioinformatics is
@cite{Developing Bioinformatics Computer Skills}.

@item
The sequences @emph{gawk} and @emph{gnuawk} are in widespread use in
the genetic material of virtually every earthly living being. Let us
take this as a clear indication that the divine creator has intended
@code{gawk} to prevail over other scripting languages such as @code{perl},
@code{tcl}, or @code{python} which are not even proper sequences. (:-)
@end enumerate

@node Links, GNU Free Documentation License, Some Applications and Techniques, Top
@chapter Related Links

This section lists the URLs for various items discussed in this @value{CHAPTER}.
They are presented in the order in which they appear.

@table @asis

@item @cite{Internet Programming with Python}
@uref{http://www.fsbassociates.com/books/python.htm}

@item @cite{Advanced Perl Programming}
@uref{http://www.oreilly.com/catalog/advperl}

@item @cite{Web Client Programming with Perl}
@uref{http://www.oreilly.com/catalog/webclient}

@item Richard Stevens's home page and book
@uref{http://www.kohala.com/~rstevens}

@item The SPAK home page
@uref{http://www.userfriendly.net/linux/RPM/contrib/libc6/i386/spak-0.6b-1.i386.html}

@item Volume III of @cite{Internetworking with TCP/IP}, by Comer and Stevens
@uref{http://www.cs.purdue.edu/homes/dec/tcpip3s.cont.html}

@item XBM Graphics File Format
@uref{http://www.wotsit.org/download.asp?f=xbm}

@item GNUPlot
@uref{http://www.cs.dartmouth.edu/gnuplot_info.html}

@item Mark Humphrys' Eliza page 
@uref{http://www.compapp.dcu.ie/~humphrys/eliza.html}

@item Yahoo! Eliza Information 
@uref{http://dir.yahoo.com/Recreation/Games/Computer_Games/Internet_Games/Web_Games/Artificial_Intelligence}

@item Java versions of Eliza 
@uref{http://www.tjhsst.edu/Psych/ch1/eliza.html}

@item Java versions of Eliza with source code 
@uref{http://home.adelphia.net/~lifeisgood/eliza/eliza.htm}

@item Eliza Programs with Explanations 
@uref{http://chayden.net/chayden/eliza/Eliza.shtml}

@item Loebner Contest
@uref{http://acm.org/~loebner/loebner-prize.htmlx}

@item Tck/Tk Information 
@uref{http://www.scriptics.com/}

@item Intel 80x86 Processors
@uref{http://developer.intel.com/design/platform/embedpc/what_is.htm}

@item AMD Elan Processors
@uref{http://www.amd.com/products/epd/processors/4.32bitcont/32bitcont/index.html}

@item XINU 
@uref{http://willow.canberra.edu.au/~chrisc/xinu.html }

@item GNU/Linux
@uref{http://uclinux.lineo.com/}

@item Embedded PCs
@uref{http://dir.yahoo.com/Business_and_Economy/Business_to_Business/Computers/Hardware/Embedded_Control/}

@item MiniSQL 
@uref{http://www.hughes.com.au/library/}

@item Market Share Surveys
@uref{http://www.netcraft.com/survey}

@item @cite{Numerical Recipes in C: The Art of Scientific Computing}
@uref{http://www.nr.com}

@item VRML
@uref{http://www.vrml.org}

@item The VRML FAQ
@uref{http://www.vrml.org/technicalinfo/specifications/specifications.htm#FAQ}

@item The UMBC Agent Web 
@uref{http://www.cs.umbc.edu/agents }

@item Apache Web Server
@uref{http://www.apache.org}

@item National Center for Biotechnology Information (NCBI)
@uref{http://www.ncbi.nlm.nih.gov}

@item Basic Local Alignment Search Tool (BLAST)
@uref{http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html}

@item NCBI Home Page
@uref{http://www.ncbi.nlm.nih.gov}

@item BLAST Pages
@uref{http://www.ncbi.nlm.nih.gov/BLAST}

@item BLAST Demonstration Client
@uref{ftp://ncbi.nlm.nih.gov/blast/blasturl/}

@item BLAST anonymous FTP location
@uref{ftp://ncbi.nlm.nih.gov/blast/network/netblast/}

@item BLAST 2.0 Executables
@uref{ftp://ncbi.nlm.nih.gov/blast/executables/}

@item IUB/IUPAC Amino Acid and Nucleic Acid Codes
@uref{http://www.uthscsa.edu/geninfo/blastmail.html#item6}

@item FASTA/Pearson Format
@uref{http://www.ncbi.nlm.nih.gov/BLAST/fasta.html}

@item Fasta/Pearson Sequence in Java
@uref{http://www.kazusa.or.jp/java/codon_table_java/}

@item Book Review of @cite{Introduction to Computational Biology}
@uref{http://www.acm.org/crossroads/xrds5-1/introcb.html}

@item @cite{Developing Bioinformatics Computer Skills}
@uref{http://www.oreilly.com/catalog/bioskills/}

@end table

@node GNU Free Documentation License
@unnumbered GNU Free Documentation License

@cindex FDL (Free Documentation License)
@cindex Free Documentation License (FDL)
@cindex GNU Free Documentation License
@center Version 1.2, November 2002

@display
Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA  02111-1307, USA

Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
@end display

@enumerate 0
@item
PREAMBLE

The purpose of this License is to make a manual, textbook, or other
functional and useful document @dfn{free} in the sense of freedom: to
assure everyone the effective freedom to copy and redistribute it,
with or without modifying it, either commercially or noncommercially.
Secondarily, this License preserves for the author and publisher a way
to get credit for their work, while not being considered responsible
for modifications made by others.

This License is a kind of ``copyleft'', which means that derivative
works of the document must themselves be free in the same sense.  It
complements the GNU General Public License, which is a copyleft
license designed for free software.

We have designed this License in order to use it for manuals for free
software, because free software needs free documentation: a free
program should come with manuals providing the same freedoms that the
software does.  But this License is not limited to software manuals;
it can be used for any textual work, regardless of subject matter or
whether it is published as a printed book.  We recommend this License
principally for works whose purpose is instruction or reference.

@item
APPLICABILITY AND DEFINITIONS

This License applies to any manual or other work, in any medium, that
contains a notice placed by the copyright holder saying it can be
distributed under the terms of this License.  Such a notice grants a
world-wide, royalty-free license, unlimited in duration, to use that
work under the conditions stated herein.  The ``Document'', below,
refers to any such manual or work.  Any member of the public is a
licensee, and is addressed as ``you''.  You accept the license if you
copy, modify or distribute the work in a way requiring permission
under copyright law.

A ``Modified Version'' of the Document means any work containing the
Document or a portion of it, either copied verbatim, or with
modifications and/or translated into another language.

A ``Secondary Section'' is a named appendix or a front-matter section
of the Document that deals exclusively with the relationship of the
publishers or authors of the Document to the Document's overall
subject (or to related matters) and contains nothing that could fall
directly within that overall subject.  (Thus, if the Document is in
part a textbook of mathematics, a Secondary Section may not explain
any mathematics.)  The relationship could be a matter of historical
connection with the subject or with related matters, or of legal,
commercial, philosophical, ethical or political position regarding
them.

The ``Invariant Sections'' are certain Secondary Sections whose titles
are designated, as being those of Invariant Sections, in the notice
that says that the Document is released under this License.  If a
section does not fit the above definition of Secondary then it is not
allowed to be designated as Invariant.  The Document may contain zero
Invariant Sections.  If the Document does not identify any Invariant
Sections then there are none.

The ``Cover Texts'' are certain short passages of text that are listed,
as Front-Cover Texts or Back-Cover Texts, in the notice that says that
the Document is released under this License.  A Front-Cover Text may
be at most 5 words, and a Back-Cover Text may be at most 25 words.

A ``Transparent'' copy of the Document means a machine-readable copy,
represented in a format whose specification is available to the
general public, that is suitable for revising the document
straightforwardly with generic text editors or (for images composed of
pixels) generic paint programs or (for drawings) some widely available
drawing editor, and that is suitable for input to text formatters or
for automatic translation to a variety of formats suitable for input
to text formatters.  A copy made in an otherwise Transparent file
format whose markup, or absence of markup, has been arranged to thwart
or discourage subsequent modification by readers is not Transparent.
An image format is not Transparent if used for any substantial amount
of text.  A copy that is not ``Transparent'' is called ``Opaque''.

Examples of suitable formats for Transparent copies include plain
@sc{ascii} without markup, Texinfo input format, La@TeX{} input
format, @acronym{SGML} or @acronym{XML} using a publicly available
@acronym{DTD}, and standard-conforming simple @acronym{HTML},
PostScript or @acronym{PDF} designed for human modification.  Examples
of transparent image formats include @acronym{PNG}, @acronym{XCF} and
@acronym{JPG}.  Opaque formats include proprietary formats that can be
read and edited only by proprietary word processors, @acronym{SGML} or
@acronym{XML} for which the @acronym{DTD} and/or processing tools are
not generally available, and the machine-generated @acronym{HTML},
PostScript or @acronym{PDF} produced by some word processors for
output purposes only.

The ``Title Page'' means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the material
this License requires to appear in the title page.  For works in
formats which do not have any title page as such, ``Title Page'' means
the text near the most prominent appearance of the work's title,
preceding the beginning of the body of the text.

A section ``Entitled XYZ'' means a named subunit of the Document whose
title either is precisely XYZ or contains XYZ in parentheses following
text that translates XYZ in another language.  (Here XYZ stands for a
specific section name mentioned below, such as ``Acknowledgements'',
``Dedications'', ``Endorsements'', or ``History''.)  To ``Preserve the Title''
of such a section when you modify the Document means that it remains a
section ``Entitled XYZ'' according to this definition.

The Document may include Warranty Disclaimers next to the notice which
states that this License applies to the Document.  These Warranty
Disclaimers are considered to be included by reference in this
License, but only as regards disclaiming warranties: any other
implication that these Warranty Disclaimers may have is void and has
no effect on the meaning of this License.

@item
VERBATIM COPYING

You may copy and distribute the Document in any medium, either
commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License applies
to the Document are reproduced in all copies, and that you add no other
conditions whatsoever to those of this License.  You may not use
technical measures to obstruct or control the reading or further
copying of the copies you make or distribute.  However, you may accept
compensation in exchange for copies.  If you distribute a large enough
number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and
you may publicly display copies.

@item
COPYING IN QUANTITY

If you publish printed copies (or copies in media that commonly have
printed covers) of the Document, numbering more than 100, and the
Document's license notice requires Cover Texts, you must enclose the
copies in covers that carry, clearly and legibly, all these Cover
Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
the back cover.  Both covers must also clearly and legibly identify
you as the publisher of these copies.  The front cover must present
the full title with all words of the title equally prominent and
visible.  You may add other material on the covers in addition.
Copying with changes limited to the covers, as long as they preserve
the title of the Document and satisfy these conditions, can be treated
as verbatim copying in other respects.

If the required texts for either cover are too voluminous to fit
legibly, you should put the first ones listed (as many as fit
reasonably) on the actual cover, and continue the rest onto adjacent
pages.

If you publish or distribute Opaque copies of the Document numbering
more than 100, you must either include a machine-readable Transparent
copy along with each Opaque copy, or state in or with each Opaque copy
a computer-network location from which the general network-using
public has access to download using public-standard network protocols
a complete Transparent copy of the Document, free of added material.
If you use the latter option, you must take reasonably prudent steps,
when you begin distribution of Opaque copies in quantity, to ensure
that this Transparent copy will remain thus accessible at the stated
location until at least one year after the last time you distribute an
Opaque copy (directly or through your agents or retailers) of that
edition to the public.

It is requested, but not required, that you contact the authors of the
Document well before redistributing any large number of copies, to give
them a chance to provide you with an updated version of the Document.

@item
MODIFICATIONS

You may copy and distribute a Modified Version of the Document under
the conditions of sections 2 and 3 above, provided that you release
the Modified Version under precisely this License, with the Modified
Version filling the role of the Document, thus licensing distribution
and modification of the Modified Version to whoever possesses a copy
of it.  In addition, you must do these things in the Modified Version:

@enumerate A
@item
Use in the Title Page (and on the covers, if any) a title distinct
from that of the Document, and from those of previous versions
(which should, if there were any, be listed in the History section
of the Document).  You may use the same title as a previous version
if the original publisher of that version gives permission.

@item
List on the Title Page, as authors, one or more persons or entities
responsible for authorship of the modifications in the Modified
Version, together with at least five of the principal authors of the
Document (all of its principal authors, if it has fewer than five),
unless they release you from this requirement.

@item
State on the Title page the name of the publisher of the
Modified Version, as the publisher.

@item
Preserve all the copyright notices of the Document.

@item
Add an appropriate copyright notice for your modifications
adjacent to the other copyright notices.

@item
Include, immediately after the copyright notices, a license notice
giving the public permission to use the Modified Version under the
terms of this License, in the form shown in the Addendum below.

@item
Preserve in that license notice the full lists of Invariant Sections
and required Cover Texts given in the Document's license notice.

@item
Include an unaltered copy of this License.

@item
Preserve the section Entitled ``History'', Preserve its Title, and add
to it an item stating at least the title, year, new authors, and
publisher of the Modified Version as given on the Title Page.  If
there is no section Entitled ``History'' in the Document, create one
stating the title, year, authors, and publisher of the Document as
given on its Title Page, then add an item describing the Modified
Version as stated in the previous sentence.

@item
Preserve the network location, if any, given in the Document for
public access to a Transparent copy of the Document, and likewise
the network locations given in the Document for previous versions
it was based on.  These may be placed in the ``History'' section.
You may omit a network location for a work that was published at
least four years before the Document itself, or if the original
publisher of the version it refers to gives permission.

@item
For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
the Title of the section, and preserve in the section all the
substance and tone of each of the contributor acknowledgements and/or
dedications given therein.

@item
Preserve all the Invariant Sections of the Document,
unaltered in their text and in their titles.  Section numbers
or the equivalent are not considered part of the section titles.

@item
Delete any section Entitled ``Endorsements''.  Such a section
may not be included in the Modified Version.

@item
Do not retitle any existing section to be Entitled ``Endorsements'' or
to conflict in title with any Invariant Section.

@item
Preserve any Warranty Disclaimers.
@end enumerate

If the Modified Version includes new front-matter sections or
appendices that qualify as Secondary Sections and contain no material
copied from the Document, you may at your option designate some or all
of these sections as invariant.  To do this, add their titles to the
list of Invariant Sections in the Modified Version's license notice.
These titles must be distinct from any other section titles.

You may add a section Entitled ``Endorsements'', provided it contains
nothing but endorsements of your Modified Version by various
parties---for example, statements of peer review or that the text has
been approved by an organization as the authoritative definition of a
standard.

You may add a passage of up to five words as a Front-Cover Text, and a
passage of up to 25 words as a Back-Cover Text, to the end of the list
of Cover Texts in the Modified Version.  Only one passage of
Front-Cover Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity.  If the Document already
includes a cover text for the same cover, previously added by you or
by arrangement made by the same entity you are acting on behalf of,
you may not add another; but you may replace the old one, on explicit
permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License
give permission to use their names for publicity for or to assert or
imply endorsement of any Modified Version.

@item
COMBINING DOCUMENTS

You may combine the Document with other documents released under this
License, under the terms defined in section 4 above for modified
versions, provided that you include in the combination all of the
Invariant Sections of all of the original documents, unmodified, and
list them all as Invariant Sections of your combined work in its
license notice, and that you preserve all their Warranty Disclaimers.

The combined work need only contain one copy of this License, and
multiple identical Invariant Sections may be replaced with a single
copy.  If there are multiple Invariant Sections with the same name but
different contents, make the title of each such section unique by
adding at the end of it, in parentheses, the name of the original
author or publisher of that section if known, or else a unique number.
Make the same adjustment to the section titles in the list of
Invariant Sections in the license notice of the combined work.

In the combination, you must combine any sections Entitled ``History''
in the various original documents, forming one section Entitled
``History''; likewise combine any sections Entitled ``Acknowledgements'',
and any sections Entitled ``Dedications''.  You must delete all
sections Entitled ``Endorsements.''

@item
COLLECTIONS OF DOCUMENTS

You may make a collection consisting of the Document and other documents
released under this License, and replace the individual copies of this
License in the various documents with a single copy that is included in
the collection, provided that you follow the rules of this License for
verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute
it individually under this License, provided you insert a copy of this
License into the extracted document, and follow this License in all
other respects regarding verbatim copying of that document.

@item
AGGREGATION WITH INDEPENDENT WORKS

A compilation of the Document or its derivatives with other separate
and independent documents or works, in or on a volume of a storage or
distribution medium, is called an ``aggregate'' if the copyright
resulting from the compilation is not used to limit the legal rights
of the compilation's users beyond what the individual works permit.
When the Document is included an aggregate, this License does not
apply to the other works in the aggregate which are not themselves
derivative works of the Document.

If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one half of
the entire aggregate, the Document's Cover Texts may be placed on
covers that bracket the Document within the aggregate, or the
electronic equivalent of covers if the Document is in electronic form.
Otherwise they must appear on printed covers that bracket the whole
aggregate.

@item
TRANSLATION

Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section 4.
Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the
original versions of these Invariant Sections.  You may include a
translation of this License, and all the license notices in the
Document, and any Warrany Disclaimers, provided that you also include
the original English version of this License and the original versions
of those notices and disclaimers.  In case of a disagreement between
the translation and the original version of this License or a notice
or disclaimer, the original version will prevail.

If a section in the Document is Entitled ``Acknowledgements'',
``Dedications'', or ``History'', the requirement (section 4) to Preserve
its Title (section 1) will typically require changing the actual
title.

@item
TERMINATION

You may not copy, modify, sublicense, or distribute the Document except
as expressly provided for under this License.  Any other attempt to
copy, modify, sublicense or distribute the Document is void, and will
automatically terminate your rights under this License.  However,
parties who have received copies, or rights, from you under this
License will not have their licenses terminated so long as such
parties remain in full compliance.

@item
FUTURE REVISIONS OF THIS LICENSE

The Free Software Foundation may publish new, revised versions
of the GNU Free Documentation License from time to time.  Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.  See
@uref{http://www.gnu.org/copyleft/}.

Each version of the License is given a distinguishing version number.
If the Document specifies that a particular numbered version of this
License ``or any later version'' applies to it, you have the option of
following the terms and conditions either of that specified version or
of any later version that has been published (not as a draft) by the
Free Software Foundation.  If the Document does not specify a version
number of this License, you may choose any version ever published (not
as a draft) by the Free Software Foundation.
@end enumerate

@c fakenode --- for prepinfo
@unnumberedsec ADDENDUM: How to use this License for your documents

To use this License in a document you have written, include a copy of
the License in the document and put the following copyright and
license notices just after the title page:

@smallexample
@group
  Copyright (C)  @var{year}  @var{your name}.
  Permission is granted to copy, distribute and/or modify this document
  under the terms of the GNU Free Documentation License, Version 1.2
  or any later version published by the Free Software Foundation;
  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
  A copy of the license is included in the section entitled ``GNU
  Free Documentation License''.
@end group
@end smallexample

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
replace the ``with...Texts.'' line with this:

@smallexample
@group
    with the Invariant Sections being @var{list their titles}, with
    the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
    being @var{list}.
@end group
@end smallexample

If you have Invariant Sections without Cover Texts, or some other
combination of the three, merge those two alternatives to suit the
situation.

If your document contains nontrivial examples of program code, we
recommend releasing these examples in parallel under your choice of
free software license, such as the GNU General Public License,
to permit their use in free software.

@c Local Variables:
@c ispell-local-pdict: "ispell-dict"
@c End:


@node Index, , GNU Free Documentation License, Top
@comment node-name,    next,  previous,      up

@unnumbered Index
@printindex cp
@bye

Conventions:
1. Functions, built-in or otherwise, do NOT have () after them.
2. Gawk built-in vars and functions are in @code.  Also program vars and
   functions.
3. HTTP method names are in @code.
4. Protocols such as echo, ftp, etc are in @samp.
5. URLs are in @url.
6. All RFC's in the index.  Put a space between `RFC' and the number.