|
Subject: [zebra 18677] zebra initialisation bug and patch
Hi All,
I have found a bug in zebra that prevents its routing table and
interface database from being initialised properly. The problem occurs
when a request is made via the netlink socket but the kernel produces a
EWOULDBLOCK/EAGAIN when the result is trying to be retrieved via a
recvmsg(). Zebra does not do anything about this and continues to
function (with an empty routing table and interface list) as if nothing
has happened. With no such information the routing protocol dosn't work!
Two functions are called during the initialisation of Zebra:
interface_lookup_netlink() and netlink_route_read() - obtaining the
interfaces and routing table from the kernel respectively. These are the
only time these functions are called.
These functions, interface_lookup_netlink() and netlink_route_read(),
use netlink_parse_info() to recieve the data from the netlink socket.
The problem is, netlink_parse_info() returns (without error) when its
call to recvmsg() results in an errno EWOULDBLOCK/EAGAIN. This behaviour
is expected by other funtions calling netlink_parse_info() -
netlink_parse_info is simply recalled at a later stage. However, on
initialisation it is never recalled.
Since zebra is expected to nothing else during initialisation it was
easiest to temporarily change the netlink socket to BLOCK and wait
indefinently until the kernel responds with the required information.
Attached is a patch with these changes.
Comments and questions are welcome.
Please inform me if this patch is added to the Zebra source.
--israel
|
|
Date: Sat, 11 Jan 2003 23:26:28 +0100 (CET)
From: Yon Uriarte <havanna_moon@gmx.net>
To: "the list(tm) Zebra" <zebra@zebra.org>
Subject: [zebra 17217] [PATCH] show thread CPU
Hi,
a little patch from the 'stupid preprocessor tricks' collection to record
thread statistics.
Usage: "show thread cpu [r][w][t][e][x]"
Output Fields: self explaining I hope. Type is one of RWTEX for:
Read, Write (fd threads), Timer, Event, Execute.
Overhead vs. vanilla zebra: almost nothing. Vanilla CVS zebra already
collects thread run times.
Caveats: Under linux getrusage has a granularity of 10ms, which is almost
useless in this case. Run ./configure, edit config.h and comment out
"#define HAVE_RUSAGE", this way it will use getimeofday which has a much
better granularity. IMHO this is better, as cooperative threads are
effectively running during all that wall time (dont care if CPU
utilization was 3% or 99% during the time the thread was running (an
effective rusage combined with getimeofday could give that info)).
Maybe someone can give tips for other platforms on API granularity.
TODO: change some of the calls to thread_add_$KIND to
funcname_thread_add_$KIND with a meaningfull funcname, so users will get a
better idea of what's going on.
F.ex. (AFAIK):
ospf_spf_calculate_timer -> "Routes Step 1, areas SPF"
ospf_ase_calculate_timer -> "Routes Step 2, externals"
Could this be added to the unofficial patch collection?
Could someone with BGP keepalive problems run their bgpd with this patch
and post the results?
TIA, HTH, HAND, regards
yon
Example output:
--------------------------------
ospfd# show thread cpu
Runtime(ms) Invoked Avg uSecs Max uSecs Type Thread
14.829 31 478 585 T ospf_ase_calculate_timer
82.132 9838 8 291 EX ospf_nsm_event
0.029 1 29 29 E ospf_default_originate_timer
0.254 9 28 34 T ospf_db_desc_timer
0.026 7 3 11 T ospf_wait_timer
669.015 523 1279 490696 R vty_read
4.415 45 98 173 TE ospf_network_lsa_refresh_timer
15.026 31 484 588 T ospf_spf_calculate_timer
29.478 1593 18 122 E ospf_ls_upd_send_queue_event
0.173 1 173 173 T vty_timeout
4.173 242 17 58 E ospf_ls_ack_send_event
637.767 121223 5 55 T ospf_ls_ack_timer
39.373 244 161 2691 R zclient_read
12.169 98 124 726 EX ospf_ism_event
0.226 2 113 125 R vty_accept
537.776 14256 37 3813 W ospf_write
4.967 41 121 250 T ospf_router_lsa_timer
0.672 1 672 672 E zclient_connect
7.901 1658 4 26 T ospf_ls_req_timer
0.459 2 229 266 E ospf_external_lsa_originate_timer
3.203 60 53 305 T ospf_maxage_lsa_remover
108.341 9772 11 65 T ospf_ls_upd_timer
33.302 525 63 8628 W vty_flush
0.101 1 101 101 T ospf_router_lsa_update_timer
0.016 1 16 16 T ospf_router_id_update_timer
26.970 407 66 176 T ospf_lsa_maxage_walker
381.949 12244 31 69 T ospf_hello_timer
0.114 22 5 14 T ospf_inactivity_timer
34.290 1223 28 310 T ospf_lsa_refresh_walker
470.645 6592 71 665 R ospf_read
3119.791 180693 17 490696 RWTEX TOTAL
ospfd#
bgpd# sh t c TeX
Runtime(ms) Invoked Avg uSecs Max uSecs Type Thread
21.504 476 45 71 T bgp_keepalive_timer
17.784 1157 15 131 T bgp_reuse_timer
29.080 193 150 249 T bgp_scan
23.606 995 23 420 E bgp_event
317.734 28572 11 69 T bgp_routeadv_timer
0.084 1 84 84 E zlookup_connect
0.526 1 526 526 E zclient_connect
1.348 13 103 147 T bgp_start_timer
19.443 142 136 420 T bgp_connect_timer
16.032 772 20 27 T bgp_import
447.141 32322 13 526 TEX TOTAL
bgpd#
bgpd# show thread cpu rw
Runtime(ms) Invoked Avg uSecs Max uSecs Type Thread
155.043 7 22149 150659 R bgp_accept
129.638 180 720 53844 R vty_read
1.734 56 30 129 R zclient_read
0.255 2 127 148 R vty_accept
58.483 983 59 340 R bgp_read
171.495 29190 5 245 W bgp_write
13.884 181 76 2542 W vty_flush
530.532 30599 17 150659 RW TOTAL
bgpd#
--------------------------------
|