summaryrefslogtreecommitdiff
path: root/lib/thread.c
AgeCommit message (Collapse)Author
2009-07-19[bgpd] Stability fixes including bugs 397, 492Chris Caputo
I've spent the last several weeks working on stability fixes to bgpd. These patches fix all of the numerous crashes, assertion failures, memory leaks and memory stomping I could find. Valgrind was used extensively. Added new function bgp_exit() to help catch problems. If "debug bgp" is configured and bgpd exits with status of 0, statistics on remaining lib/memory.c allocations are printed to stderr. It is my hope that other developers will use this to stay on top of memory issues. Example questionable exit: bgpd: memstats: Current memory utilization in module LIB: bgpd: memstats: Link List : 6 bgpd: memstats: Link Node : 5 bgpd: memstats: Hash : 8 bgpd: memstats: Hash Bucket : 2 bgpd: memstats: Hash Index : 8 bgpd: memstats: Work queue : 3 bgpd: memstats: Work queue item : 2 bgpd: memstats: Work queue name string : 3 bgpd: memstats: Current memory utilization in module BGP: bgpd: memstats: BGP instance : 1 bgpd: memstats: BGP peer : 1 bgpd: memstats: BGP peer hostname : 1 bgpd: memstats: BGP attribute : 1 bgpd: memstats: BGP extra attributes : 1 bgpd: memstats: BGP aspath : 1 bgpd: memstats: BGP aspath str : 1 bgpd: memstats: BGP table : 24 bgpd: memstats: BGP node : 1 bgpd: memstats: BGP route : 1 bgpd: memstats: BGP synchronise : 8 bgpd: memstats: BGP Process queue : 1 bgpd: memstats: BGP node clear queue : 1 bgpd: memstats: NOTE: If configuration exists, utilization may be expected. Example clean exit: bgpd: memstats: No remaining tracked memory utilization. This patch fixes bug #397: "Invalid free in bgp_announce_check()". This patch fixes bug #492: "SIGBUS in bgpd/bgp_route.c: bgp_clear_route_node()". My apologies for not separating out these changes into individual patches. The complexity of doing so boggled what is left of my brain. I hope this is all still useful to the community. This code has been production tested, in non-route-server-client mode, on a linux 32-bit box and a 64-bit box. Release/reset functions, used by bgp_exit(), added to: bgpd/bgp_attr.c,h bgpd/bgp_community.c,h bgpd/bgp_dump.c,h bgpd/bgp_ecommunity.c,h bgpd/bgp_filter.c,h bgpd/bgp_nexthop.c,h bgpd/bgp_route.c,h lib/routemap.c,h File by file analysis: * bgpd/bgp_aspath.c: Prevent re-use of ashash after it is released. * bgpd/bgp_attr.c: #if removed uncalled cluster_dup(). * bgpd/bgp_clist.c,h: Allow community_list_terminate() to be called from bgp_exit(). * bgpd/bgp_filter.c: Fix aslist->name use without allocation check, and also fix memory leak. * bgpd/bgp_main.c: Created bgp_exit() exit routine. This function frees allocations made as part of bgpd initialization and, to some extent, configuration. If "debug bgp" is configured, memory stats are printed as described above. * bgpd/bgp_nexthop.c: zclient_new() already allocates stream for ibuf/obuf, so bgp_scan_init() shouldn't do it too. Also, made it so zlookup is global so bgp_exit() can use it. * bgpd/bgp_packet.c: bgp_capability_msg_parse() call to bgp_clear_route() adjusted to use new BGP_CLEAR_ROUTE_NORMAL flag. * bgpd/bgp_route.h: Correct reference counter "lock" to be signed. bgp_clear_route() now accepts a bgp_clear_route_type of either BGP_CLEAR_ROUTE_NORMAL or BGP_CLEAR_ROUTE_MY_RSCLIENT. * bgpd/bgp_route.c: - bgp_process_rsclient(): attr was being zero'ed and then bgp_attr_extra_free() was being called with it, even though it was never filled with valid data. - bgp_process_rsclient(): Make sure rsclient->group is not NULL before use. - bgp_processq_del(): Add call to bgp_table_unlock(). - bgp_process(): Add call to bgp_table_lock(). - bgp_update_rsclient(): memset clearing of new_attr not needed since declarationw with "= { 0 }" does it. memset was already commented out. - bgp_update_rsclient(): Fix screwed up misleading indentation. - bgp_withdraw_rsclient(): Fix screwed up misleading indentation. - bgp_clear_route_node(): Support BGP_CLEAR_ROUTE_MY_RSCLIENT. - bgp_clear_node_queue_del(): Add call to bgp_table_unlock() and also free struct bgp_clear_node_queue used for work item. - bgp_clear_node_complete(): Do peer_unlock() after BGP_EVENT_ADD() in case peer is released by peer_unlock() call. - bgp_clear_route_table(): Support BGP_CLEAR_ROUTE_MY_RSCLIENT. Use struct bgp_clear_node_queue to supply data to worker. Add call to bgp_table_lock(). - bgp_clear_route(): Add support for BGP_CLEAR_ROUTE_NORMAL or BGP_CLEAR_ROUTE_MY_RSCLIENT. - bgp_clear_route_all(): Use BGP_CLEAR_ROUTE_NORMAL. Bug 397 fixes: - bgp_default_originate() - bgp_announce_table() * bgpd/bgp_table.h: - struct bgp_table: Added reference count. Changed type of owner to be "struct peer *" rather than "void *". - struct bgp_node: Correct reference counter "lock" to be signed. * bgpd/bgp_table.c: - Added bgp_table reference counting. - bgp_table_free(): Fixed cleanup code. Call peer_unlock() on owner if set. - bgp_unlock_node(): Added assertion. - bgp_node_get(): Added call to bgp_lock_node() to code path that it was missing from. * bgpd/bgp_vty.c: - peer_rsclient_set_vty(): Call peer_lock() as part of peer assignment to owner. Handle failure gracefully. - peer_rsclient_unset_vty(): Add call to bgp_clear_route() with BGP_CLEAR_ROUTE_MY_RSCLIENT purpose. * bgpd/bgp_zebra.c: Made it so zclient is global so bgp_exit() can use it. * bgpd/bgpd.c: - peer_lock(): Allow to be called when status is "Deleted". - peer_deactivate(): Supply BGP_CLEAR_ROUTE_NORMAL purpose to bgp_clear_route() call. - peer_delete(): Common variable listnode pn. Fix bug in which rsclient was only dealt with if not part of a peer group. Call bgp_clear_route() for rsclient, if appropriate, and do so with BGP_CLEAR_ROUTE_MY_RSCLIENT purpose. - peer_group_get(): Use XSTRDUP() instead of strdup() for conf->host. - peer_group_bind(): Call bgp_clear_route() for rsclient, and do so with BGP_CLEAR_ROUTE_MY_RSCLIENT purpose. - bgp_create(): Use XSTRDUP() instead of strdup() for peer_self->host. - bgp_delete(): Delete peers before groups, rather than after. And then rather than deleting rsclients, verify that there are none at this point. - bgp_unlock(): Add assertion. - bgp_free(): Call bgp_table_finish() rather than doing XFREE() itself. * lib/command.c,h: Compiler warning fixes. Add cmd_terminate(). Fixed massive leak in install_element() in which cmd_make_descvec() was being called more than once for the same cmd->strvec/string/doc. * lib/log.c: Make closezlog() check fp before calling fclose(). * lib/memory.c: Catch when alloc count goes negative by using signed counts. Correct #endif comment. Add log_memstats_stderr(). * lib/memory.h: Add log_memstats_stderr(). * lib/thread.c: thread->funcname was being accessed in thread_call() after it had been freed. Rearranged things so that thread_call() frees funcname. Also made it so thread_master_free() cleans up cpu_record. * lib/vty.c,h: Use global command_cr. Add vty_terminate(). * lib/zclient.c,h: Re-enable zclient_free().
2009-06-30[lib/cleanup] Use a typedef for the thread typePaul Jakma
* lib/thread.{c,h}: As per subject. This will avoid head-scratching for next person who adds a thread-type and gets strange breakage.
2008-11-29[lib] Fix timer precision.Joakim Tjernlund
Whenever a thread adds an timer funcname_thread_add_timer_timeval() gets called to add the timer. Before adding the timer a quagga_gettimeofday() call is made to do some time house keeping. However quagga_gettimeofday() only updates recent_time, not relative_time that is used to calculate the alarm_time. Replace with quagga_get_relative (NULL)
2008-08-22[lib] hash compare function arguments ought to be const qualifiedStephen Hemminger
2008-08-14 Stephen Hemminger <stephen.hemminger@vyatta.com> * lib/hash.h: (struct hash) Hash comparator callback really ought to treat storage behind arguments as constant - a compare function with side-effects would be evil. * */*.c: Adjust comparator functions similarly, thus fixing at least a few compiler warnings about const qualifier being dropped. Signed-off-by: Paul Jakma <paul@quagga.net>
2006-08-27[lib] Bug #134: threads should be more robust against backward time jumpsPaul Jakma
2006-08-25 Paul Jakma <paul.jakma@sun.com> * thread.c: (general) Add support for monotonic clock, it may still jump forward by huge amounts, but should be immune to going backwards. Fixes bug #134. (quagga_gettimeofday_relative_adjust) helper, does what name says - adjusts gettimeofday based relative timer. (quagga_gettimeofday) helper to keep recent_time up to date. (quagga_get_relative) helper, update and getch the relative timer using gettimeofday(). POSIX CLOCK_MONOTONIC is also supported, but the code is not enabled yet nor tested. (quagga_real_stabilised) helper, retrieve absolute time but stabilised so as to never decrease. (quagga_gettime) Exported interface, analogous to POSIX clock_gettime() in interface, supporting several clocks. (quagga_time) Exported interface, analogous to traditional time(), will never decrease. (recent_relative_time) Convenience function to retrieve relative_time timeval, similar to existing recent_time absolute timeval, for when an approximately recent value will do. (remainder) Update to use above helpers. (thread_getrusage) Previously was a macro, but needs to be a function to twiddle with thread.c private stuff. * thread.c: Point the GETRUSAGE macro at previous function. Export quagga_gettime, quagga_time and recent_relative_time for general use.
2006-07-25[lib] Optimise thread_call by caching pointer to thread history in the threadPaul Jakma
2006-07-25 Paul Jakma <paul.jakma@sun.com> * thread.h: (struct thread) Add a cache pointer to the struct cpu_thread_history, if it is known - saving hash lookup on each thread_call. * thread.c: (thread_call) Cache the pointer to the cpu_thread_history, so that future thread_calls of same thread can avoid the hash_lookup.
2005-05-192005-05-19 Paul Jakma <paul@dishone.st>paul
* thread.c: (thread_cancel_event) the number of pending events cancelled is potentially useful information, dont throw it away, pass it back to the caller.
2005-05-062005-05-06 Paul Jakma <paul@dishone.st>paul
* (general) extern and static'ification of functions in code and header. Cleanup any definitions with unspecified arguments. Add casts for callback assignments where the callback is defined, typically, as passing void *, but the function being assigned has some other pointer type defined as its argument, as gcc complains about casts from void * to X* via function arguments. Fix some old K&R style function argument definitions. Add noreturn gcc attribute to some functions, as appropriate. Add unused gcc attribute to some functions (eg ones meant to help while debugging) Add guard defines to headers which were missing them. * command.c: (install_node) add const qualifier, still doesnt shut up the warning though, because of the double pointer. (cmp_node) ditto * keychain.c: (key_str2time) Add GET_LONG_RANGE() macro, derived fromn vty.h ones to fix some of the (long) < 0 warnings. * thread.c: (various) use thread_empty (cpu_record_hash_key) should cast to uintptr_t, a stdint.h type * vty.h: Add VTY_GET_IPV4_ADDRESS and VTY_GET_IPV4_PREFIX so they removed from ospfd/ospf_vty.h * zebra.h: Move definition of ZEBRA_PORT to here, to remove dependence of lib on zebra/zserv.h
2005-04-282005-04-27 Andrew J. Schorr <ajschorr@alumni.princeton.edu>ajs
Add wall-clock timing statistics to 'show thread cpu' output. * thread.h: Define struct rusage_t to contain wall-clock time and cpu time. Change GETRUSAGE macro to collect both pieces of data. Make appropriate changes to struct cpu_thread_history to track CPU time and real time. Change proto for thread_consumed_time to return real and cpu time elapsed. And declare a new global variable 'struct timeval recent_time'. * thread.c (struct timeval recent_time): New global timestamp variable. (timeval_adjust): If timeout is negative, set to 0 (not 10 microseconds). And remove upper bound of 1,000,000 seconds, since this does not seem to make any sense (and it breaks funcname_thread_add_timer_timeval). (timeval_cmp): Should return long, not int. (vty_out_cpu_thread_history): Show CPU time and real time. (cpu_record_hash_print): Calculate totals for CPU and real time. (cpu_record_print): Change 'show thread cpu' title to show CPU and real time. (thread_timer_remain_second): Put current time in global recent_time. (funcname_thread_add_timer_timeval): Fix assert. Replace 2-case switch assignment with a ternary expression. Use global recent_time variable. Fix use of timeval_adjust (previously, the value was not actually being adjusted). (thread_cancel): Add missing "break" statement in case THREAD_BACKGROUND. (thread_timer_wait): Use global recent_time value instead of calling gettimeofday. And there's no need to check for negative timeouts, since timeval_subtract already sets these to zero. (thread_timer_process): Timers are sorted, so bail out once we encounter a timer that has not yet popped. And remove some extraneous asserts. (thread_fetch): Do not process foreground timers before calling select. Instead, add them to the ready list just after the select. Also, no need to maintain a count of the number of ready threads, since we don't care how many there are, just whether there's one at the head of the ready list (which is easily checked). Stick current time in global variable recent_time to reduce the number of calls to gettimeofday. Tighten logic for calculating the select timeout. (thread_consumed_time): Now returns real time and puts the elapsed cpu time in an additional argument. (thread_should_yield): Use real (wall-clock) time to decide whether to yield. (thread_call): Maintain CPU and real time statistics. * vty.c (vty_command): For slow commands, show real and cpu time.
2005-04-252005-04-25 Paul Jakma <paul.jakma@sun.com>paul
* thread.c: Kill unused TIMER_NO_SORT bits
2005-04-222005-04-22 Paul Jakma <paul.jakma@sun.com>paul
* thread.h: Add background thread type and thread_add_background macro and accompanying funcname_... function. export thread_should_yield, background threads can use it. Lower thread yield time to 10ms, 100ms is noticeable lag and a thread would only be /starting/ to finish sometime afterward. * thread.c: (general) Add background thread type and schedule nearly all thread types through the ready list for fairness. (timeval_adjust) static qualifier missing (vty_out_cpu_thread_history) add support for printout of background threads (show_thread_cpu) ditto. (thread_master_debug) add debug of background list (thread_master_create) fixup long line (thread_add_unuse) add asserts for required state. (thread_master_free) free background thread list (funcname_thread_add_timer_timeval) make generic, able to support arbitrary timer-like thread types. (funcname_thread_add_timer) pass thread type to .._add_timer_timeval (funcname_thread_add_timer_msec) ditto (funcname_thread_add_background) Add a background thread, with an optional millisecond delay factor, using .._add_timer_timeval. (thread_cancel) Add background thread type. Move the thread_list_delete common to all cases to bottom of function, after the switch statement.. (thread_cancel_event) indent (thread_timer_wait) Static qualifier, and make it able to cope with arbitrary timer-like thread lists, so its of use to background threads too. (thread_process_fd) static qualifier. Again, make it take a list reference rather than thread_master. Fix indentation. (thread_timer_process) Check for ready timer-like threads in the given list and move them on to the ready list - code originally embedded in thread_fetch. (thread_fetch) Schedule all threads, other than events, through the ready list, to ensure fairness. Timer readying code moved to thread_timer_process so it can be reused for background threads. Remove the unneeded quagga_sigevent_process, as pointed out by John Lin <john.ch.lin@gmail.com>. (thread_should_yield) make this available.
2005-04-162005-04-16 Andrew J. Schorr <ajschorr@alumni.princeton.edu>ajs
* configure.ac: Added AC_ARG_ENABLE(time-check). By default, warning messages will now be printed for threads or commands that take longer than 5 seconds, but this configure argument can be used to disable the checks or change the threshold. * thread.h (thread_consumed_time): Declare new function to calculate elapsed microseconds. * thread.c (thread_consumed_time): Must be global not static so we can call it from lib/vty.c:vty_command. (thread_should_yield): Surround with `#if 0' to make clear that this function is not currently being used anywhere. (thread_call): If CONSUMED_TIME_CHECK is defined, print a CPU HOG warning message if the thread takes more than CONSUMED_TIME_CHECK microseconds. * vty.c (vty_command): If CONSUMED_TIME_CHECK is defined, print a CPU HOG warning message if the command takes more than CONSUMED_TIME_CHECK microseconds.
2004-12-282004-12-28 Andrew J. Schorr <ajschorr@alumni.princeton.edu>ajs
* thread.c: (funcname_thread_add_timer_msec) Reduce overflow risk.
2004-11-202004-11-19 Andrew J. Schorr <ajschorr@alumni.princeton.edu>ajs
* global: Replace strerror with safe_strerror. And vtysh/vtysh.c needs to include "log.h" to pick up the declaration.
2004-10-312004-10-31 Paul Jakma <paul@dishone.st>paul
* thread.c: Use XCALLOC and sizeof the type, not the pointer.
2004-10-312004-10-31 Paul Jakma <paul@dishone.st>paul
* memory.h: Add MTYPE_THREAD_FUNCNAME and MTYPE_THREAD_STATS * thread.c: Update stats and funcname alloc/free to use previous specific memory type defines
2004-10-112004-10-11 Paul Jakma <paul@dishone.st>paul
* thread.c: (funcname_thread_add_timer) (funcname_thread_add_timer_msec) Fix mistakes from last change. Pointed out by Liu Xin in [quagga-dev 1609].
2004-10-05Number of warnings is down to 3 again in lib directory. A lot of const'shasso
added to strings and a lot of int -> unsigned int changes.
2004-10-052004-10-05 Paul Jakma <paul@dishone.st>paul
* thread.c: (funcname_thread_add_timer_timeval) new function, add timer at specified timeval. (funcname_thread_add_timer) use funcname_thread_add_timer_timeval. (funcname_thread_add_timer_msec) ditto
2004-07-222004-07-14 Paul Jakma <paul@dishone.st>paul
* sigevent.c: (quagga_signal_handler) add a global caught flag, set the flags to a constant rather increment to be kinder. (quagga_sigevent_process) new function, to do core of what quagga_signal_timer did. dont block signals at all as sig->caught is volatile sig_atomic_t and should be safe to access from signal and normal contexts. The signal blocking is unneeded paranoia, but is left intact under an ifdef, should some platform require it. Check global caught flag before iterating through array. (quagga_signal_timer) nearly everything moved to quagga_sigevent_process. Left in under ifdef, in case some platform could use a regular timer check for signals. * sigevent.h: quagga_sigevent_process declaration. * thread.c: (thread_fetch) check for signals at beginning of scheduler loop, check for signals if select returns EINTR.
2003-12-23Merge isisd into the Quagga's framework:jardin
- add privs support - use misc quagga's definitions - make it compile"able" - fix segfault cases related to hostname() - add debug isis xxx command This patch has been approved by Paul Jakma.
2003-03-27Need to free the defunct funcname if we grab a thread from the unused list.paul
2003-03-12Fix memory leak in 'show thread cpu' command.paul
2003-01-17From havanna_moon@gmx.net Fri Jan 17 23:37:49 2003paul
Date: Sat, 11 Jan 2003 23:26:28 +0100 (CET) From: Yon Uriarte <havanna_moon@gmx.net> To: "the list(tm) Zebra" <zebra@zebra.org> Subject: [zebra 17217] [PATCH] show thread CPU Hi, a little patch from the 'stupid preprocessor tricks' collection to record thread statistics. Usage: "show thread cpu [r][w][t][e][x]" Output Fields: self explaining I hope. Type is one of RWTEX for: Read, Write (fd threads), Timer, Event, Execute. Overhead vs. vanilla zebra: almost nothing. Vanilla CVS zebra already collects thread run times. Caveats: Under linux getrusage has a granularity of 10ms, which is almost useless in this case. Run ./configure, edit config.h and comment out "#define HAVE_RUSAGE", this way it will use getimeofday which has a much better granularity. IMHO this is better, as cooperative threads are effectively running during all that wall time (dont care if CPU utilization was 3% or 99% during the time the thread was running (an effective rusage combined with getimeofday could give that info)). Maybe someone can give tips for other platforms on API granularity. TODO: change some of the calls to thread_add_$KIND to funcname_thread_add_$KIND with a meaningfull funcname, so users will get a better idea of what's going on. F.ex. (AFAIK): ospf_spf_calculate_timer -> "Routes Step 1, areas SPF" ospf_ase_calculate_timer -> "Routes Step 2, externals" Could this be added to the unofficial patch collection? Could someone with BGP keepalive problems run their bgpd with this patch and post the results? TIA, HTH, HAND, regards yon Example output: -------------------------------- ospfd# show thread cpu Runtime(ms) Invoked Avg uSecs Max uSecs Type Thread 14.829 31 478 585 T ospf_ase_calculate_timer 82.132 9838 8 291 EX ospf_nsm_event 0.029 1 29 29 E ospf_default_originate_timer 0.254 9 28 34 T ospf_db_desc_timer 0.026 7 3 11 T ospf_wait_timer 669.015 523 1279 490696 R vty_read 4.415 45 98 173 TE ospf_network_lsa_refresh_timer 15.026 31 484 588 T ospf_spf_calculate_timer 29.478 1593 18 122 E ospf_ls_upd_send_queue_event 0.173 1 173 173 T vty_timeout 4.173 242 17 58 E ospf_ls_ack_send_event 637.767 121223 5 55 T ospf_ls_ack_timer 39.373 244 161 2691 R zclient_read 12.169 98 124 726 EX ospf_ism_event 0.226 2 113 125 R vty_accept 537.776 14256 37 3813 W ospf_write 4.967 41 121 250 T ospf_router_lsa_timer 0.672 1 672 672 E zclient_connect 7.901 1658 4 26 T ospf_ls_req_timer 0.459 2 229 266 E ospf_external_lsa_originate_timer 3.203 60 53 305 T ospf_maxage_lsa_remover 108.341 9772 11 65 T ospf_ls_upd_timer 33.302 525 63 8628 W vty_flush 0.101 1 101 101 T ospf_router_lsa_update_timer 0.016 1 16 16 T ospf_router_id_update_timer 26.970 407 66 176 T ospf_lsa_maxage_walker 381.949 12244 31 69 T ospf_hello_timer 0.114 22 5 14 T ospf_inactivity_timer 34.290 1223 28 310 T ospf_lsa_refresh_walker 470.645 6592 71 665 R ospf_read 3119.791 180693 17 490696 RWTEX TOTAL ospfd# bgpd# sh t c TeX Runtime(ms) Invoked Avg uSecs Max uSecs Type Thread 21.504 476 45 71 T bgp_keepalive_timer 17.784 1157 15 131 T bgp_reuse_timer 29.080 193 150 249 T bgp_scan 23.606 995 23 420 E bgp_event 317.734 28572 11 69 T bgp_routeadv_timer 0.084 1 84 84 E zlookup_connect 0.526 1 526 526 E zclient_connect 1.348 13 103 147 T bgp_start_timer 19.443 142 136 420 T bgp_connect_timer 16.032 772 20 27 T bgp_import 447.141 32322 13 526 TEX TOTAL bgpd# bgpd# show thread cpu rw Runtime(ms) Invoked Avg uSecs Max uSecs Type Thread 155.043 7 22149 150659 R bgp_accept 129.638 180 720 53844 R vty_read 1.734 56 30 129 R zclient_read 0.255 2 127 148 R vty_accept 58.483 983 59 340 R bgp_read 171.495 29190 5 245 W bgp_write 13.884 181 76 2542 W vty_flush 530.532 30599 17 150659 RW TOTAL bgpd# --------------------------------
2002-12-13Initial revisionpaul