summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-03-21bgpd: Implement revised error handling for partial optional/trans. attributesPaul Jakma
* BGP error handling generally boils down to "reset session". This was fine when all BGP speakers pretty much understood all BGP messages. However the increasing deployment of new attribute types has shown this approach to cause problems, in particular where a new attribute type is "tunneled" over some speakers which do not understand it, and then arrives at a speaker which does but considers it malformed (e.g. corruption along the way, or because of early implementation bugs/interop issues). To mitigate this drafts before the IDR (likely to be adopted) propose to treat errors in partial (i.e. not understood by neighbour), optional transitive attributes, when received from eBGP peers, as withdrawing only the NLRIs in the affected UPDATE, rather than causing the entire session to be reset. See: http://tools.ietf.org/html/draft-scudder-idr-optional-transitive * bgp_aspath.c: (assegments_parse) Replace the "NULL means valid, 0-length OR an error" return value with an error code - instead taking pointer to result structure as arg. (aspath_parse) adjust to suit previous change, but here NULL really does mean error in the external interface. * bgp_attr.h (bgp_attr_parse) use an explictly typed and enumerated value to indicate return result. (bgp_attr_unintern_sub) cleans up just the members of an attr, but not the attr itself, for benefit of those who use a stack-local attr. * bgp_attr.c: (bgp_attr_unintern_sub) split out from bgp_attr_unintern (bgp_attr_unintern) as previous. (bgp_attr_malformed) helper function to centralise decisions on how to handle errors in attributes. (bgp_attr_{aspathlimit,origin,etc..}) Use bgp_attr_malformed. (bgp_attr_aspathlimit) Subcode for error specifc to this attr should be BGP_NOTIFY_UPDATE_OPT_ATTR_ERR. (bgp_attr_as4_path) be more rigorous about checks, ala bgp_attr_as_path. (bgp_attr_parse) Adjust to deal with the additional error level that bgp_attr_ parsers can raise, and also similarly return appropriate error back up to (bgp_update_receive). Try to avoid leaking as4_path. * bgp_packet.c: (bgp_update_receive) Adjust to deal with BGP_ATTR_PARSE_WITHDRAW error level from bgp_attr_parse, which should lead to a withdraw, by making the attribute parameter in call to (bgp_nlri_parse) conditional on the error, so the update case morphs also into a withdraw. Use bgp_attr_unintern_sub from above, instead of doing this itself. Fix error case returns which were not calling bgp_attr_unintern_sub and probably leaking memory. * tests/aspath_test.c: Fix to work for null return with bad segments
2011-03-21tools/multiple-bgpd.sh: set some community attributes to help test themPaul Jakma
2011-03-21bgpd: Try fix extcommunity resource allocation probs, particularly with 'set ↵Paul Jakma
extcom..' * Extended communities has some kind of resource allocation problem which causes a double-free if the 'set extcommunity ...' command is used. Try fix by properly interning extcommunities. Also, more generally, make unintern functions take a double pointer so they can NULL out callers references - a usefully defensive programming pattern for functions which make refs invalid. Sadly, this patch doesn't fix the problem entirely - crashes still occur on session clear. * bgp_ecommunity.h: (ecommunity_{free,unintern}) take double pointer args. * bgp_community.h: (community_unintern) ditto * bgp_attr.h: (bgp_attr_intern) ditto * bgp_aspath.h: (bgp_aspath.h) ditto * (general) update all callers of above * bgp_routemap.c: (route_set_ecommunity_{rt,soo}) intern the new extcom added to the attr, and unintern any old one. (route_set_ecommunity_{rt,soo}_compile) intern the extcom to be used for the route-map set. (route_set_ecommunity_*_free) unintern to match, instead of free (route_set_ecommunity_soo) Do as _rt does and don't just leak any pre-existing community, add to it (is additive right though?)
2011-03-21tests: Extend aspath_test.c with cases for invalid segments & attributesPaul Jakma
* aspath_test.c: Add more test cases. In particular ones to cover the last invalid-segment problem. Also add ability to specify aspath attribute headers and test them somewhat. NB: It's obvious this test has not been run for a year by anyone, despite 2 non-trivial commits to bgpd aspath code.
2011-03-21bgpd: Rollback some of the changes made for invalid AS_PATH segment fixPaul Jakma
Some of the changes made in commit cddb8112b80fa9867156c637d63e6e79eeac67bb don't work particularly well for other changes that need to be made to address BGP attribute error handling problems. In particular, returning a pointer from complex attribute data parsing functions will not suffice to express the require range of return status conditions. * bgp_aspath.c: (assegments_parse) Rollback to a more minimal set of changes to fix the original problem. (aspath_parse) Slightly needless pushing around of code, and taking 2 parameters to say whether ot use 2 or 4 byte encoding seems unnecessary. * bgp_attr.c: (bgp_attr_as{,4}path) Rollback, in preparation for BGP attribute error handling update.
2011-03-21bgpd: Remove AS Path limit/TTL functionalityPaul Jakma
* draft-ietf-idr-as-pathlimit doesn't seem to have gone anywhere, and its author does not think it will make progress in IDR. Remove all support introduced for it, but leave stubs for the commands to avoid breaking any configurations. Basically reverts cecab5e9725792e60a5e4b473e238a14cd85815d.
2011-03-21bgpd/security: CVE-2010-1674 Fix crash due to extended-community parser errorPaul Jakma
* bgp_attr.c: (bgp_attr_ext_communities) Certain extended-community attrs can leave attr->flag indicating ext-community is present, even though no extended-community object has been attached to the attr structure. Thus a null-pointer dereference can occur later. (bgp_attr_community) No bug fixed here, but tidy up flow so it has same form as previous. Problem and fix thanks to anonymous reporter.
2011-03-21ospf6d: Extend the "[no] debug ospf6 route" vty commandsTom Goff
* ospf6_route.c ([no_]debug_ospf6_route) Include memory as a debug option. This allows ospf6 route memory debugging to be enabled or disabled interactively or from a config file.
2011-03-21ospf6d: Route locking (memory) cleanupTom Goff
* ospf6_route.c: (ospf6_route_best_next) Allows unlock route, even when there's no next route. This is consistent with how ospf6_route_next() behaves. * ospf6_intra.c: (ospf6_intra_prefix_lsa_remove) Make sure the last route considered is always unlocked. This is needed when the for loop terminates because ospf6_route_is_prefix() returns zero.
2011-03-21ospf6d: Have ospf6d cleanup when it terminates normallyTom Goff
A clean exit makes it easier to use memory debuggers. * ospf6_asbr.c: (ospf6_asbr_terminate) Add a function to do route map cleanup. * ospf6_lsa.c: (ospf6_lsa_terminate) Add a function to cleanup the lsa handler vector. * ospf6_main.c: (ospf6_exit) Add an function that causes ospf6d to gracefully exit. * ospf6_message.c: (ospf6_message_terminate) Add a function that frees the send and receive buffers. * ospf6_top.c: (ospf6_delete) Enable the ospf6_delete() function. Disable ospf6 before freeing everything.
2011-03-21ospf6d: Remove obsolete codeTom Goff
* ospf6_area.c: (ospf6_area_delete) Get rid of unused code that refers to a nonexistent function and structure member.
2011-03-21ospf6d: Fix memory allocation issues in SPFTom Goff
* ospf6_area.c: Call ospf6_spf_table_finish() before deleting the spf table. This ensures that the associated ospf6_vertex structures are also freed. * ospf6_spf.c: Only allocate a priority queue when a spf calculation is actually performed. Also defer calling ospf6_spf_table_finish().
2011-03-21lib: zlog should clean up its memoryTom Goff
* log.c: (closezlog) Also free the dynamically allocated filename when a log is closed.
2011-03-21lib: Add a function to delete all interfacesTom Goff
if.c: (if_terminate) This adds a cleanup function that can be called when a daemon exits, similar to vty_terminate().
2011-03-21bgpd: use Jenkins hash for BGP transit, cluster and attr hashesStephen Hemminger
* bgp_attr.c: I observed while doing some debugging that even for simple tests there was a lot of hash collisions for BGP attributes. Switch to using Jhash rather than additive hashing. Probably overkill, but the function is fast and available. ({attrhash,cluster,transit}_hask_key_make) convert to Jenkins hash, instead of additive hash.
2011-03-21lib: Better hashing of string values using Bernstein hashStephen Hemminger
* hash.{h,c}: (string_hash_make) Hash optimised for strings, current implementation using Bernstein hash, which offers a good compromise between distribution and performance. * distribute.c: (distribute_hash_make) use previous instead of additive string hash. * if_rmap.c: (if_rmap_hash_make) ditto
2011-03-21bgpd: Remove extra lock on interior table nodeBarry Friedman
If the radix tree creates an extra interior node in bgp_node_get(), it locks the interior node even though this node is not returned to the caller, so it may never be unlocked. The lock prevents this node from being deleted. * bgpd/bgp_table.c: (bgp_node_get) Remove lock on interior node which prevents proper node deletion
2011-03-21bgpd: Fix display of unsigned attributesWataru Tanitsu
* bgp_route.c: (route_vty_out*) The local prefix, metric and weight values are all stored as uint32_t. Change the format to %u so that large values are not displayed as negative integers.
2011-03-21bgpd: fix use of free memory by update_rsclientStephen Hemminger
* bgp_route.c: (bgp_static_update_rsclient) BGP sometimes crashes when removing route server client because of use after free. The code to update rsclient created a local static copy of bgp attributes but neglected to handle the extra information pointer. The extra information was getting freed by bgp_attr_unintern() and reused later when the copy was passed to bgp_attr_intern(). The fix is to use the attr_dup function to create a copy of the extra information, then clean it up.
2011-03-21bgpd: unlock node on aggregate errorRobert Bays
* bgp_route.c: (bgp_aggregate_set) make sure to unlock BGP node if failure
2011-03-21bgpd: fix errors in aggregate address commandRobert Bays
* bgpd: (bgp_aggregate_{set,unset,delete}) This fixes locking and other issues with aggregate set/unset command
2011-03-21bgpd: use XCALLOC to allocate bgpd damp arrayStephen Hemminger
* bgpd: (bgp_damp_parameter_set) The BGP reuse_index is not initialized properly. This would cause sporadic crash when disabling dampening. Use XCALLOC correctly and the right size array is initialized and no memset is needed.
2011-03-21bgpd: fix bgp_node locking issuesChris Caputo
* bgpd: Connected table locks were being locked but not unlocked, such that eventually a lock would exceed 2^31 and become negative, thus triggering an assert later on. * bgp_main.c: (bgp_exit) delete connected elements along with ifp's. * bgp_nexthop.c: (bgp_nexthop_lookup{,_ipv6}) add missing unlocks (bgp_multiaccess_check_v4) ditto (bgp_connected_{add,delete}) Use a distinct memtype for bgp_connected_ref. (bgp_scan_finish) reset the nexthop cache to clean it up when bgpd exits * bgp_route.c: fix missing bgp_node unlocks * lib/memtype.c: (memory_list_bgp) add MTYPE_BGP_CONN * testing: has been tested for almost 2 months now.
2011-03-21lib: Fix accounting of memoryChris Hall
* lib/memory.c: (zrealloc) If is called with NULL pointer then it should increment allocations because it behaves the same as zmalloc. (zfree) is called with NULL pointer, it does nothing therefore allocation count should not change.
2011-03-20Merge paul/ospfd/201012-review ospfd and lib/ fixes and performance improvementsPaul Jakma
2011-03-18doc: fix "ipv6 address" interface command syntax (#608)Denis Ovsienko
2011-03-18bgpd: improve "monotonic" uptime correctionJohn Kemp
Older versions of Quagga/Zebra would output a value in MRT table dump files for "uptime" aka "ORIGINATED" that was a WALL clock value. Given that uptime is now internally a bgp_clock MONOTONIC value, the output in the MRT files is showing up as monotonic. Note: time of MRT dump is still recorded correctly as a time() based value, so we haven't lost that value. Proposal is to correct the uptime output on the vty and in the MRT files to again display something more akin to WALL time. * bgp_dump.c: (bgp_dump_routes_func) add conditional correction * bgp_route.c: (route_vty_out_detail) make correction conditional, move variable declaration to beginning of the function
2011-03-13ripngd: copy debug statements fix from ripdStephen Hemminger
Doesn't ripng needs same fix as ripd.
2011-02-24ripd: resolve debug statements issue (bug 442)Andrew J. Schorr
...A nasty bug, if you forgot to disable debugging, stored the config and reboot your machine - if you really depend on ripd, then the machine will not fully come back on the network, because ripd fails.
2011-01-17bgpd: VTY string fixes for debug commandsDavid Ward
* bgpd/bgp_debug.c: fix VTY strings for BGP debug commands to match correct syntax
2011-01-14bgpd: fix handling of "Unsupported Capability"Dmitrij Tejblum
* bgp_packet.c: (bgp_notify_receive) justify the difference between BGP_NOTIFY_OPEN_UNSUP_PARAM and BGP_NOTIFY_OPEN_UNSUP_CAPBL cases, as it is explained in RFC5492, page 3, paragraph 1. "Unsupported Capability" error does not mean, that the peer doesn't support capabilities advertisement -- quite the opposite (if the peer would not support capabilities advertisement, the code would be "Unsupported Optional Parameter"). Thus there is no reason to mark the peer as one non-supporting capabilities advertisement. Example: suppose the peer is in fact IPv6-only, but we didn't configure anything address-family specific for it. Then, the peer would refuse the session with "Unsupported Capability" code. If we internally set the peer as non-supporting capabilities advertisement after that, we will not be able to establish the session with it ever, even with a fixed configuration -- IPv6-only BGP session cannot be established without capabilities. In practice an edge case would be seen as the same IPv6 peer working with its "neighbor" block read from bgpd.conf, but not working, when slowly input in "conf t" mode.
2011-01-13ospf6d: fix crash in SPF calculationDmitrij Tejblum
* ospf6_spf.c: Don't replace a node with another node with a lower number of hops, instead get them from the queue in the correct order. (Actually, the replacement crashed the ospf6d daemon rather than worked.)
2010-12-08ospfd: Remove oi field from LSA, have network_lsa_refresh look up when neededPaul Jakma
* ospf_lsa.h: (struct ospf_lsa) remove oi pointer * ospf_lsa.c: (ospf_network_lsa_refresh) instead of keeping a pointer, just lookup the oi when it's needed. This decouples network LSA from oi lifetime and avoids having to invalidate pointers in LSAs when an oi changes, simplifying the code.
2010-12-08ospfd: potential fix for router-id change assert on refresh cleanup patchPaul Jakma
* ospf_lsa.c: (various) unregister LSAs from refresher before flushing.
2010-12-08ospfd: Fix maxage/flush to not try flood twice, remember maxages for longerPaul Jakma
2006-05-30 Paul Jakma <paul.jakma@sun.com> * (general) Fix confusion around MaxAge-ing and problem with high-latency networks. Analysis and suggested fixes by Phillip Spagnolo, in [quagga-dev 4132], on which this commit expands slightly. * ospf_flood.{c,h}: (ospf_lsa_flush) new function. Scope-general form of existing flush functions, essentially the dormant ospf_maxage_flood() but without the ambiguity of whether it is responsible for flooding. * ospf_lsa.c: (ospf_lsa_maxage) Role minimised to simply setup LSA on the Maxage list and schedule removal - no more. ospf_lsa_flush* being the primary way to kick-off flushes of LSAs. Don't hardcode the remover-timer value, which was too short for very high-latency networks. (ospf_maxage_lsa_remover) Just do what needs to be done to remove maxage LSAs from the maxage list, remove the call to ospf_flood_through(). Don't hardcode remove-timer value. (ospf_lsa_{install,flush_schedule}) ospf_lsa_flush is the correct entrypoint to flushing maxaged LSAs. (lsa_header_set) Use a define for the initial age, useful for testing. * ospf_opaque.c: (ospf_opaque_lsa_refresh) ditto. (ospf_opaque_lsa_flush_schedule) ditto. * ospfd.h: ({struct ospf,ospf_new}) Add maxage_delay parameter, interval to wait before running the maxage_remover. Supply a suitable default. Add a define for OSPF_LSA_INITIAL_AGE, see lsa_header_set().
2010-12-08ospfd: Unify router and network LSA refresh logic with general refresherPaul Jakma
* (general) Get rid of the router and network LSA specific refresh timers and make the general refresher do this instead. Get rid of the twiddling of timers for router/network LSA that was spread across the code. This lays the foundations for future, general LSA refresh improvements, such as making sequence rollover work, and having generic LSA delays. * ospfd.h: (struct ospf) Bye bye to the router-lsa update timer thread pointer. (struct ospf_area) and to the router-lsa refresh timer. * ospf_interface.h: Remove the network_lsa_self timer thread pointer * ospf_lsa.h: (struct ospf_lsa) oi field should always be there, for benefit of type-2/network LSA processing. (ospf_{router,network}_lsa_{update_timer,timer_add}) no timers for these more (ospf_{router,network}_lsa_update) more generic functions to indicate that some router/network LSAs need updating (ospf_router_lsa_update_area) update router lsa in a particular area alone. (ospf_{summary,summary_asbr,network}_lsa_refresh) replaced by the general ospf_lsa_refresh function. (ospf_lsa_refresh) general LSA refresh function
2010-12-08ospfd: Remember network LSA sequence numbers across up/downs of an interfacePaul Jakma
* ospf_interface.h: (struct ospf_if_params) add field for saved network LSA seqnum * ospf_interfa.c: (ospf_new_if_params) init network_lsa_seqnum field to initial seqnum - doesnt matter though. * ospf_lsa.c: (ospf_network_lsa_new) check for any saved sequence number, and use if it exists. Save the result back. This should help avoid needless round of LSUpdate/LSRequests when a neighbour has to tell the originator "uhm, i have something newer than that already". * ospf_vty.c: (show_ip_ospf_interface_sub) Show the saved network LSA seqnum
2010-12-08ospfd: Prioritise hellos for sending by queueing to head of output bufferPaul Jakma
* It's possible for the packet output buffer to be filled up with a long series of non-Hello packets in between Hellos packets, such that the router's neighbours don't receive the Hello packet in time, even though the hello-timer ran at about the right time. Fix this by prioritising Hello packets, letting them skip the queue and go ahead of any packets already on the queue. This problem can occur when there are lots of LSAs and slow links. * ospf_packet.h: (ospf_hello_send_sub) not used outside of ospf_packet.c * ospf_packet.c: (ospf_fifo_push_head) add packet to head of fifo (so its no longer really a fifo, but hey) (ospf_packet_add_top) add packet to top of the packet output queue. (ospf_hello_send_sub) Put Hello's at the top of the packet output queue. make it take in_addr_t parameter, so that this ospf_hello_send can re-use this code too. (ospf_hello_send) consolidate code by using ospf_hello_send_sub (ospf_poll_send,ospf_hello_reply_timer) adjust for ospf_hello_send_sub.
2010-12-08ospfd: Reset neighbour inactivity timer for any packet arrivalPaul Jakma
* The hello protocol monitors connectivity in 2 different ways: a) local -> remote b) remote -> local Connectivity is required in both directions (2-way) for adjacencies to form. The first requires a round-trip to detect, and is done by advertising which other hosts a router knows about in its hello messages. This allows a host to detect which other routers are and are not receiving its message. If a remote neighbour delists the local router, then the local router raises a "1-Way Received" event. The latter is straight-forward, and is detected by setting a timer for the neighbour. If another Hello packet is not received within this time then the neighbour is dead, and a separate "Inactive" event is raised. These are 2 different and relatively independent measures. Knowing that we can optimise the 2nd, remote->local measure and reset the timer when /any/ packet arrives from that neighbour. For any packet is as good as a Hello packet. This can help in marginal situations, where the number of protocol messages that must be sent sometimes can exceed the capacity of the network to transmit the messages within the configured dead-time. I.e. an OSPF network with lots of LSAs, slow links and/or slow hosts (e.g. O(10k) LSAs, O(100kbit) links, embedded CPUs, and O(10s) dead-times). This optimisation allows an OSPF network to run closer to this margin, and/or allows networks to perhaps better cope with rare periods of exceptional load, where otherwise they would not. It's fully compatible with plain OSPF implementations and doesn't prejudice dead-neighbour detection. * ospf_nsm.h: Rename HelloReceived event to PacketReceived. * ospf_nsm.c: (nsm_hello_received) -> nsm_packet_received * ospf_packet.c: Schedule PacketReceived whenever a valid message is received.
2010-12-08ospfd: the maxage_lsa_remover should check whether it needs to yield the cpuPaul Jakma
2010-12-08ospfd: Fix various route_unlock discrepenciesPaul Jakma
* ospf_ase.c: (ospf_ase_calculate_route) Fix compiler warning about eval needing brackets. (various) add defensive asserts. * ospf_lsdb.c: (ospf_lsdb_add) add missing node unlock if same lsa already was indexed. (ospf_lsdb_delete) check it's actually the same as specified lsa before deleting (ospf_lsdb_lookup_by_id_next) fix another corner case - no result => don't go on.
2010-12-08ospfd: fix lsa_refresh_walker unlock before use bugPaul Jakma
* ospf_lsa.c: (ospf_lsa_refresh_walker) fix an "unlock before use" bug (various) add asserts for lsa refcounting.
2010-12-08ospfd: interface code should leave network_lsa_self alonePaul Jakma
* ospf_interface.c: (ospf_if_{new,cleanup}) don't touch the network_lsa_self, ISM and NSM take care of cleaning it up if needs be + we want to keep network_lsa_self around when possible for the the seqnum. This shouldn't really make much difference though, particularly as we have a separate sequence number memory mechanism.
2010-12-08ospfd: OSPF_MIN_LS_ARRIVAL compare should be >= to match ospf_floodPaul Jakma
* ospf_packet.c: (ospf_ls_upd) the corresponding test on the arrival side in (ospf_flood) is <, so this should be >=, not >, purely for consistency. There is no practical effect here though.
2010-12-08ospfd: ospf_if_free can leave dangling references on ISM events - cancel themPaul Jakma
* ospf_interface.c: (ospf_if_free) events with dangling pointers left scheduled can be seriously bad for ospfd's health. Cancel the event.
2010-12-08ospfd: Lower level of some common messages from info to debugPaul Jakma
* ospf_{ism,network}.c: Certain oft-repeated but trivial messages should be debug log level, not info, to avoid spamming 'terminal monitor'
2010-12-08lib: Fix bug in prefix trie lookupPaul Jakma
* lib/table.c: (route_node_match) fix overshoot that was causing this function to go 1 bit too far and thus reading past end of prefix. (route_node_lookup) be defensive - don't assume others will clean up leaves when removing info.
2010-12-08lib: prefix.c nano-optimisationPaul Jakma
* lib/prefix.c: (prefix_match) nano-optimisation, let it return early without copying pointers.
2010-12-08lib: Make workqueue more conservative about ramping upPaul Jakma
* workqueue.c: (work_queue_run) Err more on the side of keeping granularity down, by being more conservative about increasing it. Also, fix mispelling.
2010-12-08lib: Add a command to clear the thread CPU history dataPaul Jakma
* (general) this can be useful when investigating thread latency problems, when you don't want to have to restart a daemon between tests. * thread.c: (cpu_record_(hash_)clear) wipe the stored thread cpu history data, according to the filter, similar to the vty print code. (clear_thread_cpu_cmd) new command to clear data. * thread.h: export new command * command.c: install it