Issue

Both in SLC4 and SLC5, getaddrinfo() in glibc sorts multiple IP addresses returned by DNS according to the algorithm defined in RFC3484. The implementation is explained by Ulrich Drepper.

The impact at CERN is that while DNS round-robin load-balancing nicely distributes the returned addresses by rotating them, glibc sorts the returned addresses according to longest matching prefix to the client address before they are returned to the application by getaddrinfo(). This is a Good Thing™ in a IPv6 environment but it violates the least amount of surprise principle in IPv4 systems and applications will prefer one address more than the others depending on the source address.

Impact: applications will not connect() uniformly to the addresses behind a DNS alias, but skewed.

Currently this affects all applications on SLC4/SLC5 that use getaddrinfo() to resolve DNS names and it cannot be switched off. The only known workaround is that the application ignores the order of the returned addresses and randomly picks one instead of always the first one relying on the rotation by the DNS server.

This issue does not affect applications using gethostbyname().

Example

lxplus257:~% for i in $( seq 100 ); do host lxplus.cern.ch | sed -n '/address/{p;q}'; done | sort | uniq -c | sort -n
     12 lx64slc4.cern.ch has address 137.138.141.148
     12 lx64slc4.cern.ch has address 137.138.141.149
     12 lx64slc4.cern.ch has address 137.138.141.156
     12 lx64slc4.cern.ch has address 137.138.4.19
     13 lx64slc4.cern.ch has address 137.138.5.220
     13 lx64slc4.cern.ch has address 137.138.5.223
     13 lx64slc4.cern.ch has address 137.138.5.233
     13 lx64slc4.cern.ch has address 137.138.5.234
lxplus257:~% for i in $( seq 100 ); do ssh -q lxplus.cern.ch hostname ; done | sort | uniq -c
     100 lxplus254.cern.ch
lxplus257:~%

IOW, while the DNS nicely rotates the returned addresses, ssh will always go to lxplus254 (137.138.141.156) because this address has the longest matching prefix of the source address of lxplus257 (137.138.141.158).

-- PeterKelemen - 11 Mar 2009

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2009-03-13 - PeterKelemen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LinuxSupport All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback