AFS (and X11) broken when logging in via SSH

Starting around the last week of June, we see several support requests along the lines of "no AFS token after SSH" or "no X11 auth forwarding after SSH". There are several overlapping and possible related symptoms. Standard "user experience" is along the lines of

/usr/bin/X11/xauth:  timeout in locking authority file /afs/cern.ch/user/f/foo/.Xauthority
hepix: E: /usr/bin/fs returned error, no tokens?
Unfortunately X11 forwarding also appears to be the very first write access to AFS, so several causes lead to similar symptoms.

In all cases, the user needs to supply ssh -vv ... output - diagnosing things without this makes no sense at all.

This is possibly related to upgrading the last AFS KDC to Heimdal-1.

Different scenarios / symptoms:

corrupted .Xauthority-c

ALERT! Update: This issue is believed to have been fixed, with no new cases since mid-July, and no persistent .Xauthority-c files to be found...

Several users managed somehow to get a permanent .Xauthority-c file in their home directory, this appears due to some AFS corruption (the file can be "listed" but not stat'ed or removed or recreated).

Bernard says that the "salvage" message links this file somehow to a file under ~/.gconf (where short-lived lock files are created that have already in the past occasionally screwed AFS).

The only servers apparently affected are afs22 and afs91 (both on 1.4.4). The similarly-sized/-used afs36 runs 1.4.6 and hasn't had an issue yet. [FIXME - to be confirmed?]

"/usr/afs/bin/salvager -part /vicepad -vol 1933776333 -showlog -nowrite" gives something like:

@(#) OpenAFS 1.4.4 built  2007-03-27 4294967295 0
07/04/2008 12:01:51 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -part /vicepad -vol 1933776333 -showlog)
07/04/2008 12:01:51 2 nVolumesInInodeFile 64
07/04/2008 12:01:51 CHECKING CLONED VOLUME 1933780659.
07/04/2008 12:01:51 user.abenelli.backup (1933780659) updated 07/03/2008 15:54
07/04/2008 12:01:51 SALVAGING VOLUME 1933776333.
07/04/2008 12:01:51 user.abenelli (1933776333) updated 07/04/2008 12:01
...
07/04/2008 12:01:51 totalInodes 5367
07/04/2008 12:01:51 dir vnode 1: invalid entry: ./.Xauthority-c (vnode 5402, unique 851942)
07/04/2008 12:01:51 dir vnode 1: ./.Xauthority-c (vnode 5402): unique changed from 851942 to 0 -- deleted
07/04/2008 12:01:51 Found 23 orphaned files and directories (approx. 6356 KB)
07/04/2008 12:01:51 Salvaged user.abenelli (1933776333): 4871 files, 397709 blocks

ALERT! *Warning*: if two files claim the same vnode, the salvager will destroy the content of one of them randomly!

ALERT! NOT a Workaround: afs_admin salvage $HOME will make this file "normal" again, after which it can be removed, after which X11 forwarding works again (but maybe some data is lost, see the warning above).

Current suspicion: AFS fileserver bug; possibly related to locking; possibly fixed in 1.4.6. FIXED.

SSH-1, Kerberos-5 TGT only

ssh -1 with only a Kerberos5 TGT (i.e no Krb4 TGT or AFS token) on the sender side will not get AFS tokens on the destination.

ssh(7727) debug1: Trying Kerberos v5 authentication.
ssh(7727) debug3: Trying to reverse map address 137.138.4.22.
ssh(7727) debug1: Kerberos v5 authentication accepted.
ssh(7727) debug1: Kerberos v5 TGT forwarded (foo@CERN.CH).
ssh(7727) debug1: Requesting compression at level 6.
If the server happens to be running in debug mode, we also get (on the client):
   user_pty(11384) debug3: Cannot get AFS token via Krb5/MIT

This issue is understood, the Kerberos ticket file name (via KRB5CCNAME) is not transferred to PAM from the "unpriviledged" ssh process that received the forwarded ticket, pam_krb5afs then says (in debug mode)

Jul  4 11:00:44 lxcert-amd64 sshd[11383]: pam_krb5[11383]: no v5 creds for user 'foo', skipping session setup
Jul  4 11:00:44 lxcert-amd64 sshd[11383]: pam_krb5[11383]: pam_open_session returning 0 (Success)

The CERN sshd is capable of receiving forwarded AFS tokens, and of converting forwarded Krb4 TGTs into AFS tokens, but not of doing this for Krb5 (not part of MIT library or krbafs, and the daemon is not linked with either Heimdal or the minikafs library).

Workaround/Solutions:

  • use ssh -2 ..., the Kerberos5 TGT is transferred on a different code path (GSSAPI) that actually end up in the right place on the receiver, or
  • ensure that the sender has a valid Kerberos4 TGT and/or AFS token (and that these also get passed over the SSH connection - but this will happen automatically if the Kerberos5 TGT is being transferred).

"temporary AFS token" gets dropped - FIXED.

ALERT! FIXED

Symptom is dmesg output like

afs: Tokens for user of AFS id 1234 for cell cern.ch are discarded (rxkad error=19270407)
which means (translate_et)
19270407 (rxk).7 = security object was passed a bad ticket
In other words, the client kernel module (= SSH server) loaded the token, tried to access something on AFS, then got told by the AFS server that the token is useless and decided to remove it again.

Seems to mostly affect "non-CERN" client machines (SL4, SL5) [FIXME - true?] This is the most troublesome kind of ticket, it appears that this is really a recent change in behaviour - and the Linux ssh/sshd haven't been updatd for some time.

we seem to have two variants:

for SSH sessions

The AFS token has been created from a forwarded KRB5 TGT via pam_krb5afs in these cases ([FIXME]- true for all cases?)

Invoking "GetToken" and "SetToken" (same machine or other machine) on the soon-to-be-evicted token gives a working token.

Invoking afs5log -5, aklog, afs5log on the forwaded Krb5 TGT yields a working token.

for "native" klog

This has also been reported by offsite users invoking klog directly on their non-CERNified machines:
  • "SL5, kernel 2.6.18-92.1.6.el5 and openafs 1.4.7 (openafs-1.4.7-68.SL5.i686)"
  • "64bit linux box, with a 2.6.23.9 kernel, using openafs version 1.4.7."

ALERT! Update: FIXED, was due to a bug in Heimdal padding tickets to 48bytes.

for "native" kinit

One instance seen (CT560274). Error message in /var/log/messages is something like:
Oct 29 13:47:23 HOST kernel: afs: Tokens for user of AFS id USERID for cell cern.ch are discarded (rxkad error=19270410)
translate_et says
19270410 (rxk).10 = sealed data inconsistent
Looks like the AFS token is usable for a short time, but then gets thrown away by the kernel.

no AFS token after Public key authentication

this is old, has never worked, is well documented e.g. on Q&A but some recent calls appear to fall into this category. Surprises people who usually use Kerberos authentication/GSSAPI but have a "working" pubkey setup that kicks in whenever their credentials are expired, or who normally use pubkey with AFS token forwarding (SSH-1 speciality) and have no valid AFS token at that moment.

Symptoms:

ssh(7846) debug1: Trying RSA authentication with key '/home/foo/.ssh/identity'
ssh(7846) debug1: Received RSA challenge from server.
ssh(7846) debug1: Remote: RSA authentication accepted.
ssh(7846) debug1: RSA authentication accepted by server.
or
ssh(7874) debug1: Authentication succeeded (publickey).

Solutions:

  • don't use Pubkey auth, or
  • don't expect write access to your AFS directory, and don't expect X11 to work (use ssh -x)

bad ~/.ssh/rc prevents X11 forwarding

(one case so far, AFS access actually works in this case)

In case the user has a ~/.ssh/rc file, normal X11 credential forwarding is broken unless that script is prepared to handle the X11 cookie itself.

Symptoms:

ssh(7878) debug2: x11_get_proto: /usr/bin/X11/xauth  list :0.0 2>/dev/null
ssh(7878) debug1: Requesting X11 forwarding with authentication spoofing.
...
ssh(7878) debug2: X11 auth data does not match fake data.
ssh(7878) X11 connection rejected because of wrong authentication.
ssh(7878) debug2: X11 rejected 1 i0/o0
Since the file needs to be accessible by sshd, it is likely to be at least listable (if not readable/executable) for unauthenticated AFS users - easy to check for.

Solution:

  • get rid of ~/.ssh/rc for a test
  • if really required: read X11 cookie from STDIN, pass via /usr/bin/X11/xauth add "$DISPLAY" $cookie if $DIAPY is set.. and make ure the user knowns that we normally don't support such bricolage.

Misc

there appears to be a small difference in the AFS token format, as stored in the kernel and obtainable via GetToken - in some cases this says Unix UID 1234, in others AFS ID 1234. The tokens command actually expects these formats and translated the first into User's (AFS ID 1234) tokens for afs@cernNOSPAMPLEASE.ch [Expires ..] and the second to Tokens for afs@cernNOSPAMPLEASE.ch [Expires Jul  5 ..]. Repeated ssh logins into a machine from the same KRB5 TGT may get one or the other, apparently at random...

Origin is Openafs src/auth/ktc.c:648, ktc_GetToken()

    500     struct ClearToken ct;
         (some fuzzing with copying over into ct, to be looked at)
    636                 if (ct.AuthHandle == -1) {
    637                     ct.AuthHandle = 999;
    638                 }
    639                 atoken->kvno = ct.AuthHandle;
  
    648                     if ((atoken->kvno == 999) ||        /* old style bcr    648 ypt ticket */
    649                         (ct.BeginTimestamp &&   /* new w/ prserver looku    649 p */
    650                          (((ct.EndTimestamp - ct.BeginTimestamp) & 1) ==    650  1))) {
    651                         sprintf(aclient->name, "AFS ID %d", ct.ViceId);
    652                     } else {
    653                         sprintf(aclient->name, "Unix UID %d", ct.ViceId)    653 ;

and that in turn apparently takes the "AuthHandle" from something provided by the client.

On that subject, http://osdir.com/ml/file-systems.openafs.general/2003-06/msg00290.html says

The short answer is "it doesn't mean a thing".
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2008-10-29 - JanIven
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LinuxSupport All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback