Wednesday, November 01, 2017

Old scripting

The landscape

I'm helping someone with a scripting problem on an old system and an old shell. How old? Try IBM AIX 6.1, first released in 2007, and ksh93 "e" released in ... 1993. At least the AIX is a version of 6.1 from 2014! (Kudos to IBM for treating long-term support seriously.)

A second point to ponder. The goal is to improve remote scripting—running scripts on a remote machine. In this environment, ssh exists but is not used. The remote execution tool chosen is rexec, considered one of the most dangerous tools possible. But my remit is not to address the insecurity, just to improve the scripting. (They know this is a bad, and are actively working to eventually resolve.)

So, given these constraints, what problem am I solving?

Example problem

This environment makes extensive use of remotely executed scripts to wire together a distributed, locally-hosted system. Current scripts duplicate the same approach, each implemented as a one-off: Copy a script to a remote machine with rcp; use rexec to invoke the script, capturing the output to a file on the remote host; copy the captured file back to the local host; process the output file; sometimes clean up the remote host afterwards.

Some gotchas to watch out for with ksh93e or rexec:

  • Function tracing - Using the standard xtrace setting to trace script execution in ksh93 has problems with tracing functions, and requires using old-style function syntax
  • Variable scope - To keep variables local to a function in ksh93, you must use the new-style function syntax (note the conflict with tracing)
  • Exit broken with trap - When calling exit to quit a remote script, trap does not get a correct $? variable (it is always 0, as exit succeeded in returning a non-0 exit status). Instead one must "set" $? with the code of a failing command, and then leave with a plain call to exit
  • No pipefail - Release "e" of ksh93 just does not know anything about set -o pipefail, and there is no uninstrusive workaround. This now common feature showed up in release "g"
  • No exit code - Would you believe rexec does not itself exit with the exit code of the remote command, never has, and never will? It always exits 0 if the remote command could be started.
  • Buffered stderr - Empirically, rexec (at least the version with this AIX) buffers the stderr stream of remote commands, and only flushes when rexec exits, so the sense of ordering between stdout, stderr and the command-line prompt is even worse than usual (the actual handling is unspecified)

This problem and environment triggers a memory: The last time I worked on AIX was in 1994, and it was almost the same problem! I really thought I had escaped those days.

A solution

So I refactored. I couldn't change the use of rexec—this environment is not ready for SSH key management—, I couldn't replace KSH93 with BASH or replace AIX with Linux, but I could do something about the imperfect duplication and random detritus files.

The solution

Note the need to call a fail function instead of exit directly because of poor interaction with trap.

Assuming some help, such as a global progname variable (which could simply be $0), and avoiding remote temporary files:

_transfer_exit_code() {
    while read line
    do
        case $line in
            ^[0-9] | ^[1-9][0-9] | ^11[0-9] | ^12[0-7] ) return ${line#^} ;;
            * ) printf '%s\n' "$line" ;;
        esac
    done
    return 1  # ksh93e lacks pipefail; we get here when 'rscript' failed
}

rscript() {
    case $# in
        0 | 1 )
            echo "$progname: BUG: Usage: rexec SCRIPT-NAME HOSTNAME [ARGS]..." >&2 ;;
        * ) script_name=$1 ; shift
            hostname=$1 ; shift ;;
    esac
    # Trace callers script if we ourselves are being traced
    case $- in
        *x* ) _set_x='set -x' ;;
    esac

    rexec $hostname /usr/bin/ksh93 -s "$@" <<EOS | _transfer_exit_code
set - "$@"  # Only reasonable way to pass through function arguments

# Work around AIX ksh93 return code of exit ignored by trap
fail() {
    return \$1
}

# Our hook to capture the exit code for rexec who dumbly swallows it
trap 'rc=\$?; echo ^\$rc; exit \$rc' EXIT

PS4='+$script_name:\$(( LINENO - 14 )) (\$SECONDS) '
$_set_x

# The callers script
$(cat)
EOS
}

Example use

#!/usr/bin/ksh93

progname=${0##*/}

PS4='+$progname:$LINENO ($SECONDS) '

usage() {
    echo "Usage: $0 [-d] HOSTNAME"
}

. rexec.ksh

debug=false
while getopts :d opt
do
    case $opt in
        d ) debug=true ;;
        * ) usage >&2 ; exit 2 ;;
    esac
done
shift $(( OPTIND - 1 ))

case $# in
    1 ) hostname=$1 ;;
    * ) usage >&2 ; exit 2 ;;
esac

$debug && set -x

script_name=My-Remote-Script

tmp=${TMPDIR-/tmp}/$progname.$RANDOM
trap 'rm -f $tmp' EXIT

rscript $script_name $hostname Katy <<'EOS' >$tmp
echo $#: $1
fail 3
EOS

case $? in
    3 ) ;;
    * ) echo "$0: Did not pass through exit code" >&2 ; exit 1 ;;
esac

case "$(<$tmp)" in
    '1: Katy' ) ;;
    * ) echo "$0: Did not pass through arguments" >&2 ; exit 1 ;;
esac

Source

The code is in GitHub.