Friday, May 23, 2008

Getting back in the swing of things

Intro
It's been a while since I last blogged. A couple of years, in fact. I've been busy with work stuff and my own life, as happens with pretty much everybody. It has had its ups and its downs, but I suppose I'm in a pretty good place right now.

Building Python Extensions on OpenSolaris
I've got a new job, doing web development stuff. It's not OS development, but it pays the bills (and the people here are pretty cool). Since I've recently been getting back into working on / using OpenSolaris, I thought I'd go ahead and set up an environment a new project of ours at work. We use Python by-and-large (though we do have some PHP projects), and we use Django for most of the things we do. So I was setting up a local server for our latest project, and I needed to install psycopg2.

The install went well enough, but when I ran
python manage.py syncdb
I was greeted with a traceback stating that libpq.so.5 could not be found. I had installed PostgreSQL from the Blastwave package, so initially when building psycopg2, I ran into the problem of it not knowing where my Postgre headers were (I had forgotten to plop /opt/csw/postgresql/bin into my PATH for pg_config). But here, I was stumped.

In FreeBSD, I've had to run ldconfig(8) a few times, but ldconfig doesn't exist on OpenSolaris. I found out that the analogue to this is crle(1), but I was advised not to use it. Instead, it was suggested to rebuild psycopg2 with LDFLAGS containing -R /opt/csw/postgresql/lib. I clobbered the previous build and ran:
LDFLAGS='-R /opt/csw/postgresql/lib' python setup.py build
Everything went smoothly after the installation -- the runtime linker's paths were updated and my syncdb worked like a charm. (Well, after I went into Postgres and created the database, which I hadn't done yet. My brilliance never ceases to astound me.)

OpenSolaris Syscalls
I'm currently bashing my head against a brick wall trying to get cap-eye Install to work for me (I keep trashing my ability to boot the system) while I work on RFE 4616466. I think I have a working syscall for it, but untill I get Install to work and actually get to boot the kernel to run a test program to use the syscall, I have no idea. It was a pretty interesting challenge: from the kernel, how do you go from a local / remote address:port tuple to find what PID is responsible for that TCP connection.

It's kind of a tough answer, actually. Initially, I thought it was pretty straightforward. The tcp_s structure contains a field called tcp_cpid, which is the process that initially created the described TCP session. However, processes may hand off a socket to other processes (plenty of ways to do this), so the PID currently responsible for the socket may not be the one that created it. Indeed, the one that created it need not even be running, which not at all an unlikely possibility. Additionally, multiple processes may manage the socket in question, but I chose to not support this: It would require the caller to know beforehand how many processes use the socket as I would then have to copyout all the PIDs. I could alternatively let the caller specify a maximum number of PIDs they are interested in, but this seems too arbitrary to need to support.

My implementation goes something like this:

  1. First, we get a reference to the current IP netstack by calling netstack_get_current() and dereferencing its nu_ip member.
  2. We get a connection record by looking in the list of connections for the IP stack, using a handy macro that turns the remote address and local/remote port tuple given the IP stack into an index into this array: ipst->ips_ipcl_conn_fanout[IPCL_CONN_HASH(remote_addr, ports, ipst)];
  3. We then loop over all matches looking specifically for TCP connections on the local/remote tuple, matching with:
    IPCL_CONN_MATCH(connp, IPPROTO_TCP, raddr, laddr, ports)
  4. Once we've found a matching connection, we dereference its conn_tcp member to get our tcp_cpid as a hint.
  5. Now the fun part. We loop over all processes in the active zone.
    1. If the process' PID matches the tcp_cpid field, we return that value, as the process still exists and still has control over the socket (and this saves us from needing to do a bunch of lookups into the process' file table).
    2. We loop over the process' file table looking for the socket holding the connection. This was a particular bitch to figure out, but I think I got it:
      1. The file table is accessible by using the P_FINFO() macro on the struct proc.
      2. Starting from 0 and looping while less than filetable->fi_nfiles, we
        walk an ugly list of structures.
        1. We get the desired fp from filetable->fi_list[i]->uf_file.
        2. We access its associated vnode (as sockets are implemented using sockfs, we have a vnode for the socket) from fp->f_vnode.
        3. The socket node (struct sonode) is stored in the private vn->v_data field.
        4. The tcp_s struct for the socket is stored in the sonode's private so->so_priv field.
        5. If the pointer of the tcp_s struct here matches that of the tcp_s struct we grabbed from the connection, we've got a winner.
  6. If we don't find anything, return ESRCH
I got a good bit of help regarding implementation of syscalls from Eric Schrock's blog entry on adding new syscalls to OpenSolaris. After reading this and doing my own poking around, I also found SYSCALL.README, which has other useful information on the subject.

Anyway, that's about all for this entry. Perhaps more later :)

--dho

0 comments: