2009-06-11

ZFS over iSCSI over ZFS

Some stuff I was hacking away on at work today; this is mostly a note to self:

The tl;dr version:

  1. double-check commands you type when playing with disks
  2. triple-check commands you type when playing with disks
  3. see #1 and #2
  4. the way to "refresh" the list of iscsi targets on the initiator is `iscsiadm modify discovery -t enable` (yes, I already had it enabled; it works anyway)
  5. it's `zpool iostat [pool] [interval]`, not `zpool iostat [interval] [pool]`

So...I set out to find out what happens when we resize a disk that's shared out over iSCSI.

And, just for my own sanity that this will work with real data, I copied over some D&D stuff to the client:

> time find dnd -type f -exec cat {} \; | md5sum
b5f1fc52d09baaf9f3db34408ce9c184  -

real    4m6.008s

I started with a mirror of 2 disks, cleverly named foo and bar, both 10G. Resizing bar to 15G caused the initiator to disconnect, and the pool faulted. Then I removed the faulty disk and added the new one in, and viola! 25G of space instead of my 9G mirror.

...crap; I wanted to mirror that. Backing out changes is a PITA, so I'll create a new disk (cleverly named zot) to be a mirror of foo during the upgrade to 15G.

Before adding zot (removed the targets that I'm not playing with):

fileserver:~# iscsitadm list target
Target: idle/robin/foo
    iSCSI Name: iqn.1986-03.com.sun:02:0fb34688-bd5c-6f54-ba25-ca3d972187f6
    Connections: 1
Target: idle/robin/bar
    iSCSI Name: iqn.1986-03.com.sun:02:2e83e95d-3841-44d7-c97c-c8de9fe7629c
    Connections: 1



client:/# iscsiadm list target
Target: iqn.1986-03.com.sun:02:2e83e95d-3841-44d7-c97c-c8de9fe7629c
        Alias: idle/robin/bar
        TPGT: 1
        ISID: 4000002a0000
        Connections: 0
Target: iqn.1986-03.com.sun:02:0fb34688-bd5c-6f54-ba25-ca3d972187f6
        Alias: idle/robin/foo
        TPGT: 1
        ISID: 4000002a0000
        Connections: 1

um...this is odd, since bar is part of my zpool on the client.

And then I added the new zfs iSCSI target on the fileserver:
mkiscitarget.sh is a wrapper around all the steps:

  1. zfs create -s -V ${SIZE} ${TARGET}
  2. (optional) iscsitadm modify -l ${INITIATOR} ${TARGET}
  3. (optional) iscsitadm modify -p ${TPGT} ${TARGET}

fileserver:~# ./mkiscsitarget.sh idle/robin/zot 10G client 208
fileserver:~# iscsitadm list target
Target: idle/robin/foo
    iSCSI Name: iqn.1986-03.com.sun:02:0fb34688-bd5c-6f54-ba25-ca3d972187f6
    Connections: 1
Target: idle/robin/bar
    iSCSI Name: iqn.1986-03.com.sun:02:2e83e95d-3841-44d7-c97c-c8de9fe7629c
    Connections: 1
Target: idle/robin/zot
    iSCSI Name: iqn.1986-03.com.sun:02:e0fe1c0d-634e-e96e-f8ad-dbc3c304a6e5
    Connections: 1



client:/# iscsiadm modify discovery -t enable
client:/# iscsiadm list target
Target: iqn.1986-03.com.sun:02:0fb34688-bd5c-6f54-ba25-ca3d972187f6
        Alias: idle/robin/foo
        TPGT: 208
        ISID: 4000002a0000
        Connections: 1
Target: iqn.1986-03.com.sun:02:2e83e95d-3841-44d7-c97c-c8de9fe7629c
        Alias: idle/robin/bar
        TPGT: 208
        ISID: 4000002a0000
        Connections: 1
Target: iqn.1986-03.com.sun:02:e0fe1c0d-634e-e96e-f8ad-dbc3c304a6e5
        Alias: idle/robin/zot
        TPGT: 208
        ISID: 4000002a0000
        Connections: 1
Target: iqn.1986-03.com.sun:02:2e83e95d-3841-44d7-c97c-c8de9fe7629c
        Alias: idle/robin/bar
        TPGT: 1
        ISID: 4000002a0000
        Connections: 0
Target: iqn.1986-03.com.sun:02:0fb34688-bd5c-6f54-ba25-ca3d972187f6
        Alias: idle/robin/foo
        TPGT: 1
        ISID: 4000002a0000
        Connections: 0
client:/# format
Searching for disks...done

c1t010000144FF2985500002A004A31B8A6d0: configured with capacity of 10.00GB


AVAILABLE DISK SELECTIONS:
       0. c0t0d0 
          /pci@1f,0/ide@d/dad@0,0
       1. c1t010000144FF2985500002A004A31A1F7d0 
          /scsi_vhci/ssd@g010000144ff2985500002a004a31a1f7
       2. c1t010000144FF2985500002A004A31B8A6d0 
          /scsi_vhci/ssd@g010000144ff2985500002a004a31b8a6
       3. c1t010000144FF2985500002A004A304D18d0 
          /scsi_vhci/ssd@g010000144ff2985500002a004a304d18
Specify disk (enter its number): ^C

an aside: Ben Rockwood used ^D to get out of format in an article I ran across. Since that works, it would stand to reason that `format <&-` would work in bash; so much for reason. `format </dev/null` or `echo | format` work fine, but who wants to type all that?</aside>

Oh...and I changed the tpgt for zot, bar, and foo - 208 is the box's nge0; 1 is actually tpgt 0 on the targets, but it comes across to the initiator as 1; no idea why that is. Evidently, iscsiadm has decided the disks are different (despite matching GUIDs) because the TPGT entries are different.

That's just silly; I'll move them back so it sees its targets again. Besides, my `zpool status` on the client is hanging and unkillable, so maybe the disk is gone and solaris hasn't quite figured it out (40min seek time? sure, that makes sense).

fileserver:~# iscsitadm delete target -p 208 idle/robin/foo
fileserver:~# iscsitadm delete target -p 208 idle/robin/bar


client:/# iscsiadm modify discovery -t enable
client:/# iscsiadm list target
Target: iqn.1986-03.com.sun:02:0fb34688-bd5c-6f54-ba25-ca3d972187f6
        Alias: idle/robin/foo
        TPGT: 208
        ISID: 4000002a0000
        Connections: 1
Target: iqn.1986-03.com.sun:02:2e83e95d-3841-44d7-c97c-c8de9fe7629c
        Alias: idle/robin/bar
        TPGT: 208
        ISID: 4000002a0000
        Connections: 0
Target: iqn.1986-03.com.sun:02:e0fe1c0d-634e-e96e-f8ad-dbc3c304a6e5
        Alias: idle/robin/zot
        TPGT: 208
        ISID: 4000002a0000
        Connections: 1
Target: iqn.1986-03.com.sun:02:2e83e95d-3841-44d7-c97c-c8de9fe7629c
        Alias: idle/robin/bar
        TPGT: 1
        ISID: 4000002a0000
        Connections: 1
Target: iqn.1986-03.com.sun:02:0fb34688-bd5c-6f54-ba25-ca3d972187f6
        Alias: idle/robin/foo
        TPGT: 1
        ISID: 4000002a0000
        Connections: 0

wait...what?!? I tried moving foo back to 208, since that's where the client says it's connected to the disk, but nothing changed. Time to turn off the iSCSI sharing and see if ZFS can figure it out when things come back.

fileserver:~# zfs set shareiscsi=off idle/robin/foo
fileserver:~# zfs set shareiscsi=off idle/robin/bar
fileserver:~# zfs set shareiscsi=off idle/robin/zot
fileserver:~# iscsitadm list target
fileserver:~#



client:/# iscsiadm modify discovery -t enable
client:/# iscsiadm list target
Target: iqn.1986-03.com.sun:02:0fb34688-bd5c-6f54-ba25-ca3d972187f6
        Alias: idle/robin/foo
        TPGT: 208
        ISID: 4000002a0000
        Connections: 0
Target: iqn.1986-03.com.sun:02:2e83e95d-3841-44d7-c97c-c8de9fe7629c
        Alias: idle/robin/bar
        TPGT: 208
        ISID: 4000002a0000
        Connections: 0
Target: iqn.1986-03.com.sun:02:e0fe1c0d-634e-e96e-f8ad-dbc3c304a6e5
        Alias: idle/robin/zot
        TPGT: 208
        ISID: 4000002a0000
        Connections: 0
Target: iqn.1986-03.com.sun:02:2e83e95d-3841-44d7-c97c-c8de9fe7629c
        Alias: idle/robin/bar
        TPGT: 1
        ISID: 4000002a0000
        Connections: 0
Target: iqn.1986-03.com.sun:02:0fb34688-bd5c-6f54-ba25-ca3d972187f6
        Alias: idle/robin/foo
        TPGT: 1
        ISID: 4000002a0000
        Connections: 0
client:/# iscsiadm modify discovery -t disable
iscsiadm: logical unit in use
iscsiadm: Unable to complete operation

Weird; and the file system is still mounted. But I can't `zpool status` (going on an hour now), so it shouldn't still work, right?

> ls dnd/
2e                                   cerat
3.5e                                 dX_skills.ods
4e                                   dnd
...

Yes, I know I have a dnd/dnd; I really need to clean it up.

Okay...looks like things are still hosed; time to pull the plug on the processes:

client:/# kill 726 2531 17260
client:/# kill -9 726 2531 17260
client:/# kill -CONT 726 2531 17260
client:/# for i in `/pkgs/gnu/bin/seq 1 48`; do kill -$i 726 2531 17260; done
client:/#

Now, the box is just mocking me...I think this is why the default is to panic in such a situation.

Sharing all the stuff back out to client so it can come up "clean":

fileserver:~# zfs set shareiscsi=on idle/robin/foo
fileserver:~# iscsitadm modify target -l xitomatl idle/robin/foo
fileserver:~# iscsitadm modify target -p 208 idle/robin/foo

... ditto for bar and zot ...

fileserver:~# iscsitadm list target
Target: idle/robin/foo
    iSCSI Name: iqn.1986-03.com.sun:02:0fb34688-bd5c-6f54-ba25-ca3d972187f6
    Connections: 0
Target: idle/robin/bar
    iSCSI Name: iqn.1986-03.com.sun:02:2e83e95d-3841-44d7-c97c-c8de9fe7629c
    Connections: 0
Target: idle/robin/zot
    iSCSI Name: iqn.1986-03.com.sun:02:e0fe1c0d-634e-e96e-f8ad-dbc3c304a6e5
    Connections: 0
client:/# reboot
Connection to client closed by remote host.
Connection to client closed.

fileserver:~# iscsitadm list target
Target: idle/robin/foo
    iSCSI Name: iqn.1986-03.com.sun:02:0fb34688-bd5c-6f54-ba25-ca3d972187f6
    Connections: 1
Target: idle/robin/bar
    iSCSI Name: iqn.1986-03.com.sun:02:2e83e95d-3841-44d7-c97c-c8de9fe7629c
    Connections: 1
Target: idle/robin/zot
    iSCSI Name: iqn.1986-03.com.sun:02:e0fe1c0d-634e-e96e-f8ad-dbc3c304a6e5
    Connections: 1

Yes, it connected just after I told it to reboot and SSH kicked me out.

And, back to the client, now sitting at a white screen of unlife:

stop+a
Type 'go' to resume
> boot

On reboot:

NOTICE: iscsi session(12) iqn.1986-03.com.sun:02:0fb34688-bd5c-6f54-ba25-ca3d972187f6 online
NOTICE: iscsi session(9) iqn.1986-03.com.sun:02:2e83e95d-3841-44d7-c97c-c8de9fe7629c online
NOTICE: iscsi session(6) iqn.1986-03.com.sun:02:e0fe1c0d-634e-e96e-f8ad-dbc3c304a6e5 online

You'll notice these are the iqn's for foo, bar, and zot, respectively. After cde-login comes up, the box helpfully tells me that my pool has faulted.

After logging in and running a `zpool status`, it says that bar is offline. Waiting a little makes bar come back. Weird, but at least it recovered. It's not a mirror, but I still only seem to have 10G out of my 2 10G disks. I hate using the Windows approach in UNIX, though.

client:/# zpool status trump
  pool: trump
 state: ONLINE
 scrub: none requested
config:

        NAME                                     STATE     READ WRITE CKSUM
        trump                                    ONLINE       0     0     0
          c1t010000144FF2985500002A004A31A1F7d0  ONLINE       0     0     0
          c1t010000144FF2985500002A004A304D18d0  ONLINE       0     0     0

errors: No known data errors
client:/# format </dev/null
Searching for disks...done

c1t010000144FF2985500002A004A31B8A6d0: configured with capacity of 10.00GB


AVAILABLE DISK SELECTIONS:
       0. c0t0d0 
          /pci@1f,0/ide@d/dad@0,0
       1. c1t010000144FF2985500002A004A31A1F7d0 
          /scsi_vhci/ssd@g010000144ff2985500002a004a31a1f7
       2. c1t010000144FF2985500002A004A31B8A6d0 
          /scsi_vhci/ssd@g010000144ff2985500002a004a31b8a6
       3. c1t010000144FF2985500002A004A304D18d0 
          /scsi_vhci/ssd@g010000144ff2985500002a004a304d18
Specify disk (enter its number):
client:/# zfs list trump
NAME    USED  AVAIL  REFER  MOUNTPOINT
trump  1.59G  8.19G  1.59G  /disk/trump
client:/# zpool iostat trump 30
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
trump       1.59G  23.2G     13      0  1.59M      0
Weird...looks like it still thinks the devices are 10G and 15G.
fileserver:~# zfs get volsize idle/robin/foo
NAME            PROPERTY  VALUE           SOURCE
idle/robin/foo  volsize   10G             -
fileserver:~# zfs get volsize idle/robin/bar
NAME            PROPERTY  VALUE           SOURCE
idle/robin/bar  volsize   10G             -
fileserver:~# zfs get volsize idle/robin/zot
NAME            PROPERTY  VALUE           SOURCE
idle/robin/zot  volsize   10G             -

And a quick data integrity check:

> time find dnd -type f -exec cat {} \; | md5sum
b5f1fc52d09baaf9f3db34408ce9c184  -

real    4m26.241s

Time to attach zot to the mirror and wait for it to resilver.

client:/# zpool attach trump c1t010000144FF2985500002A004A31A1F7d0 c1t010000144FF2985500002A004A31B8A6d0
client:/# zpool status trump
  pool: trump
 state: FAULTED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: resilver in progress for 0h0m, 6.39% done, 0h3m to go
config:

        NAME                                       STATE     READ WRITE CKSUM
        trump                                      FAULTED      0     6     0  insufficient replicas
          mirror                                   ONLINE       0     0     0
            c1t010000144FF2985500002A004A31A1F7d0  ONLINE       0     0     0
            c1t010000144FF2985500002A004A31B8A6d0  ONLINE       0     0     0
          c1t010000144FF2985500002A004A304D18d0    UNAVAIL      0     6     0  cannot open

errors: No known data errors

0h3m became 0h6m; 0h6m became 0h10m; 0h10m became 0h30m...I went to get food. I'll work on this more later.

:wq