I currently have a vncserver going with all the windows open, in all
the necessary directories to do a single sampler run.
If you run vncviewer and log into megahard:5 with the online password,
you should be able to pick up where ever it last left off.
There are 4 working directories at the moment, and different samplers
should be run from separate directories because the software passes information
between processes in files with the same names!.
These are the directories that we're currently using:
/raid1/online/obs/subb1
/raid1/online/obs/subb2
/raid1/online/obs/subb3
/raid1/online/obs/subb4
I usually use subb1 for testing things with one sampler, so this should
have the current version of the software, cobra_acquisition, cobra_display,
cobra_command_testx and cobra_testy. It's worth making sure
you have the same versions of these programs in the other subb directories
if you're going to run more samplers.
I'm using the file README in this directory to make notes, if you get
stuck or get various error messages, have a look at this file,
if I've met the error before, it may be noted.
ADLINK drivers | Log into the sampler node as root
Check whether the drivers are loaded by typing lsmod [root@node1-7 /root]# lsmod Module Size Used by p7300b 35633 1 adl_mem_mgr 2632 2 [p7300b]If these two drivers are not seen, install as follows: cd /raid1/bjoshi/testsamp/ ./dask_inst_SMP.pl |
cobra_acquisition | This program should be run as root on the sampler node, in the same
subbx directory as everything else,
since it writes a parameter file header_ipcx which holds shared memory parameters for other processes to pick up. Make sure you remove any stop_acq file from the directory before starting. Run command: ./cobra_acquisition <nsleep> <rank> <mjd> <sec> <pac_size> <[logfile]> Where
./cobra_acquisition 1 2 52575 40000 16384 /scratch/COBRA/acq.log |
mpi software | All the rest of the software should be run from cobra, using the online account. |
mpi software
parameter files |
The processing software reads a set of parameter files on startup (in
current directory as usual).
They are:
clients, everyhting else is mostly standard. |
mpi software
starting it! |
First remove any files called shut, abort or setup
The main processing software is started with the script mpistart20, which should be edited for the processing nodes that you want to use. eg. mpimon cobra_test5 -- node1-8 1 node1-8 1 node1-7 1 node2-1 2 node2-2 2 node2-3 2 node2-4 2 node2-5 2 node2-6 2 node2-7 2 node2-8 2 node2-9 2 node2-10 2in this example,
|
cobra_command | In a different window, you now need to start cobra_command, which links
the mpiprograms to the
outside world. This needs to be run not less than 3 seconds after running mpimon, and before all the node ready messages start coming along (see below). The run command is: cobra_command_test7 192.168.0.108 1080 1082 d aWhere the IP-address is that of the master process, and the two port address can be pretty much anything (>1060) but are those given in the parameter file for the master. D is a flag for logging and a is a flag to look for commands from arthur (rather than the keyboard). If you've started successfully, you'll get the messageI Connected to: 192.168.0.102 Connected to: 192.168.0.102 |
mpi software
what it does on startup... |
as each node starts you get a meesage, and finally a node ready command.
eg
Linux node2-1 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-1 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-5 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-5 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-3 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-3 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-8 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-8 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-4 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-2 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-9 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-9 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-4 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-2 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-7 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-7 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-6 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-6 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-10 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node2-10 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node1-8 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node1-8 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Linux node1-7 2.4.3-12scalismp #1 SMP Wed Oct 24 13:20:46 CEST 2001 i686 unknown Cobra_master on node1-8 checking if nodes are ready Node 0 ready Node 1 ready Server (node1-7) - System ready Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Node 2 ready Node 3 ready Node 4 ready Node 5 ready Node 6 ready Node 7 ready Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Pack sizes : 16777216 16777236 33555476 Node 8 ready Node 9 ready Node 10 ready Node 11 ready Node 12 ready Node 13 ready Node 14 ready Node 15 ready Node 16 ready Node 17 ready Node 18 ready Node 19 ready Node 20 ready Node 21 ready Node 22 ready Master (node1-8) - System readyAnd that's it ready to go...... If you get "process 2 - segmentation violation" check that cobra_acquistion hasn't died. |
Integrating - manually | create a suitable pulsar.eph file and setup.in in subbx,
touch setup To stop and integration, touch abort To shut down everything (including cobra_acquisition) touch shut |
Integrating - arthur driven | In yet another window, goto the directory /raid1/online/scripts/
and check the file
cobracq_config_lovell.dat. Currently the only part of this in use is the frequency of the sampler! run the script /raid1/online/scripts/total_lovell.pl which checks what the filter bank hardware is doing, and creates the appropriate pulsar.eph and setup.in files, then starts and stops integrations in time with the filter bank observations. |
Progess messages | When a new setup file is created in the working directory, the following
sequence should take place:
s 10 10 2002 J1635+2418 pulsar.eph 52557.000000 480 12 h 1352.500000 Using parameter file pulsar.eph 254 Arg 28 225 254and sends the data to cobra_master. header 1342.500000 5.000000 27.215000 0.00000020 27.215000 1342.500000 5.000000 0.000000 0.000467 2333 Server sent setup command to data acq Server allocated send buffers Opening /scratch/COBRA/polyco.dat003 Opening /scratch/COBRA/polyco.dat006 Opening /scratch/COBRA/polyco.dat005 Opening /scratch/COBRA/polyco.dat004 Opening /scratch/COBRA/polyco.dat007 Opening /scratch/COBRA/polyco.dat008 Opening /scratch/COBRA/polyco.dat009 Opening /scratch/COBRA/polyco.dat011 Opening /scratch/COBRA/polyco.dat010 Opening /scratch/COBRA/polyco.dat012 Opening /scratch/COBRA/polyco.dat013 1144 Opening /scratch/COBRA/polyco.dat015 Opening /scratch/COBRA/polyco.dat014 Opening /scratch/COBRA/polyco.dat016 Server allocated ring buffers Server print_epnhead Server waiting for ready from acq Opening /scratch/COBRA/polyco.dat017 Opening /scratch/COBRA/polyco.dat018 Opening /scratch/COBRA/polyco.dat020 Opening /scratch/COBRA/polyco.dat019 Opening /scratch/COBRA/polyco.dat021 Opening /scratch/COBRA/polyco.dat022 Server received ready from acq mstat now : 1 Master (node1-8) - System ready Server sent begin command to data acq Computing profiles 3 Computing profiles 4 Computing profiles 5 Computing profiles 6 Computing profiles 7 Computing profiles 8 Computing profiles 3 Computing profiles 4 Computing profiles 9etc. until a stop is received. (An abort file created in the working directory). Integrating will continue indefinately, with an epn file written after each x secs (specified in setup.in.) The only indication that an epn file has been created is a new profile displayed by cobra_display or a directory listing for new files. file. It prints the following message: new cmd eCobra master send stop round to all the other processes, which finish processing their current buffer and finally return to System ready. Computing profiles 10 Computing profiles 11 Server sent end command to data acq Server received ready from acq Computing profiles 4 Computing profiles 13 Computing profiles 3 Computing profiles 6 Computing profiles 12 Computing profiles 5 Computing profiles 7 Computing profiles 8 Master (node1-8) - System ready |
cobra_display | In the current directory, run cobra_display which checks for new integrations and displays them. |
To stop everything | In the current directory touch shut and Ctrl-C the total_lovell
script if you're running it.
There are long sleeps in some of the process shutdowns,and it may take about 60 secs to finaaly exit the mpi stuff. |