Dual 64
From MohidWiki
This is the user's guide and log book of the dual_64 machine. The logs are sorted chronologically descending from the last log entry to the first log entry.
Contents
User's guide
THE DUAL_64 USAGE IS AT BETA STAGE. Important parameters such as IP and hostname are likely to change!
Briefing
The proposed methodology for using the fedora machine for personal use is:
- Create or copy a model with Mohid GUI at one's desktop.
- Transfer the model with winscp to the fedora machine in the Aplica folder. Soon there'll be a batch file created for that.
- Open a putty session to host 192.168.23.151, login fedora, pass fedora.
- Purge the .dat files so they are path/linux/windows independent.
- Copy an executable version to the exe path. For example:
- Launch ./MohidWater, sitback and enjoy the ride.
>./MohidWater
Here are some specific guidelines:
- MOHID applications are to be inserted in /home/fedora/Aplica. Careful not to overwrite any already existing application. In the future people will have a login and will use the /home/user/Aplica folder.
- The path where the mohid bins will lie is /usr/bin/mohid. What lies in there are MOHID related libraries and executables.
- The path where the Mohid source lies is /home/fedora/Projects/mohid_v4. You can get the latest version from SourceOffSite with make nix.sos and then build a MohidWater binary with make nix.
- Here's the list of already existing binaries:
[fedora@dual_64 ~]$ ls -l /usr/bin/mohid/ -rw-r--r-- 1 root root 2452422 Jul 26 10:40 x64_doubleMohid_Base_1.lib -rw-r--r-- 1 root root 2071128 Jul 26 10:40 x64_doubleMohid_Base_2.lib -rwxr-xr-x 1 root root 7786218 Jul 26 10:40 x64_doubleMohidWater -rw-r--r-- 1 root root 2436738 Jul 26 10:19 x64_singleMohid_Base_1.lib -rw-r--r-- 1 root root 2065860 Jul 26 10:19 x64_singleMohid_Base_2.lib -rwxr-xr-x 1 root root 7749376 Jul 26 10:19 x64_singleMohidWater
The naming philosophy is to add a prefix indicating the architecture (x64 or x32), the floating point unit precision (single or double), the symbolic information for debugging (g) and the symbolic information for profiling (g_p). They are for generic use. However the administrator may freely update them with the latest version of MOHID. It is VERY recommended to choose and copy one of them to the exe directory of the application for consistency.
Begin a new session
- Open a putty terminal and connect to Dual_64 or 192.168.20.160.
- username: fedora
- password: fedora
Purge the model .dat files
To purge the model files, a bash script was written called mohidpurge and made accessible from PATH.
To correctly use it, you need to go to the model's directory (in linux), then to call mohidpurge followed by the model's path in the windows systems. Here's an example. MyModel copied to linux from E:\Aplica\MyModel
:
>cd MyModel >mohidpurge E:\/Aplica\/MyModel
Since the .dat files from Mohid GUI don't work on other platforms unless some changes are operated, what the MohidPurge script does is:
- all paths are changed into relative paths (beginning with '../..').
- all backslashes are changed for slashes ('\' for '/').
- Every file named Nomfich are renamed nomfich as linux is case-sensitive.
Run MohidWater
From your model root path (/home/fedora/Aplica/YourModel) in the putty session type:
>mohidinstall ThisSimulation >cd ThisSimulation/exe >./MohidWater
Run process in background
or if you need to run the model in a background process (so you can close your current shell without killing the MohidWater process) type instead
>exec ./MohidWater &> MohidWater.log &
Remember the ampersand (the symbol at the end)!! In that case, the MohidWater console output is redirected to MohidWater.log. If you want to rerun the model you can skip the copy part (first line). It is advised for consistency that you keep the same executable for all the runs in your application.
Log
20090721
- installed & configured mailx and msmtp
- Created and scheduled a scripts that builds the mohid makefile then sends an email if an error or a warning rises.
0 23 * * * $HOME/Projects/scripts/mohidnightlybuild.sh >/dev/null 2>&1
20090714
Installing svn
- downloaded the svn source from subversion.tigris.org
- downloaded svn dependency apr and apr-util
- built apr
> ./configure > make > sudo make install
- built apr-util
> ./configure --with-apr=../apr-1.3.6 > make > sudo make install
- downloaded the sqlite
> mkdir subversion-1.3.6/directory > cp sqlite3.c subversion-1.3.6/directory/sqlite3.c
- built svn
> ./configure --with-apr=../apr-1.3.6 -with-apr-util=../apr-util-1.3.8 > make > sudo make install
20080129
- Installed in the PATH [[ncdump] and ncgen.
20070725
Here are the 3 steps to create a fresh directory of the preop-model:
- Create the directory tree,
- Copy the GeneralData files,
- Copy the configuration files of each submodel;
> cd /home/Aplica/PreOp-Model > for i in `find . -type d`; do mkdir /home/Aplica/PreOp-Model-V2/$i; done; > cd /home/Aplica/PreOp-Model/GeneralData > for i in `find . -type f`; do cp $i /home/Aplica/PreOp-Model-V2/GeneralData/$i; done; > cd /home/Aplica/PreOp-Model > for i in `find . | grep '_root\.dat'`;do cp $i /home/Aplica/PreOp-Model-V2/$i; done;
I still need to change the bathymetries and the hdf5 input files accordingly. As for the mslp Vs atmospheric pressure; I still need to test the mohidwater with mslp in my own machine. Only by friday will I get fresh results.
20070716
Bad blocks
Here's what I did when trying to diagnose and recover a failure from the hard-drives:
> su
> df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda3 241023808 95102800 145921008 40% / /dev/sda1 69972 13050 53309 20% /boot /dev/sdb1 976529360 284465684 692063676 30% /home tmpfs 4061936 0 4061936 0% /dev/shm
> debugreiserfs /dev/sda3 debugreiserfs 3.6.19 (2003 www.namesys.com) Filesystem state: consistency is not checked after last mounting Reiserfs super block in block 16 on 0x803 of format 3.6 with standard journal Count of blocks on the device: 60257792 Number of bitmaps: 1839 Blocksize: 4096 Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 36480251 Root block: 8323803 Filesystem is NOT clean Tree height: 5 Hash function used to sort names: "r5" Objectid map size 172, max 972 Journal parameters: Device [0x0] Magic [0x3435a9d1] Size 8193 blocks (including 1 for journal header) (first block 18) Max transaction length 1024 blocks Max batch size 900 blocks Max commit age 30 Blocks reserved by journal: 0 Fs state field: 0x0: sb_version: 2 inode generation number: 333260 UUID: a6b186c9-fc9a-44ac-a82a-0545dd759994 LABEL: Set flags in SB: ATTRIBUTES CLEAN
> debugreiserfs /dev/sda1 reiserfs_open: the reiserfs superblock cannot be found on /dev/sda1. debugreiserfs: can not open reiserfs on "/dev/sda1": no filesystem found
> debugreiserfs /dev/sdb1 debugreiserfs 3.6.19 (2003 www.namesys.com) The problem has occurred looks like a hardware problem. If you have bad blocks, we advise you to get a new hard drive, because once you get one bad block that the disk drive internals cannot hide from your sight,the chances of getting more are generally said to become much higher (precise statistics are unknown to us), and this disk drive is probably not expensive enough for you to you to risk your time and data on it. If you don't want to follow that follow that advice then if you have just a few bad blocks, try writing to the bad blocks and see if the drive remaps the bad blocks (that means it takes a block it has in reserve and allocates it for use for of that block number). If it cannot remap the block, use badblock option (-B) with reiserfs utils to handle this block correctly. bread: Cannot read the block (2): (Input/output error). Aborted
20061124
- Added perl module Date::Calc with
> perl -MCPAN -e shell cpan> install Date::Calc
- Implemented in Aplica/PreOpModel the downloading automated process
20061021
The Nx nomachine was installed. It allows to easily connect remotely to the workstation. The rdesktop was installed. It allows to connect remotely to windows systems with remote-desktoping enabled:
> rdesktop -u Administrator -d MARETEC Einstein
To access to fedora@dual_64 type.
$ vncviewer dual_64.maretec.ist.utl.pt:0
This log explains how to re-install GNOME using yum. Apparently, the X11 libraries were corrupted thus the graphical environmente wouldn't start. Bomer! Here's the answer: By the way, if something goes wrong while downloading with yum, then try to purge yum cache before and update the yum:
> sudo yum clean all > sudo yum update > sudo yum -y groupremove "GNOME Desktop Environment" (this line wasn't actually performed) > sudo yum -y groupremove "X Window System" > sudo yum -y groupinstall "X Window System" > sudo yum -y groupinstall "GNOME Desktop Environment" > sudo yum install samba > system
Note: the samba daemon needs to be reinstalled too. Also a firewall was mounted by gnome. To disable it see the Linux network article.
If everything went well, the gnome X environment should start with the following command:
> startx
20061021
This log explains how to recover the RAID1 logical drive:
RAID controller BIOS configuration
At boot [Ctrl-M] to enter the RAID controller BIOS setup:
-->Clear configuration -->New configuration -->Initialize Logical drive (will loose all info on drive!) -->Save and reboot
Be careful now, don't initialize the system disk unless you want to reinstall the OS!
Creating and formatting a partition in /dev/sdb
To see available drives and devices:
> fdisk -l
To see drives with correct filesystems:
> df -T
Creating a DOS table to Dual_64 RAID 1 logical drive and one primary partition:
> fdisk /dev/sdb #m> n #m> p #m> 1 #m> t 1 #m> 83 #m> w
Creating an extended filesystem partitioon to the whole RAID 1 drive:
> mkfs -t ext3 /dev/sdb1 > e2label /dev/sdb1 /mnt/RAID1 > vim /etc/fstab #vim> LABEL=/mnt/RAID1 /mnt/RAID1 ext3 default 1 2
Moving /home
to the RAID1 partition
Moving the /home
directory to the /dev/sdb1
:
> init 1 > cd /home > cp -ax * /mnt/RAID1 > mv /home /home.old > mkdir /home > mount /mnt/RAID1 /home > vim /etc/fstab #vim> LABEL=/mnt/RAID1 /home ext3 default 1 2
20060828
If one needs to mount the dvd-drive then type
>mount /dev/dvd /mnt/dvd
20060801
- MPICH successfully installed.
- PedroG started running Alqueva model using some experimental openmp implementations in ThomasZ algorithm.
- Troubleshooting, installing MPICH in linux. Edited the /etc/hosts file:
vim /etc/hosts
127.0.0.1 localhost.localdomain.com localhost #added this line 192.168.20.160 dual_64.maretec.ist.utl.pt dual_64
Now mpdcheck -s / mpdcheck -c works!
20060728
The Mohid standard benchmark results:
All processors using OpenMP:
-------------------------- MOHID ------------------------- Program Mohid Water succefully terminated Total Elapsed Time : 86.0820 Total CPU time : 322.9962 CPU usage (%) : 375.2192 Workcycle Elapsed Time : 78.3990 Workcycle CPU time : 313.3636 Workcycle CPU usage (%): 399.7036 ----------------------------------------------------------
Single processor:
-------------------------- MOHID ------------------------- Program Mohid Water succefully terminated Total Elapsed Time : 91.3430 Total CPU time : 91.3417 CPU usage (%) : 99.9986 Workcycle Elapsed Time : 83.2600 Workcycle CPU time : 83.2612 Workcycle CPU usage (%): 100.0014 ----------------------------------------------------------
A single core characteristic:
>cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 270 stepping : 2 cpu MHz : 1991.609 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy bogomips : 3987.91 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp
The results on Luis PC
Pentium4 3.2 GHz 2 GB RAM
Total Elapsed Time : 83.4550 Total CPU time : 82.8906 CPU usage : 99.3237 Workcycle Elapsed Time : 76.5790 Workcycle CPU time : 76.0781 Workcycle CPU usage : 99.3459
The first conclusions aren't too encouraging. Yet it still wasn't optimized.
20060726
Testing x32/x64 architecture X single/double precision
Throughout these tests the same MohidWater baroclinic 3D model was run.
x64_single_openmp:
-------------------------- MOHID ------------------------- Program Mohid Water succefully terminated Total Elapsed Time : 1465.4191 Total CPU time : 5857.4219 CPU usage (%) : 399.7097 Workcycle Elapsed Time : 1463.8910 Workcycle CPU time : 5855.1538 Workcycle CPU usage (%): 399.9720 ----------------------------------------------------------
Testing the CPU workload distribution by processes
- Running 5 simultaneous MohidWater models by different users. Note how the workload is distributed among CPUS:
>ps au
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND fedora 1160 65.2 0.6 37968 24564 pts/3 R+ 09:57 0:54 ./MohidWater fedora 1162 99.1 0.6 38500 25004 pts/5 R+ 09:57 1:04 ./MohidWater fedora 1163 98.2 0.6 38508 25008 pts/1 R+ 09:57 0:59 ./MohidWater fedora 1164 99.5 0.6 38508 25012 pts/2 R+ 09:57 0:57 ./MohidWater fedora 1165 51.2 0.6 37960 24560 pts/4 R+ 09:58 0:28 ./MohidWater fedora 1170 0.1 0.0 59864 1664 pts/6 Ss 09:58 0:00 -bash
- Running 4 simultaneous MohidWater models by different users. Note how the workload is well balanced between CPUS (each with 100% of usage):
>ps au
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND fedora 29734 99.9 0.7 46488 30408 pts/2 R+ 20:49 18:23 ./MohidWater fedora 29735 99.9 0.7 46068 30144 pts/3 R+ 20:49 18:22 ./MohidWater fedora 29736 99.8 0.7 46060 30136 pts/1 R+ 20:49 18:19 ./MohidWater fedora 29737 99.8 0.7 46068 30144 pts/4 R+ 20:49 18:18 ./MohidWater fedora 29742 0.0 0.0 59868 1688 pts/5 Ss 20:49 0:00 -bash
Conclusion: The linux OS natively handles well the different processes between CPUS. 4 single-threaded models may run using 100% of each processor. 5 single-threaded models will imply that a CPU will share 50%-50% the workload of two of them. This is interesting as 4 modelers who don't use the MPI option may share the same machine without loss of CPU power.
20060721
- Battery tests idealized with profiling: A) run the standard benchmark, B) run a 3D application for:
- x64_single precision
- x64_double precision
- x32_single precision
- x32_double precision
- The makefile methodology was implemented and the MOHID files are easily retrievable from the SourceOffSite. MOHID now compiles in the dual_64 machine.
20060717
- The intel fortran compiler, the hdf5 and netcdf libraries were successfully installed. Users are fedora:fedora and root:... . The useradd command doesn't work? The machine current ip is 192.168.20.160 and is ssh ip-accessible within the intranet. Name resolution is still a problem. Samba is yet to be properly configured.
- As the intel fortran is distributed in rpms, the Fedora Core seemed appropriate and straightforward for installation. A boot DVD was made and installed. Thus a stripped version of Fedora Core 5 x86_64 is the current dual_64 OS. During installation, a custom installation was chosen, and all the compatibility and legacy packages available were installed (glibc, libstdc++, and libgcc among others).
- During the installation of glibc (legacy lib) things went wrong. Lost all control over the machine. Solution: reinstallation of the entire OS.
- The intel fortran wouldn't install as some legacy libraries required were missing.
- As of this date, the machine was delivered last week with the Gentoo Vannilla distribution.