@zigen 's note

mpi

最終更新:

mynote

- view
だれでも歓迎! 編集

MPI(Message Passing Interface)

平たく言うとPCで並列計を行う為の規格(<-本当は「分散メモリ型並列処理」の為の〜)

MPI規格はMPI forumが中心となって進めています、現在MPI2の開発研究中?(2006/12/20)
http://www.mpi-forum.org/

MPIを使ったソフトにも色々あって(OSがWindowsだけじゃないのと同じ、というかLinuxが同じカーネル使ってるのに様々あるのと同じ?)下の系譜みたいな感じらしい。
今のところMPICHを使用していくつもり


とりあえず2台(Core2DuoだからCPU数は4)で動かしてみた下の画像は2台の同時刻のシステムモニタのキャプチャ(シングルユーザーモードならtopコマンドで見れる)


[cfd@utmcc010 hime]$ ./paramset.sh M 1 2 2
[cfd@utmcc010 hime]$ mpicc -o LAMM122 himenobmtxps.c
[cfd@utmcc010 hime]$ mpicc -O -o LAMM122O himenobmtxps.c
[cfd@utmcc010 hime]$ mpicc -O1 -o LAMM122O1 himenobmtxps.c
[cfd@utmcc010 hime]$ mpicc -O2 -o LAMM122O2 himenobmtxps.c
[cfd@utmcc010 hime]$ mpicc -O3 -o LAMM122O3 himenobmtxps.c
[cfd@utmcc010 hime]$ ls
LAMM112 LAMM122 M111 M112 M122 himeno-np2.txt makefile.sample
LAMM112O LAMM122O M111O M112O1 M122O1 himeno-np2.txt~ param.h
LAMM112O1 LAMM122O1 M111O1 M112O2 M122O2 himeno2.o paramset.sh
LAMM112O2 LAMM122O2 M111O2 M112O3 M122O3 himenobmtxps.c
LAMM112O3 LAMM122O3 M111O3 M112O3] cc_himenoBMTxp_mpi.lzh himenobmtxps.o
[cfd@utmcc010 hime]$ mpirun -np 4 LAMM122
Sequential version array size
mimax = 129 mjmax = 129 mkmax = 257
Parallel version array size
mimax = 129 mjmax = 67 mkmax = 131
imax = 128 jmax = 65 kmax =129
I-decomp = 1 J-decomp = 2 K-decomp =2
Start rehearsal measurement process.
Measure the performance in 3 times.

MFLOPS: 1051.135143 time(s): 0.391306 1.702009e-03

Now, start the actual measurement process.
The loop will be excuted in 459 times
This will take about one minute.
Wait for a while

cpu : 60.335524 sec.
Loop executed for 459 times
Gosa : 1.019457e-03
MFLOPS measured : 1043.021642
Score based on Pentium III 600MHz : 12.590797
[cfd@utmcc010 hime]$ mpirun -np 4 LAMM122O
Sequential version array size
mimax = 129 mjmax = 129 mkmax = 257
Parallel version array size
mimax = 129 mjmax = 67 mkmax = 131
imax = 128 jmax = 65 kmax =129
I-decomp = 1 J-decomp = 2 K-decomp =2
Start rehearsal measurement process.
Measure the performance in 3 times.

MFLOPS: 1856.711289 time(s): 0.221529 1.702009e-03

Now, start the actual measurement process.
The loop will be excuted in 812 times
This will take about one minute.
Wait for a while

cpu : 59.410046 sec.
Loop executed for 812 times
Gosa : 8.322609e-04
MFLOPS measured : 1873.914897
Score based on Pentium III 600MHz : 22.620894
[cfd@utmcc010 hime]$ mpirun -np 4 LAMM122O1
Sequential version array size
mimax = 129 mjmax = 129 mkmax = 257
Parallel version array size
mimax = 129 mjmax = 67 mkmax = 131
imax = 128 jmax = 65 kmax =129
I-decomp = 1 J-decomp = 2 K-decomp =2
Start rehearsal measurement process.
Measure the performance in 3 times.

MFLOPS: 1833.250123 time(s): 0.224364 1.702009e-03

Now, start the actual measurement process.
The loop will be excuted in 802 times
This will take about one minute.
Wait for a while

cpu : 58.883746 sec.
Loop executed for 802 times
Gosa : 8.363915e-04
MFLOPS measured : 1867.379824
Score based on Pentium III 600MHz : 22.542007
[cfd@utmcc010 hime]$ mpirun -np 4 LAMM122O2
Sequential version array size
mimax = 129 mjmax = 129 mkmax = 257
Parallel version array size
mimax = 129 mjmax = 67 mkmax = 131
imax = 128 jmax = 65 kmax =129
I-decomp = 1 J-decomp = 2 K-decomp =2
Start rehearsal measurement process.
Measure the performance in 3 times.

MFLOPS: 1848.849559 time(s): 0.222471 1.702009e-03

Now, start the actual measurement process.
The loop will be excuted in 809 times
This will take about one minute.
Wait for a while

cpu : 59.082790 sec.
Loop executed for 809 times
Gosa : 8.335127e-04
MFLOPS measured : 1877.332719
Score based on Pentium III 600MHz : 22.662153
[cfd@utmcc010 hime]$ mpirun -np 4 LAMM122O3
Sequential version array size
mimax = 129 mjmax = 129 mkmax = 257
Parallel version array size
mimax = 129 mjmax = 67 mkmax = 131
imax = 128 jmax = 65 kmax =129
I-decomp = 1 J-decomp = 2 K-decomp =2
Start rehearsal measurement process.
Measure the performance in 3 times.

MFLOPS: 1848.617767 time(s): 0.222499 1.702009e-03

Now, start the actual measurement process.
The loop will be excuted in 808 times
This will take about one minute.
Wait for a while

cpu : 59.195166 sec.
Loop executed for 808 times
Gosa : 8.339165e-04
MFLOPS measured : 1871.452640
Score based on Pentium III 600MHz : 22.591171
[cfd@utmcc010 hime]$ date
2007年 1月 4日 木曜日 21:50:17 JST
[cfd@utmcc010 hime]$





[cfd@utmcc010 hime]$ mpicc -o LAMM112 himenobmtxps.c
[cfd@utmcc010 hime]$ mpicc -O -o LAMM112O himenobmtxps.c
[cfd@utmcc010 hime]$ mpicc -O1 -o LAMM112O1 himenobmtxps.c
[cfd@utmcc010 hime]$ mpicc -O2 -o LAMM112O2 himenobmtxps.c
[cfd@utmcc010 hime]$ mpicc -O3 -o LAMM112O3 himenobmtxps.c
[cfd@utmcc010 hime]$ ls
LAMM112 LAMM112O3 M111O2 M112O2 M122O1 himeno-np2.txt himenobmtxps.o
LAMM112O M111 M111O3 M112O3 M122O2 himeno-np2.txt~ makefile.sample
LAMM112O1 M111O M112 M112O3] M122O3 himeno2.o param.h
LAMM112O2 M111O1 M112O1 M122 cc_himenoBMTxp_mpi.lzh himenobmtxps.c paramset.sh
[cfd@utmcc010 hime]$ date
2007年 1月 4日 木曜日 21:28:17 JST
[cfd@utmcc010 hime]$ mpirun -np 2 LAMM112
Sequential version array size
mimax = 129 mjmax = 129 mkmax = 257
Parallel version array size
mimax = 129 mjmax = 129 mkmax = 131
imax = 128 jmax = 128 kmax =129
I-decomp = 1 J-decomp = 1 K-decomp =2
Start rehearsal measurement process.
Measure the performance in 3 times.

MFLOPS: 562.444063 time(s): 0.731300 1.667103e-03

Now, start the actual measurement process.
The loop will be excuted in 246 times
This will take about one minute.
Wait for a while

cpu : 60.890822 sec.
Loop executed for 246 times
Gosa : 1.195825e-03
MFLOPS measured : 553.907178
Score based on Pentium III 600MHz : 6.686470
[cfd@utmcc010 hime]$
[cfd@utmcc010 hime]$ mpirun -np 2 LAMM112O
Sequential version array size
mimax = 129 mjmax = 129 mkmax = 257
Parallel version array size
mimax = 129 mjmax = 129 mkmax = 131
imax = 128 jmax = 128 kmax =129
I-decomp = 1 J-decomp = 1 K-decomp =2
Start rehearsal measurement process.
Measure the performance in 3 times.

MFLOPS: 930.466107 time(s): 0.442053 1.667103e-03

Now, start the actual measurement process.
The loop will be excuted in 407 times
This will take about one minute.
Wait for a while

cpu : 59.369238 sec.
Loop executed for 407 times
Gosa : 1.059472e-03
MFLOPS measured : 939.910841
Score based on Pentium III 600MHz : 11.346099
[cfd@utmcc010 hime]$ mpirun -np 2 LAMM112O1
Sequential version array size
mimax = 129 mjmax = 129 mkmax = 257
Parallel version array size
mimax = 129 mjmax = 129 mkmax = 131
imax = 128 jmax = 128 kmax =129
I-decomp = 1 J-decomp = 1 K-decomp =2
Start rehearsal measurement process.
Measure the performance in 3 times.

MFLOPS: 923.843610 time(s): 0.445222 1.667103e-03

Now, start the actual measurement process.
The loop will be excuted in 404 times
This will take about one minute.
Wait for a while

cpu : 58.842383 sec.
Loop executed for 404 times
Gosa : 1.061599e-03
MFLOPS measured : 941.336367
Score based on Pentium III 600MHz : 11.363307
[cfd@utmcc010 hime]$ mpirun -np 2 LAMM112O2
Sequential version array size
mimax = 129 mjmax = 129 mkmax = 257
Parallel version array size
mimax = 129 mjmax = 129 mkmax = 131
imax = 128 jmax = 128 kmax =129
I-decomp = 1 J-decomp = 1 K-decomp =2
Start rehearsal measurement process.
Measure the performance in 3 times.

MFLOPS: 925.789487 time(s): 0.444286 1.667103e-03

Now, start the actual measurement process.
The loop will be excuted in 405 times
This will take about one minute.
Wait for a while

cpu : 58.791508 sec.
Loop executed for 405 times
Gosa : 1.060972e-03
MFLOPS measured : 944.483005
Score based on Pentium III 600MHz : 11.401292
[cfd@utmcc010 hime]$ mpirun -np 2 LAMM112O3
Sequential version array size
mimax = 129 mjmax = 129 mkmax = 257
Parallel version array size
mimax = 129 mjmax = 129 mkmax = 131
imax = 128 jmax = 128 kmax =129
I-decomp = 1 J-decomp = 1 K-decomp =2
Start rehearsal measurement process.
Measure the performance in 3 times.

MFLOPS: 919.508017 time(s): 0.447321 1.667103e-03

Now, start the actual measurement process.
The loop will be excuted in 402 times
This will take about one minute.
Wait for a while

cpu : 59.145764 sec.
Loop executed for 402 times
Gosa : 1.062964e-03
MFLOPS measured : 931.871716
Score based on Pentium III 600MHz : 11.249055
[cfd@utmcc010 hime]$ date
2007年 1月 4日 木曜日 21:39:32 JST
[cfd@utmcc010 hime]$





姫野ベンチ流れました!原因はmpichでコンパイルしたためlam-mpiのmpirunでは流れなかった(気づけよ俺)

[cfd@utmcc010 hime]$ ./paramset.sh M 1 2 2
[cfd@utmcc010 hime]$ mpicc -o himeno2.o himenobmtxps.c
[cfd@utmcc010 hime]$ ls
M111 M112 M122 himeno-np2.txt makefile.sample
M111O M112O1 M122O1 himeno-np2.txt~ param.h
M111O1 M112O2 M122O2 himeno2.o paramset.sh
M111O2 M112O3 M122O3 himenobmtxps.c
M111O3 M112O3] cc_himenoBMTxp_mpi.lzh himenobmtxps.o
[cfd@utmcc010 hime]$ mpirun -np 4 himeno2.o
Sequential version array size
mimax = 129 mjmax = 129 mkmax = 257
Parallel version array size
mimax = 129 mjmax = 67 mkmax = 131
imax = 128 jmax = 65 kmax =129
I-decomp = 1 J-decomp = 2 K-decomp =2
Start rehearsal measurement process.
Measure the performance in 3 times.

MFLOPS: 1073.739694 time(s): 0.383068 1.702009e-03

Now, start the actual measurement process.
The loop will be excuted in 469 times
This will take about one minute.
Wait for a while

cpu : 60.063891 sec.
Loop executed for 469 times
Gosa : 1.013058e-03
MFLOPS measured : 1070.565156
Score based on Pentium III 600MHz : 12.923288
[cfd@utmcc010 hime]$
[cfd@utmcc010 hime]$ date
2007年 1月 4日 木曜日 21:09:21 JST
[cfd@utmcc010 hime]$



姫野ベンチを走らせたんだが・・・どうも弾かれてます

[cfd@utmcc010 hime]$ ls
M111 M111O3 M112O3 M122O2 himeno-np2.txt~ param.h
M111O M112 M112O3] M122O3 himenobmtxps.c paramset.sh
M111O1 M112O1 M122 cc_himenoBMTxp_mpi.lzh himenobmtxps.o
M111O2 M112O2 M122O1 himeno-np2.txt makefile.sample
[cfd@utmcc010 hime]$ mpirun -np 4 M122O2
Invalid number of PE
Please check partitioning pattern or number of PE

It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.

p0_31624: p4_error: interrupt SIGx: 15
p0_26127: p4_error: interrupt SIGx: 15
[cfd@utmcc010 hime]$ date
2007年 1月 4日 木曜日 20:19:10 JST
[cfd@utmcc010 hime]$






行けてるっぽいがいちいちutmcc011のpassword聞いてくるのがめんどい(rshはhostsファイルいじればよかったけどsshは?)

[cfd@utmcc010 advanced]$ lamboot -v lamhosts

LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University

n-1<31257> ssi:boot:base:linear: booting n0 (utmcc010)
n-1<31257> ssi:boot:base:linear: booting n1 (utmcc011)
cfd@utmcc011's password:
cfd@utmcc011's password:
n-1<31257> ssi:boot:base:linear: finished
[cfd@utmcc010 advanced]$ date
2007年 1月 4日 木曜日 17:23:55 JST
[cfd@utmcc010 advanced]$


?行けた??
Woo hoo is a common expression of joy, especially as arising from success or good fortune. by Wiki
「Woo hoo」ってのは一般的な喜びを表現する言葉だよ、特に成功や幸運からの喜びの時発せられるもの。


[cfd@utmcc010 advanced]$ recon -v lamhosts
n-1<31237> ssi:boot:base:linear: booting n0 (utmcc010)
n-1<31237> ssi:boot:base:linear: booting n1 (utmcc011)
cfd@utmcc011's password:
cfd@utmcc011's password:
n-1<31237> ssi:boot:base:linear: finished

Woo hoo!

recon has completed successfully. This means that you will most likely
be able to boot LAM successfully with the "lamboot" command (but this
is not a guarantee). See the lamboot(1) manual page for more
information on the lamboot command.

If you have problems booting LAM (with lamboot) even though recon
worked successfully, enable the "-d" option to lamboot to examine each
step of lamboot and see what fails. Most situations where recon
succeeds and lamboot fails have to do with the hboot(1) command (that
lamboot invokes on each host in the hostfile).

[cfd@utmcc010 advanced]$ date
2007年 1月 4日 木曜日 17:14:43 JST



rshもsshも通るやんorz

[cfd@utmcc010 advanced]$ rsh utmcc011 date
2007年 1月 4日 木曜日 17:09:48 JST
[cfd@utmcc010 advanced]$ rsh utmcc011 who
root :0 Dec 29 22:45
root pts/1 Dec 30 00:01 (:0.0)
root pts/2 Jan 1 20:26 (:0.0)
root pts/3 Jan 1 21:15 (:0.0)
root pts/4 Jan 1 21:32 (:0.0)
[cfd@utmcc010 advanced]$ rsh utmcc011 ls
Desktop
utmcc010
utmcc011
[cfd@utmcc010 advanced]$ ssh utmcc011 date
cfd@utmcc011's password:
2007年 1月 4日 木曜日 17:10:17 JST
[cfd@utmcc010 advanced]$



2007/01/04/17:12
LAM-MPIのログです・・・rsh(ssh)が拒否される?
MPICHはうまくいってるのにorz

LAM tried to use the remote agent command "/usr/bin/ssh"
to invoke "echo $SHELL" on the remote node.

This usually indicates an authentication problem with the remote
agent, or some other configuration type of error in your .cshrc or
.profile file. The following is a list of items that you may wish to
check on the remote node:
「/usr/bin/ssh」という命令が引き合いに出す遠隔エージェントを使用するために審理されるLAMは、
リモートなノードで「$SHELLを繰り返します」。

これは通常遠隔エージェントに関する認証問題を示します、
あるいは、あなたの.cshrcにおけるエラーの若干の他の構成種または.profileは綴じ込みます。
以下は、あなたがリモートなノードをチェックすることを望むかもしれないアイテムのリストです:

[cfd@utmcc010 advanced]$ recon -v lamhosts
n-1<31160> ssi:boot:base:linear: booting n0 (utmcc010)
n-1<31160> ssi:boot:base:linear: booting n1 (utmcc011)
cfd@utmcc011's password:
ERROR: LAM/MPI unexpectedly received the following on stderr:
Connection closed by 202.13.13.17

LAM failed to execute a process on the remote node "utmcc011".
LAM was not trying to invoke any LAM-specific commands yet -- we were
simply trying to determine what shell was being used on the remote
host.

LAM tried to use the remote agent command "/usr/bin/ssh"
to invoke "echo $SHELL" on the remote node.

This usually indicates an authentication problem with the remote
agent, or some other configuration type of error in your .cshrc or
.profile file. The following is a list of items that you may wish to
check on the remote node:

       - You have an account and can login to the remote machine
       - Incorrect permissions on your home directory (should
         probably be 0755)
       - Incorrect permissions on your $HOME/.rhosts file (if you are
         using rsh -- they should probably be 0644)
       - You have an entry in the remote $HOME/.rhosts file (if you
         are using rsh) for the machine and username that you are
         running from
       - Your .cshrc/.profile must not print anything out to the
         standard error
       - Your .cshrc/.profile should set a correct TERM type
       - Your .cshrc/.profile should set the SHELL environment
         variable to your default shell

Try invoking the following command at the unix command line:

       /usr/bin/ssh -x -a utmcc011 -n echo $SHELL

You will need to configure your local setup such that you will *not*
be prompted for a password to invoke this command on the remote node.
No output should be printed from the remote node before the output of
the command is displayed.

When you can get this command to execute successfully by hand, LAM
will probably be able to function properly.

n-1<31160> ssi:boot:base:linear: Failed to boot n1 (utmcc011)
n-1<31160> ssi:boot:base:linear: aborted!

recon was not able to complete successfully. There can be any number
of problems that did not allow recon to work properly. You should use
the "-d" option to recon to get more information about each step that
recon attempts.

Any error message above may present a more detailed description of the
actual problem.

Here is general a list of prerequisites that *must* be fulfilled
before recon can work:

       - Each machine in the hostfile must be reachable and operational.
       - You must have an account on each machine.
       - You must be able to rsh(1) to the machine (permissions
         are typically set in the user's $HOME/.rhosts file).

       *** Sidenote: If you compiled LAM to use a remote shell program
           other than rsh (with the --with-rsh option to ./configure;
           e.g., ssh), or if you set the LAMRSH environment variable
           to an alternate remote shell program, you need to ensure
           that you can execute programs on remote nodes with no
           password.  For example:

       unix% ssh -x pinky uptime
       3:09am up 211 day(s), 23:49, 2 users, load average: 0.01, 0.08, 0.10

       - The LAM executables must be locatable on each machine, using
         the shell's search path and possibly the LAMHOME environment
         variable.
       - The shell's start-up script must not print anything on standard
         error.  You can take advantage of the fact that rsh(1) will
         start the shell non-interactively.  The start-up script (such
         as .profile or .cshrc) can exit early in this case, before
         executing many commands relevant only to interactive sessions
         and likely to generate output.

タグ:

+ タグ編集
  • タグ:

このサイトはreCAPTCHAによって保護されており、Googleの プライバシーポリシー利用規約 が適用されます。

目安箱バナー