Skip to content

Commit 444af86

Browse files
committed
Updated demo
1 parent c994f05 commit 444af86

File tree

3 files changed

+125
-98
lines changed

3 files changed

+125
-98
lines changed

README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
mrsimform: MapReduce based Simulation Informatics
1+
simform: MapReduce based Simulation Informatics
22
=========
33

44
### Written by Austin Benson, Paul Constantine, David F. Gleich, and Yangyang Hou
@@ -25,27 +25,27 @@ Here are some commands I ran to do a quick analysis on the EC2 cluster:
2525

2626
### Initialization
2727

28-
$ make dir=hdfs://nebula/data/exodus-runs
28+
$ make dir=hdfs://nebula/data/exodus-runs
2929

30-
$ make setup_database name=runs variable=TEMP dir=hdfs://ec2-107-22-80-153.compute-1.amazonaws.com:8020/user/temp/simform/
31-
$ make -f runs preprocess
32-
$ make -f runs convert timestepfile=timesteps.txt
30+
$ make setup_database name=runs variable=TEMP dir=hdfs://ec2-107-22-80-153.compute-1.amazonaws.com:8020/user/temp/simform/
31+
$ make -f runs preprocess
32+
$ make -f runs convert timestepfile=timesteps.txt
3333

3434
In this case, we had to normalize time-steps across the different files as the default step-length is variable.
3535

3636
### Simple interpolation
3737

38-
$ make -f runs predict design=design_points.txt points=new_points.txt
38+
$ make -f runs predict design=design_points.txt points=new_points.txt
3939

4040
and then dump out exodus files
4141

42-
$ make -f runs seq2exodus numExodusfiles=10 OutputName=output/thermal_maze
42+
$ make -f runs seq2exodus numExodusfiles=10 OutputName=output/thermal_maze
4343

4444
### SVD based Model Reduction
4545

46-
$ make -f runs seq2mseq
47-
$ make -f runs model numExodusfiles=6
48-
$ make -f runs interpsvd design=design_points.txt points=new_points.txt
46+
$ make -f runs seq2mseq
47+
$ make -f runs model numExodusfiles=6
48+
$ make -f runs interpsvd design=design_points.txt points=new_points.txt
4949

5050
Setup
5151
---------------

demo/DEMO.md

Lines changed: 110 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -1,63 +1,89 @@
1-
ssh into one of the node
1+
Whirr-EC2 Demo
2+
==============
23

3-
ssh -i $(HOME)/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no ssh -i /home/dgleich/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no 50.16.61.230
4+
This is a slightly terse guide to a quick check of our simform codes
5+
on a Whirr-launched EC2 Hadoop cluster.
6+
7+
We begin where we left off in `README.md`
48

5-
once there...
9+
Configuring the cluster
10+
-----------------------
611

7-
hadoop fs -mkdir /user/temp
8-
hadoop fs -mkdir /user/temp/simform
12+
1. ssh into one of the node
913

10-
# Only the temp directory has enough space
11-
cd /mnt/tmp
14+
ssh -i $(HOME)/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no ssh -i /home/dgleich/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no 50.16.61.230
1215

13-
mkdir scratch
16+
2. once there, we want to create a place to put our exodus files
1417

15-
cd scratch
18+
hadoop fs -mkdir /user/temp
19+
hadoop fs -mkdir /user/temp/simform
20+
21+
3. After that's done, we need to get exodus files into the cluster.
22+
The easiest wa to do this is to copy them into the temp directory,
23+
and then move them:
1624

17-
rsync -avz mysource.computer.com:~/mydatadir/*.e scratch/
25+
# Only the temp directory has enough space for a few GB of data
26+
cd /mnt/tmp
27+
mkdir scratch
28+
cd scratch
29+
# copy them from your computer
30+
rsync -avz mysource.computer.com:~/mydatadir/*.e scratch/
1831

19-
for f in `ls *.e`; do hadoop fs -put $f /user/temp/simform & ; done
32+
The final step is to load the files into HDFS
2033

21-
Wait until these finish
34+
cd scratch
35+
for f in `ls *.e`; do hadoop fs -put $f /user/temp/simform & ; done
2236

23-
Now, we need to install numpy, scipy on all nodes
37+
Wait until these finish. It can take a while.
2438

25-
'ssh -i /home/dgleich/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no [email protected]'
26-
'ssh -i /home/dgleich/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no [email protected]'
27-
'ssh -i /home/dgleich/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no [email protected]'
28-
'ssh -i /home/dgleich/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no [email protected]'
29-
'ssh -i /home/dgleich/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no [email protected]'
39+
4. Meanwhile we need to install some software on all the nodes. Using
40+
the ip addresses in ~/.whirr/mrsimform-hadoop/instances, we can run the following
41+
commands:
3042

31-
for node in 107.22.80.153 50.17.5.207 50.16.113.97 107.20.113.124 23.21.6.71; do
32-
ssh -i /home/dgleich/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no dgleich@$node sudo apt-get install -y python-numpy python-scipy python-setuptools python-netcdf python-dev libatlas3gf-base
33-
ssh -i /home/dgleich/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no dgleich@$node sudo easy_install typedbytes ctypedbytes
34-
done
43+
for node in 107.22.80.153 50.17.5.207 50.16.113.97 107.20.113.124 23.21.6.71; do
44+
ssh -i /home/dgleich/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" \
45+
-o StrictHostKeyChecking=no $node \
46+
sudo apt-get install -y python-numpy python-scipy python-setuptools \
47+
python-netcdf python-dev libatlas3gf-base
48+
ssh -i /home/dgleich/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" \
49+
-o StrictHostKeyChecking=no $node \
50+
sudo easy_install -z typedbytes
51+
ssh -i /home/dgleich/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" \
52+
-o StrictHostKeyChecking=no $node \
53+
sudo easy_install ctypedbytes
54+
done
3555

36-
for node in 107.22.80.153 50.17.5.207 50.16.113.97 107.20.113.124 23.21.6.71; do
37-
ssh -i /home/dgleich/.ssh/id_rsa_whirr -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no dgleich@$node sudo apt-get install -y libatlas3gf-base
38-
done
56+
This will install all the necessary software into all the nodes
57+
**ASSUMING YOU UPDATE THE LIST OF IP ADDRESSES FOR YOUR EXAMPLE**
3958

59+
5. Now, ssh into the head node, and let's install some of the
60+
other software there.
4061

41-
Now, ssh into the head node
62+
**Basic setup**
4263

43-
sudo apt-get install git-core
64+
sudo apt-get install git-core
4465

45-
mkdir devextern
46-
cd devextern
66+
cd ~
67+
mkdir devextern
68+
cd devextern
4769

48-
Install dumbo
70+
**Install dumbo**
4971

5072
sudo easy_install -z dumbo
5173

52-
# install mrjob
53-
git clone https://github.com/dgleich/mrjob.git
54-
cd mrjob
55-
sudo python setup.py install
56-
cd ..
74+
**Install mrjob**
5775

58-
Now we need to get hyy-hadoop everywhere. Check ~/.whirr/mrsimform-hadoop/instances
76+
git clone https://github.com/dgleich/mrjob.git
77+
cd mrjob
78+
sudo python setup.py install
79+
cd ..
80+
81+
**Install hyy-hadoop**
82+
Now we need to get hyy-hadoop everywhere.
83+
Check `~/.whirr/mrsimform-hadoop/instances`
5984
for the private IPs of all the nodes:
6085

86+
cd ~/devextern
6187
git clone https://github.com/hyysun/Hadoop.git
6288
cd Hadoop
6389
cd python-hadoop
@@ -68,72 +94,70 @@ for the private IPs of all the nodes:
6894
ssh -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no $node sudo easy_install Hadoop-0.2.tar.gz
6995
done
7096

71-
We've got all the prereqs installed now. We can get the new codes!
72-
73-
cd ~
74-
mkdir dev
75-
cd dev
76-
git clone https://github.com/hyysun/simform.git
77-
78-
cd simform/src
97+
6. Install simform! We've got all the prereqs installed now. We can get the new codes!
7998

80-
cd model
99+
cd ~
100+
mkdir dev
101+
cd dev
102+
git clone https://github.com/hyysun/simform.git
81103

82-
# install feathers
83-
git clone https://github.com/klbostee/feathers.git
84-
cd feathers
85-
sh build.sh
86-
cp feathers.jar ..
104+
cd simform/src
87105

88-
export HADOOP_HOME=/usr/lib/hadoop
106+
Except we need to install dumbo feathers
89107

90-
set the following as .mrjob.conf
108+
**Install feathers**
91109

92-
runners:
93-
hadoop:
94-
hadoop_home: /usr/lib/hadoop
95-
jobconf:
96-
mapreduce.task.timeout: 3600000
97-
mapred.task.timeout: 3600000
98-
mapred.reduce.tasks: 8
99-
mapred.child.java.opts: -Xmx2G
100-
110+
cd model
111+
git clone https://github.com/klbostee/feathers.git
112+
cd feathers
113+
sh build.sh
114+
cp feathers.jar ..
115+
116+
7. System setup. Run
117+
118+
export HADOOP_HOME=/usr/lib/hadoop
119+
120+
and set the following as .mrjob.conf
121+
122+
runners:
123+
hadoop:
124+
hadoop_home: /usr/lib/hadoop
125+
jobconf:
126+
mapreduce.task.timeout: 3600000
127+
mapred.task.timeout: 3600000
128+
mapred.reduce.tasks: 8
129+
mapred.child.java.opts: -Xmx2G
130+
131+
Running the codes
132+
------------------
133+
101134
For the next step, we need the actual HDFS path. For my demo, it is:
102135

103136
hdfs://ec2-107-22-80-153.compute-1.amazonaws.com:8020
104137

105-
make setup_database name=runs variable=TEMP dir=hdfs://ec2-107-22-80-153.compute-1.amazonaws.com:8020/user/temp/simform/
138+
1. Build the database
139+
140+
make setup_database name=runs variable=TEMP \
141+
dir=hdfs://ec2-107-22-80-153.compute-1.amazonaws.com:8020/user/temp/simform/
106142

107-
make -f runs preprocess
143+
make -f runs preprocess
108144

109145
At this point, we need to edit the output directory to enable the mapred user
110146
to write to it
111147

112-
hadoop fs -chmod 777 /user/temp/simform/output
113-
114-
make -f runs convert timestepfile=timesteps.txt
115-
using normalized timesteps 20min36s
148+
hadoop fs -chmod 777 /user/temp/simform/output
116149

117-
make -f runs convert
118-
exodus2seq_output=hdfs://icme-hadoop1.localdomain/user/yangyang/simform/output/data.seq2/
119-
without using normalized timesteps 20min24s
150+
make -f runs convert timestepfile=timesteps.txt
151+
120152

121-
make -f runs predict design=design_points.txt points=new_points.txt
122-
16min2s
153+
2. Make some predictions and save exodus files
123154

124-
make -f runs seq2exodus numExodusfiles=10 OutputName=output/thermal_maze
125-
locally, 9min
155+
make -f runs predict design=design_points.txt points=new_points.txt
156+
make -f runs seq2exodus numExodusfiles=10 OutputName=output/thermal_maze
126157

127-
SVD:
128-
make -f runs seq2mseq
129-
map: 40min
130-
reduce:55min
131-
total:1hr35min
132-
133-
make -f runs model numExodusfiles=6
134-
full1 22min51s
135-
full2 1min
136-
full3 3min57s
137-
TSMatMul 3min29s
138-
total: 32min
158+
3. Compute the SVD
139159

160+
make -f runs seq2mseq
161+
162+
make -f runs model numExodusfiles=6
163+

demo/README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,16 +27,19 @@ this as "my-mrsimform-hadoop.properties"
2727

2828
whirr launch-cluster --config my-mrsimform-hadoop.properties
2929

30-
4. Login to the nodes
30+
4. Login to the nodes. whirr should spit out a list of nodes when
31+
it launches. I just pick the first one.
3132

3233
ssh -i $(HOME)/.ssh/id_rsa_whirr -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no ec2-50-16-181-181.compute-1.amazonaws.com
3334

35+
5. Now see DEMO.md for how to configure the cluster and run a few commands.
3436

3537
15. Destroy the cluster
3638

3739
whirr destroy-cluster --config my-mrsimform-hadoop.properties
3840

39-
41+
Referenes
42+
---------
4043

4144
* <http://www.evanconkle.com/2011/11/run-hadoop-cluster-ec2-easy-apache-whirr/>
4245
* <http://archive.cloudera.com/cdh/3/whirr/contrib/python/running-mapreduce-jobs.html>

0 commit comments

Comments
 (0)