2010年1月27日水曜日

1台でHBase, Hive, Pig, HUE(旧Cloudera Desktop)を試してみる(Ubuntu + Cloudera)


1LinuxHadoopHBase, Hive, Pig, HUE
Ubuntu(1)ClouderaHadoop(CDH3)CDH3Hadoop  
(Ubuntu 11.04/10.04/9.10/8.04Debian)

その他の記事

Hadoop(1台構成)Hive,Pig等(1台構成)Hadoop(複数台構成)
CentOSこちらHBase,Hive,Pig,HUE(旧Cloudera Desktop), Oozieこちら
UbuntuこちらHBase,Hive,Pig,HUE(旧Cloudera Desktop)



Linux(Ubuntu)+ClouderaHadoop(1) 

ClouderaCDH3HBase, Hive, Pig, HUE(apt)



1. HBase, 2. Pig, 3. Hive, 4. HUE()

1. HBase

1-1. : Linuxroot

1-1-1. HBase


apt-get -y install hadoop-hbase
apt-get -y install hadoop-hbase-master
apt-get -y install hadoop-hbase-regionserver
apt-get -y install hadoop-zookeeper-server


1-1-2. 

distributed2

/etc/zookeeper/zoo.cfg
/etc/zookeeper/zoo.cfgserver.0=localhost:2888:3888localhost(IP)

/etc/hbase/conf/hbase-site.xml

cat << EOF > /etc/hbase/conf/hbase-site.xml
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>zookeeperzoo.cfg(IP)</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
</configuration>
EOF

1-2. 

 CDH3 beta2beta3

mkdir /var/run/hbase


/etc/init.d/hadoop-zookeeper-server start
/etc/init.d/hadoop-hbase-master start
/etc/init.d/hadoop-hbase-regionserver start



jps


3148 TaskTracker
3319 Jps
2973 JobTracker
3053 DataNode
2787 HRegionServer
2889 NameNode
3272 HMaster
2376 QuorumPeerMain


1-3. 

Hbaseshell

hbase shell


HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.90.3-cdh3u1, r, Mon Jul 18 08:23:50 PDT 2011
hbase(main):001:0>


hbaseversion, status, exit


base(main):001:0> version
0.90.3-cdh3u1, r, Mon Jul 18 08:23:50 PDT 2011
hbase(main):002:0> status
1 servers, 0 dead, 2.0000 average load
hbase(main):003:0> exit





create 'yamanoteline', 'transfer', 'location'



put 'yamanoteline', 'Shinagawa', 'location:ku', 'Minato'
put 'yamanoteline', 'Shinagawa', 'transfer:jreast', '3'
put 'yamanoteline', 'Shinagawa', 'transfer:subway', '0'
put 'yamanoteline', 'Shinagawa', 'transfer:other', '2'

put 'yamanoteline', 'Osaki', 'location:ku', 'Shinagawa'
put 'yamanoteline', 'Osaki', 'transfer:jreast', '2'
put 'yamanoteline', 'Osaki', 'transfer:subway', '0'
put 'yamanoteline', 'Osaki', 'transfer:other', '1'

put 'yamanoteline', 'Gotanda', 'location:ku', 'Shinagawa'
put 'yamanoteline', 'Gotanda', 'transfer:jreast', '0'
put 'yamanoteline', 'Gotanda', 'transfer:subway', '1'
put 'yamanoteline', 'Gotanda', 'transfer:other', '1'

(get)(scan)
hbase(main):001:0> get 'yamanoteline', 'Osaki'
COLUMN                  CELL
 location:ku            timestamp=1314086040085, value=Shinagawa
 transfer:jreast        timestamp=1314086040125, value=2
 transfer:other         timestamp=1314086040207, value=1
 transfer:subway        timestamp=1314086040165, value=0
4 row(s) in 0.0450 seconds
hbase(main):002:0> scan 'yamanoteline', {STARTROW => 'G', STOPROW => 'P'}
ROW                     COLUMN+CELL
 Gotanda                column=location:ku, timestamp=1314086040267, value=Shinagawa
 Gotanda                column=transfer:jreast, timestamp=1314086040306, value=0
 Gotanda                column=transfer:other, timestamp=1314086041294, value=1
 Gotanda                column=transfer:subway, timestamp=1314086040379, value=1
 Osaki                  column=location:ku, timestamp=1314086040085, value=Shinagawa
 Osaki                  column=transfer:jreast, timestamp=1314086040125, value=2
 Osaki                  column=transfer:other, timestamp=1314086040207, value=1
 Osaki                  column=transfer:subway, timestamp=1314086040165, value=0
2 row(s) in 0.1510 seconds
hbase(main):011:0>


scan 'yamanoteline'

HDFS/hbase/
使(())

nametransfer:jreasttransfer:subwaytransfer:otherlocation:ku
Shinagawa302Minato
Osaki201Shinagawa
Gotanda011Shinagawa

2. Pig

2-1. : Linuxroot


apt-get -y install hadoop-pig


2-2. 

(/var/pigtest.csv)

cat << TESTDATA > /var/tmp/pigtest.csv
Shinagawa,3,0,2 Minato
Osaki,2,0,1,Shinagawa
Gotanda,0,1,1,Shinagawa
Meguro,0,2,1,Shinagawa
Ebisu,2,1,0,Shibuya
Shibuya,2,3,3,Shibuya
Harajuku,0,1,0,Shibuya
Yoyogi,1,1,0,Shibuya
Shinjuku,5,3,3,Shinjuku
TESTDATA


HDFS/var/pigtest/test

hadoop-0.20 fs -put /var/tmp/pigtest.csv /var/pigtest/test.csv


pig

JAVA_HOME=/usr/lib/jvm/java-6-sun pig

2011-08-22 21:59:22,720 [main] INFO  org.apache.pig.Main - Logging error messages to: /root/pig_1314086362718.log
2011-08-22 21:59:23,165 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2011-08-22 21:59:23,678 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
grunt> 



grunt> Y1 = LOAD '/var/pigtest/test.csv' USING PigStorage(',')
 AS (name: chararray,
 transfer_jreast: int, transfer_subway: int, transfer_other: int,
 location_ku: chararray);
grunt> Y2 = FILTER Y1 BY location_ku MATCHES 'Shibuya';
grunt> DUMP Y2;
...Mapreduce...
ine.mapReduceLayer.MapReduceLauncher - Success!
(Ebisu,2,1,0,Shibuya)
(Shibuya,2,3,3,Shibuya)
(Harajuku,0,1,0,Shibuya)
(Yoyogi,1,1,0,Shibuya)
grunt> QUIT;


3. Hive

3-1. : Linuxroot


apt-get -y install hadoop-hive


3-2. 

()

echo -e "apple\0001red\0001100\0012lemon\0001yellow\0001120\0012orange\0001orange\000160" > /var/tmp/data1


hive

hive

Hive history file=/tmp/root/hive_job_log_root_200912092154_480111727.txt
hive> 


hiveSQL
SHOW TABLES

hive> CREATE TABLE fruits (name STRING, color STRING, price INT);
OK
Time taken: 27.504 seconds
hive> SHOW TABLES;
OK
fruits
Time taken: 0.288 seconds

()

hive> LOAD DATA LOCAL INPATH '/var/tmp/data1' OVERWRITE INTO TABLE fruits;
Copying data from file:/var/tmp/data1
Loading data to table fruits
OK
Time taken: 1.104 seconds
hive> SELECT * FROM fruits;
apple red 100
lemon yellow 120
orange orange 60
Time taken: 0.665 seconds

(80)

hive> SELECT * FROM fruits WHERE price < 80;
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_200912092139_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_200912092139_0003
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=localhost:8021 -kill job_200912092139_0003
2009-12-09 09:59:52,112 map = 0%, reduce =0%
...Mapreduce...
2009-12-09 10:00:34,989 map = 100%, reduce =100%
Ended Job = job_200912092139_0003
OK
orange orange 60
Time taken: 48.499 seconds
hive> QUIT;


JOIN


echo -e "pomme\0001apple\0012citron\0001lemon\0012orange\0001orange" > /var/tmp/data2

()

hive> CREATE TABLE enfr (fr STRING, en STRING);
OK
Time taken: 18.98 seconds
hive> LOAD DATA LOCAL INPATH '/var/tmp/data2' OVERWRITE INTO TABLE enfr;
Copying data from file:/var/tmp/data2
Loading data to table enfr
OK
Time taken: 1.115 seconds
hive> SELECT * FROM enfr;
OK
pomme apple
citron lemon
orange orange
Time taken: 0.875 seconds

JOIN(80)

SELECT e.fr, f.price FROM fruits f JOIN enfr e ON f.name = e.en WHERE f.price > 80;
Total MapReduce jobs = 1
...Mapreduce...
2009-12-15 10:03:10,274 map = 100%, reduce =100%
Ended Job = job_200912152135_0004
OK
pomme 100
citron 120
Time taken: 132.47 seconds
hive> 

使

fruits
namecolorprice
applered100
lemonyellow120
orangeorange20

enfr
fren
pommeapple
citronlemon
orangeorange



HDFS/user/hive/warehouse/

4. HUE(Hadoop User Experience)

HUECloudera DesktopCDH3beta3CDH3()

4-1. : Linuxroot

4-1-1.


apt-get -y install hue hue-plugins


4-1-3. 

(hadoop-0.20-conf-pseudo-hue)使/etc/hue/hue.ininamenode_hostjobtracker_host


/etc/hadoop/conf.pseudo.clouderadesktop/hdfs-site.xml
(<configuration></configuration>)

<property>
<name>dfs.namenode.plugins</name>
<value>org.apache.hadoop.thriftfs.NamenodePlugin</value>
<description>Comma-separated list of namenode plug-ins to be activated.
</description>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value>org.apache.hadoop.thriftfs.DatanodePlugin</value>
<description>Comma-separated list of datanode plug-ins to be activated.
</description>
</property>
<property>
<name>dfs.thrift.address</name>
<value>0.0.0.0:9090</value>
</property>

/etc/hadoop/conf.pseudo.clouderadesktop/mapred-site.xml
(<configuration></configuration>)

<property>
<name>jobtracker.thrift.address</name>
<value>0.0.0.0:9290</value>
</property>
<property>
<name>mapred.jobtracker.plugins</name>
<value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
<description>Comma-separated list of jobtracker plug-ins to be activated.
</description>
</property>

4-1-2. 

/etc/init.d/hadoop-0.20-namenode restart
/etc/init.d/hadoop-0.20-datanode restart
/etc/init.d/hadoop-0.20-jobtracker restart
/etc/init.d/hadoop-0.20-tasktracker restart
/etc/init.d/hue start


4-1-3. firewall

tcp8088

4-2. 使

Web(Firefox)
http://:8088/



username=hadoop, password=hadoop


(usernamepassword)
()西Unicode







HDFS()
MapReduce(,)
Cloudera Desktop
Beeswax for Hive(Hive)()

4-3. Hive(Beeswax for Hive)

Hive


HDFS


QUERY


--

2 件のコメント:


  1. 使

    HiveSQL
    Hbase
    install: cannot change owner and permissions of `/usr/lib/hbase/pids': No such file or directory


    返信削除
  2. もりやすさん、コメントありがとうございます。この度、本記事のCDH3beta3対応で動作確認していたら、/usr/lib/hbase/pidsの問題が発生しました。とりあえず、手で/var/run/hbaseディレクトリを作れば動くようでした。が、このディレクトリ、なんぞのタイミングで消えるようで。

    返信削除