Zeppelin為0.5.6
Zeppelin默認(rèn)自帶本地spark,可以不依賴任何集群,下載bin包,解壓安裝就可以使用。
使用其他的spark集群在yarn模式下。
配置:
vi zeppelin-env.sh
添加:
export SPARK_HOME=/usr/crh/current/spark-clientexport SPARK_SUBMIT_OPTIONS="--driver-memory 512M --executor-memory 1G"export HADOOP_CONF_DIR=/etc/hadoop/conf
Zeppelin InterPReter配置
注意:設(shè)置完重啟解釋器。
Properties的master屬性如下:
新建Notebook
Tips:幾個月前zeppelin還是0.5.6,現(xiàn)在最新0.6.2,zeppelin 0.5.6寫notebook時前面必須加%spark,而0.6.2若什么也不加就默認(rèn)是scala語言。
zeppelin 0.5.6不加就報如下錯:
Connect to 'databank:4300' failed%spark.sqlselect count(*) from tc.gjl_test0報錯:
com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope) at [Source: {"id":"2","name":"ConvertToSafe"}; line: 1, column: 1] at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148) at com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843) at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533) at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220) at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143) at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409) at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358) at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265) at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245) at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143) at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439) at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666) at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578) at org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85) at org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136) at org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.ConvertToSafe.doExecute(rowFormatConverters.scala:56) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:187) at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505) at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375) at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1374) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1374) at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1456) at sun.reflect.NativeMethodaccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297) at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300) at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
原因:
進(jìn)入/opt/zeppelin-0.5.6-incubating-bin-all目錄下:
# ls lib |grep jacksonjackson-annotations-2.5.0.jarjackson-core-2.5.3.jarjackson-databind-2.5.3.jar將里面的版本換成如下版本:
# ls lib |grep jacksonjackson-annotations-2.4.4.jarjackson-core-2.4.4.jarjackson-databind-2.4.4.jar測試成功!
參考網(wǎng)站
Sparksql也可直接通過hive jdbc連接,只需換端口,如下圖:
zeppelin主要有以下功能
數(shù)據(jù)提取
數(shù)據(jù)發(fā)現(xiàn)
數(shù)據(jù)分析
數(shù)據(jù)可視化
目前版本(0.5-0.6)之前支持的數(shù)據(jù)搜索引擎有如下
安裝
環(huán)境 centOS 6.6
編譯準(zhǔn)備工作
sudo yum updatesudo yum install openjdk-7-jdksudo yum install gitsudo yum install npm下載源碼
git clone https://github.com/apache/incubator-zeppelin.git編譯,打包
cd incubator-zeppelin#build for spark 1.4.x ,hadoop 2.4.xmvn clean package -Pspark-1.4 -Dhadoop.version=2.4.0 -Phadoop-2.4 -DskipTests -P build-distr
結(jié)果會生成在
zeppelin-distribution/target
下解壓
tar -zxvf zeppelin-0.6.0-incubating-SNAPSHOT.tar.gz修改配置,在zeppelin-site.xml中可以修改端口號等信息,zeppelin-env.sh中修改一些啟動環(huán)境變量。
cp zeppelin-site.xml.template zeppelin-site.xmlcp zeppelin-env.sh.template zeppelin-env.sh啟動zeppelin
./bin/zeppelin-daemon.sh start關(guān)閉zeppelin(記得要用命令關(guān)閉,不然你很可能再也起不來,別問我怎么知道的。)
./bin/zeppelin-daemon.sh stopweb ui
安裝環(huán)節(jié)至此結(jié)束,后續(xù)使用篇主要是hive與spark-sql的可視化使用,有時間將慢慢添加。
1.首先我們要下載zeppelin的壓縮包,當(dāng)我們解壓之后(這一臺主機上面已經(jīng)安裝過了java的環(huán)境)
2.修改配置環(huán)境
進(jìn)入conf/
將zeppelin-env.sh.template修改為zeppelin-env.sh
將zeppelin-site.xml.template修改為zeppelin-site.xml
然后我們接下來修改conf/zeppelin-env.sh新增
export SPARK_MASTER_IP=192.168.109.136
export SPARK_LOCAL_IP=192.168.109.136
3.啟動zeppelin
進(jìn)入zeppelin:進(jìn)入bin目錄下執(zhí)行./zeppelin-daemon.sh start
然后瀏覽器訪問192.168.109.136:8080進(jìn)入界面
此時就啟動成功
4.zeppelin簡單實用
1.text
2.html
3.table
5.可以對數(shù)據(jù)進(jìn)行分析
對于我做的最多的分析,就是基于學(xué)校的那個資料,我有學(xué)校里面的信息,這個里面的每一行的信息是以","
進(jìn)行分隔,這個其中里面的民族,此時我們對這個民族進(jìn)行分析
由于我們這個zeppelin是在linux里面的啟動,所以我們必須把原有的數(shù)據(jù)放到linux的里面,此時zeppelin讀的文件目錄是linux里面的目錄
則此時我們就可以對數(shù)據(jù)庫里面的東西進(jìn)行視圖分析,我們通過這個數(shù)據(jù),我們發(fā)現(xiàn)通過讀取數(shù)據(jù)
,以分組的方式,然后在查詢數(shù)據(jù)有多少個,這樣就可以對數(shù)據(jù)進(jìn)行顯示
a.
val text = sc.textFile("/tmp/xjdx.txt")case class Person(college:String,time:Integer)val rdd1 = text.map(line =>{ val fields = line.split(",") if(fields.length >=10){ val mz = fields(10) Person(mz,1) }else{ Person("1",1) }})
b.
rdd1.toDF().registerTempTable("rdd1")c.
%sql select college,count(1) from rdd1 group by college?
新聞熱點
疑難解答