Workshop: Hands-on massive data processing tools and platforms

Speaker: Pili Hu and Charlie Chen (Hong Kong)

Community: Open Innovation Lab, CUHK.

Language: English (With English Slides)

Category: Cloud

Tag: Data Science Cloud

Beginners
Photo of Pili HU

About Speaker - Pili Hu

Pili Hu is MPhil graduate from Department of Information Engineering, The Chinese University of Hong Kong. He researches on Decentralized Social Networks in CUHK. Before joining CUHK, he worked in Baidu as a search engine algorithm R/D engineer. He is co-initiator and curator of Code4HK ( code4.hk ); Co-founder and CEO of HyperLab ( hyperlab.io ); Co-initiator and curator of Open Innovation Lab ( cuhkoil.org ). Speaker of HKOSC’13, BlackHat’14, BarCamp’14, etc.

Photo of Charlie CHEN

About Speaker - Charlie Chen

Mr. Charlie Chen is both a law school student and a programmer in UDomain Web Hosting Company Ltd with experience coding in six programming languages among which Scala is his favorite.

About the Topic

Several CUHK research students will present this workshop together. By hands-on exercising 3 currently widely used platform, we give the audience a deeper understanding of de facto standard massive data processing frameworks. Participants are required to have mastering knowledge of Unix/Linux; general understanding of programming and algorithm; better to know basics of Docker (used to distribute OS images). We plan to take the whole afternoon.

  1. Hadoop (batch framework; MapReduce in v1; DAG works in v2)
  2. Spark (batch framework; Iterative computations; using Scala or Python)
  3. GraphLab (graph framework; common ML algorithms in GL)