Distributed Data Management (WT 2018/19) - tele-TASK

Distributed Data Management (WT 2018/19) - tele-TASK

26 episodes

The free lunch is over! Computer systems up until the turn of the century became constantly faster without any particular effort simply because the hardware they were running on increased its clock speed with every new release. This trend has changed and today's CPUs stall at around 3 GHz. The size of modern computer systems in terms of contained transistors (cores in CPUs/GPUs, CPUs/GPUs in compute nodes, compute nodes in clusters), however, still increases constantly. This caused a paradigm shift in writing software: instead of optimizing code for a single thread, applications now need to solve their given tasks in parallel in order to expect noticeable performance gains. Distributed computing, i.e., the distribution of work on (potentially) physically isolated compute nodes is the most extreme method of parallelization. Big Data Analytics is a multi-million dollar market that grows constantly! Data and the ability to control and use it is the most valuable ability of today's computer systems. Because data volumes grow so rapidly and with them the complexity of questions they should answer, data analytics, i.e., the ability of extracting any kind of information from the data becomes increasingly difficult. As data analytics systems cannot hope for their hardware getting any faster to cope with performance problems, they need to embrace new software trends that let their performance scale with the still increasing number of processing elements. In this lecture, we take a look a various technologies involved in building distributed, data-intensive systems. We discuss theoretical concepts (data models, encoding, replication, ...) as well as some of their practical implementations (Akka, MapReduce, Spark, ...). Since workload distribution is a concept which is useful for many applications, we focus in particular on data analytics.

Podcasts

Lecture Summary

Published: Feb. 5, 2019, 9:15 a.m.
Duration: 1 hour 32 minutes 33 seconds

Listed in: Education

Distributed Query Optimization (1)

Published: Jan. 22, 2019, 1:30 p.m.
Duration: 1 hour 26 minutes 3 seconds

Listed in: Education

Distributed Query Optimization (2)

Published: Jan. 22, 2019, 9:15 a.m.
Duration: 1 hour 23 minutes 23 seconds

Listed in: Education

Processing Streams

Published: Jan. 15, 2019, 9:15 a.m.
Duration: 1 hour 33 minutes 58 seconds

Listed in: Education

Stream Processing

Published: Jan. 14, 2019, 1:30 p.m.
Duration: 1 hour 27 minutes 31 seconds

Listed in: Education

Transactions

Published: Jan. 8, 2019, 9:15 a.m.
Duration: 1 hour 29 minutes 57 seconds

Listed in: Education

Consistency and Consensus

Published: Jan. 7, 2019, 1:30 p.m.
Duration: 1 hour 30 minutes 2 seconds

Listed in: Education

Distributed Systems

Published: Dec. 18, 2018, 9:15 a.m.
Duration: 1 hour 33 minutes 19 seconds

Listed in: Education

Spark - Hands On

Published: Dec. 17, 2018, 1:30 p.m.
Duration: 1 hour 28 minutes 41 seconds

Listed in: Education

Apache Spark

Published: Dec. 11, 2018, 9:15 a.m.
Duration: 1 hour 29 minutes 38 seconds

Listed in: Education

Beyond MapReduce

Published: Dec. 10, 2018, 1:30 p.m.
Duration: 1 hour 29 minutes 41 seconds

Listed in: Education

Distributed File Systems and MapReduce

Published: Dec. 4, 2018, 9:15 a.m.
Duration: 1 hour 27 minutes 15 seconds

Listed in: Education

Batch Processing

Published: Dec. 3, 2018, 1:30 p.m.
Duration: 1 hour 29 minutes 20 seconds

Listed in: Education

Partitioning

Published: Nov. 27, 2018, 9:15 a.m.
Duration: 1 hour 19 minutes 6 seconds

Listed in: Education

Replication

Published: Nov. 26, 2018, 1:30 p.m.
Duration: 1 hour 26 minutes 56 seconds

Listed in: Education

Storage and Retrieval

Published: Nov. 20, 2018, 9:15 a.m.
Duration: 1 hour 27 minutes 1 second

Listed in: Education

Data Models and Query Languages

Published: Nov. 13, 2018, 9:15 a.m.
Duration: 1 hour 24 minutes 5 seconds

Listed in: Education

Patterns

Published: Nov. 12, 2018, 1:30 p.m.
Duration: 1 hour 29 minutes

Listed in: Education

Akka Actor-Programming Part 2

Published: Nov. 6, 2018, 9:15 a.m.
Duration: 1 hour 29 minutes 57 seconds

Listed in: Education

Akka Actor-Programming Hands-on

Published: Nov. 5, 2018, 1:30 p.m.
Duration: 1 hour 28 minutes 43 seconds

Listed in: Education

Models of Dataflow

Published: Oct. 30, 2018, 9:15 a.m.
Duration: 1 hour 27 minutes 18 seconds

Listed in: Education

Encoding and Evolution

Published: Oct. 29, 2018, 1:30 p.m.
Duration: 1 hour 11 minutes 8 seconds

Listed in: Education

Data Warehouses

Published: Oct. 23, 2018, 9:15 a.m.
Duration: 1 hour 17 minutes 7 seconds

Listed in: Education

Distributed DBMS

Published: Oct. 22, 2018, 1:30 p.m.
Duration: 1 hour 29 minutes 1 second

Listed in: Education

Foundations

Published: Oct. 16, 2018, 9:15 a.m.
Duration: 1 hour 23 minutes 6 seconds

Listed in: Education

Introduction

Published: Oct. 15, 2018, 1:30 p.m.
Duration: 1 hour 12 minutes 41 seconds

Listed in: Education