FP7 Logo

EC Project 257859

European Union

Co-funded by the 
European Union

ROBUST Partner TUB: Publications at ACM RecSys and CIKM

ROBUST Partner TU Berlin has published research papers at this years Conference on Information and Knowledge Management (CIKM) and the 7th Recommender Systems Conference (RecSys). The paper titled “All Roads Lead to Rome:” Optimistic Recovery for Distributed Iterative Data Processing" [1] proposes an optimistic recovery mechanism using algorithmic compensations. Executing data-parallel iterative algorithms on large datasets is crucial for many
advanced analytical applications like community analytics. In the absence of failures the proposed optimistic recovery scheme is able to outperform a pessimistic approach by a factor of two to five. In presence of failures, the approach provides fast recovery and outperforms pessimistic approaches in the majority of cases. The second paper titled “Distributed Matrix Factorization with MapReduce using a series of Broadcast-Joins” [2] presents efficient, distributed
factorization of large matrices on clusters of commodity  machines, which is crucial to applying latent factor models in industrial-scale recommender systems. The paper outlines an efficient, data-parallel low-rank matrix factorization with Alternating Least Squares which uses a series of broadcast-joins that can be efficiently executed with MapReduce. The paper also reports on experiments on two publicly available datasets and on an artificial dataset termed Bigflix , generated from the Netflix dataset. Bigflix contains 25~million users and more than 5~billion ratings, mimicking data sizes recently reported as Netflix' production workload. It is demonstrated that the proposed approach is able to run an iteration of Alternating Least Squares in six minutes on this dataset. The implementation has also been contributed to
the open source machine learning library Apache Mahout.

[1] All Roads Lead to Rome: Optimistic Recovery for Distributed Iterative Data Processing Sebastian Schelter, Stephan Ewen, Kostas Tzoumas, and Volker Markl. CIKM

[2] Distributed Matrix Factorization with MapReduce using a series of Broadcast-Joins
Sebastian Schelter, Christoph Boden, Martin Schenck, Alexander Alexandrov, and Volker Markl. RecSys (2013)