We are living in the era of massive data. Some estimate that more than 90% of the worlds data was generated in recent years. This large amount of data comes with technological challenges such as: distributed storage systems, synchronization issues, density constraints, power constraints and more.
The talk will be partitioned into two, independent parts in which some solutions to those challenges will be discussed. Specifically, we will touch upon dynamical distributed storage systems, and Polya string models.
In the first part of the talk we will discuss dynamical distributed storage. In distributed storage systems the data is partitioned and the parts are saved in several storage units, each part per unit. The storage units are unreliable and prone to failures. Upon such failure, the system repairs the failed unit using the information stored in other units. This repair process incurs in-network traffic. The problem of node repair based on erasure coding for distributed storage aims at optimizing the tradeoff of in-network traffic and storage overhead. The existing body of works focuses on the failure of a unit (or several units) and the ensuing reconstruction process, but puts less emphasis on the time evolution of the entire network and the inherent stochastic nature of unit failures. In the talk, we will present a dynamical version of distributed storage systems and will build upon the so called fixed cost model, suggested by Akhlaghi et al. in 2010. We will estimate the increment of the file size that can be reliably stored over the network.
In the second part of the talk we will discuss Polya string models. A motivating application for studying those models comes from in-vivo DNA-based storage systems. In those systems information is stored inside a living organism in order to be read from its decedents. Unfortunately, due to the DNA replication process, this information is prone to many kinds of errors, including insertions, deletions, and duplication errors. In the talk we will cover the case of duplication errors and specifically will focus on end-duplications. The end-duplication model assumes that at each step, a bit from the sequence is chosen uniformly at random, copied, and placed at the end of the sequence. Thus, at each step the length of the sequence is increased. We will present the entropy rate of the end-duplication model and of the noisy end-duplication (in which the bit that is placed at the end is passed through a binary channel).
Ohad Elishco is a post-doctoral researcher at the department of electrical engineering, University of Maryland at College Park. He received his B.Sc. in mathematics and his B.Sc. in electrical engineering in 2012 from Ben-Gurion University of the Negev, Israel; the M.Sc. and Ph.D degrees in electrical engineering from Ben-Gurion University if the Negev, Israel, in 2013, 2017 respectively. Between 2017-2018 he was a post-doctoral researcher in the department of electrical engineering, Massachusetts institute of technology. His research interests include coding theory and dynamical systems.