Accommodating bursts in distributed stream processing systems
There has been an explosion of innovation in open source stream processing over the past few years.Frameworks such as Apache Spark and Apache Storm give developers stream abstractions on which they can develop applications; Apache Beam provides an API abstraction, enabling developers to write code independent of the underlying framework, while tools such as Apache Ni Fi and Stream Sets Data Collector provide a user interface abstraction, allowing data engineers to define data flows from high-level building blocks with little or no coding.Next, we devise super-operator (SO) that load balances multi-degree operator replicas.Moreover, for improving the fault-tolerance of the system, we color the SOs based on a coloring bin-packing (CBP) model that assigns peer operator replicas to different servers.Abstract—In the recent years we have witnessed a prolif-eration of distributed stream processing systems that need to operate efficiently, even when data bursts occur.
Pravega enables the ingestion capacity of a stream to grow and shrink according to workload and sends signals downstream to enable Flink to scale accordingly.We present an economical and fault-tolerant load balancing strategy (EFTLBS) based on an operator replication mechanism and a load shedding method, that fully utilizes the network resources to realize continuous and highly-available data stream processing without dynamic operator migration over wide area networks.In this paper, we first design an economical operator distribution (EOD) plan based on a bin-packing model under the constraints of each stream bandwidth as well as each server's CPU capacity.ERIC Archive, Business, Case Studies, College Administration, College Planning, Computer Oriented Programs, Data Processing, Higher Education, Industry, Information Centers, Information Networks, Library Networks, Management Information Systems, Microcomputers, Office Management, Technological Advancement, Technology Transfer Proceedings of the 1984 CAUSE conference on information management and new technologies are presented.Contents include 49 papers covering seven subject areas: issues in higher education, managing the information resource, innovative technologies, office automation/networking, microcomputer issues and applications, promises and perils of technology, and applications.
Search for accommodating bursts in distributed stream processing systems:
He holds a Ph D in computer science from the University of California, San Diego and is interested in various aspects of distributed systems, including distributed algorithms, concurrency, and scalability.