Abstract
Continuous query processing in data stream management systems (DSMS) has received considerable attention recently. Many applications share the same need for processing data streams in a continuous fashion. For most distributed streaming applications, the centralized processing of continuous queries over distributed data is simply not viable. This paper addresses the problem of computing approximate answers to continuous join queries over distributed data streams. We present a new method, called DHTJoin, which combines hash-based placement of tuples in a Distributed Hash Table (DHT) and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incurring little overhead. DHTJoin also deals with join attribute value skew which may hurt load balancing and result completeness. We provide a performance evaluation of DHTJoin which shows that it can achieve significant performance gains in terms of network traffic.
Original language | English |
---|---|
Pages (from-to) | 291-317 |
Number of pages | 27 |
Journal | Distributed and Parallel Databases |
Volume | 26 |
Issue number | 2-3 |
DOIs | |
State | Published - Dec 2009 |
Externally published | Yes |
Keywords
- Continuous join queries
- DHT networks
- Data stream management
- Distributed query execution
- Load balancing
- Result completeness