subscribe to arXiv mailings
arXiv:2109.05344 ?[ pdf ]? cs.DL cs.IT Viewing citation trend of Indian physics and astronomy research papers since 2005 through the lens of some new indicators Authors: Gopinath Das , Bidyarthi Dutta , Anup Kumar Das Abstract : The indicator Citation Swing Factor (CSF) has recently been developed to measure this diffusion process quantitatively on the basis of h-core citations, excess citations and total citations. The observed or experimental value of CSF as followed from the basic definition is (dθ/dε), which resulted (-R3/he2) based on a theoretical calculation, where R2, h2 and e2 indicate total citations, h-core cit… ▽ More The indicator Citation Swing Factor (CSF) has recently been developed to measure this diffusion process quantitatively on the basis of h-core citations, excess citations and total citations. The observed or experimental value of CSF as followed from the basic definition is (dθ/dε), which resulted (-R3/he2) based on a theoretical calculation, where R2, h2 and e2 indicate total citations, h-core citations and excess citations respectively. The later expression indicates the expected or theoretical value of CSF. This paper found out (dθ/dε) for Indian physics research output appeared in selective Indian journals since 2005 to 2020 and compared it with the respective theoretical values. The average error over entire time span is found 2.26% indicating close proximity between theoretically expected and practically observed values. Besides, three other scientometric indicators are introduced here, viz. Time-Normalised Total Cited Ratio (TC), Time-Normalised Cited Uncited Ratio (CU) and Time-Normalised Total Uncited Ratio (TU). The numerical values of these indicators are found out for the same sample and the temporal variations along with their mutual interrelationships are determined by regression analysis. △ Less Submitted 11 September, 2021; originally announced September 2021. Comments: 9 pages, 3 figures arXiv:2109.02186 ?[ pdf , other ]? cs.NI eess.SY Achieving QoS for Real-Time Bursty Applications over Passive Optical Networks Authors: Dibbendu Roy , Aravinda S. Rao , Tansu Alpcan , Goutam Das , Marimuthu Palaniswami Abstract : Emerging real-time applications such as those classified under ultra-reliable low latency (uRLLC) generate bursty traffic and have strict Quality of Service (QoS) requirements. Passive Optical Network (PON) is a popular access network technology, which is envisioned to handle such applications at the access segment of the network. However, the existing standards cannot handle strict QoS constraint… ▽ More Emerging real-time applications such as those classified under ultra-reliable low latency (uRLLC) generate bursty traffic and have strict Quality of Service (QoS) requirements. Passive Optical Network (PON) is a popular access network technology, which is envisioned to handle such applications at the access segment of the network. However, the existing standards cannot handle strict QoS constraints. The available solutions rely on instantaneous heuristic decisions and maintain QoS constraints (mostly bandwidth) in an average sense. Existing works with optimal strategies are computationally complex and are not suitable for uRLLC applications. This paper presents a novel computationally-efficient, far-sighted bandwidth allocation policy design for facilitating bursty traffic in a PON framework while satisfying strict QoS (age of information/delay and bandwidth) requirements of modern applications. To this purpose, first we design a delay-tracking mechanism which allows us to model the resource allocation problem from a control-theoretic viewpoint as a Model Predictive Control (MPC). MPC helps in taking far-sighted decisions regarding resource allocations and captures the time-varying dynamics of the network. We provide computationally efficient polynomial-time solutions and show its implementation in the PON framework. Compared to existing approaches, MPC reduces delay violations by approximately 15% for a delay-constrained application of 1ms target. Our approach is also robust to varying traffic arrivals. △ Less Submitted 5 September, 2021; originally announced September 2021. arXiv:2108.08235 ?[ pdf , ps , other ]? cs.IT eess.SP Adaptive Rate NOMA for Cellular IoT Networks Authors: G. Sreya , S. Saigadha , Praful D. Mankar , Goutam Das , Harpreet S. Dhillon Abstract : Internet-of-Things (IoT) technology is envisioned to enable a variety of real-time applications by interconnecting billions of sensors/devices deployed to observe some random physical processes. These IoT devices rely on low-power wide-area wireless connectivity for transmitting, mostly fixed- but small-size, status updates of their associated random processes. The cellular networks are seen as a… ▽ More Internet-of-Things (IoT) technology is envisioned to enable a variety of real-time applications by interconnecting billions of sensors/devices deployed to observe some random physical processes. These IoT devices rely on low-power wide-area wireless connectivity for transmitting, mostly fixed- but small-size, status updates of their associated random processes. The cellular networks are seen as a natural candidate for providing reliable wireless connectivity to IoT devices. However, the conventional orthogonal multiple access (OMA) to these massive number of devices is expected to degrade the spectral efficiency. As a promising alternative to OMA, the cellular base stations (BSs) can employ non-orthogonal multiple access (NOMA) for the uplink transmissions of mobile users and IoT devices. In particular, the uplink NOMA can be configured such that the mobile user can adapt transmission rate based on its channel condition while the IoT device transmits at a fixed rate. For this setting, we analyze the ergodic capacity of mobile users and the mean local delay of IoT devices using stochastic geometry. Our analysis demonstrates that the above NOMA configuration can provide better ergodic capacity for mobile users compare to OMA when IoT devices' delay constraint is strict. Furthermore, we also show that NOMA can support a larger packet size for IoT devices than OMA under the same delay constraint. △ Less Submitted 18 August, 2021; originally announced August 2021. arXiv:2105.09313 ?[ pdf , ps , other ]? cs.CG cs.DS Approximation Algorithms For The Dispersion Problems in a Metric Space Authors: Pawan K. Mishra , Gautam K. Das Abstract : In this article, we consider the $c$-dispersion problem in a metric space $(X,d)$. Let $P=\{p_{1}, p_{2}, \ldots, p_{n}\}$ be a set of $n$ points in a metric space $(X,d)$. For each point $p \in P$ and $S \subseteq P$, we define $cost_{c}(p,S)$ as the sum of distances from $p$ to the nearest $c $ points in $S \setminus \{p\}$, where $c\geq 1$ is a fixed integer. We define… ▽ More In this article, we consider the $c$-dispersion problem in a metric space $(X,d)$. Let $P=\{p_{1}, p_{2}, \ldots, p_{n}\}$ be a set of $n$ points in a metric space $(X,d)$. For each point $p \in P$ and $S \subseteq P$, we define $cost_{c}(p,S)$ as the sum of distances from $p$ to the nearest $c $ points in $S \setminus \{p\}$, where $c\geq 1$ is a fixed integer. We define $cost_{c}(S)=\min_{p \in S}\{cost_{c}(p,S)\}$ for $S \subseteq P$. In the $c$-dispersion problem, a set $P$ of $n$ points in a metric space $(X,d)$ and a positive integer $k \in [c+1,n]$ are given. The objective is to find a subset $S\subseteq P$ of size $k$ such that $cost_{c}(S)$ is maximized. We propose a simple polynomial time greedy algorithm that produces a $2c$-factor approximation result for the $c$-dispersion problem in a metric space. The best known result for the $c$-dispersion problem in the Euclidean metric space $(X,d)$ is $2c^2$, where $P \subseteq \mathbb{R}^2$ and the distance function is Euclidean distance [ Amano, K. and Nakano, S. I., Away from Rivals, CCCG, pp.68-71, 2018 ]. We also prove that the $c$-dispersion problem in a metric space is $W[1]$-hard. △ Less Submitted 9 June, 2021; v1 submitted 19 May, 2021; originally announced May 2021. Comments: 9. arXiv admin note: text overlap with arXiv:2105.09217 arXiv:2105.09217 ?[ pdf , other ]? cs.CG cs.DS Approximation Algorithms For The Euclidean Dispersion Problems Authors: Pawan K. Mishra , Gautam K. Das Abstract : In this article, we consider the Euclidean dispersion problems. Let $P=\{p_{1}, p_{2}, \ldots, p_{n}\}$ be a set of $n$ points in $\mathbb{R}^2$. For each point $p \in P$ and $S \subseteq P$, we define $cost_γ(p,S)$ as the sum of Euclidean distance from $p$ to the nearest $γ$ point in $S \setminus \{p\}$. We define $cost_γ(S)=\min_{p \in S}\{cost_γ(p,S)\}$ for $S \subseteq P$. In the $γ$-dispersio… ▽ More In this article, we consider the Euclidean dispersion problems. Let $P=\{p_{1}, p_{2}, \ldots, p_{n}\}$ be a set of $n$ points in $\mathbb{R}^2$. For each point $p \in P$ and $S \subseteq P$, we define $cost_γ(p,S)$ as the sum of Euclidean distance from $p$ to the nearest $γ$ point in $S \setminus \{p\}$. We define $cost_γ(S)=\min_{p \in S}\{cost_γ(p,S)\}$ for $S \subseteq P$. In the $γ$-dispersion problem, a set $P$ of $n$ points in $\mathbb{R}^2$ and a positive integer $k \in [γ+1,n]$ are given. The objective is to find a subset $S\subseteq P$ of size $k$ such that $cost_γ(S)$ is maximized. We consider both $2$-dispersion and $1$-dispersion problem in $\mathbb{R}^2$. Along with these, we also consider $2$-dispersion problem when points are placed on a line. In this paper, we propose a simple polynomial time $(2\sqrt 3 + ε)$-factor approximation algorithm for the $2$-dispersion problem, for any $ε> 0$, which is an improvement over the best known approximation factor $4\sqrt3$ [Amano, K. and Nakano, S. I., An approximation algorithm for the $2$-dispersion problem, IEICE Transactions on Information and Systems, Vol. 103(3), pp. 506-508, 2020]. Next, we develop a common framework for designing an approximation algorithm for the Euclidean dispersion problem. With this common framework, we improve the approximation factor to $2\sqrt 3$ for the $2$-dispersion problem in $\mathbb{R}^2$. Using the same framework, we propose a polynomial time algorithm, which returns an optimal solution for the $2$-dispersion problem when points are placed on a line. Moreover, to show the effectiveness of the framework, we also propose a $2$-factor approximation algorithm for the $1$-dispersion problem in $\mathbb{R}^2$. △ Less Submitted 19 May, 2021; originally announced May 2021. Comments: 17 arXiv:2104.12147 ?[ pdf ]? cs.NI Learning Aided Auctioning for Opportunistic Scheduling in a Wireless Optical Network Authors: Atri Mukhopadhyay , Goutam Das Abstract : This paper focusses on Service Level Agreement (SLA) based end-to-end Quality of Service (QoS) maintenance across a wireless optical integrated network. The wireless network used is Long Term Evolution/Long Term Evolution Advanced (LTE/LTE-A) mobile network and the optical network is comprised of an Ethernet Passive Optical Network (EPON). The proposal targets opportunistic allocation of any bandw… ▽ More This paper focusses on Service Level Agreement (SLA) based end-to-end Quality of Service (QoS) maintenance across a wireless optical integrated network. The wireless network used is Long Term Evolution/Long Term Evolution Advanced (LTE/LTE-A) mobile network and the optical network is comprised of an Ethernet Passive Optical Network (EPON). The proposal targets opportunistic allocation of any bandwidth that is available after meeting the SLA requirements. Such an opportunistic allocation is particularly beneficial when networking applications with varying urgency and quality of service requirements are being operated at the user end. The opportunistic allocation is carried out with the help of Vickerey-Clarke-Groves (VCG) auction. The proposal allows the users of the integrated network to decide the payment they want to make in order to opportunistically avail bandwidth. Learning automata is used for the users to converge to the optimal payment value based on the network load. The payment made by the users is later used by the optical network units of the EPON to prepare the bids for the auction. The proposal has been verified through extensive simulations. △ Less Submitted 25 April, 2021; originally announced April 2021. arXiv:2103.08875 ?[ pdf , other ]? cs.PF cs.NI Queuing Analysis of Opportunistic Cognitive Radio IoT Network with Imperfect Sensing Authors: Asif Ahmed Sardar , Dibbendu Roy , Washim Uddin Mondal , Goutam Das Abstract : In this paper, we analyze a Cognitive Radio-based Internet-of-Things (CR-IoT) network comprising a Primary Network Provider (PNP) and an IoT operator. The PNP uses its licensed spectrum to serve its users. The IoT operator identifies the white-space in the licensed band at regular intervals and opportunistically exploits them to serve the IoT nodes under its coverage. IoT nodes are battery-operate… ▽ More In this paper, we analyze a Cognitive Radio-based Internet-of-Things (CR-IoT) network comprising a Primary Network Provider (PNP) and an IoT operator. The PNP uses its licensed spectrum to serve its users. The IoT operator identifies the white-space in the licensed band at regular intervals and opportunistically exploits them to serve the IoT nodes under its coverage. IoT nodes are battery-operated devices that require periodical energy replenishment. We employ the Microwave Power Transfer (MPT) technique for its superior energy transfer efficiency over long-distance. The white-space detection process is not always perfect and the IoT operator may jeopardize the PNP's transmissions due to misdetection. To reduce the possibility of such interferences, some of the spectrum holes must remain unutilized, even when the IoT nodes have data to transmit. The IoT operator needs to decide what percentage of the white-space to keep unutilized and how to judiciously use the rest for data transmission and energy-replenishment to maintain an optimal balance between the average interference inflicted on PNP's users and the Quality-of-Service (QoS) experienced by IoT nodes. Due to the periodic nature of the spectrum-sensing process, Discrete Time Markov Chain (DTMC) method can realistically model this framework. In literature, activities of the PNP and IoT operator are assumed to be mutually exclusive, for ease of analysis. Our model incorporates possible overlaps between these activities, making the analysis more realistic. Using our model, the sustainability region of the CR-IoT network can be obtained. The accuracy of our analysis is demonstrated via extensive simulation. △ Less Submitted 16 March, 2021; originally announced March 2021. arXiv:2007.11997 ?[ pdf , other ]? cs.DS cs.CC Total Domination in Unit Disk Graphs Authors: Sangram K. Jena , Gautam K. Das Abstract : Let $G=(V,E)$ be an undirected graph. We call $D_t \subseteq V$ as a total dominating set (TDS) of $G$ if each vertex $v \in V$ has a dominator in $D$ other than itself. Here we consider the TDS problem in unit disk graphs, where the objective is to find a minimum cardinality total dominating set for an input graph. We prove that the TDS problem is NP-hard in unit disk graphs. Next, we propose an… ▽ More Let $G=(V,E)$ be an undirected graph. We call $D_t \subseteq V$ as a total dominating set (TDS) of $G$ if each vertex $v \in V$ has a dominator in $D$ other than itself. Here we consider the TDS problem in unit disk graphs, where the objective is to find a minimum cardinality total dominating set for an input graph. We prove that the TDS problem is NP-hard in unit disk graphs. Next, we propose an 8-factor approximation algorithm for the problem. The running time of the proposed approximation algorithm is $O(n \log k)$, where $n$ is the number of vertices of the input graph and $k$ is output size. We also show that TDS problem admits a PTAS in unit disk graphs. △ Less Submitted 23 July, 2020; originally announced July 2020. arXiv:2007.00287 ?[ pdf , ps , other ]? cs.IT cs.NI doi 10.1109/LCOMM.2020.3002532 On Exact Distribution of Poisson-Voronoi Area in $K$-tier HetNets with Generalized Association Rule Authors: Washim Uddin Mondal , Goutam Das Abstract : This letter characterizes the exact distribution function of a typical Voronoi area in a $K$-tier Poisson network. The users obey a generalized association (GA) rule, which is a superset of nearest base station association and maximum received power based association (with arbitrary fading) rules that are commonly adopted in the literature. Combining the Robbins' theorem and the probability genera… ▽ More This letter characterizes the exact distribution function of a typical Voronoi area in a $K$-tier Poisson network. The users obey a generalized association (GA) rule, which is a superset of nearest base station association and maximum received power based association (with arbitrary fading) rules that are commonly adopted in the literature. Combining the Robbins' theorem and the probability generating functional of a Poisson point process, we obtain the exact moments of a typical $k$-th tier Voronoi area, $k \in \{1,...,K\}$ under the GA rule. We apply this result in several special cases. For example, we prove that in multi-tier networks with the GA rule, the mean of $k$-th tier Voronoi area can exactly be expressed in a closed-form. We also obtain simplified expressions of its higher-order moments for both average and instantaneous received power based user association. In single-tier networks with exponential fading, the later association rule provides closed-form expression of the second-order moment of a typical Voronoi area. We numerically evaluate this exact expression and compare it with an approximated result. △ Less Submitted 1 July, 2020; originally announced July 2020. Journal ref: IEEE Communications Letters, 2020 arXiv:2006.15381 ?[ pdf , other ]? cs.DS cs.CC The Generalized Independent and Dominating Set Problems on Unit Disk Graphs Authors: Sangram K. Jena , Ramesh K. Jallu , Gautam K. Das , Subhas C. Nandy Abstract : In this article, we study a generalized version of the maximum independent set and minimum dominating set problems, namely, the maximum $d$-distance independent set problem and the minimum $d$-distance dominating set problem on unit disk graphs for a positive integer $d>0$. We first show that the maximum $d$-distance independent set problem and the minimum $d$-distance dominating set problem belon… ▽ More In this article, we study a generalized version of the maximum independent set and minimum dominating set problems, namely, the maximum $d$-distance independent set problem and the minimum $d$-distance dominating set problem on unit disk graphs for a positive integer $d>0$. We first show that the maximum $d$-distance independent set problem and the minimum $d$-distance dominating set problem belongs to NP-hard class. Next, we propose a simple polynomial-time constant-factor approximation algorithms and PTAS for both the problems. △ Less Submitted 27 June, 2020; originally announced June 2020. arXiv:2006.00390 ?[ pdf , other ]? cs.NI cs.GT Centralized and Decentralized Non-Cooperative Load-Balancing Games among Federated Cloudlets Authors: Sourav Mondal , Goutam Das , Elaine Wong Abstract : Edge computing servers like cloudlets from different service providers compensate scarce computational, memory, and energy resources of mobile devices, are distributed across access networks. However, depending on the mobility pattern and dynamically varying computational requirements of associated mobile devices, cloudlets at different parts of the network become either overloaded or under-loaded… ▽ More Edge computing servers like cloudlets from different service providers compensate scarce computational, memory, and energy resources of mobile devices, are distributed across access networks. However, depending on the mobility pattern and dynamically varying computational requirements of associated mobile devices, cloudlets at different parts of the network become either overloaded or under-loaded. Hence, load balancing among neighboring cloudlets appears to be an essential research problem. Nonetheless, the existing load balancing frameworks are unsuitable for low-latency applications. Thus, in this paper, we propose an economic and non-cooperative load balancing game for low-latency applications among federated neighboring cloudlets from the same as well as different service providers and heterogeneous classes of job requests. Firstly, we propose a centralized incentive mechanism to compute the pure strategy Nash equilibrium load balancing strategies of the cloudlets under the supervision of a neutral mediator. With this mechanism, we ensure that the truthful revelation of private information to the mediator is a weakly-dominant strategy for all the federated cloudlets. Secondly, we propose a continuous-action reinforcement learning automata-based algorithm, which allows each cloudlet to independently compute the Nash equilibrium in a completely distributed network setting. We critically study the convergence properties of the designed learning algorithm, scaffolding our understanding of the underlying load balancing game for faster convergence. Furthermore, through extensive simulations, we study the impacts of exploration and exploitation on learning accuracy. This is the first study to show the effectiveness of reinforcement learning algorithms for load balancing games among neighboring cloudlets. △ Less Submitted 5 May, 2021; v1 submitted 30 May, 2020; originally announced June 2020. arXiv:2005.13913 ?[ pdf , other ]? cs.CC cs.DS Liar's Domination in Unit Disk Graphs Authors: Ramesh K. Jallu , Sangram K. Jena , Gautam K. Das Abstract : In this article, we study a variant of the minimum dominating set problem known as the minimum liar's dominating set (MLDS) problem. We prove that the MLDS problem is NP-hard in unit disk graphs. Next, we show that the recent sub-quadratic time $\frac{11}{2}$-factor approximation algorithm \cite{bhore} for the MLDS problem is erroneous and propose a simple $O(n + m)$ time 7.31-factor approximation… ▽ More In this article, we study a variant of the minimum dominating set problem known as the minimum liar's dominating set (MLDS) problem. We prove that the MLDS problem is NP-hard in unit disk graphs. Next, we show that the recent sub-quadratic time $\frac{11}{2}$-factor approximation algorithm \cite{bhore} for the MLDS problem is erroneous and propose a simple $O(n + m)$ time 7.31-factor approximation algorithm, where $n$ and $m$ are the number of vertices and edges in the input unit disk graph, respectively. Finally, we prove that the MLDS problem admits a polynomial-time approximation scheme. △ Less Submitted 28 May, 2020; originally announced May 2020. arXiv:1909.10918 ?[ pdf , other ]? cs.NI Protocol design for energy efficient OLT transmitter in TWDM-PON guaranteeing SLA of up-stream and down-stream traffic Authors: Sourav Dutta , Dibbendu Roy , Goutam Das Abstract : Environmental and economic concerns promote research on designing energy-efficient Time and Wavelength Division Multiplexed Ethernet Passive Optical Network (TWDM-EPON), which is the future extension to TDM-EPON. In TDM-EPON, a plethora of research is already present to achieve energy savings at Optical Network Units (ONUs) which can easily be applied for TWDM-EPON ONUs. However, TWDM-EPON provide… ▽ More Environmental and economic concerns promote research on designing energy-efficient Time and Wavelength Division Multiplexed Ethernet Passive Optical Network (TWDM-EPON), which is the future extension to TDM-EPON. In TDM-EPON, a plethora of research is already present to achieve energy savings at Optical Network Units (ONUs) which can easily be applied for TWDM-EPON ONUs. However, TWDM-EPON provides an additional opportunity for saving energy at the Optical Line Terminal (OLT). All existing protocols have primarily been designed for saving energy at the OLT receivers. The protocols to save energy at the OLT receives depends only on the Up-Stream(US) traffic scheduling while its transmitter counterpart depends on both US and Down-Stream (DS) scheduling since the OLT transmits GATE message along with DS traffic. The US and DS scheduling have a basic difference. The MAC protocol doesn't allow scheduling of US traffic of an ONU after its REPORT arrival at multiple disjoint time slots. However, this restriction is absent for DS traffic and hence, the grant-size of an ONU can be partitioned and every part can be scheduled at different times. In this paper, we propose a method for saving energy at the OLT transmitters in TWDM-EPON while satisfying the SLAs. This includes a heuristic algorithm to partition the DS grant and schedule them. Through extensive simulations, we demonstrate that the proposed method provides a significant improvement in energy efficiency as compared to existing protocols (up to 45%). △ Less Submitted 24 September, 2019; originally announced September 2019. arXiv:1907.11416 ?[ pdf , other ]? cs.CC cs.DM math.CO On $d$-distance $m$-tuple ($\ell, r$)-domination in graphs Authors: Sangram K. Jena , Ramesh K. Jallu , Gautam K. Das Abstract : In this article, we study the $d$-distance $m$-tuple ($\ell, r$)-domination problem. Given a simple undirected graph $G=(V, E)$, and positive integers $d, m, \ell$ and $r$, a subset $V' \subseteq V$ is said to be a $d$-distance $m$-tuple ($\ell, r$)-dominating set if it satisfies the following conditions: (i) each vertex $v \in V$ is $d$-distance dominated by at least $m$ vertices in $V'$, and (ii… ▽ More In this article, we study the $d$-distance $m$-tuple ($\ell, r$)-domination problem. Given a simple undirected graph $G=(V, E)$, and positive integers $d, m, \ell$ and $r$, a subset $V' \subseteq V$ is said to be a $d$-distance $m$-tuple ($\ell, r$)-dominating set if it satisfies the following conditions: (i) each vertex $v \in V$ is $d$-distance dominated by at least $m$ vertices in $V'$, and (ii) each $r$ size subset $U$ of $V$ is $d$-distance dominated by at least $\ell$ vertices in $V'$. Here, a vertex $v$ is $d$-distance dominated by another vertex $u$ means the shortest path distance between $u$ and $v$ is at most $d$ in $G$. A set $U$ is $d$-distance dominated by a set of $\ell$ vertices means size of the union of the $d$-distance neighborhood of all vertices of $U$ in $V'$ is at least $\ell$. The objective of the $d$-distance $m$-tuple ($\ell, r$)-domination problem is to find a minimum size subset $V' \subseteq V$ satisfying the above two conditions. We prove that the problem of deciding whether a graph $G$ has (i) a 1-distance $m$-tuple ($\ell, r$)-dominating set for each fixed value of $m, \ell$, and $r$, and (ii) a $d$-distance $m$-tuple ($\ell, 2$)-dominating set for each fixed value of $d (> 1), m$, and $\ell$ of cardinality at most $k$ (here $k$ is a positive integer) are NP-complete. We also prove that for any $\varepsilon>0$, the 1-distance $m$-tuple $(\ell, r)$-domination problem and the $d$-distance $m$-tuple $(\ell,2)$-domination problem cannot be approximated within a factor of $(\frac{1}{2}- \varepsilon)\ln V$ and $(\frac{1}{4}- \varepsilon)\ln V$, respectively, unless $P = NP$. △ Less Submitted 19 April, 2021; v1 submitted 26 July, 2019; originally announced July 2019. arXiv:1905.07143 ?[ pdf , ps , other ]? cs.IT Buffer-aided Resource Allocation for a Price Based Opportunistic Cognitive Radio Network Authors: Nilanjan Biswas , Goutam Das , Priyadip Ray Abstract : In this paper, a resource allocation problem for an opportunistic cooperative cognitive radio network is considered, where cognitive radio nodes send their hard decisions to the fusion center. The fusion center plays dual role, i.e., takes the global decision (i.e., decision about the primary user's activity) as well as allocates transmission time durations among cognitive radio nodes. Revenue bas… ▽ More In this paper, a resource allocation problem for an opportunistic cooperative cognitive radio network is considered, where cognitive radio nodes send their hard decisions to the fusion center. The fusion center plays dual role, i.e., takes the global decision (i.e., decision about the primary user's activity) as well as allocates transmission time durations among cognitive radio nodes. Revenue based utility functions are considered at the fusion center and cognitive radio nodes. An optimization problem is formulated to maximize the fusion center's revenue while satisfying some well defined constraints. User selection among cognitive radio nodes is performed in order to make the optimization problem feasible. △ Less Submitted 17 May, 2019; originally announced May 2019. Comments: 32 pages, 8 figures, Journal arXiv:1903.10000 ?[ pdf , other ]? cs.DB cs.LG Approximate Query Processing using Deep Generative Models Authors: Saravanan Thirumuruganathan , Shohedul Hasan , Nick Koudas , Gautam Das Abstract : Data is generated at an unprecedented rate surpassing our ability to analyze them. The database community has pioneered many novel techniques for Approximate Query Processing (AQP) that could give approximate results in a fraction of time needed for computing exact results. In this work, we explore the usage of deep learning (DL) for answering aggregate queries specifically for interactive applica… ▽ More Data is generated at an unprecedented rate surpassing our ability to analyze them. The database community has pioneered many novel techniques for Approximate Query Processing (AQP) that could give approximate results in a fraction of time needed for computing exact results. In this work, we explore the usage of deep learning (DL) for answering aggregate queries specifically for interactive applications such as data exploration and visualization. We use deep generative models, an unsupervised learning based approach, to learn the data distribution faithfully such that aggregate queries could be answered approximately by generating samples from the learned model. The model is often compact - few hundred KBs - so that arbitrary AQP queries could be answered on the client side without contacting the database server. Our other contributions include identifying model bias and minimizing it through a rejection sampling based approach and an algorithm to build model ensembles for AQP for improved accuracy. Our extensive experiments show that our proposed approach can provide answers with high accuracy and low latency. △ Less Submitted 18 November, 2019; v1 submitted 24 March, 2019; originally announced March 2019. Comments: Accepted to ICDE 2020 as "Approximate Query Processing for Data Exploration using Deep Generative Models" arXiv:1903.09999 ?[ pdf , other ]? cs.DB cs.LG Multi-Attribute Selectivity Estimation Using Deep Learning Authors: Shohedul Hasan , Saravanan Thirumuruganathan , Jees Augustine , Nick Koudas , Gautam Das Abstract : Selectivity estimation - the problem of estimating the result size of queries - is a fundamental problem in databases. Accurate estimation of query selectivity involving multiple correlated attributes is especially challenging. Poor cardinality estimates could result in the selection of bad plans by the query optimizer. We investigate the feasibility of using deep learning based approaches for bot… ▽ More Selectivity estimation - the problem of estimating the result size of queries - is a fundamental problem in databases. Accurate estimation of query selectivity involving multiple correlated attributes is especially challenging. Poor cardinality estimates could result in the selection of bad plans by the query optimizer. We investigate the feasibility of using deep learning based approaches for both point and range queries and propose two complementary approaches. Our first approach considers selectivity as an unsupervised deep density estimation problem. We successfully introduce techniques from neural density estimation for this purpose. The key idea is to decompose the joint distribution into a set of tractable conditional probability distributions such that they satisfy the autoregressive property. Our second approach formulates selectivity estimation as a supervised deep learning problem that predicts the selectivity of a given query. We also introduce and address a number of practical challenges arising when adapting deep learning for relational data. These include query/data featurization, incorporating query workload information in a deep learning framework and the dynamic scenario where both data and workload queries could be updated. Our extensive experiments with a special emphasis on queries with a large number of predicates and/or small result sizes demonstrates that our proposed techniques provide fast and accurate selective estimates with minimal space overhead. △ Less Submitted 17 June, 2019; v1 submitted 24 March, 2019; originally announced March 2019. arXiv:1902.04004 ?[ pdf , other ]? cs.NI A 1-approximation algorithm for energy-efficient TDM-PON guaranteeing SLA of up-stream and down-stream traffic Authors: Sourav Dutta , Dibbendu Roy , Goutam Das Abstract : Economical and environmental concerns necessitate research on designing energy-efficient optical access network especially Ethernet Passive Optical Network (EPON) which is one of the most widely accepted and deployed last-mile access network. In this paper, our primary focus is on designing a protocol for saving energy at Optical Network Units (ONUs) while satisfying the Service Label Agreement (S… ▽ More Economical and environmental concerns necessitate research on designing energy-efficient optical access network especially Ethernet Passive Optical Network (EPON) which is one of the most widely accepted and deployed last-mile access network. In this paper, our primary focus is on designing a protocol for saving energy at Optical Network Units (ONUs) while satisfying the Service Label Agreement (SLA). The SLA of both Up-Stream (US) and Down-Stream (DS) traffic can be satisfied only if the EPON network can react to their instantaneous load change during sleep periods of ONUs and to the best of our knowledge, there doesn't exist any such proposal. Towards this target, we propose a mechanism that allows the Optical Line Terminal (OLT) to force ONUs to wake-up from sleep mode. Here, we demonstrate that if the OLT can distribute the active ONUs (transceivers are active) fairly among cycles then it provides a significant improvement in energy-efficiency. To achieve this, we formulate an ILP for fairly distributing active ONUs among cycles while satisfying the SLA of both US and DS traffic at the same time. A polynomial time $1$-approximation algorithm is proposed for solving this ILP. The convergence and the complexity analysis of the algorithm are also performed. Extensive simulations depict that fair distribution of ONUs reduces the power consumption and average delay figure at the same time and the reduction increases with an increment of the number of ONUs and round-trip time. △ Less Submitted 11 February, 2019; originally announced February 2019. arXiv:1812.08605 ?[ pdf , other ]? cs.NI Design of Energy-efficient EPON: a Novel Protocol Proposal and its Performance Analysis Authors: Sourav Dutta , Goutam Das Abstract : Economical and environmental concerns necessitate network engineers to focus on energy-efficient access network design. The optical network units (ONUs), being predominantly responsible for the energy consumption of Ethernet Passive Optical Network (EPON), motivates us towards designing a novel protocol for saving energy at ONU. The proposed protocol exploits different low power modes (LPM) and op… ▽ More Economical and environmental concerns necessitate network engineers to focus on energy-efficient access network design. The optical network units (ONUs), being predominantly responsible for the energy consumption of Ethernet Passive Optical Network (EPON), motivates us towards designing a novel protocol for saving energy at ONU. The proposed protocol exploits different low power modes (LPM) and opts for the suitable one using traffic prediction. This scheme provides a significant improvement of energy-efficiency especially at high load (~ 40%) over existing protocols. A better understanding of the performance and a deeper insight into several design aspects can only be addressed through a detailed mathematical analysis. The proposed protocol involves traffic prediction which infringes Markovian property. However, some pragmatic assumptions along with a proper selection of observation instances and state descriptions allow us to form a Discrete Time Markov Chain (DTMC) of the proposed algorithm. Thus, the primary objective of this paper is to propose a novel scheme for achieving energy-efficiency at ONU and mathematically analyze the performance of it with the help of a DTMC. The analysis reveals that the energy-efficiency is more sensitive to the power consumption of doze mode as compared to other LPM while the effect of sleep-to-wake-up time is minor. △ Less Submitted 20 December, 2018; originally announced December 2018. arXiv:1807.05258 ?[ pdf , other ]? cs.DB QR2: A Third-party Query Reranking Service Over Web Databases Authors: Yeshwanth D. Gunasekaran , Abolfazl Asudeh , Sona Hasani , Nan Zhang , Ali Jaoua , Gautam Das Abstract : The ranked retrieval model has rapidly become the de-facto way for search query processing in web databases. Despite the extensive efforts on designing better ranking mechanisms, in practice, many such databases fail to address the diverse and sometimes contradicting preferences of users. In this paper, we present QR2, a third-party service that uses nothing but the public search interface of a we… ▽ More The ranked retrieval model has rapidly become the de-facto way for search query processing in web databases. Despite the extensive efforts on designing better ranking mechanisms, in practice, many such databases fail to address the diverse and sometimes contradicting preferences of users. In this paper, we present QR2, a third-party service that uses nothing but the public search interface of a web database and enables the on-the-fly processing of queries with any user-specified ranking functions, no matter if the ranking function is supported by the database or not. △ Less Submitted 13 July, 2018; originally announced July 2018. Comments: 34th IEEE International Conference on Data Engineering (ICDE Demo), 2018 arXiv:1807.03128 ?[ pdf ]? cs.CV PRED18: Dataset and Further Experiments with DAVIS Event Camera in Predator-Prey Robot Chasing Authors: Diederik Paul Moeys , Daniel Neil , Federico Corradi , Emmett Kerr , Philip Vance , Gautham Das , Sonya A. Coleman , Thomas M. McGinnity , Dermot Kerr , Tobi Delbruck Abstract : Machine vision systems using convolutional neural networks (CNNs) for robotic applications are increasingly being developed. Conventional vision CNNs are driven by camera frames at constant sample rate, thus achieving a fixed latency and power consumption tradeoff. This paper describes further work on the first experiments of a closed-loop robotic system integrating a CNN together with a Dynamic a… ▽ More Machine vision systems using convolutional neural networks (CNNs) for robotic applications are increasingly being developed. Conventional vision CNNs are driven by camera frames at constant sample rate, thus achieving a fixed latency and power consumption tradeoff. This paper describes further work on the first experiments of a closed-loop robotic system integrating a CNN together with a Dynamic and Active Pixel Vision Sensor (DAVIS) in a predator/prey scenario. The DAVIS, mounted on the predator Summit XL robot, produces frames at a fixed 15 Hz frame-rate and Dynamic Vision Sensor (DVS) histograms containing 5k ON and OFF events at a variable frame-rate ranging from 15-500 Hz depending on the robot speeds. In contrast to conventional frame-based systems, the latency and processing cost depends on the rate of change of the image. The CNN is trained offline on the 1.25h labeled dataset to recognize the position and size of the prey robot, in the field of view of the predator. During inference, combining the ten output classes of the CNN allows extracting the analog position vector of the prey relative to the predator with a mean 8.7% error in angular estimation. The system is compatible with conventional deep learning technology, but achieves a variable latency-power tradeoff that adapts automatically to the dynamics. Finally, investigations on the robustness of the algorithm, a human performance comparison and a deconvolution analysis are also explored. △ Less Submitted 2 July, 2018; originally announced July 2018. Comments: 8 pages Journal ref: IEEE EBCCSP 2018 arXiv:1804.02831 ?[ pdf , other ]? cs.IT A Novel Geometry-based Stochastic Double Directional Analytical Model for Millimeter Wave Outdoor NLOS Channels Authors: Rakesh R. T. , Debarati Sen , Goutam Das Abstract : Millimeter wave (mmWave) communications which essentially employ directional antennas find applications spanning from indoor short range wireless personal area networks to outdoor cellular networks. A thorough understanding of mmWave signal propagation through the wireless channel provides valuable insights for the design of such networks which in turn dictates the achievable performance limits. H… ▽ More Millimeter wave (mmWave) communications which essentially employ directional antennas find applications spanning from indoor short range wireless personal area networks to outdoor cellular networks. A thorough understanding of mmWave signal propagation through the wireless channel provides valuable insights for the design of such networks which in turn dictates the achievable performance limits. High path loss, penetration loss, and negligible signal scattering are certain distinctive features of the mmWave channel which necessitates the development of new mathematical models. Importantly, the impact of directional antennas which spatially filter multi-path components needs to be embodied as an integral part of the channel model. In this paper, we model outdoor directional non-line-of-sight mmWave channels using a combination of stochastic geometry and image theory by expressing channel parameters as joint functions of transmitter-receiver separation distance, antenna half power beamwidth, and antenna beam pointing direction. By approximating the signal propagation path due to first and second order reflections from buildings, closed form analytical expressions for average number of first order reflection components, path loss, and root-mean square delay spread of channel are derived. The accuracy of the model is verified by comparing numerically obtained results with experimental data reported by various urban outdoor peer-to-peer mmWave measurements, thus demonstrating the usefulness of the proposed analytical model for performance evaluation of mmWave communication networks. △ Less Submitted 9 April, 2018; originally announced April 2018. Comments: 14 pages arXiv:1802.10303 ?[ pdf , other ]? cs.DB RRR: Rank-Regret Representative Authors: Abolfazl Asudeh , Azade Nazi , Nan Zhang , Gautam Das , H. V. Jagadish Abstract : Selecting the best items in a dataset is a common task in data exploration. However, the concept of "best" lies in the eyes of the beholder: different users may consider different attributes more important, and hence arrive at different rankings. Nevertheless, one can remove "dominated" items and create a "representative" subset of the data set, comprising the "best items" in it. A Pareto-optimal… ▽ More Selecting the best items in a dataset is a common task in data exploration. However, the concept of "best" lies in the eyes of the beholder: different users may consider different attributes more important, and hence arrive at different rankings. Nevertheless, one can remove "dominated" items and create a "representative" subset of the data set, comprising the "best items" in it. A Pareto-optimal representative is guaranteed to contain the best item of each possible ranking, but it can be almost as big as the full data. Representative can be found if we relax the requirement to include the best item for every possible user, and instead just limit the users' "regret". Existing work defines regret as the loss in score by limiting consideration to the representative instead of the full data set, for any chosen ranking function. However, the score is often not a meaningful number and users may not understand its absolute value. Sometimes small ranges in score can include large fractions of the data set. In contrast, users do understand the notion of rank ordering. Therefore, alternatively, we consider the position of the items in the ranked list for defining the regret and propose the {\em rank-regret representative} as the minimal subset of the data containing at least one of the top-$k$ of any possible ranking function. This problem is NP-complete. We use the geometric interpretation of items to bound their ranks on ranges of functions and to utilize combinatorial geometry notions for developing effective and efficient approximation algorithms for the problem. Experiments on real datasets demonstrate that we can efficiently find small subsets with small rank-regrets. △ Less Submitted 3 March, 2018; v1 submitted 28 February, 2018; originally announced February 2018. arXiv:1712.09752 ?[ pdf , other ]? cs.DB Designing Fair Ranking Schemes Authors: Abolfazl Asudeh , H. V. Jagadish , Julia Stoyanovich , Gautam Das Abstract : Items from a database are often ranked based on a combination of multiple criteria. A user may have the flexibility to accept combinations that weigh these criteria differently, within limits. On the other hand, this choice of weights can greatly affect the fairness of the produced ranking. In this paper, we develop a system that helps users choose criterion weights that lead to greater fairness.… ▽ More Items from a database are often ranked based on a combination of multiple criteria. A user may have the flexibility to accept combinations that weigh these criteria differently, within limits. On the other hand, this choice of weights can greatly affect the fairness of the produced ranking. In this paper, we develop a system that helps users choose criterion weights that lead to greater fairness. We consider ranking functions that compute the score of each item as a weighted sum of (numeric) attribute values, and then sort items on their score. Each ranking function can be expressed as a vector of weights, or as a point in a multi-dimensional space. For a broad range of fairness criteria, we show how to efficiently identify regions in this space that satisfy these criteria. Using this identification method, our system is able to tell users whether their proposed ranking function satisfies the desired fairness criteria and, if it does not, to suggest the smallest modification that does. We develop user-controllable approximation that and indexing techniques that are applied during preprocessing, and support sub-second response times during the online phase. Our extensive experiments on real datasets demonstrate that our methods are able to find solutions that satisfy fairness criteria effectively and efficiently. △ Less Submitted 4 January, 2018; v1 submitted 27 December, 2017; originally announced December 2017. arXiv:1710.03253 ?[ pdf ]? cs.NI doi 10.1109/JSYST.2020.2991325 Low Complexity Fair Scheduling in LTE/LTE-A Uplink Involving Multiple Traffic Classes Authors: Atri Mukhopadhyay , Goutam Das Abstract : The bulk of the research on Long Term Evolution/Long Term Evolution-Advanced packet scheduling is concentrated in the downlink and the uplink is comparatively less explored. In up-link, channel aware scheduling with throughput maximization has been widely studied while considering an infinitely back-logged buffer model, which makes the investigations unrealistic. Therefore, we propose an optimal u… ▽ More The bulk of the research on Long Term Evolution/Long Term Evolution-Advanced packet scheduling is concentrated in the downlink and the uplink is comparatively less explored. In up-link, channel aware scheduling with throughput maximization has been widely studied while considering an infinitely back-logged buffer model, which makes the investigations unrealistic. Therefore, we propose an optimal uplink packet scheduling pro-cedure with realistic traffic sources. Firstly, we advocate a joint channel and buffer aware algorithm, which maximizes the actual transmitted bit-count. Thereafter, we introduce delay constraints in our algorithm to support real-time traffic. We further enhance our algorithm by incorporating the varied delay and throughput requirements demanded by mixed traffic classes. Finally, we in-troduce priority flipping to minimize bandwidth starvation of lower priority traffic in presence of higher percentage of high priority traffic. We observe that a delay constraint may render the optimization-based proposals infeasible. Therefore, to avoid infeasibility, we replace the delay constraint with delay outage minimization (DOM). DOM aims at minimizing the packet drop due to delay violation. Moreover, DOM also helps in reducing the problems to a well-known assignment problem, which can be solved by applying the Hungarian algorithm. Hence, our approach delivers an optimal allocation with low computational complexity. △ Less Submitted 15 May, 2020; v1 submitted 9 October, 2017; originally announced October 2017. Comments: in IEEE Systems Journal arXiv:1709.04616 ?[ pdf , other ]? cs.NI Effect of Transmission Impairments in CO-OFDM Based Elastic Optical Network Design Authors: Sadananda Behera , Jithin George , Goutam Das Abstract : Coherent Optical Orthogonal Frequency Division Multiplexing (CO-OFDM) based Elastic Optical Network (EON) is one of the emerging technologies being considered for next generation high data rate optical network systems. Routing and Spectrum Allocation (RSA) is an important aspect of EON. Apart from spectral fragmentation created due to spectrum continuity and contiguity constraints of RSA, transmis… ▽ More Coherent Optical Orthogonal Frequency Division Multiplexing (CO-OFDM) based Elastic Optical Network (EON) is one of the emerging technologies being considered for next generation high data rate optical network systems. Routing and Spectrum Allocation (RSA) is an important aspect of EON. Apart from spectral fragmentation created due to spectrum continuity and contiguity constraints of RSA, transmission impairments such as shot noise, amplified spontaneous emission (ASE) beat noise due to coherent detection, crosstalk in cross-connect (XC), nonlinear interference, and filter narrowing, limit the transmission reach of optical signals in EON. This paper focuses on the cross-layer joint optimization of delay-bandwidth product, fragmentation and link congestion for RSA in CO-OFDM EON while considering the effect of physical layer impairments. First, we formulate an optimal Integer Linear Programming (ILP) that achieves load-balancing in presence of transmission impairments and minimizes delay-bandwidth product along with fragmentation. We next propose a heuristic algorithm for large networks with two different demand ordering techniques. We show the benefits of our algorithm compared to the existing load balancing algorithm. △ Less Submitted 14 September, 2017; originally announced September 2017. arXiv:1708.09218 ?[ pdf , other ]? cs.NI A novel online scheduling protocol for energy-efficient TWDM-OLT design Authors: Sourav Dutta , Dibbendu Roy , Chayan Bhar , Goutam Das Abstract : Design of energy-efficient access networks has emerged as an important area of research, since access networks consume $80-90\%$ of the overall Internet power consumption. TWDM-PON is envisaged to be one of the widely accepted future access technologies. TWDM-PON offers an additional opportunity to save energy at the OLT along with the existing energy-efficient ONU design. In this paper, we focus… ▽ More Design of energy-efficient access networks has emerged as an important area of research, since access networks consume $80-90\%$ of the overall Internet power consumption. TWDM-PON is envisaged to be one of the widely accepted future access technologies. TWDM-PON offers an additional opportunity to save energy at the OLT along with the existing energy-efficient ONU design. In this paper, we focus on the energy-efficient OLT design in a TWDM-PON. While most of the conventional methods employ a minimization of the number of wavelengths, we propose a novel approach which aims at minimizing the number of voids created due to scheduling. In the process, for the first time, we present a low-complexity on-line scheduling algorithm for the upstream traffic considering delay constraints. Our extensive simulations demonstrate a significant improvement in energy efficiency of $\sim 25\%$ for high load at the OLT receivers. Furthermore, we provide an analytical upper-bound on the energy-efficiency of the OLT receivers and demonstrate that the proposed protocol achieves an energy efficiency very close to the bound with a maximum deviation $\sim 2\%$ for $64$ ONUs. △ Less Submitted 30 August, 2017; originally announced August 2017. arXiv:1708.04257 ?[ pdf , ps , other ]? cs.IT On Bounds of Spectral Efficiency of Optimally Beamformed NLOS Millimeter Wave Links Authors: Rakesh RT , Debarati Sen , Goutam Das Abstract : Beamforming is an indispensable feature for millimeter wave (mmWave) wireless communications in order to compensate for the severe path loss incurred due to high frequency operation. In this paper, we introduce a novel framework to evaluate the spectral efficiency (SE) of non-line-of-sight(NLOS) mmWave links with optimal analog beamforming. Optimality here implies the joint selection of antenna be… ▽ More Beamforming is an indispensable feature for millimeter wave (mmWave) wireless communications in order to compensate for the severe path loss incurred due to high frequency operation. In this paper, we introduce a novel framework to evaluate the spectral efficiency (SE) of non-line-of-sight(NLOS) mmWave links with optimal analog beamforming. Optimality here implies the joint selection of antenna beams at the transmitter and receiver which simultaneously maximize the received power. We develop a mathematical framework based on the extended Saleh-Valenzuela channel model to embody the impact of optimal analog beamforming into the performance metrics for NLOS mmWave links. Practical mmWave channels are characterized by sparsity in terms of number of multi-path components; we exploit this feature to derive upper and lower bounds on SE of beamformed directional links. Simulation results reveal that the proposed approach is fairly accurate to model beamformed links in most practical operating scenarios. We also study the impact of overhead due to antenna beam training on the throughput (TP) of a link and obtain an approximate solution for optimal antenna half power beamwidth which maximizes TP. △ Less Submitted 23 November, 2017; v1 submitted 14 August, 2017; originally announced August 2017. Comments: Accepted for publication in IEEE Transactions on Vehicular Technology arXiv:1708.01371 ?[ pdf , other ]? cs.NI Constrained Receiver Scheduling in Flexible Time and Wavelength Division Multiplexed Optical Authors: Chayan Bhar , Arnab Mitra , Goutam Das Abstract : An increasing bandwidth demand has mandated a shift to the time and wavelength division multiplexing (TWDM) techniques in optical access networks (OAN). Typical TWDM scheduling schemes consider scheduling of the optical line terminal receiver only. In this paper we have identified an additional collision domain that is present in TWDM schemes that offer security, in addition to bandwidth flexibili… ▽ More An increasing bandwidth demand has mandated a shift to the time and wavelength division multiplexing (TWDM) techniques in optical access networks (OAN). Typical TWDM scheduling schemes consider scheduling of the optical line terminal receiver only. In this paper we have identified an additional collision domain that is present in TWDM schemes that offer security, in addition to bandwidth flexibility. Scheduling of the identified collision domain is termed as group scheduling. We illustrate that consideration of receiver scheduling only (as done in typical TWDM schemes) severely affects their throughput when implemented on flexible and secure TWDM architectures. A novel media access control protocol has been proposed in this paper that considers the multiple collision domains. Through simulations, we are able to illustrate that the proposed scheme achieves a high throughput. A theoretical upper bound of throughput has also been derived to explain the simulation results. Complexity reduction of the proposed scheme has been illustrated, thereby making it an attractive proposal. △ Less Submitted 3 August, 2017; originally announced August 2017. arXiv:1707.03243 ?[ pdf ]? cs.CR Malware in the Future? Forecasting of Analyst Detection of Cyber Events Authors: Jonathan Z. Bakdash , Steve Hutchinson , Erin G. Zaroukian , Laura R. Marusich , Saravanan Thirumuruganathan , Charmaine Sample , Blaine Hoffman , Gautam Das Abstract : There have been extensive efforts in government, academia, and industry to anticipate, forecast, and mitigate cyber attacks. A common approach is time-series forecasting of cyber attacks based on data from network telescopes, honeypots, and automated intrusion detection/prevention systems. This research has uncovered key insights such as systematicity in cyber attacks. Here, we propose an alternat… ▽ More There have been extensive efforts in government, academia, and industry to anticipate, forecast, and mitigate cyber attacks. A common approach is time-series forecasting of cyber attacks based on data from network telescopes, honeypots, and automated intrusion detection/prevention systems. This research has uncovered key insights such as systematicity in cyber attacks. Here, we propose an alternate perspective of this problem by performing forecasting of attacks that are analyst-detected and -verified occurrences of malware. We call these instances of malware cyber event data. Specifically, our dataset was analyst-detected incidents from a large operational Computer Security Service Provider (CSSP) for the U.S. Department of Defense, which rarely relies only on automated systems. Our data set consists of weekly counts of cyber events over approximately seven years. Since all cyber events were validated by analysts, our dataset is unlikely to have false positives which are often endemic in other sources of data. Further, the higher-quality data could be used for a number for resource allocation, estimation of security resources, and the development of effective risk-management strategies. We used a Bayesian State Space Model for forecasting and found that events one week ahead could be predicted. To quantify bursts, we used a Markov model. Our findings of systematicity in analyst-detected cyber attacks are consistent with previous work using other sources. The advanced information provided by a forecast may help with threat awareness by providing a probable value and range for future cyber events one week ahead. Other potential applications for cyber event forecasting include proactive allocation of resources and capabilities for cyber defense (e.g., analyst staffing and sensor configuration) in CSSPs. Enhanced threat awareness may improve cybersecurity. △ Less Submitted 8 June, 2018; v1 submitted 11 July, 2017; originally announced July 2017. Comments: Revised version resubmitted to journal arXiv:1705.03028 ?[ pdf , other ]? cs.DB Assisting Service Providers In Peer-to-peer Marketplaces: Maximizing Gain Over Flexible Attributes Authors: Abolfazl Asudeh , Azade Nazi , Nick Koudas , Gautam Das Abstract : Peer to peer marketplaces such as AirBnB enable transactional exchange of services directly between people. In such platforms, those providing a service (hosts in AirBnB) are faced with various choices. For example in AirBnB, although some amenities in a property (attributes of the property) are fixed, others are relatively flexible and can be provided without significant effort. Providing an amen… ▽ More Peer to peer marketplaces such as AirBnB enable transactional exchange of services directly between people. In such platforms, those providing a service (hosts in AirBnB) are faced with various choices. For example in AirBnB, although some amenities in a property (attributes of the property) are fixed, others are relatively flexible and can be provided without significant effort. Providing an amenity is usually associated with a cost. Naturally different sets of amenities may have a different "gains" for a host. Consequently, given a limited budget, deciding which amenities (attributes) to offer is challenging. In this paper, we formally introduce and define the problem of Gain Maximization over Flexible Attributes (GMFA). We first prove that the problem is NP-hard and show that identifying an approximate algorithm with a constant approximate ratio is unlikely. We then provide a practically efficient exact algorithm to the GMFA problem for the general class of monotonic gain functions, which quantify the benefit of sets of attributes. As the next part of our contribution, we focus on the design of a practical gain function for GMFA. We introduce the notion of frequent-item based count (FBC), which utilizes the existing tuples in the database to define the notion of gain, and propose an efficient algorithm for computing it. We present the results of a comprehensive experimental evaluation of the proposed techniques on real dataset from AirBnB and demonstrate the practical relevance and utility of our proposal. △ Less Submitted 6 October, 2017; v1 submitted 8 May, 2017; originally announced May 2017. arXiv:1703.00080 ?[ pdf , other ]? cs.DB Efficient Computation of Subspace Skyline over Categorical Domains Authors: Md Farhadur Rahman , Abolfazl Asudeh , Nick Koudas , Gautam Das Abstract : Platforms such as AirBnB, Zillow, Yelp, and related sites have transformed the way we search for accommodation, restaurants, etc. The underlying datasets in such applications have numerous attributes that are mostly Boolean or Categorical. Discovering the skyline of such datasets over a subset of attributes would identify entries that stand out while enabling numerous applications. There are only… ▽ More Platforms such as AirBnB, Zillow, Yelp, and related sites have transformed the way we search for accommodation, restaurants, etc. The underlying datasets in such applications have numerous attributes that are mostly Boolean or Categorical. Discovering the skyline of such datasets over a subset of attributes would identify entries that stand out while enabling numerous applications. There are only a few algorithms designed to compute the skyline over categorical attributes, yet are applicable only when the number of attributes is small. In this paper, we place the problem of skyline discovery over categorical attributes into perspective and design efficient algorithms for two cases. (i) In the absence of indices, we propose two algorithms, ST-S and ST-P, that exploits the categorical characteristics of the datasets, organizing tuples in a tree data structure, supporting efficient dominance tests over the candidate set. (ii) We then consider the existence of widely used precomputed sorted lists. After discussing several approaches, and studying their limitations, we propose TA-SKY, a novel threshold style algorithm that utilizes sorted lists. Moreover, we further optimize TA-SKY and explore its progressive nature, making it suitable for applications with strict interactive requirements. In addition to the extensive theoretical analysis of the proposed algorithms, we conduct a comprehensive experimental evaluation of the combination of real (including the entire AirBnB data collection) and synthetic datasets to study the practicality of the proposed algorithms. The results showcase the superior performance of our techniques, outperforming applicable approaches by orders of magnitude. △ Less Submitted 30 May, 2017; v1 submitted 28 February, 2017; originally announced March 2017. arXiv:1611.07808 ?[ pdf , other ]? cs.CG Hardness of Liar's Domination on Unit Disk Graphs Authors: Ramesh K Jallu , Gautam K Das Abstract : A unit disk graph is the intersection graph of a set of unit diameter disks in the plane. In this paper we consider liar's domination problem on unit disk graphs, a variant of dominating set problem. We call this problem as {\it Euclidean liar's domination problem}. In the Euclidean liar's domination problem, a set ${\cal P}=\{p_1,p_2,\ldots,p_n\}$ of $n$ points (disk centers) are given in the Euc… ▽ More A unit disk graph is the intersection graph of a set of unit diameter disks in the plane. In this paper we consider liar's domination problem on unit disk graphs, a variant of dominating set problem. We call this problem as {\it Euclidean liar's domination problem}. In the Euclidean liar's domination problem, a set ${\cal P}=\{p_1,p_2,\ldots,p_n\}$ of $n$ points (disk centers) are given in the Euclidean plane. For $p \in {\cal P}$, $N[p]$ is a subset of ${\cal P}$ such that for any $q \in N[p]$, the Euclidean distance between $p$ and $q$ is less than or equal to 1, i.e., the corresponding unit diameter disks intersect. The objective of the Euclidean liar's domination problem is to find a subset $D\; (\subseteq {\cal P})$ of minimum size having the following properties : (i) $N[p_i] \cap D \geq 2$ for $1 \leq i \leq n$, and (ii) $(N[p_i] \cup N[p_j]) \cap D \geq 3$ for $i\neq j, 1\leq i,j \leq n$. This article aims to prove the Euclidean liar's domination problem is NP-complete. △ Less Submitted 23 November, 2016; originally announced November 2016. Comments: 6 pages, 4 figures arXiv:1609.05670 ?[ pdf , ps , other ]? cs.IT cs.NI Load-aware Performance Analysis of Cell Center/Edge Users in Random HetNets Authors: Praful D. Mankar , Goutam Das , S. S. Pathak Abstract : For real-time traffic, the link quality and call blocking probability (both derived from coverage probability) are realized to be poor for cell edge users (CEUs) compared to cell center users (CCUs) as the signal reception in the cell center region is better compared to the cell edge region. In heterogeneous networks (HetNets), the uncoordinated channel access by different types of base stations d… ▽ More For real-time traffic, the link quality and call blocking probability (both derived from coverage probability) are realized to be poor for cell edge users (CEUs) compared to cell center users (CCUs) as the signal reception in the cell center region is better compared to the cell edge region. In heterogeneous networks (HetNets), the uncoordinated channel access by different types of base stations determine the interference statistics that further arbitrates the coverage probability. Thus, the spectrum allocation techniques have major impact on the performance of CCU and CEU. In this paper, the performance of CCUs and CEUs in a random two-tier network is studied for two spectrum allocation techniques namely: 1) co-channel (CSA), and 2) shared (SSA). For performance analysis, the widely accepted conception of modeling the tiers of HetNet using independent homogeneous Poisson point process (PPP) is considered to accommodate the spatial randomness in location of BSs. To incorporate the spatial randomness in the arrival of service and to aid the load-aware analysis, the cellular traffic is modeled using spatio-temporal PPP. Under this scenario, we have developed an analytical framework to evaluate the load-aware performance, including coverage and blocking probabilities, of CCUs and CEUs under both spectrum allocation techniques. Further, we provide insight into achievable area energy efficiency for SSA and CSA. The developed analytical framework is validated through extensive simulations. Next, we demonstrate the impact of traffic load and femto access points density on the performance of CCUs/CEUs under CSA and SSA. △ Less Submitted 9 March, 2017; v1 submitted 19 September, 2016; originally announced September 2016. Comments: 13 pages and 11 figures. This paper is submitted to IEEE Transaction on Vehicular Technology arXiv:1609.05656 ?[ pdf , ps , other ]? cs.IT Coverage Analysis of Two-Tier HetNets for Co-Channel, Orthogonal, and Partial Spectrum Sharing under Fractional Load Conditions Authors: Praful D. Mankar , Goutam Das , S. S. Pathak Abstract : In heterogeneous networks, the random deployment of femto access points (FAPs) and macro base stations (MBSs) with uncoordinated channel access impose huge inter-tier interferences. In real-life networks, the process of MBSs deployment exhibits the homogeneity, however the FAPs have the behavioral characteristic of clusters formation like in malls, apartments, offices, etc. Therefore, the composit… ▽ More In heterogeneous networks, the random deployment of femto access points (FAPs) and macro base stations (MBSs) with uncoordinated channel access impose huge inter-tier interferences. In real-life networks, the process of MBSs deployment exhibits the homogeneity, however the FAPs have the behavioral characteristic of clusters formation like in malls, apartments, offices, etc. Therefore, the composite modeling of the MBSs and the FAPs using Poisson point process and Poisson cluster process is employed for the evaluation of coverage probability. The scenario of the real-time traffic for macro-tier and the best-effort traffic for femto-tier is considered. Cognition is introduced in the clustered FAPs to control the inter-tier interference. Furthermore, the impact of macro-tier load is analyzed by exploiting the inherent coupling between coverage probability and activity factor of an MBS. Further, we study the effect of co-channel, orthogonal, and partial spectrum sharing modes on the coverage for given parameters like load condition, FAPs/MBSs density, etc. We provide simulation validation for the derived expressions of coverage and present an comparative analysis for the mentioned spectrum sharing modes. △ Less Submitted 18 February, 2017; v1 submitted 19 September, 2016; originally announced September 2016. Comments: 14 pages and 10 figures. Submitted to IEEE Transaction on Vehicular Technology arXiv:1606.09433 ?[ pdf ]? cs.RO cs.CV Steering a Predator Robot using a Mixed Frame/Event-Driven Convolutional Neural Network Authors: Diederik Paul Moeys , Federico Corradi , Emmett Kerr , Philip Vance , Gautham Das , Daniel Neil , Dermot Kerr , Tobi Delbruck Abstract : This paper describes the application of a Convolutional Neural Network (CNN) in the context of a predator/prey scenario. The CNN is trained and run on data from a Dynamic and Active Pixel Sensor (DAVIS) mounted on a Summit XL robot (the predator), which follows another one (the prey). The CNN is driven by both conventional image frames and dynamic vision sensor "frames" that consist of a constant… ▽ More This paper describes the application of a Convolutional Neural Network (CNN) in the context of a predator/prey scenario. The CNN is trained and run on data from a Dynamic and Active Pixel Sensor (DAVIS) mounted on a Summit XL robot (the predator), which follows another one (the prey). The CNN is driven by both conventional image frames and dynamic vision sensor "frames" that consist of a constant number of DAVIS ON and OFF events. The network is thus "data driven" at a sample rate proportional to the scene activity, so the effective sample rate varies from 15 Hz to 240 Hz depending on the robot speeds. The network generates four outputs: steer right, left, center and non-visible. After off-line training on labeled data, the network is imported on the on-board Summit XL robot which runs jAER and receives steering directions in real time. Successful results on closed-loop trials, with accuracies up to 87% or 92% (depending on evaluation criteria) are reported. Although the proposed approach discards the precise DAVIS event timing, it offers the significant advantage of compatibility with conventional deep learning technology without giving up the advantage of data-driven computing. △ Less Submitted 30 June, 2016; originally announced June 2016. Comments: Paper presented at the conference: Second International Conference on Event-Based Control, Communication and Signal Processing (EBCCSP) 2016, At Krakow, Poland arXiv:1602.06454 ?[ pdf , other ]? cs.SI cs.IR Web Item Reviewing Made Easy By Leveraging Available User Feedback Authors: Azade Nazi , Mahashweta Das , Gautam Das Abstract : The widespread use of online review sites over the past decade has motivated businesses of all types to possess an expansive arsenal of user feedback to mark their reputation. Though a significant proportion of purchasing decisions are driven by average rating, detailed reviews are critical for activities like buying expensive digital SLR camera. Since writing a detailed review for an item is usua… ▽ More The widespread use of online review sites over the past decade has motivated businesses of all types to possess an expansive arsenal of user feedback to mark their reputation. Though a significant proportion of purchasing decisions are driven by average rating, detailed reviews are critical for activities like buying expensive digital SLR camera. Since writing a detailed review for an item is usually time-consuming, the number of reviews available in the Web is far from many. Given a user and an item our goal is to identify the top-$k$ meaningful phrases/tags to help her review the item easily. We propose general-constrained optimization framework based on three measures - relevance (how well the result set of tags describes an item), coverage (how well the result set of tags covers the different aspects of an item), and polarity (how well sentiment is attached to the result set of tags). By adopting different definitions of coverage, we identify two concrete problem instances that enable a wide range of real-world scenarios. We develop practical algorithms with theoretical bounds to solve these problems efficiently. We conduct experiments on synthetic and real data crawled from the web to validate the effectiveness of our solutions. △ Less Submitted 20 February, 2016; originally announced February 2016. arXiv:1602.05100 ?[ pdf , other ]? cs.DB doi 10.14778/2983200.2983205 Query Reranking As A Service Authors: Abolfazl Asudeh , Nan Zhang , Gautam Das Abstract : The ranked retrieval model has rapidly become the de facto way for search query processing in client-server databases, especially those on the web. Despite of the extensive efforts in the database community on designing better ranking functions/mechanisms, many such databases in practice still fail to address the diverse and sometimes contradicting preferences of users on tuple ranking, perhaps (a… ▽ More The ranked retrieval model has rapidly become the de facto way for search query processing in client-server databases, especially those on the web. Despite of the extensive efforts in the database community on designing better ranking functions/mechanisms, many such databases in practice still fail to address the diverse and sometimes contradicting preferences of users on tuple ranking, perhaps (at least partially) due to the lack of expertise and/or motivation for the database owner to design truly effective ranking functions. This paper takes a different route on addressing the issue by defining a novel {\em query reranking problem}, i.e., we aim to design a third-party service that uses nothing but the public search interface of a client-server database to enable the on-the-fly processing of queries with any user-specified ranking functions (with or without selection conditions), no matter if the ranking function is supported by the database or not. We analyze the worst-case complexity of the problem and introduce a number of ideas, e.g., on-the-fly indexing, domination detection and virtual tuple pruning, to reduce the average-case cost of the query reranking algorithm. We also present extensive experimental results on real-world datasets, in both offline and live online systems, that demonstrate the effectiveness of our proposed techniques. △ Less Submitted 16 July, 2016; v1 submitted 6 February, 2016; originally announced February 2016. Comments: Proceedings of the VLDB Endowment (PVLDB), Vol. 9, No. 11, 2016 Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol 9, No 11, 2016 arXiv:1602.03730 ?[ pdf , other ]? cs.DB HDBSCAN: Density based Clustering over Location Based Services Authors: Md Farhadur Rahman , Weimo Liu , Saad Bin Suhaim , Saravanan Thirumuruganathan , Nan Zhang , Gautam Das Abstract : Location Based Services (LBS) have become extremely popular and used by millions of users. Popular LBS run the entire gamut from mapping services (such as Google Maps) to restaurants (such as Yelp) and real-estate (such as Redfin). The public query interfaces of LBS can be abstractly modeled as a kNN interface over a database of two dimensional points: given an arbitrary query point, the system re… ▽ More Location Based Services (LBS) have become extremely popular and used by millions of users. Popular LBS run the entire gamut from mapping services (such as Google Maps) to restaurants (such as Yelp) and real-estate (such as Redfin). The public query interfaces of LBS can be abstractly modeled as a kNN interface over a database of two dimensional points: given an arbitrary query point, the system returns the k points in the database that are nearest to the query point. Often, k is set to a small value such as 20 or 50. In this paper, we consider the novel problem of enabling density based clustering over an LBS with only a limited, kNN query interface. Due to the query rate limits imposed by LBS, even retrieving every tuple once is infeasible. Hence, we seek to construct a cluster assignment function f(.) by issuing a small number of kNN queries, such that for any given tuple t in the database which may or may not have been accessed, f(.) outputs the cluster assignment of t with high accuracy. We conduct a comprehensive set of experiments over benchmark datasets and popular real-world LBS such as Yahoo! Flickr, Zillow, Redfin and Google Maps. △ Less Submitted 16 February, 2016; v1 submitted 11 February, 2016; originally announced February 2016. arXiv:1512.02138 ?[ pdf , other ]? cs.DB doi 10.14778/2983200.2983205 Discovering the Skyline of Web Databases Authors: Abolfazl Asudeh , Saravanan Thirumuruganathan , Nan Zhang , Gautam Das Abstract : Many web databases are "hidden" behind proprietary search interfaces that enforce the top-$k$ output constraint, i.e., each query returns at most $k$ of all matching tuples, preferentially selected and returned according to a proprietary ranking function. In this paper, we initiate research into the novel problem of skyline discovery over top-$k$ hidden web databases. Since skyline tuples provide… ▽ More Many web databases are "hidden" behind proprietary search interfaces that enforce the top-$k$ output constraint, i.e., each query returns at most $k$ of all matching tuples, preferentially selected and returned according to a proprietary ranking function. In this paper, we initiate research into the novel problem of skyline discovery over top-$k$ hidden web databases. Since skyline tuples provide critical insights into the database and include the top-ranked tuple for every possible ranking function following the monotonic order of attribute values, skyline discovery from a hidden web database can enable a wide variety of innovative third-party applications over one or multiple web databases. Our research in the paper shows that the critical factor affecting the cost of skyline discovery is the type of search interface controls provided by the website. As such, we develop efficient algorithms for three most popular types, i.e., one-ended range, free range and point predicates, and then combine them to support web databases that feature a mixture of these types. Rigorous theoretical analysis and extensive real-world online and offline experiments demonstrate the effectiveness of our proposed techniques and their superiority over baseline solutions. △ Less Submitted 20 March, 2016; v1 submitted 7 December, 2015; originally announced December 2015. Journal ref: The Proceedings of the VLDB Endowment (PVLDB) 2016, Vol 9, No. 7, pages 600 - 611 arXiv:1505.02441 ?[ pdf , other ]? cs.DB Aggregate Estimations over Location Based Services Authors: Weimo Liu , Md Farhadur Rahman , Saravanan Thirumuruganathan , Nan Zhang , Gautam Das Abstract : Location based services (LBS) have become very popular in recent years. They range from map services (e.g., Google Maps) that store geographic locations of points of interests, to online social networks (e.g., WeChat, Sina Weibo, FourSquare) that leverage user geographic locations to enable various recommendation functions. The public query interfaces of these services may be abstractly modeled as… ▽ More Location based services (LBS) have become very popular in recent years. They range from map services (e.g., Google Maps) that store geographic locations of points of interests, to online social networks (e.g., WeChat, Sina Weibo, FourSquare) that leverage user geographic locations to enable various recommendation functions. The public query interfaces of these services may be abstractly modeled as a kNN interface over a database of two dimensional points on a plane: given an arbitrary query point, the system returns the k points in the database that are nearest to the query point. In this paper we consider the problem of obtaining approximate estimates of SUM and COUNT aggregates by only querying such databases via their restrictive public interfaces. We distinguish between interfaces that return location information of the returned tuples (e.g., Google Maps), and interfaces that do not return location information (e.g., Sina Weibo). For both types of interfaces, we develop aggregate estimation algorithms that are based on novel techniques for precisely computing or approximately estimating the Voronoi cell of tuples. We discuss a comprehensive set of real-world experiments for testing our algorithms, including experiments on Google Maps, WeChat, and Sina Weibo. △ Less Submitted 13 May, 2015; v1 submitted 10 May, 2015; originally announced May 2015. arXiv:1505.00079 ?[ pdf , other ]? cs.SI physics.soc-ph Leveraging History for Faster Sampling of Online Social Networks Authors: Zhuojie Zhou , Nan Zhang , Gautam Das Abstract : How to enable efficient analytics over such data has been an increasingly important research problem. Given the sheer size of such social networks, many existing studies resort to sampling techniques that draw random nodes from an online social network through its restrictive web/API interface. Almost all of them use the exact same underlying technique of random walk - a Markov Chain Monte Carlo b… ▽ More How to enable efficient analytics over such data has been an increasingly important research problem. Given the sheer size of such social networks, many existing studies resort to sampling techniques that draw random nodes from an online social network through its restrictive web/API interface. Almost all of them use the exact same underlying technique of random walk - a Markov Chain Monte Carlo based method which iteratively transits from one node to its random neighbor. Random walk fits naturally with this problem because, for most online social networks, the only query we can issue through the interface is to retrieve the neighbors of a given node (i.e., no access to the full graph topology). A problem with random walks, however, is the "burn-in" period which requires a large number of transitions/queries before the sampling distribution converges to a stationary value that enables the drawing of samples in a statistically valid manner. In this paper, we consider a novel problem of speeding up the fundamental design of random walks (i.e., reducing the number of queries it requires) without changing the stationary distribution it achieves - thereby enabling a more efficient "drop-in" replacement for existing sampling-based analytics techniques over online social networks. Our main idea is to leverage the history of random walks to construct a higher-ordered Markov chain. We develop two algorithms, Circulated Neighbors and Groupby Neighbors Random Walk (CNRW and GNRW) and prove that, no matter what the social network topology is, CNRW and GNRW offer better efficiency than baseline random walks while achieving the same stationary distribution. We demonstrate through extensive experiments on real-world social networks and synthetic graphs the superiority of our techniques over the existing ones. △ Less Submitted 9 May, 2015; v1 submitted 30 April, 2015; originally announced May 2015. Comments: Technical report for the VLDB 2015 paper arXiv:1502.05106 ?[ pdf , other ]? cs.DB "The Whole Is Greater Than the Sum of Its Parts": Optimization in Collaborative Crowdsourcing Authors: Habibur Rahman , Senjuti Basu Roy , Saravanan Thirumuruganathan , Sihem Amer-Yahia , Gautam Das Abstract : In this work, we initiate the investigation of optimization opportunities in collaborative crowdsourcing. Many popular applications, such as collaborative document editing, sentence translation, or citizen science resort to this special form of human-based computing, where, crowd workers with appropriate skills and expertise are required to form groups to solve complex tasks. Central to any collab… ▽ More In this work, we initiate the investigation of optimization opportunities in collaborative crowdsourcing. Many popular applications, such as collaborative document editing, sentence translation, or citizen science resort to this special form of human-based computing, where, crowd workers with appropriate skills and expertise are required to form groups to solve complex tasks. Central to any collaborative crowdsourcing process is the aspect of successful collaboration among the workers, which, for the first time, is formalized and then optimized in this work. Our formalism considers two main collaboration-related human factors, affinity and upper critical mass, appropriately adapted from organizational science and social theories. Our contributions are (a) proposing a comprehensive model for collaborative crowdsourcing optimization, (b) rigorous theoretical analyses to understand the hardness of the proposed problems, (c) an array of efficient exact and approximation algorithms with provable theoretical guarantees. Finally, we present a detailed set of experimental results stemming from two real-world collaborative crowdsourcing application us- ing Amazon Mechanical Turk, as well as conduct synthetic data analyses on scalability and qualitative aspects of our proposed algorithms. Our experimental results successfully demonstrate the efficacy of our proposed solutions. △ Less Submitted 12 April, 2015; v1 submitted 17 February, 2015; originally announced February 2015. arXiv:1411.1455 ?[ pdf , other ]? cs.DB Rank-Based Inference over Web Databases Authors: Md Farhadur Rahman , Weimo Liu , Saravanan Thirumuruganathan , Nan Zhang , Gautam Das Abstract : In recent years, there has been much research in Ranked Retrieval model in structured databases, especially those in web databases. With this model, a search query returns top-k tuples according to not just exact matches of selection conditions, but a suitable ranking function. This paper studies a novel problem on the privacy implications of database ranking. The motivation is a novel yet serious… ▽ More In recent years, there has been much research in Ranked Retrieval model in structured databases, especially those in web databases. With this model, a search query returns top-k tuples according to not just exact matches of selection conditions, but a suitable ranking function. This paper studies a novel problem on the privacy implications of database ranking. The motivation is a novel yet serious privacy leakage we found on real-world web databases which is caused by the ranking function design. Many such databases feature private attributes - e.g., a social network allows users to specify certain attributes as only visible to him/herself, but not to others. While these websites generally respect the privacy settings by not directly displaying private attribute values in search query answers, many of them nevertheless take into account such private attributes in the ranking function design. The conventional belief might be that tuple ranks alone are not enough to reveal the private attribute values. Our investigation, however, shows that this is not the case in reality. To address the problem, we introduce a taxonomy of the problem space with two dimensions, (1) the type of query interface and (2) the capability of adversaries. For each subspace, we develop a novel technique which either guarantees the successful inference of private attributes, or does so for a significant portion of real-world tuples. We demonstrate the effectiveness and efficiency of our techniques through theoretical analysis, extensive experiments over real-world datasets, as well as successful online attacks over websites with tens to hundreds of millions of users - e.g., Amazon Goodreads and Renren.com. △ Less Submitted 5 April, 2015; v1 submitted 5 November, 2014; originally announced November 2014. arXiv:1410.7833 ?[ pdf , other ]? cs.SI physics.soc-ph Walk, Not Wait: Faster Sampling Over Online Social Networks Authors: Azade Nazi , Zhuojie Zhou , Saravanan Thirumuruganathan , Nan Zhang , Gautam Das Abstract : In this paper, we introduce a novel, general purpose, technique for faster sampling of nodes over an online social network. Specifically, unlike traditional random walk which wait for the convergence of sampling distribution to a predetermined target distribution - a waiting process that incurs a high query cost - we develop WALK-ESTIMATE, which starts with a much shorter random walk, and then pro… ▽ More In this paper, we introduce a novel, general purpose, technique for faster sampling of nodes over an online social network. Specifically, unlike traditional random walk which wait for the convergence of sampling distribution to a predetermined target distribution - a waiting process that incurs a high query cost - we develop WALK-ESTIMATE, which starts with a much shorter random walk, and then proactively estimate the sampling probability for the node taken before using acceptance-rejection sampling to adjust the sampling probability to the predetermined target distribution. We present a novel backward random walk technique which provides provably unbiased estimations for the sampling probability, and demonstrate the superiority of WALK-ESTIMATE over traditional random walks through theoretical analysis and extensive experiments over real world online social networks. △ Less Submitted 1 November, 2014; v1 submitted 28 October, 2014; originally announced October 2014. arXiv:1403.2763 ?[ pdf , other ]? cs.DB Aggregate Estimation Over Dynamic Hidden Web Databases Authors: Weimo Liu , Saravanan Thirumuruganathan , Nan Zhang , Gautam Das Abstract : Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they… ▽ More Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most real-world web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive real-world experiments demonstrate the effectiveness of our proposed algorithms and their superiority over baseline solutions (e.g., the repeated execution of algorithms designed for static web databases). △ Less Submitted 1 May, 2014; v1 submitted 11 March, 2014; originally announced March 2014. arXiv:1401.1302 ?[ pdf , other ]? cs.DB cs.SI Optimization in Knowledge-Intensive Crowdsourcing Authors: Senjuti Basu Roy , Ioanna Lykourentzou , Saravanan Thirumuruganathan , Sihem Amer-Yahia , Gautam Das Abstract : We present SmartCrowd, a framework for optimizing collaborative knowledge-intensive crowdsourcing. SmartCrowd distinguishes itself by accounting for human factors in the process of assigning tasks to workers. Human factors designate workers' expertise in different skills, their expected minimum wage, and their availability. In SmartCrowd, we formulate task assignment as an optimization problem, an… ▽ More We present SmartCrowd, a framework for optimizing collaborative knowledge-intensive crowdsourcing. SmartCrowd distinguishes itself by accounting for human factors in the process of assigning tasks to workers. Human factors designate workers' expertise in different skills, their expected minimum wage, and their availability. In SmartCrowd, we formulate task assignment as an optimization problem, and rely on pre-indexing workers and maintaining the indexes adaptively, in such a way that the task assignment process gets optimized both qualitatively, and computation time-wise. We present rigorous theoretical analyses of the optimization problem and propose optimal and approximation algorithms. We finally perform extensive performance and quality experiments using real and synthetic data to demonstrate that adaptive indexing in SmartCrowd is necessary to achieve efficient high quality task assignment. △ Less Submitted 7 January, 2014; originally announced January 2014. Comments: 12 pages arXiv:1312.7243 ?[ pdf , ps , other ]? cs.DS Minimum Dominating Set for a Point Set in $\IR^2$ Authors: Ramesh K. Jallu , Prajwal R. Prasad , Gautam K. Das Abstract : In this article, we consider the problem of computing minimum dominating set for a given set $S$ of $n$ points in $\IR^2$. Here the objective is to find a minimum cardinality subset $S'$ of $S$ such that the union of the unit radius disks centered at the points in $S'$ covers all the points in $S$. We first propose a simple 4-factor and 3-factor approximation algorithms in $O(n^6 \log n)$ and… ▽ More In this article, we consider the problem of computing minimum dominating set for a given set $S$ of $n$ points in $\IR^2$. Here the objective is to find a minimum cardinality subset $S'$ of $S$ such that the union of the unit radius disks centered at the points in $S'$ covers all the points in $S$. We first propose a simple 4-factor and 3-factor approximation algorithms in $O(n^6 \log n)$ and $O(n^{11} \log n)$ time respectively improving time complexities by a factor of $O(n^2)$ and $O(n^4)$ respectively over the best known result available in the literature [M. De, G.K. Das, P. Carmi and S.C. Nandy, {\it Approximation algorithms for a variant of discrete piercing set problem for unit disk}, Int. J. of Comp. Geom. and Appl., to appear]. Finally, we propose a very important shifting lemma, which is of independent interest and using this lemma we propose a $\frac{5}{2}$-factor approximation algorithm and a PTAS for the minimum dominating set problem. △ Less Submitted 5 January, 2014; v1 submitted 27 December, 2013; originally announced December 2013. Comments: 14 pages, 8 figures arXiv:1304.0419 ?[ pdf , ps , other ]? cs.SI cs.DS cs.IR Top-K Product Design Based on Collaborative Tagging Data Authors: Mahashweta Das , Gautam Das , Vagelis Hristidis Abstract : The widespread use and popularity of collaborative content sites (e.g., IMDB, Amazon, Yelp, etc.) has created rich resources for users to consult in order to make purchasing decisions on various products such as movies, e-commerce products, restaurants, etc. Products with desirable tags (e.g., modern, reliable, etc.) have higher chances of being selected by prospective customers. This creates an o… ▽ More The widespread use and popularity of collaborative content sites (e.g., IMDB, Amazon, Yelp, etc.) has created rich resources for users to consult in order to make purchasing decisions on various products such as movies, e-commerce products, restaurants, etc. Products with desirable tags (e.g., modern, reliable, etc.) have higher chances of being selected by prospective customers. This creates an opportunity for product designers to design better products that are likely to attract desirable tags when published. In this paper, we investigate how to mine collaborative tagging data to decide the attribute values of new products and to return the top-k products that are likely to attract the maximum number of desirable tags when published. Given a training set of existing products with their features and user-submitted tags, we first build a Naive Bayes Classifier for each tag. We show that the problem of is NP-complete even if simple Naive Bayes Classifiers are used for tag prediction. We present a suite of algorithms for solving this problem: (a) an exact two tier algorithm(based on top-k querying techniques), which performs much better than the naive brute-force algorithm and works well for moderate problem instances, and (b) a set of approximation algorithms for larger problem instances: a novel polynomial-time approximation algorithm with provable error bound and a practical hill-climbing heuristic. We conduct detailed experiments on synthetic and real data crawled from the web to evaluate the efficiency and quality of our proposed algorithms, as well as show how product designers can benefit by leveraging collaborative tagging information. △ Less Submitted 1 April, 2013; originally announced April 2013. Comments: The conference version appeared under the title "Leveraging collaborative tagging for web item design" in SIGKDD 2011, pages 538-546 ACM Class: H.4 arXiv:1211.5184 ?[ pdf , other ]? cs.SI cs.DS physics.soc-ph Faster Random Walks By Rewiring Online Social Networks On-The-Fly Authors: Zhuojie Zhou , Nan Zhang , Zhiguo Gong , Gautam Das Abstract : Many online social networks feature restrictive web interfaces which only allow the query of a user's local neighborhood through the interface. To enable analytics over such an online social network through its restrictive web interface, many recent efforts reuse the existing Markov Chain Monte Carlo methods such as random walks to sample the social network and support analytics based on the sampl… ▽ More Many online social networks feature restrictive web interfaces which only allow the query of a user's local neighborhood through the interface. To enable analytics over such an online social network through its restrictive web interface, many recent efforts reuse the existing Markov Chain Monte Carlo methods such as random walks to sample the social network and support analytics based on the samples. The problem with such an approach, however, is the large amount of queries often required (i.e., a long "mixing time") for a random walk to reach a desired (stationary) sampling distribution. In this paper, we consider a novel problem of enabling a faster random walk over online social networks by "rewiring" the social network on-the-fly. Specifically, we develop Modified TOpology (MTO)-Sampler which, by using only information exposed by the restrictive web interface, constructs a "virtual" overlay topology of the social network while performing a random walk, and ensures that the random walk follows the modified overlay topology rather than the original one. We show that MTO-Sampler not only provably enhances the efficiency of sampling, but also achieves significant savings on query cost over real-world online social networks such as Google Plus, Epinion etc. △ Less Submitted 21 November, 2012; originally announced November 2012. Comments: 15 pages, 14 figure, technical report for ICDE2013 paper. Appendix has all the theorems' proofs; ICDE'2013 .
原始网站图片
增加监测目标对象/主题，请
登录 直接在原文中划词！