Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page Not Found
Page not found. Your pixels are in another canvas.
About me
About me
Archive Layout with Content
Posts by Category
Posts by Collection
CV
Markdown
Page not in menu
This is a page not in th emain menu
Page Archive
Portfolio
Publications
Sitemap
Posts by Tags
Talk map
Talks and presentations
Teaching
Terms and Privacy Policy
Blog posts
Jupyter notebook markdown generator
Future Blog Post
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml
and set future: false
.
Blog Post number 4
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 3
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 2
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 1
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Portfolio item number 1
Short description of portfolio item number 1
Portfolio item number 2
Short description of portfolio item number 2
Zebra regression: estimating motion of periodically-emittedparticles from unlabeled sightings
Authors: Yuyan Wang, Senaka Buthpitiya and Alex FabrikantIn submission.
We introduce Zebra regression, a novel streaming regression task. A stream of periodically-created particles travels along a finite, 1-dimensional trajectory, and we observe a stream of sporadic, noisy reports that estimate the location of some particle, but without exact information on which particle is being observed. We seek a streaming clustering algorithm that maintains accurate estimates of the locations of the particles, where a cluster of data reports corresponds to a moving particle.
An Objective for Hierarchical Clustering in Euclidean Space and Its Connection to Bisecting K-means
abc Authors: Benjamin Moseley and Yuyan WangPublished at: AAAI, 2020
Design a new global objective function to measure the quality of Hierarchical Clustering solutions for inputs in Eulidean space and discussed about its theoretical connection to groud-truth inputs and algorithms used in practice. The objective captures the criterion that has motivated the use of divisive clustering algorithms: that when a split happens, points in the same cluster should be more similar than points in different clusters. This objective gives reasonable results on ground-truth inputs for hierarchical clustering and has a theoretical connection between this objective and the bisecting k-means algorithm.
Link: here
Fair Hierarchical Clustering
abc Authors: Sara Ahmadian, Alessandro Epasto, Marina Knittel, Ravi Kumar, Mohammad Mahdian, Benjamin Moseley, Philip Pham, Sergei Vassilvitskii and Yuyan WangPublished at: NeurIPS, 2020
Defines fairness in the context of Hierarchical Clustering (HC) and designs algorithm that efficiently produces fair HC solutions for existing objective functions for HC. The proposed fairness is a hard constraint on HC trees. It assigns colors to data points according to their values of a chosen protected feature and imposes that all clusters (apart from leaves) in the HC tree, none of the colors could be overly dominant.
Link: here
Scaling Average-Linkage via Sparse Cluster Embeddings
abc Authors: Thomas Lavastida, Kefu Lu, Benjamin Moseley and Yuyan WangPublished at: ACML, 2021
Average-linkage, one of the most popular hierarchical clustering algorithms, does not scale to large data sets. The fastest known implementation has running time quadratic in the number of data points. This paper presents a technique we call cluster embedding. It maps clusters into points in another Euclidean space with slightly higher dimensions. Further, the distances between the mapped points well approximates the average distance between the clusters. When coupled with nearest-neighbor techniques, this embedding helps us efficiently find close clusters to merge and is used to break the quadratic running time bound. The speed-up is testified both theoretically and empirically.
Faster Hierarchical Clustering in General Metric Spaces using Approximate Nearest Neighbors
abc Authors: Benjamin Moseley, Sergei Vassilvitskii and Yuyan WangPublished at: AISTATS, 2021
We use approximate nearest neighbor (ANN) queries to improve the running time of single-linkage and average-linkage, both popular methods for Hierarchical Clustering, for inputs in general metric spaces where ANN techniques may or may not be known. We first propose such a method which works for any general metric space where ANN exists, and then extend it to the more common scenario where an ANN-friendly proxy metric is knwon. We complement our theoretical analysis with an empirical evaluation.
Relational Algorithms for k-means Clustering
abc Authors: Benjamin Moseley, Kirk Pruhs, Alireza Samadian and Yuyan WangPublished at: ICALP, 2021
Given a relational database, we give a constant approximation k-means algorithm that operates directly on the relational input without converting the relational database to the conventional sample-feature matrix input format. Most data encountered by data scientists is stored in relational databases. Running k-means on a relation database usually requires converting the database into a set of data points, where each data point is a collection of values for different features (attributes). The size of this set could be exponential in the size of the relational database. Our algorithm takes advantage of the highly compact relational representation and achieves a running time that could potentially be sublinear in the number of points the database represents. The algorithm gives a constant approximation of the optimal k-means.
Link: here
Robust Online Correlation Clustering
abc Authors: Silvio Lattanzi, Benjamin Moseley, Sergei Vassilvitskii, Yuyan Wang and Rudy ZhouPublished at: NeurIPS, 2021
In correlation clustering we are given a set of points along with recommendations whether each pair of points should be placed in the same cluster or into separate clusters. The goal is to cluster the points to minimize disagreements from the recommendations. We study the correlation clustering problem in the online setting, where points arrive one at a time, and upon arrival the algorithm must make an irrevocable cluster assignment decision. The online version is known to have a polynomial lower-bound on competitive ratio for any algorithm possible. In this work we go beyond worst case analysis, and show that the celebrated Pivot algorithm performs well when given access to a small number of random samples from the input. Moreover, we prove that Pivot is robust to additional adversarial perturbations of the sample set in this setting. We conclude with an empirical analysis validating our theoretical findings.
Talk 1 on Relevant Topic in Your Field
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Tutorial 1 on Relevant Topic in Your Field
Published:
Talk 2 on Relevant Topic in Your Field
Published:
Conference Proceeding talk 3 on Relevant Topic in Your Field
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Teaching experience 1
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Teaching experience 2
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.