Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Page Not Found

Page not found. Your pixels are in another canvas.

About me

About me

Archive Layout with Content

Posts by Category

Posts by Collection

CV

Markdown

Page not in menu

This is a page not in th emain menu

Page Archive

Portfolio

Publications

Sitemap

Posts by Tags

Talk map

Talks and presentations

Teaching

Terms and Privacy Policy

Blog posts

Jupyter notebook markdown generator

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

Zebra regression: estimating motion of periodically-emittedparticles from unlabeled sightings

Authors: Yuyan Wang, Senaka Buthpitiya and Alex Fabrikant

In submission.

We introduce Zebra regression, a novel streaming regression task. A stream of periodically-created particles travels along a finite, 1-dimensional trajectory, and we observe a stream of sporadic, noisy reports that estimate the location of some particle, but without exact information on which particle is being observed. We seek a streaming clustering algorithm that maintains accurate estimates of the locations of the particles, where a cluster of data reports corresponds to a moving particle.

An Objective for Hierarchical Clustering in Euclidean Space and Its Connection to Bisecting K-means

abc Authors: Benjamin Moseley and Yuyan Wang

Published at: AAAI, 2020

Design a new global objective function to measure the quality of Hierarchical Clustering solutions for inputs in Eulidean space and discussed about its theoretical connection to groud-truth inputs and algorithms used in practice. The objective captures the criterion that has motivated the use of divisive clustering algorithms: that when a split happens, points in the same cluster should be more similar than points in different clusters. This objective gives reasonable results on ground-truth inputs for hierarchical clustering and has a theoretical connection between this objective and the bisecting k-means algorithm.

Link: here

Fair Hierarchical Clustering

abc Authors: Sara Ahmadian, Alessandro Epasto, Marina Knittel, Ravi Kumar, Mohammad Mahdian, Benjamin Moseley, Philip Pham, Sergei Vassilvitskii and Yuyan Wang

Published at: NeurIPS, 2020

Defines fairness in the context of Hierarchical Clustering (HC) and designs algorithm that efficiently produces fair HC solutions for existing objective functions for HC. The proposed fairness is a hard constraint on HC trees. It assigns colors to data points according to their values of a chosen protected feature and imposes that all clusters (apart from leaves) in the HC tree, none of the colors could be overly dominant.

Link: here

Scaling Average-Linkage via Sparse Cluster Embeddings

abc Authors: Thomas Lavastida, Kefu Lu, Benjamin Moseley and Yuyan Wang

Published at: ACML, 2021

Average-linkage, one of the most popular hierarchical clustering algorithms, does not scale to large data sets. The fastest known implementation has running time quadratic in the number of data points. This paper presents a technique we call cluster embedding. It maps clusters into points in another Euclidean space with slightly higher dimensions. Further, the distances between the mapped points well approximates the average distance between the clusters. When coupled with nearest-neighbor techniques, this embedding helps us efficiently find close clusters to merge and is used to break the quadratic running time bound. The speed-up is testified both theoretically and empirically.

Faster Hierarchical Clustering in General Metric Spaces using Approximate Nearest Neighbors

abc Authors: Benjamin Moseley, Sergei Vassilvitskii and Yuyan Wang

Published at: AISTATS, 2021

We use approximate nearest neighbor (ANN) queries to improve the running time of single-linkage and average-linkage, both popular methods for Hierarchical Clustering, for inputs in general metric spaces where ANN techniques may or may not be known. We first propose such a method which works for any general metric space where ANN exists, and then extend it to the more common scenario where an ANN-friendly proxy metric is knwon. We complement our theoretical analysis with an empirical evaluation.

Relational Algorithms for k-means Clustering

abc Authors: Benjamin Moseley, Kirk Pruhs, Alireza Samadian and Yuyan Wang

Published at: ICALP, 2021

Given a relational database, we give a constant approximation k-means algorithm that operates directly on the relational input without converting the relational database to the conventional sample-feature matrix input format. Most data encountered by data scientists is stored in relational databases. Running k-means on a relation database usually requires converting the database into a set of data points, where each data point is a collection of values for different features (attributes). The size of this set could be exponential in the size of the relational database. Our algorithm takes advantage of the highly compact relational representation and achieves a running time that could potentially be sublinear in the number of points the database represents. The algorithm gives a constant approximation of the optimal k-means.

Link: here

Robust Online Correlation Clustering

abc Authors: Silvio Lattanzi, Benjamin Moseley, Sergei Vassilvitskii, Yuyan Wang and Rudy Zhou

Published at: NeurIPS, 2021

In correlation clustering we are given a set of points along with recommendations whether each pair of points should be placed in the same cluster or into separate clusters. The goal is to cluster the points to minimize disagreements from the recommendations. We study the correlation clustering problem in the online setting, where points arrive one at a time, and upon arrival the algorithm must make an irrevocable cluster assignment decision. The online version is known to have a polynomial lower-bound on competitive ratio for any algorithm possible. In this work we go beyond worst case analysis, and show that the celebrated Pivot algorithm performs well when given access to a small number of random samples from the input. Moreover, we prove that Pivot is robust to additional adversarial perturbations of the sample set in this setting. We conclude with an empirical analysis validating our theoretical findings.

talks

Talk 1 on Relevant Topic in Your Field

Published:

This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!

Conference Proceeding talk 3 on Relevant Topic in Your Field

Published:

This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.