# Student Papers and Presentations

As the program was approaching its conclusion, students presented their work to the group. Below are the abstracts of the presentations, with links to the slides of the talks and to the full papers where available. Several students are currently working to polish their papers so more papers will be posted in the near future.

### Ariel Schvartzman, “An Approximation Algorithm for a Constraint Satisfaction Problem.” Faculty mentor: Yury Makarychev.

The Constraint Satisfaction Problem (CSP) encompasses a large family of problems in combinatorial optimization, including notorious NP-Hard problems such as 3-SAT or Max Cut. In fact, these two problems and many other CSPs are even NP-Hard to approximate within certain constant factors. Therefore, the most interesting results in this field are concerned with approximation algorithms and matching hardness of approximation proofs.

An instance of a $k$-CSP($d$) is a set $C$ of constraints and a set $X$ of variables such that every clause contains at most $k$ variables. We wish to assign values from the domain $[0, 1, ..., d-1]$ to the variables so as to maximize the number of satisfied constraints.

In this talk, we present a slight improvement to a recent algorithm by Makarychev and Makarychev [MM14] for this problem. The MM algorithm is efficient when $k > \log(d)$. We extend their result to all instances of the problem with the same approximation ratio, $O(kd/d^k)$. The previous best approximation ratio for $k < \log(d)$ was $O(d/d^k)$.

### Nathaniel Sauder, “Pooling Mechanisms and learnt invariance under rigid motion and deformation in Convolutional Neural Networks.” Faculty mentor: Greg Shakhnarovich.

Deep convolutional neural networks (DCNN) have emerged as a dominant architecture for a variety of computer vision tasks, including image classification, category level detection and segmentation. Local response pooling operation in such networks is important both for reducing computational and statistical cost of training the DCNNs, and for introducing desired invariances into the models. In this talk we consider modifications to commonly used max and average pooling schemes and methods of ensuring invariance under small deformations and rigid motions.

### Max Cytrynbaum and Wei Hu, “Maximum likelihood estimation of a multi-dimensional log-concave density via sos-convexity.” Faculty mentor: John Lafferty.

Shape constraints are a common approach for solving non-parametric density estimation problems. Log-concavity, in particular, is a well studied constraint with a variety of nice properties. In this talk, we consider a tractable relaxation of the general log-concave density estimation problem. Specifically, we let $s(x)$ be a multivariate convex polynomial and consider densities of the form $e^{-s(x)}$. While checking if a polynomial is convex is NP-hard in general, sos-convexity (sos="sum of squares") can be determined using semidefinite programming. In this context, we formulate maximum likelihood estimation of a log-sos-concave density as a convex optimization problem. We propose an algorithm using projected stochastic gradient method, in which biased gradient estimates are obtained through a Markov Chain Monte Carlo sampling procedure that is efficient for log-concave densities. Convergence rate of our algorithm and the theoretical properties of our estimator are demonstrated.

### Yo Joong (YJ) Choe: “Learning Additive Models with Sparse Convexity Patterns.” Faculty mentor: John Lafferty.

In nonparametric statistics, additive modeling is an efficient and interpretable technique for high-dimensional models, even when the true model is not additive. Shape constraints, such as requiring functions to be convex or monotone, also allow tractable methods that apply to numerous examples.

Here, we propose a novel estimation technique that combines additive models with shape constraints. Specifically, we consider a regression model in which the true function is additive and each component is either convex, concave, or identically zero. This model extends the idea of sparse additive models (SpAM) proposed by Ravikumar et al. (2008).

We first show that the problem can be expressed as a 0-1 mixed-integer second-order cone program (MISOCP), for which there exist solvers based on heuristics. Then, we describe our recently discovered quadratic program (QP) formulation which extends the idea of Lasso (Tibshirani, 1996). We present real-world examples as well as simulations comparing the different formulations.

### Ridwan Syed, “Approximation Resistance and Linear Threshold Predicates.” Faculty mentor: Madhur Tulsiani.

In the boolean $Max-k-CSP(f)$ problem we are given a predicate $f:\{-1,1\}^k \rightarrow \{-1,1\}$, a set of variables, and local constraints of the form $f(x_1,...,x_k)$, where each $x_i$ is either a variable or negated variable. The goal is to assign values to variables as to maximize the fraction of constraints which evaluate to $1$.

Many such problems are NP-Hard to solve exactly, and some are even NP-Hard to approximate better than a random assignment. Such problems, or rather their corresponding predicates $f$, are called approximation resistant.

Recently Khot et al. gave a characterization (assuming the Unique Games Conjecture) of a modified notion of approximation resistance in terms of the existence of a measure over a convex polytope corresponding to the predicate. In this talk we discuss this characterization as well as the approximability of linear threshold predicates (predicates of the form $sgn(w_1x_1+...+w_kx_k)$). We present evidence that a certain family of linear threshold predicates may be approximation resistant.

### Liran Chen, “Learning ensembles of convolutional networks.” Faculty mentor: Greg Shakhnarovich.

Convolutional Network Models have demonstrated impressive performance in image classification. When fitting complex models with non-convex objectives to train the network, the resulting model depends on stochastic learning procedure, i.e., the final network trained with gradient descent depends on the order of data in each epoch, initialization, learning rates, etc.

Ensemble learning is a method for generating multiple versions of a predictor network and using them to get an aggregated prediction. It reduces the variance portions in the bias-variance decomposition of the prediction error. In our project we have experimented with different ensemble methods that tend to contribute to dramatic error reduction. The tradeoff between number of models and their complexity (as controlled by learning duration) has been investigated and we show that ensemble learning may lead to accuracy gains along with reduction in training time.

### Stella Biderman, Kevin Cuddy, Min Jae Song, and Ang Li, “Complexity measures for Boolean functions. Sensitivity of uniform hypergraph properties.” Faculty mentor: Laszlo Babai.

We present a graph property with sensitivity $\Theta(\sqrt{n})$, where $n = {v \choose {2}}$ is the number of variables, and generalize it to a $k$-uniform hypergraph property with sensitivity $\Theta(\sqrt{n})$, where $n = {n \choose k}$ is again the number of variables. This yields smallest sensitivity yet achieved for a $k$-uniform hypergraph property. We then show that, for even $k$, there is a $k$-uniform hypergraph property that demonstrates a quadratic gap between sensitivity and block sensitivity. This matches the previously known largest gap found by Rubinstein (1995) for a general boolean function and Chakraborty (2005) for a cyclically invariant boolean function, and is the first known example of such a gap for a graph or hypergraph property.

### Shane Settle. “Automated Parameter Tuning for Machine Learning.” Faculty mentor: Greg Shakhnarovich

When constructing a model in machine learning, the goal is to build an accurate predictor by minimizing a loss function with respect to a training set. When minimizing the empirical loss of the model, it is important to keep in mind the trade off being made between the bias and variance of the model. Regularization, for example penalizing the norm of the parameters, offers a degree of control of this trade off. The use of regularization adds an additional parameter (hyperparameter) which itself needs to be set based on the available training data. Tuning this parameter is a potentially costly process, involving searching in a large space. This project explores methods for regularization parameter tuning, with the goal of providing a fully automated procedure that uses computation efficiently.

### Hing Yin (Joseph) Tsang, “Boolean Functions with Low Sensitivity.” Faculty mentor: Laszlo Babai.

We review several complexity measures for Boolean functions which are related to the study of the decision tree model. Among those measures, sensitivity is the simplest to define, but the most difficult to prove upper bounds in terms of it. We discuss the known upper bounds on other measures in terms of sensitivity and the proof techniques. By a simple decision tree construction, we suggest two conjectures that, if both are true, would imply a polynomial upper bound on decision tree complexity in terms of sensitivity.

### Enkhzaya (Eza) Enkhtaivan. “Property Testing in the Orientation Model.” Faculty mentor: Laszlo Babai.

We introduce property testing in the orientation model, which is an orientation of a given underlying undirected graph with bounded degree. For a certain class of bounded degree graphs G, it has been proven that strong connectivity can be tested in constant queries. In this paper, we prove this result when G is the toroidal grid.