1 Introduction
Many problems of optimal decision fall in the context of nonlinear optimization, where the goal is to find the maximum of a function over some domain ,
(1) 
In this context, the optimal decision results in a single value . However, many applications such as financial management may benefit from finding multiple locations with high values that are also sufficiently distinct from each other (in some sense) so as to build a diversified portfolio. Finding multiple optimal decisions which are suitably distinct presents new challenges. First, we must define the distinctness between selections of . In this work, the ability to sense relative distinctness between proposed portfolios is learned directly from the user by querying the financial expert on the distinctness of a proposed as compared to , which itself can be found through standard blackbox optimization.
Thus, the optimal decision problem can be considered as the problem of identifying a supplemental set of financial portfolios which balance a high value against a degree of distinctness from . The user may choose from among these to supplement in the eventual portfolio (thus these need not be local optima, nor within any specified range).
Finding multiple solutions to an optimization problem has been previously studied. The idea of multimodality has been extensively explored in the evolutionary algorithms literature; see, e.g.,
(Wong, 2015) and references therein. In the context of Bayesian optimization, other authors have explored this idea by finding multiple local optima for robustness (Guenther et al., 2014; Nogueira et al., 2016) or parallelization (Snoek et al., 2012; Ginsbourger et al., 2008; Nguyen et al., 2016) Another interesting area of finding diverse sets of solutions is the determinantal point process (Kulesza & Taskar, 2012).Along the path of recent advances in interactive learning in the context of optimization, this work relies on Bayesian optimization for nonlinear optimization because it is sample efficient. For decision making and preference learning, this sample efficiency is directly translated in a resourceefficient optimization in terms of computation and user queries. Following the same paradigm, (Dewancker et al., 2016) proposed to learn the user’s preference among a set of utility functions in multimetric optimization, (Brochu et al., 2008, 2010) used this concept for learning virtual materials and smoke simulations, (Thatte et al., 2017) applied the same approach for prothesis design and (Okuma et al., 2011) used it for image classification. Recently, (González et al., 2016) presented an alternative to preference learning based on Copeland functions.
2 Using Preferences to Infer Distinctness
Our goal is to build a model which, if given portfolios
, can estimate
(2) 
where is the user’s implicit sense of distinctness. For practical purposes, can be thought of as a distance, although is not required to satisfy the triangle inequality.
Therefore, using the model does not give an explicit distance between two portfolios; instead, it returns the probability that two portfolios are more distinct than two other portfolios. This allows for a flexible sense of distinctness as an explicit form need not be specified
a priori.This model is equivalent to a binary classification problem. Thus, we use logistic regression to model this probability. The input features are
, the concatenation of and , whereis the elementwise absolute value of a vector
.3 Active Preference Sampling within Bayesian Optimization
Algorithm 1 shows the proposed algorithm. As discussed earlier, is determined absent any input from the user: it is simply the global maximum of the portfolio performance function . To find the supplemental distinct “solutions” to the optimization problem we leverage the user’s sense of distinctness to inform decisions made within a Bayesian optimization loop.
First, we perform a standard Bayesian optimization loop where, at each iteration, a new suggestion is selected by maximizing the expected improvement (EI) function (Jones et al., 1998). After the initial optimization for , we focus on the problem of learning/exploiting distinctness. In this case, we continue with the Bayesian optimization, but generate solutions in parallel using the constant liar mechanism (Ginsbourger et al., 2008). During this supplemental search, the next suggestion is defined as the portfolio chosen to be most distinct by the user.
At each iteration, the rankings are converted into pairwise classification data points for the logistic model. For example, if the customer responds that should be ranked as less distinct from than , we now have the data
The choice of the number of ranked portfolios impacts the search for distinctness: fewer portfolios will bias towards exploitation and more portfolios will yield more exploration. More analysis on this impact will benefit the efficiency of the search process. For the experiments in Section 5 we choose which, while chosen somewhat arbitrarily, proves to be a reasonable balance between the amount of data provided from a query and difficulty for a user to resolve that query.
Although we have defined a sequential algorithm where the user is queried after each iteration of the Bayesian optimization algorithm, if the user has accessibility constraints, the user may respond to batch queries immediately after determining to fully prepare the model for the distinctness search. This is the ”Initialize classifier” step, and can be guided by principles similar to (Thatte et al., 2017) or (Dewancker et al., 2016).
4 Selecting Portfolios after the Search
Thus far, our process consists of identifying the globally optimal portfolio , and then searching for supplemental highperforming portfolios which enjoy varying levels of distinctness from based on the user’s preferences (as learned through queries administered either before or during the supplemental search). To resolve a trading strategy based on these supplemental portfolios, we try to address the multicriteria optimization problem
(3) 
Without the actual (we know only the preference model for (2)), we can only resolve our efficient set of points (for which can not be improved without decreasing ) in some partial sense. Let us denote all the suggested portfolios during the supplemental search as and state, without loss of generality, that
We define to be an distinctly efficient portfolio if
As the base case, we always include in the efficient set.
Because portfolios are indexed by nonincreasing function value, a portfolio is distinctly efficient only if the model predicts its distinctness from is greater than each point already in the efficient set that preceded it with probability at least . This value is a free parameter for the user to choose when deciding which portfolios to implement: higher values exclude portfolios which are less certainly distinct.
5 Numerical Experiments
The data for our experiments is pulled from the S&P 500 during 2016. Our trading strategy can take long positions in 5 aggregated industries (industrials, energy, consumer discretionary, utilities, telecommunications). For a specific day on which a trading strategy is desired, the defining (1) is the Sharpe ratio of a given portfolio for the previous 10 trading days, estimated with discrete 1 day intervals.
A portfolio is a partition of unity (i.e., and ), which must be enforced during the Bayesian optimization. To do this, we solve an adjacent problem involving such that our portfolio is
The optimization problem is limited to and is solved on a log scale. This allows us to approximately search the space of partitions of unity for .
We run the optimization loop for a total of 60 iterations to determine . Then a second 60 iteration length Bayesian optimization augmented with the user’s preferences as described in Section 3 to determine a set of points from which we can identify a set of distinctly efficient portfolios.
5.1 Visualizing the Efficient Points
Visualizing the solutions to (3) would normally be managed graphically with the Pareto frontier. In this setting, however, a standard frontier is unavailable because we have only the ability to estimate (2) and not to compute . Furthermore, different choices of will change the portfolios which appear in the efficient set. Figure 1 shows the impact of various values when identifying distinctly efficient portfolios with different preferences.
The first point to recognize in Figure 1 is that, because different preferences search the domain differently, the distribution of portfolio performances will vary; this is why the dashed line representing all 60 observed function values varies. The second point to note is that the portfolios distinct for a given value include all the portfolios distinct for any greater values. As such, the user has one final decision, albeit slightly outside the loop, in choosing the value to identify which of the portfolios from all the results are to be used in effecting a trading strategy.
5.2 Evaluating Portfolio Performance
In this section we study the impact of using distinctly efficient portfolios suggested by the strategy above to execute trades on a specific day. We build portfolios on the second Wednesday of each month of 2016, as formed based on the previous 10 trading days; January was excluded so as to use only data from 2016. The trading strategy consists of supplemented equally by the distinct portfolios,
Recall that are impacted by the choice of .
The performance of a trading algorithm is judged by both the empirical mean and variance observed over the 11 trading days. The left half of
Figure 2 shows the performance of a trading strategy based solely on as well as possible trading strategies learned for various preferences regarding distinctness. In particular, supplementing the portfolio with alternate portfolios generally has the effect of reducing variance, but it has the potential to do so without negatively impacting the mean (which would be the case if supplemental portfolios were added randomly as seen in the spread of random outcomes).The right half of Figure 2 shows the conditions under which variance can be reduced without impacting the mean, namely whether the user prefers distinctness among assets that will perform well. This graph shows a strong positive correlation in empirical mean for portfolios manually loaded with high performing assets and those which are only built with a preference for distinctness in high performing assets.
The important factor here is that we take no explicit action to cause the eventual trading strategy to contain high performing assets; we only encourage exploration of the performance function on the domain with a desire for distinctness as preferred by the user. Essentially, if the user is able to successfully identify assets which will perform well, can be supplemented in such a way as to reward this belief by decreasing the variance without decreasing the mean.
6 Future Work
The impact of varying in identifying the final trading strategy is unclear, especially given that, with enough information, the user’s preferences in all possible pairwise comparisons can be learned. Future work must determine how best to transition the results of the supplemental search into a wellbalanced trading strategy. Furthermore, we should analyze the necessary balance between working on identifying and the supplemental portfolios as evaluating a proposed portfolio could be both costly and time consuming. Additionally, the impact of batch querying after identifying has not yet been studied, but would likely be significant in a practical implementation.
References
 Brochu et al. (2008) Brochu, Eric, Freitas, Nando D, and Ghosh, Abhijeet. Active preference learning with discrete choice data. In Advances in neural information processing systems, pp. 409–416, 2008.
 Brochu et al. (2010) Brochu, Eric, Brochu, Tyson, and de Freitas, Nando. A Bayesian interactive optimization approach to procedural animation design. In Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 103–112. Eurographics Association, 2010.
 Dewancker et al. (2016) Dewancker, Ian, McCourt, Michael, and Ainsworth, Samuel. Interactive preference learning of utility functions for multiobjective optimization. In NIPS Future of Interactive Learning Machines Workshop, 2016.
 Ginsbourger et al. (2008) Ginsbourger, David, Le Riche, Rodolphe, and Carraro, Laurent. A multipoints criterion for deterministic parallel global optimization based on Gaussian processes. Technical report, March 2008.
 González et al. (2016) González, Javier, Dai, Zhenwen, Damianou, Andreas, and Lawrence, Neil. Bayesian optimisation with pairwise preferential returns. In NIPS Workshop in Bayesian Optimization, 2016.
 Guenther et al. (2014) Guenther, J., Lee, H. K. H., and Gray, G. A. Finding and choosing among multiple optima. Applied Mathematics, 5(2):300–317, 2014.
 Jones et al. (1998) Jones, D. R., Schonlau, M., and Welch, W. J. Efficient global optimization of expensive blackbox functions. Journal of Global optimization, 13(4):455–492, 1998.

Kulesza & Taskar (2012)
Kulesza, Alex and Taskar, Ben.
Determinantal point processes for machine learning.
Foundations and Trends in Machine Learning, 5(2–3):123–286, 2012.  Nguyen et al. (2016) Nguyen, V., Gupta, S. K., Rana, S., Li, C., and Venkatesh, S. Think globally, act locally: a local strategy for Bayesian optimization. In In NIPS Workshop on Bayesian Optimization , (NIPS), 2016.
 Nogueira et al. (2016) Nogueira, José, MartinezCantin, Ruben, Bernardino, Alexandre, and Jamone, Lorenzo. Unscented Bayesian optimization for safe robot grasping. In Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2016.

Okuma et al. (2011)
Okuma, Kenji, Brochu, Eric, Lowe, David G., and Little, James J.
An adaptive interface for active localization.
In
International Conference on Computer Vision Theory and Applications
, 2011.  Snoek et al. (2012) Snoek, Jasper, Larochelle, Hugo, and Adams, Ryan P. Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951–2959, 2012.
 Thatte et al. (2017) Thatte, Nitish, Duan, Helei, and Geyer, Hartmut. A sampleefficient blackbox optimizer to train policies for humanintheloop systems with user preferences. IEEE Robotics and Automation Letters, 2(2):993–1000, 2017.
 Wong (2015) Wong, KaChun. Evolutionary multimodal optimization: A short survey. CoRR, abs/1508.00457, 2015. URL http://arxiv.org/abs/1508.00457.
Comments
There are no comments yet.