11:00 - 11:20
Transfer Learning in Credit Risk (491)
Hendra Suryanto (Rich Data Coroporation, Singapore), Charles Guan (Rich Data Coroporation, Singapore), Andrew Voumard (Rich Data Coroporation, Singapore), Ghassan Beydoun (University of Technology Sydney)
In the credit risk domain, lenders frequently face situations where there is no, or limited historical lending outcome data. This generally results in limited or unaffordable credit for some individuals and small businesses. Transfer learning can potentially reduce this limitation, by leveraging knowledge from related domains, with sufficient outcome data. We investigated the potential for applying transfer learning across various credit domains, for example, from the credit card lending and debt consolidation domain into the small business lending domain.
11:20 - 11:40
A Deep Multi-Task Approach for Residual Value Forecasting (410)
Ahmed Rashed (University of Hildesheim), Shayan Jawed (University of Hildesheim), Jens Rehberg (Volkswagen Financial Services AG, Braunschweig), Josif Grabocka (University of Hildesheim), Lars Schmidt-Thieme (University of Hildesheim), Andre Hintsches (Volkswagen Financial Services AG, Braunschweig)
Residual value forecasting plays an important role in many areas, e.g., for vehicles to price leasing contracts. High forecasting accuracy is crucial as any overestimation will lead to lost sales due to customer dissatisfaction, while underestimation will lead to a direct loss in revenue when reselling the car at the end of the leasing contract. Current forecasting models mainly rely on the trend analysis of historical sales records. However, these models require extensive manual steps to filter and preprocess those records which in term limits the frequency at which these models can be updated with new data. To automate, improve and speed up residual value forecasting we propose a multi-task model that utilizes besides the residual value itself as the main target, the actual mileage that has been driven as a co-target. While combining off-the-shelf regression models with careful feature engineering yields already useful models, we show that for residual values further quantities such as the actual driven mileage contains further useful information. As this information is not available when contracts are made and the forecast is due, we model the problem as a multi-task model that uses actual mileage as a training co-target. Experiments on three Volkswagen car models show that the proposed model significantly outperforms the straight-forward modeling as a single-target regression problem.
11:40 - 12:00
Shallow Self-Learning for Reject Inference in Credit Scoring (730)
Nikita Kozodoi (Humboldt University of Berlin; Kreditech, Hamburg), Panagiotis Katsas (Kreditech, Hamburg), Stefan Lessmann (Humboldt University of Berlin), Luis Moreira-Matias (Kreditech, Hamburg), Konstantinos Papakonstantinou (Kreditech, Hamburg)
Credit scoring models support loan approval decisions in the financial services industry. Lenders train these models on data from previously granted credit applications, where the borrowers' repayment behavior has been observed. This approach creates sample bias. The scoring model is trained on accepted cases only. Applying the model to screen applications from the population of all borrowers degrades its performance. Reject inference comprises techniques to overcome sampling bias through assigning labels to rejected cases. This paper makes two contributions. First, we propose a self-learning framework for reject inference. The framework is geared toward real-world credit scoring requirements through considering distinct training regimes for labeling and model training. Second, we introduce a new measure to assess the effectiveness of reject inference strategies. Our measure leverages domain knowledge to avoid artificial labeling of rejected cases during evaluation. We demonstrate this approach to offer a robust and operational assessment of reject inference. Experiments on a real-world credit scoring data set confirm the superiority of the suggested self-learning framework over previous reject inference strategies. We also find strong evidence in favor of the proposed evaluation measure assessing reject inference strategies more reliably, raising the performance of the eventual scoring model.
12:00 - 12:20
Cold-Start Recommendation for On-Demand Cinemas (689)
Beibei Li (State Key Laboratory of Computer Sciences, Institute of Software, ChineseAcademy of Sciences; University of Chinese Academy of Sciences), Beihong Jin (State Key Laboratory of Computer Sciences, Institute of Software, ChineseAcademy of Sciences; University of Chinese Academy of Sciences), Taofeng Xue (State Key Laboratory of Computer Sciences, Institute of Software, ChineseAcademy of Sciences; University of Chinese Academy of Sciences), Kunchi Liu (State Key Laboratory of Computer Sciences, Institute of Software, ChineseAcademy of Sciences; University of Chinese Academy of Sciences), Qi Zhang (Beijing iQIYI Cinema Management Co., Ltd.), Sihua Tian (Beijing iQIYI Cinema Management Co., Ltd.)
The on-demand cinemas, which has emerged in recent years, provide offline entertainment venues for individuals and small groups. Because of the limitation of network speed and storage space, it is necessary to recommend movies to cinemas, that is, to suggest cinemas to download the recommended movies in advance. This is particularly true for new cinemas. For the new cinema cold-start recommendation, we build a matrix factorization framework and then fuse location categories of cinemas and co-popular relationship between movies in the framework. Specifically, location categories of cinemas are learned through LDA from the type information of POIs around the cinemas and used to approximate cinema latent representations. Moreover, a SPPMI matrix that reflects co-popular relationship between movies is constructed and factorized collectively with the interaction matrix by sharing the same item latent representations. Extensive experiments on real-world data are conducted. The experimental results show that the proposed approach yields significant improvements over state-of-the-art cold-start recommenders in terms of precision, recall and NDCG.
12:20 - 12:40
Scalable Bid Landscape Forecasting in Real-time Bidding (123)
Aritra Ghosh (University of Massachusetts, Amherst), Saayan Mitra (Adobe Research), Somdeb Sarkhel (Adobe Research), Jason Xie (Adobe Advertising Cloud), Gang Wu (Adobe Research), Viswanathan Swaminathan (Adobe Research)
In programmatic advertising, ad slots are usually sold using second-price (SP) auctions in real-time. The highest bidding advertiser wins but pays only the second highest bid (known as the winning price). In SP, for a single item, the dominant strategy of each bidder is to bid the true value from the bidder's perspective. However, in a practical setting, with budget constraints, bidding the true value is a sub-optimal strategy. Hence, to devise an optimal bidding strategy, it is of utmost importance to learn the winning price distribution accurately. Moreover, a demand-side platform (DSP), which bids on behalf of advertisers, observes the winning price if it wins the auction. For losing auctions, DSPs can only treat its bidding price as the lower bound for the unknown winning price. In literature, typically censored regression is used to model such partially observed data. A common assumption in censored regression is that the winning price is drawn from a fixed variance (homoscedastic) uni-modal distribution (most often Gaussian). However, in reality, these assumptions are often violated.We relax these assumptions and propose a heteroscedastic fully parametric censored regression approach, as well as a mixture density censored network.Our approach not only generalizes censored regression but also provides flexibility to model arbitrarily distributed real-world data. Experimental evaluation on the publicly available dataset for winning price estimation demonstrates the effectiveness of our method. Furthermore, we evaluate our algorithm on one of thelargest demand-side platform and significant improvement has been achievedin comparison with the baseline solutions.