The question that everybody wants to answer is whether wine ratings are related to its physicochemical properties. This analysis is then expanded to compare this constraint to a backorder rate constraint. The ability to forecast credit loss accurately is of vital importance to every financial institution for both decision support and regulatory compliance. On the agents end, the most experienced agents (oldest in age and biggest tenures with Ameritas) have been most successful selling UL policies, followed by the youngest group of agents in their thirties and shortest tenures of 2-5 years while the ones with 6-15 years tenure in the 45-55 years age band are more complacent and limited with the sales of these policies. Each relationship is rated based on such factors of risk. In this case, the same modeling results were obtained. The task also includes finding some other important information which can help us to provide better recommendations. Lixin Wang, Geolocation Optimization for Direct Mail Marketing Campaign, August 2019, (Michael Fry, Shu Chen) This data encompasses eleven years of claims. A scoring model, such as logistic regression, is used to compute activation scores from the set of attributes and these are ranked in descending order of activation likelihood. The dataset used for the project belongs to the e-commerce industry, specifically a women’s clothing website. On the contrary, direct marketing focusses on a small set of people who are believed to be interested in the product. Data contains information about 768 females, of which 268 females were diagnosed with Diabetes. With the advent of E-Commerce Industries everything from household items to cars are being made available online. Different Machine Learning algorithms were built to try and understand what are the factors that have a significant influence on the baby’s health and predict the health state of the baby based on these factors with the best possible accuracy. Principal Component Analysis will also be done to try and improve the model performance. To achieve this, I have made use of Apache Spark. Shivang Desai, Heart Disease Prediction, April 2016, (Dungang Liu, Edward Winkofsky) The primary macroeconomic drivers for 2nd Lien balance are Prime Rate, and All Transactions Home Price Index in Huntington footprint. One critical variable that impacts their orders is the time in which their representatives reply to customer emails. We integrate three levels of data (physician level, hospital level and patient level) into one. the IMDB score. For providing the best viewing experience and retaining the users, it is important for the OTT platforms to seamlessly suggest movies to both existing and new users. A recommendation system employs a statistical algorithm to predict the users’ preference and make suggestions based on those preferences. The objective is to analyze the dataset and identify the good customers from the bad customers (“charged off”) using machine learning techniques. The main goals of the project are to study various methods of feature engineering to determine the factors which affect the house prices and to implement and study several advance regression algorithms to predict the housing prices. This will help the bank establish a line of action and to quantify the exposure amount attributed to each customer. This is leading to high collection of user data. The advent of Convolutional Neural Networks has drastically improved the accuracy of image processing. It was decided to build the required model for one parking facility, tune the model and validate the results before the same could be scaled to multiple locations. In this report, we compared three representative types of collaborative filtering approaches derived from three distinct rationales using movie ratings data. This research paper focuses on the survival analysis that was performed to try and understand how long certain types of customers will stay with the company. The optimal value of K is 11 for k-nearest neighbor classifier which gives 98.23% accuracy. The non-emergency nature of elective procedures provides great opportunity for reducing costs. Specifically, the recommended products are beers from a consumer review aggregator named BeerAdvocate. We get an accuracy of 99.95% with the algorithm (adaptive boosting) on the scoring dataset and this is the expected accuracy in a general application using the same setup. For these reasons, maintaining a high renewal rate of STH’s is important to the teams on and off the field performance. The goal of this analysis is to use predictive modeling software to determine the most important variables of player grading for three types of reporting: contribution, durability, and performance. The tool used is Power BI. Based on these characteristics the study will explain which customers segments contribute high monetary value and which customer segments contribute low monetary value to the business. This project is an ad-hoc predictive analysis to determine the target customers for a Paper Towel producing CPG brand, (say YZ) targeting its customers with personalized coupon offers for various retailers. In today’s data driven world, it would make sense to take a similar approach to analyze dating preferences also. The company has its own fleet as well as it can subcontract aircrafts from other companies on a temporary basis. Then, from the deterministic model, a scenario-based stochastic model that assumes varying processing times is developed. The goal of this paper is to develop a valid deterministic mixed-integer linear-programming (MILP) model from which a valid stochastic model can be derived, and to explore how such models can potentially be utilized in an actual clinical setting. Verizon would want to leverage data insights and analytics to address these issues. This research was approached using two separate methodologies so as to compare results and also determine which one provides clearer results. The population for a direct mail campaign is selected by considering a variety of factors including marketing costs, mail offer, response score deciles, present value of the prospective customer, FICO score, response rate and approval rate of the customer. For example, the hospital requires that 90% of all cleaning requests should be completed within 60 minutes. Models presented very good results in predicting house price from the perspective of model MSE. Sravya Kasinadhuni, Email Fraud Detection- Spam and Ham Classification for Enron Email Dataset, July 2015, (Andrew Harrison, Edward Winkofsky) In order to be a truly customer-centric organization, there must first be a true understanding of who the consumer is which includes their needs, attitudes, preferences and behaviors. The data includes information on the winner, loser, rank, points of both the players in each set, venue, surface of the court and tournament details. I will also analyze how each of these have changed over the last ten years. Like humans, rodents also show a preference for high-fat diets. Random forest gave out the best results for the prediction of default based on the data. Cost curves are developed for these various scenarios to provide graphical support of the effects and tradeoffs inventory system decisions can have on total costs. Abhilash Mittapalli, Framework for Measuring Quality of User Experience on Cerkl and Analyzing Factors of High Impact, July 2016, (Dungang Liu, Tarek Kamil) Many organizations in India hope to detect and prevent this disease among people living in rural areas where medical screening is difficult to conduct. As they say, in many cases a subscriber may visit Netflix without knowing what exactly to watch. Analyzing factors such as distance, passenger traffic, competition among airlines and market dominance of a particular airline an attempt has been made to arrive at statistically significant results using linear regression. MS in Business Analytics Due to the large population density, physicians may need to take hundreds of patients every day at hospital, which is really time-consuming for patients. Many carriers have started working on SMS spam by allowing subscribers to report spam and taking action after appropriate investigation.
AutoPilot: 0,00 Euro Dus op de Duitse autobahn was dit bij 233 km/h helemaal niet goed gegaan verwacht ik. Companies need to understand the fluctuations of demand to keep the right amount of inventory on hand. Gene analysis is considered as a feasible approach for the predication of patient’s survival time. Currently, the next-best action planning is driven by analyzing the free text across multiple departments such as Oncology & Respiratory. There is a very high possibility of increasing mobile users in the future in these divisions. Lei Xia, A Study of Panel Data Analysis, May 23, 2011 (Martin Levy, Yan Yu) To facilitate these promotional activities, the manufacturer invests millions of dollars, which is driven by strategic negotiations with the retailers. Then, 10 different types of models and a model comparison are provided in the modeling section in order to find the best model to predict the foreclosure. The objective of this project is to show empirically why the robust optimization model performs better than the other models in the face of uncertainty. Our analysis is meant to assist teams during the decision-making process of drafting players by quantifying the tradeoffs inherent in each potential decision. While this was implemented for the presidential debate, the functions used are reusable and hence can be used to get the score for any other brand. In this project, COVID hotspots (counties) were identified to understand the risk of the spread of disease while taking important business decisions involving the geographical location. Kristen Bell, Effects of Bus Arrivals on Emergency-Department Patients, June 1, 2011 (David Kelton, Craig Froehle) Based on the results, the Lasso model with λ equal to 403.429 are preferred according to the performance of the results. However, if the amount of traffic through the intersection were to increase upwards of 20 percent, then a traffic light could possibly help decrease average queue lengths and the overall time spent in the intersection by all cars throughout the day. Satwinderpal Makkad, Functional Data Analysis of Plant Closings, April 2016, (Amitabh Raturi, Peng Wang) As part of an on-going project with UC MS - Business Analytics faculty, several attempts have been made at optimizing the scheduling process across all locations and specialties. Mahitha Sree Tammineedi, Analysis and Design of Balance Transfer Campaigns, July 2019, (Charles R. Sox, Jacob George) We had the employee feedback data. Arathi Nair, Demand Forecasting for Low-Volume, High-Variability Industrial Safety Products under Seasonality and Trend, December 4, 2013 (Uday Rao, David Kelton) It is composed of two projects. As an Analyst in the Business Intelligence team at Interstates Control Systems, West Chester, OH, I have worked on data extraction, data cleaning and data visualization with prime focus on Descriptive analytics. Based upon the results of the analysis it was found that the revenue generated from total inpatient services was negatively correlated to the net inpatient income but was positively correlated to the overall net income of the hospitals. Air-fare and the cost of flying have always been as much a matter of discussion as they have been a matter of speculation. In 2015, The University of Cincinnati began to transition its Student Information System from a homegrown system to a system created by Oracle PeopleSoft called Campus Solutions and branded by UC as Catalyst. Since healthcare service always related to issues of mortality and life quality for patients, hence online healthcare services and the patient satisfaction are always important to keep this industry running safely and efficiently. With a more effective recommendation system in place, the bank can better meet the individual needs of all customers and ensure their satisfaction. For this project, a linear model, generalized additive model, neural network model, and classification tree model are used to predict purchase prices in dollars. The data exhibits multiple seasonality with weekly and annual periods. One of the most important aspects of investment in volatile assets is risk control. The goal of the current project is to use the MovieLens data in R and build recommendation engines based on different techniques using the Recommender Lab package. Currently, the pharmacy chain delivers its prescriptions from seven sites using seven vehicles. The aim of this study is to show some of the advantages and disadvantages of Bayesian models in application to a particular data set used for classification in credit scoring. Harsh Singal, Python Notebooks for Data Mining Course, August 2020, (Yan Yu, Peng Wang). Visionworks business drives mostly on comprehensive eye exams and exam conversion percentage. The completion and service level agreement (SLA) compliance rate for IEN (Intelligence Engineering Network) projects at Verizon is lower than desired. A comparison of disparate classification methods will be evaluated on their predictive capabilities as well as the steps required in construction of the model. But the application which motivated me the most to take up a project in computer vision was Autopilot feature in Tesla. However, in the data driven model the accuracy was somewhat reduced and the other modeling techniques, except for classification trees, showed improvement. A logistic scoring model is built to create response or activation scores from the characteristic attributes that describe the name list. The goal of this project is to provide holistic experience to a user by providing recommendations based on various criteria such as popularity, user-user collaborative filtering and content filtering. In our analysis, we chose the primary elements to be Antimony (Sb), Germanium (Ge), Tellurium (Te) and the dopants for these combinations of primary elements are Ti, Bi, BiN, Mo, N, Sc, Al, AlSc, SiC, In, C, Si, SiN, O, W, Se, Er, Gd, Sn. If a bank is able to identify customers who have potential to spend more next year than what they have spent this year they can market better products to them and increase customer satisfaction along with their profits. Various binary classification models like logistic regression, random forest, XGBoost have been built and compared based on classifier performance and ability to correctly classify churned customers. In this analysis we will predict the price of second hand cars whose ads were posted on EBAY Kleinanzeigen based on various attributes of the car made available by the sellers. The second one is to build the ARIMA model to predict the call volume. Backward selection was utilized to select predictors and removed one at a time if determined that the predictor variable does not contribute to the overall prediction. However, the power to identify designer based on the fashion accessory is now becoming commonplace with the advent of computer vision. Coupons.com recognizes the power of context and has built Retailer iQ (RiQ) Platform that engage consumers with Client’s content in ways that are most relevant to them when they are most apt to receive it. The model clearly segregate event and nonevent. The classifier assumes that each new complaint is assigned to one and only one category. This project explores the AMES Housing Dataset which contains information on the residential property sales that occurred in Ames, Ohio from 2006 to 2010. Marketing activities require careful planning so that every step of the process is understood before you launch. Applicants’ demographic information is usually prohibited for collection but it is needed to perform fair lending analysis. The Nelson-Siegel factor model is used to fit the Treasury bond yield data from 1985 through 2000. These predictions will be achieved by implementing multiple Machine Learning Algorithms and then comparing them to find the best model for prediction purpose. As the production of its products is handled by a vendor, Dymatize needed upfront prediction on the demand from different customers for different products for a proper inventory planning and expansion of its business. Using statistical modeling, it would be possible to determine the various causes/factors that lead to employee attrition and predict whether an employee would leave the organization or not. Data from 1990 to 2011 are public and collected by the Bureau of Labor Statistics and the U.S. Census Bureau. It was found that the more positive the rating is, the more likely a peer customer votes the review to be positive. Furthermore, machine learning techniques such as Logistic regression, Classification tree, Random forests, and Support Vector Machine were used to predict the income level and subsequently identify the model that most accurately predicted the income level of an individual. Breast cancer is a cancer that develops from the breast tissue. This project was intended to find out the categories which are critical for employee satisfaction and which the clients need to focus on. The Hamilton County Auditor’s Office maintains and publishes real estate sale records. Further, consideration of market supply and demand during the same period gives a view into the drivers behind the decline and rebounds around the recession. Feature variables describe characteristics of the cell nuclei present in the image. By comparing the various model performances it is observed that for out-of-sample prediction, neural networks, logistic regression and random forests perform better. The multiple linear regression model has the best stability and acceptable predictive power. Motorists with riding experience less than 5 years have larger likelihood of Tibia Fracture injury. The results show how using test versus control groups helps in measuring true lift. The first three cases contains three different numbers of independent regressors. Additionally, based upon further research and simulation, alternative methods for outlier detection including Cumulative-sum (CUSUM) and Exponentially-weighted-moving-average (EWMA) control charts are compared and a path forward for the diagnostic tool is proposed. China has experienced rapid economic growth which benefited many industries but not the healthcare system. Aashish Reddy Takkala, Next-Purchase Propensity of a Customer, April 15, 2013 (David Rogers, Jeffrey Camm) The IBM Behavior Based Customer Insight for Banking solution works with IBM Predictive Customer Insight. Data used in this experiment is obtained from a survey conducted in Slovakia on participants aged 15-30 years. The biggest problem a company faces is of Churned Customers. In order to improve customer satisfaction, certain changes were deployed in June 2015 to make exceptions to these policies and refund the customers. Through response model analysis, 10k more customers could be targeted resulting in 26 more credit card accounts booked and a Net Present Value (NPV) increase of 9,824 USD for the bank. For the assessment of the analytic tools, I gathered the analytical requirements of the team and identified 4 evaluation phases. Lindner College of Business This makes the whole process of model building easier but in the hind sight the models includes all levels of a categorical variable without taking into consideration their significance. The project also includes creating various visualization scenarios with graphs and reports to generate business insights and layout steps in the campaign lifecycle. Once the best solution has been established for this use case, it can be then easily transferred onto other products being sold across the E-commerce spectrum. The logistic regression model produces better prediction in both the training and testing datasets and the classification tree provides evidence that the number of carts opened is the most statistically significant variable, prompting the management to focus the marketing efforts on visitors who put items in the cart and then abandon them later on. Then, the report also tries to build several statistical models which can predict the probability of an employee leaving the company given his information and conclude on the best model having highest performance. Some of the benefits of this system if implemented were that Schwab would be: The desired system will not only help the team in monitoring but also the development team to understand the performance of the application and the infrastructure management team to understand the amount of server usage for capacity planning. Yang He, Incremental Response Model for Improved Customer Targeting, July 2018. Item_MRP turned out to be the most important variable, followed by Outlet Type. The final model chosen was the linear model, which performed best. The main goal of this project is the design of a decision-support system to illustrate and measure the risk, flexibility, and potential impact of the decisions involving mass customization. We implement a branch-and-price column generation algorithm to overcome the problem of an intractable number of variables. Using machine learning would help these two teams to significantly reduce effort, time and money needed to classify the products and check the classifications. Organizations have a constant need to assess where they stand day-in and day-out and where they can improve. Regression tree shows better prediction performance. The process of dividing a market into homogeneous groups of customers is known as market segmentation. One issue is determining which model should be followed as a guide for investors to make an informed portfolio decision. For Cincinnati Bell to see positive ROIs customers must continue with their service for a minimum of several months. The total number of participants is 98. Our Model would help classify ships or vessels into respective categories and would save Maritime and Coastguard agency crucial time to respond to any emergencies. Ideally, the data should be the collection of optimal scores gleaned from past direct-mail campaigns. When a different company owns or guarantees for a company, the latter’s direct exposure also shows up as indirect exposure for the former. In this project, I parse and explore the database downloaded from drugbank website (https://www.drugbank.ca/), which contains plenty of information, including targets, manufacturer, price, monoisotopic mass, metabolism, toxicity, etc of over 13,000 drugs. The hospital management finds difficulty in manually deriving a nurse roster for a six-week period while trying to place an adequate number of nurses in the emergency-care unit of the hospital. The buying pattern is modeled as a Markov chain and transition-probability matrices are calculated for several product categories. Widespread application of the inverse problem in medicine, mathematical physics, meteorology, and economics has attracted much research. To solve this problem, I have used a concept called multi-label classification. Both simulations and real applications show encouraging results of the proposed estimators. Lending club issues unsecured loans to different segments of customers. It is becoming a major deterrent in customer usage of credit cards. Emily Fischer, COVID-19 Impact on Strength Training, August 2020 (Leonardo Lozano, Alex Wolfe). A comparison of the different techniques is also provided. Precision and Recall values are used as indicators of efficacy of a scheme. A lot of players are injured, he wont get much playing time if everyones fit. As part of CCB fraud modeling team my role is to build machine learning model for predicting Credit Card Bust out Account fraud.