correlation among variables
-
Visualization on variables provides multi-dimensional information to obtain some basic ideas of the relationship between categorical variables and ‘AdoptionSpeed’ variable. The correlational plot on all variables is created to assess the positive or negative relations among variables. Figure1 demonstrates the importance of each variable on inflecting ‘AdoptionSpeed’ and the following information can be obtained: ‘Type’, ‘FurLength’, ‘Vaccinated’, ‘Dewormed’, ‘Sterilized’, ‘Fee’ have negative correlation with ‘AdoptionSpeed’ and ‘Age’, ‘Gender’, ‘MaturitySize’, ‘Health’ have positive correlation with ‘AdoptionSpeed’. As mentioned in Data section, 2-breed and 2-color columns, even though the row data has values as the number in these five columns, it only means the breed or color name, so remove breed and color variables for correlation consideration. Among those variables, ‘Type’, ‘Age’, and ‘FurLength’ have relatively strong correlations with ‘AdoptionSpeed’ compared to other categorical variables, with ‘Age’ having the highest correlation coefficient as 0.1, indicating the ‘AdoptionSpeed’ value increases as the age of pet increases; adding a month to pet’s age has a significantly positive, correlational effect on the other, which leads to increase the value of ‘AdoptionSpeed’ and make the pet wait longer to be adopted. It is no surprise that ‘Health’ and ‘MaturitySize’ have positive correlations with ‘AdoptionSpeed’, though the correlation coefficient might be small. People are willing to adopt a pet without any injury happened to it and also prefer to have a small size pet rather than an extra-large pet.
-
Next, pie charts, histograms, scatter plots, and line charts are useful in determining the nature of the interaction between the ‘AdoptionSpeed’ variable and each of the categorical variables under consideration. ‘Type’ variable is used as the stratification value between ‘AdoptionSpeed’ and the other categorical variables and also applying analysis on ‘AdoptionSpeed’ variable.