Project Description :
This project’s purpose is to help lost animals to find a new home. When an animal is adopted, the maybe futur masters give interest to the animal’s picture before taking any decision. It is comprehensible when all the people won’t necessarily want to have the same dog’s race according to the family’s members, the desires etc… However, when there is two animals with the same race, the picture’s quality can lead to choose one dog over another.
The file that we have :
We have 1.04 Go of the following informations :
Two files : “test” and “train” that contain each some pictures of pets
Test : 8 pics (picked randomly when testing our algorithm)
Train : 9912 pics
2 CSV files that contains these metadatas :
1 example file for submission
the train.csv file and the test folders’s pictures are corresponding to the testing set, which permit us to mesurate the final model’s error on the unknown data for it, and mesurate its performance on these datas.
The differents variables
the two types of datas that we have are pictures and pictures’s metadatas . The metadatas are corresponding to the caracteristics that have been entered by hand. We can use thoses metadatas in addition of our pictures to create our model.
The pictures metadatas are inside the train.csv and test.csv files. Each pet’s pictures is labeled with the value 1(yes) or 0(no) for the following features :
Focus — The animal can be differencied from the neutral background, not to near or far.
Eyes — The two eyes are facing the front or near the front, with at least 1 eye correctly clear.
Face — The face is clear, facing front or near.
Near — One pet is occuping an important place of the picture (nearly 50% of the height or width of the picture)
Action — The pet is in the middle of an action (jumping for example)
Accessory — Accessory physical or numerical (toy or an electronic sticker), but not the collar or the leash
Group — More than one pet in the picture
Collage — The picture has been photoshopped (meaning combined with other pictures in one, adding a frame)
Human — There is a human on the picture
Occlusion —Unwanted objects on the picture hiding one part of the pet (a human, the cage, fencing). Please note that all the unwanted objects are not necessarily considerated like an occlusion
Info — Text or sticker added (name of the pet, description)
Blur — the picture is blurred. Note that if this is true, the “Eyes” column is always on 0.
Thoses variables are binaries.
Problem and hypothesis :
How can we ameliorate the pet adoption with datasciences adapted on pictures ?
- The pictures where pets have the eyes turning front are more appreciated.
- When there is a human on the picture, people are less wanting to adopt the pet.
- The blurred pictures have a huge impact on the adoption
- The more information there is on the pet, the more it will be adopted.
The code :
commencing by adding our libraries and store our datas :
Now, we have to check the datas :
We have all of our csv files : the train one, the test one, the submission one.
Verify the train csv file :
this file contains all the principal parameteers that we want, and the score of popularity which is from statistics of the page consultation. It is the column we want to predict.
Let’s modify the files names to facilitate the access.
Verification of the images numbers :
Verification of the pawpularity score distribution :
Normalisation of the pawpularity score :
The score is an integer, so in addition of being a regression problem, it may be considered as a 100 classes classification problem. We decided to normalize this score between 0 and 1.
Taking an example picture to see how it is :
We have this picture :
We now load datas as Dataloader objects. We are using the normalized score as sticker.
We are training a Swin Transformer as reference base. We are using the paquet of Ross Wingman to define the model. Because this competition don’t alow the access to internet, we have added the pre-trained weights of timm as data, and this code will allow timm to find the files :
model definition :
We define also the metric we want to use. We multiply it by 100 to obtain an RMSE acceptable to our prediction.
Let’s define the learner for this task, and use also the mixed precision. We are using BCEWithLogitsLoss to trait this problem as a classification problem.
We have now a learner object. We now have to find the most optimal learning rate :
We are choosing a learning rate of 0.002. We save the best model and use early stopping callback.
We had traced the loss, put again the model in fp32, and now we can export the model is we want to use it later (for an interference kernel)
We are using an interference with fatsai. We pre-treat the test csv the same way of the training csv, and the dls.test_dl function allow you to create the test dataloader using the same pipeline we defined before.
We can confirm that test_dl is correct (test pictures we have on example are just noise, it is normal)
We now try to obtain predictions.