|
Senior Deep Learning Engineer
I work in the Machine Learning (ML) team, where we deliver to the other
teams the machine learning models on which company products are
grounded. Main responsibilities:
- All phases of models life cycle: problem framing, data
collection, model design, training and validation; finally
quantization for on-device deployment.
-
Design, development and maintenance of the internal deep learning
software infrastructure as well as of its continuous integration
(CI) pipelines.
-
Data engineering tasks, e.g. data wrangling and dataset exploration,
interface with SQL database, quality inspection for data coming
from annotation providers.
|
Research
During my PhD I mainly worked on a variety of topics, including
driver’s gaze prediction, image and video saliency, synthetic data,
differentiable rendering, object pose estimation and image generation.
Representative papers are highlighted.
|
|
Warp and Learn: Novel Views Generation for
Vehicles and Other Objects
Andrea Palazzi,
Luca Bergamini,
Simone Calderara,
Rita Cucchiara
IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI) , 2020
arXiv  / 
code
 / 
bibtex
Self-supervised, semi-parametric approach for synthesizing novel views
of a vehicle starting from a single monocular image. Differently from
parametric (i.e. entirely learning-based) methods, we show how a-priori
geometric knowledge about the object and the 3D world can be integrated
into a deep learning based image generation framework.
|
|
Learning to Detect and Track Visible and Occluded Body
Joints in a Virtual World
Matteo Fabbri,
Fabio Lanzi,
Simone Calderara,
Andrea Palazzi,
Roberto Vezzani,
Rita Cucchiara
European Conference on Computer Vision (ECCV), 2018
arXiv  / 
dataset
 / 
code (dataset)
 / 
code (GTAV mod)
 / 
video  / 
bibtex
To overcome the lack of surveillance data with tracking, body part and
occlusion annotations we exploit the photo-realism of modern videogames
to create a vast Computer Graphics dataset (~500.000 frames, ~ 10
million body poses) for people tracking in urban scenarios.
|
|
End-to-end 6-DoF Object Pose Estimation through
Differentiable Rasterization
Andrea Palazzi,
Luca Bergamini,
Simone Calderara,
Rita Cucchiara
European Conference on Computer Vision (ECCV) Workshops , 2018
arXiv
 / 
code
 / 
bibtex
We introduce an approximated differentiable renderer to refine a
6-DoF pose prediction using only 2D alignment information.
|
|
Predicting the Driver's Focus of Attention: the DR(eye)VE
Project
Andrea Palazzi,
Davide Abati
Francesco Solera,
Simone Calderara,
Rita Cucchiara
IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI) , 2018
arXiv  / 
dataset  / 
code  / 
video  / 
bibtex
We introduce a dataset of human fixations while driving, and a model to
predict them given an urban scene.
|
|
Learning to Map Vehicles into Bird’s Eye View
Andrea Palazzi,
Guido Borghi,
Davide Abati
Simone Calderara,
Rita Cucchiara
International Conference on Image Analysis and Processing, 2017
Best paper honorable mention
arXiv  / 
dataset
 / 
code
 / 
video  / 
bibtex
A dataset with matched localization of vehicles from both camera car and
birdseye view, created from computer games. And a baseline model for
mapping locations across views.
|
|
Learning Where to Attend Like a Human Driver
Andrea Palazzi,
Francesco Solera,
Simone Calderara,
Stefano Alletto,
Rita Cucchiara
Intelligent Vehicles Symposium, 2017
arXiv  / 
code  / 
bibtex
We study the dynamics of the driver's gaze and use it as a proxy to
understand related attentional mechanisms. First, we build our analysis
upon two questions: where and what the driver is looking at? Second,
we model the driver's gaze by training a coarse-to-fine convolutional
network on short sequences extracted from the DR(eye)VE dataset.
|
|
DR(eye)VE: A Dataset for Attention-Based Tasks with
Applications to Autonomous and Assisted Driving
Stefano Alletto,
Andrea Palazzi,
Francesco Solera,
Simone Calderara,
Rita Cucchiara
CVPR Workshops, 2016
arXiv
 / 
code  / 
bibtex
We propose a novel and publicly available dataset acquired during
actual driving. Our dataset, composed by more than 500,000 frames,
contains drivers’ gaze fixations and their temporal integration
providing task-specific saliency maps. Geo-referenced locations,
driving speed and course complete the set of released data.
|
|