Gaurav Kumar

I am currently a graduate student in the department of Computer Science at UCSD . Prior to joining UCSD, I worked as a research engineer at Samsung Research Centre, Bengaluru, India, in its Natural Language Understanding (NLU) Team where I work on Samsung's smart assistant Bixby and application of deep learning in natural language processing and computer vision.

I finished my bachelors from Indian Institute of Technology, Kanpur with a major in Electrical Engineering and a minor in Computer Science with emphasis on Machine Learning Applications. I had the opportunity to work with professors such as Dr. Harish Karnick, Dr. Ketan Rajawat and Dr. Tanaya Guha. I have also interned at the Centre for Integrative Neuroscience at University of Tubingen in the summer of 2018.

Email  /  CV  /  Github  /  LinkedIn

profile photo
Research and Project Work

I'm interested in computer vision, natural language processing, image processing and machine learning. Most of my work is related at the intersection of different modalities such as image and text with leverage from strong Optimization techniques.

AMUSED: A Multi-Stream Vector Representation Method for Use In Natural Dialogue
Gaurav Kumar, Rishabh Joshi, Jaspreet Singh, Promod Yenigalla

Goal is to build a coherent and non-monotonous conversational agent with proper discourse and coverage. We propose an end to end multi-stream deep learning architecture which learns unified embeddings for query-response pairs by leveraging contextual information from memory networks and syntactic information by incorporating Graph Convolution Networks (GCN) over their dependency parse. A stream of this network also utilizes transfer learning by pre-training a bidirectional transformer to extract semantic representation for each input sentence and incorporates external knowledge through the neighborhood of the entities from a Knowledge Base (KB). We benchmark these embeddings on next sentence prediction task and significantly improve upon the existing techniques.

Understanding physical properties of objects by visual cues using neural networks
Research Work at Centre for Integrative Neuroscience, University of Tubingen

  • Worked on varying cloth videos to understand their material properties based on certain external parameters.
  • Experimented with CNN+LSTM networks, 2 stream, multi loss networks along with optical flow for feature extraction
  • Proposed triplet loss function for videos similar to FaceNet paper to get their low dimension representation.
  • Results showed strong correlation between human perception of physical properties and neural networks in case of motion with the perceptual graph being logarithmic. Work published in Journal of Vision 2019.
  • Converting Handwritten Mathematical equations to LaTeX
    Course project, Image Modeling Techniques: Supervised by Dr. Tanaya Guha
    [Slides]/[Report]

  • Developed a deep learning based pipeline to convert handwritten mathematical equations to LaTeX from scratch.
  • Incorporated edge detectors, Hough Transform along with other segmentation methods for pre-processing the image and built a novel tree structure based algorithm to identify multilevel superscripts and subscripts.
  • Trained multiple classifiers including SVM's, CNN, Random Forests for symbol recognition and reached accuracies close to 98% even on complex trigonometric symbols.
  • Awarded best project out of 20 others.
  • Detecting Semantically Similar Question Pairs
    Course project, Natural Language Processing: Supervised by Dr. Harish Karnick
    [Slides]

  • Goal was to detect duplicate question pairs on Quora Question Dataset based on sentence semantics & structure.
  • Experimented with Siamese LSTM's and attention based methods along with varied embedding approaches. Incorporated features from dependency tree to model syntactic information which improved the performance.
  • Multiple such models with minor variations when used in ensemble generated accuracies close to 82%.
  • Captcha Breaking
    Course project, Machine Learning Techniques, supervised by Dr. Purushottam Kar
    [Slides]

  • Objective was to build an efficient algorithm to break online squirrel mail client captchas using neural networks
  • Performed feature engineering methods such as clustering, dominating color based segmentation to remove heavy noise and orthogonal cluttering through captcha text before feeding it to a character recognizer.
  • Implemented variants of CNN models from scratch for character recognition and reached 98% accuracy. Entire Captcha breaking reached 85% accuracy for very noisy captchas and 98% for less noisy captchas.
  • Robust Principal Component Analysis and its applications
    Term paper, supervised by Dr. Ketan Rajawat
    [Report]

  • Utilized Robust PCA to perform foreground-background separation in videos and image impainting
  • Worked on methods such as alternating projections, IALM, gradient descent for extracting low rank and sparse components in a matrix. Used CVX Toolbox and ProPack Package in MATLAB to perform the experiments.

  • Inspired from this website