- I am honored to be elected as one of the members of the National Academy of Engineering (NAE). As an engineer and a scientist, I have always looked up to the distinguished members of NAE. Many of them have inspired me throughout my career. It is humbling to join this group of iconic engineers. Read the Google blog-post about this.
- I am also humbled and honored to be elected an ACM Fellow. This award is really close to my heart. As a computer scientist and a search academic, I have always looked up to the people who have been elected ACM fellows. The ACM fellows have dedicated their lives to advancement of computing, including my mentor, my advisor, the late Gerard Salton, who was elected an ACM Fellow in 1995.
- Last year I was also awarded The Asian Award for Outstanding Achievement in Science & Technology. Here is a video clip from that wonderful ceremony. And the organizers got the pictures from the ceremony broadcasted on Times Square.
My research interests are in the area of information retrieval
(IR), its application to web search, web graph analysis, and user interfaces for search. Here are some of my selected
publications (chronologically ordered). At Google I have worked on using IR techniques to improve web search. Before joining Google in 2000. I did research in the following sub-areas of Information Retrieval:
- Speech Retrieval: Increasing amounts of spoken communication are stored in digital form for archival purposes (for instance, broadcasts material). With advances in automatic speech recognition (ASR) technology, it is now possible to automatically transcribe speech with reasonable accuracy. Once transcribed, IR methods can be used to search speech collections. Think of this as a search engine for speech. However, the interesting problem is to search speech given large number of automatic speech recognition errors. More recently I have done some work in this area. When at AT&T Labs, we developed SCAN, a system that combines speech recognition, information retrieval and user interface techniques to provide a multimodal interface to speech archives.
- Document Ranking: Also called text/document searching/retrieval (that makes four phrases by the way), this is the best known part of our field. If you are reading this page, chances are that you have already used a "search engine" before. Document ranking is what search engines do: given a user query, how to rank a large collection of documents (web pages, news articles, your email, someone else's email that you happen to have hacked, ...) so that what you are looking for is ranked ahead of other less useful (or useless) documents.
- Question Answering: People have questions and they need answers,
not documents. Automatic question answering will definitely be a
significant advance in the state-of-art information retrieval
technology. Systems that can do reliable question answering without
domain restrictions have not been developed yet.
I organized the first few runnings of the QA Track under the Text REtrieval Conference (TREC) umbrella to advance this sub-field of language processing.
- Document Routing/Filtering: This is the "query by example" version of document ranking. Once you point the system to a few "good documents", the system then tracks all NEW documents and points you to only those ones that you should be looking at. Typically the system tries to find new documents that are similar to the documents that you said were good.
- Automatic Text Summarization: Documents are huge and we don't always want to read them all. (I don't know about you but I certainly don't have the patience. And given the stuff you find on the web ...) Techniques that automatically "summarize" documents will be tremendously useful. Domain independent text summarization is very hard, at times even for humans; typically machines do summarization by text extraction. Relevant pieces (sentences, paragraphs, ...) of text are typically extracted and presented as a "summary".
- Miscellaneous (TREC): Since 1992 National Institute of Standards in Technology (NIST) (along with DARPA) sponsors an annual conference called Text REtrieval Conference (TREC) to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies. I have been actively participating in TRECs since TREC-3 (held in 1994).
I was born in India in the state of Uttar Pradesh (Hindi, my native language, for "Northern State"). I spent most of my boyhood in the foothills of the Himalayas. I got a BS degree in Computer Science from University of Roorkee (now IIT Roorkee) in India, a MS, Computer Science again, from University of Minnesota (somehow, back then, I always found myself in cold places) and a PhD in Computer Science from Cornell University. At Cornell I studied with (late) Prof. Gerard Salton, one of the founders of the field of IR. Somewhere between my degrees I had real jobs doing database programming and IR system hacking. After my PhD I joined AT&T Labs in 1996. In 2000, my friend Krishna Bharat persuaded me to join Google.