I’ll be on NASA TV on March 23, 2011 for the Leading Edge TV Show at 11 am ET. You can watch on NASA TV or www.nasa.gov for a streamed broadcast. We will also be doing a NASA Chat later that day.
NASA Chat, Mar 23, 2pm ET
When an airplane flies, hundreds of data streams fly from it every second — pilot reports, incident reports, control positions, instrument positions, warning modes. But there’s so much data, it’s been nearly impossible for airlines to do anything other than look back for the cause of something that’s already happened. Data mining is the art of digging through mountains of data when you don’t know what you’re looking for or what you might find. Popular search engines like Google™ do this every second. NASA is mining terabytes of aviation data to find issues before they become incidents. Ashok Srivastava will talk to us about what computer tools NASA is building to do the digging.
Jeff Hamlett will talk about how Southwest Airlines is already using data mining “gold” to update their flight operations.
- How is NASA figuring out how to find the needle in a haystack when we don’t know what either looks like?
- What’s an “algorithm”? What’s an “anomaly”? What’s a “precursor” and why do data miners use those words all the time?
- What has Southwest changed in its practices thanks to data mining?
- How is our data mining different from Google’s or Amazon’s? How is it the same?
Join the chat on Mar 23, a few minutes before 2 pm ET at:
The IEEE International Conference on Data Mining has awarded our work in Discovering Precursors to Aviation Safety Incidents and Accidents as a Top 10 Data Mining Case Study in the world. This citation represents work that comprises both novel algorithms for anomaly detection and predictive modeling as well as applications in the aviation domain.
The conference ended last week and featured many international participants as well as numerous talks and posters from top researchers. The conference featured a keynote talk by Stephen Boyd from Stanford University, and numerous invited speakers such as Piero Bonnisone from GE, Vipin Kumar from the University of Minnesota, Dinkar Mylaraswamy from Honeywell, and Rama Nemani from NASA.
Thanks to everyone for participating and see you at CIDU 2011.
Here is an article that recently appeared in Flight International about some novel data mining and machine learning techniques that we are developing to potentially detect pilot fatigue. We are using anomaly detection techniques as well as predictive methods to look for objective indicators of fatigue. This is ongoing work between a number of groups within NASA and outside partners.
The annual ACM SIGKDD conference is the premier international forum for data mining researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. KDD-2010 will feature keynote presentations, oral paper presentations, poster sessions, workshops, tutorials, panels, exhibits, demonstrations, and the KDD Cup competition.
I’ll be giving an invited talk on Discovering Precursors to Aviation Safety Incidents and participating on a panel on the Next Generation of Transportation Systems: Greenhouse Emissions and Data Mining. We also have a paper on Multiple Kernel Learning for Heterogeneous Anomaly Detection.
The Workshops on Algorithms for Modern Massive Data Sets (MMDS) will address algorithmic, mathematical, and statistical challenges in modern large-scale data analysis. The goals of this series of workshops are to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets, and to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to promote cross-fertilization of ideas.
The world is facing a number of critical challenges. Finding the next generation of solutions for energy supply, reducing greenhouse emissions, and transportation problems is critical to sustain the world and our civilization. Energy crisis is a major challenge that needs to be addressed for sustaining and further developing the world. Greenhouse emissions is widely believed to be connected with energy consumption. Transportation system has significant effect on the energy consumption and on greenhouse emissions. Many problems related to greenhouse emissions and transportation industry are critically connected to the consumption and supply of energy. Information processing and advanced data analysis techniques are likely to play important roles in solving these problems for the next generation.
Efficient production, distribution, and consumption of existing and alternate energy would require supporting information processing networks in order to adaptively control and protect the underlying physical systems. Understanding the effects of greenhouse emissions requires advanced data analysis techniques for understanding remotely sensed data. Reducing the carbon footprints of buildings, vehicles, and airplanes would require continuous monitoring of sensors and detecting deviation from desired behavior. Designing the next generation of transportation network becomes particularly challenging in the context of increasing demand for energy supplies and reducing greenhouse emissions. Sensor networks for highways and vehicles equipped with diagnostic data bus along with the availability of machine-to-machine wireless communication networks are going to make the role of advanced data mining techniques very important in the transportation industry. Computing in itself is under scrutiny from the perspective of its effect on greenhouse emissions and pollution. We need to pay close attention to the environmental impacts of computing and the supporting infrastructure. Overall, we need to explore technology for sustainable computing and computing technology for a sustainable world.
The “Next Generation Data Mining (NGDM’09) Summit: Dealing with Energy Crisis, Greenhouse Emissions, and Transportation Challenges” will bring together data mining researchers, scientists and engineers from a diverse background along with domain experts.
NGDM’09 will focus on the following areas:
1) Energy crisis, information processing and data mining
2) Greenhouse emissions, climate changes, and data mining
3) Transportation, emissions, and data mining
The summit will generate a report based on the presentations and discussions of the participants.
Chandra Bhat, University of Texas at Austin
Kirk Borne, George Mason University
Alok Choudhary, Northwestern University
Umesh Dayal, HP Labs
Wei Fan, IBM T. J. Watson Research Laboratory
Douglas Fisher, National Science Foundation
Auroop Ganguly, Oak ridge National Laboratory
Johannes Gehrke, Cornell University
Carla Gomes, Cornell University
Vipin Kumar, University of Minnesota
Rich Lechner, IBM
Edward Maibach, George Mason University
Mark McGranaghan, Electric Power Research Inst.
Paul Melby, MITRE Corporation
Robert Neff, UMBC
Dino Pedreschi, Univ. of Pisa & Northeastern Univ,
Krishna Rajan, Iowa State University
Shashi Shekhar, University of Minnesota
Ashok Srivastava, NASA Ames Research Center
Eugene Tierney, US Env. Protection Agency
Ramasamy Uthurusamy, General Motors (Ret.)
Brian Worley, Oak Ridge National Laboratory
Philip Yu, University of Illinois at Chicago
Vince Mow, Mactec Federal Programs
Chris Stock, Verizon
A tutorial at the International Workshop on Structural Health Monitoring 2009
September 8th, 2009 1pm-5pm
The tutorial will present methods and applications in the area of data mining and machine learning for large-scale systems such as those found in structural health management applications. The purpose of the tutorial is to discuss and disseminate new publicly available data mining algorithms for anomaly detection and prediction in large-scale applications including distributed systems. We will discuss technical hurdles and possible solutions. Specific focus areas include:
- New anomaly detection algorithms that are fast and highly accurate.
- New prediction algorithms appropriate for massive data sets
- Distributed data mining algorithms which are provably correct (they give the same answer whether data is centralized or distributed).
The 4 hour tutorial will be organized as a series of short lectures with adequate time for audience participation. We will provide an overview of the algorithms covered as well as demonstrations of the methods on real-world data sets. The tutorial will feature multiple speakers who are experts in data mining.
Ashok Srivastava, Ph.D., NASA Ames Research Center
Nikunj Oza, Ph.D., NASA Ames Research Center
Santanu Das, Ph.D., UARC, NASA Ames Research Center
Kanishka Bhaduri, Ph.D., MCT Inc, NASA Ames Research Center
To Register, please visit: Tutorial Registration
Mehran Sahami and I just finished an edited book on text mining by a number of luminaries in the field. You can pick it up on Amazon.