This is an NSF funded project that is focused on studying methods for extracting temporal information from text documents. The project abstract can be found here.
Principle Investigator: Prof. Parvathi Chundi
Computer Science Dept, Univ. Of Nebraska at Omaha
Current Members:
Our main goal is to discover temporal information from text documents. We assume that documents contain the publication/creation date which can be used to time stamp their contents. Identifying the keywords/topics that are relevant to a particular time period is always a challenging task. We use the concept of a measure function to help the user attach a value of significance to keywords/topics during a time period. However, measure functions add an extra layer of computation that must be carefully managed so that we can process large document sets. A rudimentary prototype of the project is available here.
We are also faced with the challenge of interpreting the usefulness of what we compute. We are currently studying the document set to identify which information is pertinent during which time interval and comparing with the output of our system.
·
P. Chundi, A. Mills, and W. Chen, Evaluating Time Decompositions of Time Stamped Documents, UNO Tech
Report CST-2010-2.
·
W.
Chen and P. Chundi, Trend Analysis of Topics Based on Segmentation, 2010 International Conference on Data
Warehousing and Knowledge Discovery (DAWAK 2009).
·
P. Chundi, M. Subramaniam, and D. K.
Vasireddy, An
Approach to Analyze the Enron Email Data Based on Segmentation, Accepted
for publication in Elsevier’s Data and Knowledge Engineering Journal.
·
W.
Chen and P. Chundi, Extracting Hot Spots of Multi-Keyword Topics from Time Stamped
Documents, IEEE Conference on Computational Intelligence and Data Mining,
2009.
·
Harvey
P. Siy, Parvathi Chundi, Mahadevan Subramaniam, Summarizing
developer work history using time series segmentation: challenge report,
2008 ICSE Workshop on Mining Software Repositories (MSR 2008).
·
W.
Chen and P. Chundi, An Approach for Discovering the Hot Spots of Topics from Time Stamped
Documents, 2008 SIAM Workshop on Text Mining.
·
P. Chundi and D. J. Rosenkrantz, Efficient Algorithms for Item-Set Time
Series, Journal of Data Mining and Knowledge Discovery 2008.
·
P. Chundi and D. J. Rosenkrantz, H. Siy, and M. Subramaniam, A
Segmentation-Based Approach for Temporal Analysis of Software Version
Repositories, Journal of Software Maintenance and Evolution, Vol 20(3).
·
P. Chundi and D. J. Rosenkrantz, Segmentation
of Time Series Data, Encyclopedia of Data Warehousing and Mining (2nd
Edition), Edited by Prof. John Wang.
·
H. Siy, P. Chundi, D. J. Rosenkrantz, and M. Subramaniam, Discovering
Dynamic Developer Relationships from Software Version Repositories Using Time
Series Segmentation, 23rd IEEE International Conference on
Software Maintenance 2007.
·
P. Chundi and D. J. Rosenkrantz, Information Preserving Time Decompositions
for Time Stamped Documents, Journal of Data Mining and Knowledge Discovery,
Vol. 13(1). 2006.
·
R.
Zhang and P. Chundi, Using Time Decompositions to
Analyze PubMed Abstracts, International
Conference on Computer Based Medical Systems, Jun 2006.
·
P. Chundi, R. Zhang, and M. Castellanos,
Entropy Based Measure Functions for Analyzing Time Stamped Documents,
2006 Text Mining Workshop, SIAM International Conference on Data Mining.
· P. Chundi, R. Zhang, D. J. Rosenkrantz, Efficient Algorithms for Computing Time Decompositions for Time Stamped Documents, International Conference on Database and Expert Systems Applications, Sept 2005
· P. Chundi and D. J. Rosenkrantz, On Lossy Time Decompositions of Time Stamped Documents ACM Conference on Information and Knowledge Management, 2004.
· P. Chundi and D. J. Rosenkrantz, Constructing Time Decompositions of Time Stamped Documents, SIAM Data Mining Conference, 2004.
Point of Contact: Parvathi Chundi (pchundi@mail.unomaha.edu)
Date of Last Update: Oct 11th, 2010.