Data Mining, Quant, Statistics, Computer Science: Jobs, Resumes, Directory

Top Categories

Books
Companies
Conferences
Data Sets
Finance
Fraud Detection
Jobs
Journals
Related Keywords
Organizations
People
References
Research
Software
Text Mining
Training
Web Mining
White Papers

Site Map

Media Kit

[ Home ]

[ Finance ]

[ Web Audit ]

[ Consulting ]

Data mining meta-directory: general resources with emphasis on fraud detection, CRM, advertising technology, web mining, probabilistic trading, scorecards, risk management, market research, business intelligence, artificial intelligence, statistical technology, information retrieval, computational marketing.

Book Review: Text Mining Application Programming

Description

Text Mining Application Programming teaches software developers how to mine the vast amounts of information available on the Web, internal networks, and desktop files and turn it into usable data. The book helps developers understand the problems associated with managing unstructured text, and explains how to build your own mining tools using standard statistical methods from information theory, artificial intelligence, and operations research. Each of the topics covered are thoroughly explained and then a practical implementation is provided. The book begins with a brief overview of text data, where it can be found, and the typical search engines and tools used to search and gather this text. It details how to build tools for extracting and using the text, and covers the mathematics behind many of the algorithms used in building these tools. From there you’ll learn how to build tokens from text, construct indexes, and detect patterns in text. You’ll also find methods to extract the names of people, places, and organizations from an email, a news article, or a Web page. The next portion of the book teaches you how to find information on the Web, the structure of the Web, and how to build spiders to crawl the Web. Text categorization is also described in the context of managing email. The final part of the book covers information monitoring, summarization, and a simple Question & Answer (Q&A) system. The code used in the book is written in Perl, but knowledge of Perl is not necessary to run the software. Developers with an intermediate level of experience with Perl can customize the software. Although the book is about programming, methods are explained with English-like pseudocode and the source code is provided on the CD-ROM. After reading this book, you’ll be ready to tap into the bevy of information available online in ways you never thought possible.

Features

Teaches developers how to build text mining applications to manage vast amounts of text and turn it into useful data
Covers key topics such as information extraction, clustering, building spiders, text categorization, summarization, and natural language query systems
Shows step-by-step techniques for implementing text mining solutions, and provides customizable
On the CD! * Source code and tools to test many of the text mining functions described in the book. The tools belong to an open source project Text Mine hosted on SourceForge.net. * A collection of Perl packages to extract named entities, parts of speech, phrases, and summaries of text documents * CGI scripts to run Text Mine modules from a Web browser * Scripts to collect pages from Web sites and articles from news feeds * Tools to install Text Mine, search the WordNet dictionary, and test Text Mine functions * Utilities to build and manage MySql database tables
CD SYSTEM REQUIREMENTS The code is written in Perl and has been tested on the Linux and Microsoft Windows platforms. The minimum hardware requirements are a Pentium III processor, 64MB RAM, and 50MB hard disk space. On Microsoft Windows, the software requirements are a version of Perl 5.6 or greater (Active- State or Cygwin implementations), Apache, and MySQL. The software requirements on Linux are identical with the exception of Perl that usually comes bundled on most Linux distributions.

Data Mining • Machine Learning • Analytics • Quant • Statistics • Econometrics • Biostatistics • Web Analytics • Business Intelligence • Risk Management • Operations Research • AI • Predictive Modeling • Actuarial Sciences • Statistical Programming • Customer Insight • Data Modeling • Competitive Intelligence • Market Research • Information Retrieval • Computer Science • Retail Analytics • Healthcare Analytics • ROI Optimization • Design Of Experiments • Scoring Models • Six Sigma • SAS • Splus • SAP • ETL • SPSS • CRM • Cloud Computing • Electrical Engineering • Fraud Detection • Marketing Databases • Data Analysis • Decision Science • Text Mining