Information retrieval using an adaptive resonance theory (ART)-based neural net.

By: Alderman, George Allan, IIIContributor(s): Georgetown UniversityMaterial type: TextTextDescription: 167 pISBN: 0599839627Subject(s): Language, Linguistics | Computer Science | Library Science | 0290 | 0984 | 0399Dissertation note: Thesis (Ph.D.)--Georgetown University, 2000. Summary: This study evaluated the usefulness of Simply Fuzzy ARTMAP (SFAM) as an algorithm for automatic grouping of documents. A test corpus of articles from the Wall Street Journal (WSJ) was encoded as word counts and presented to an SFAM network. Each training/test run consisted of 2, 4, or 8 groups of documents; all documents in a group shared the same topic marker tag as assigned by WSJ. The groups were mutually exclusive with respect to topic group.Summary: The parameters studied were: ratio of training to test documents, vigilance (a measure of the encoding granularity of the network), and the minimum number of documents allowed within a categorization node after training. The results were: (1) Overall the system performed well above chance, but with a high error level. (2) Fuzzy ART-based models have been observed to have problems with overfitting of the data. This study supports those findings; the SFAM network implemented here did exhibit problems with proliferation of output category nodes. System performance was found to negatively correlate with the number of output category nodes. (3) Varying the initial setting of the vigilance parameter does not have a significant effect on system performance. (4) Forcing the system to maintain its vigilance at a lower value improves performance in some test runs. (5) Deleting excessively restrictive output category nodes improves system performance in some test runs.Summary: These findings indicate that SFAM has severe problems with how it learns new categories. During learning, SFAM allows its vigilance to ramp up to values so high that it overfits the data and/or forms spurious categories. This indicates that, as currently defined, the SFAM learning algorithm does not seem useful for document grouping tasks. Modifications are suggested to address overfitting problems by (1) controlling the increase in vigilance so that it does not become excessive; (2) monitoring the categories formed to assure that they are meaningful; and (3) using a category-specific as opposed to a global value for vigilance.
    Average rating: 0.0 (0 votes)
No physical items for this record

Source: Dissertation Abstracts International, Volume: 61-07, Section: A, page: 2679.

Mentor: Donald Loritz.

Thesis (Ph.D.)--Georgetown University, 2000.

This study evaluated the usefulness of Simply Fuzzy ARTMAP (SFAM) as an algorithm for automatic grouping of documents. A test corpus of articles from the Wall Street Journal (WSJ) was encoded as word counts and presented to an SFAM network. Each training/test run consisted of 2, 4, or 8 groups of documents; all documents in a group shared the same topic marker tag as assigned by WSJ. The groups were mutually exclusive with respect to topic group.

The parameters studied were: ratio of training to test documents, vigilance (a measure of the encoding granularity of the network), and the minimum number of documents allowed within a categorization node after training. The results were: (1) Overall the system performed well above chance, but with a high error level. (2) Fuzzy ART-based models have been observed to have problems with overfitting of the data. This study supports those findings; the SFAM network implemented here did exhibit problems with proliferation of output category nodes. System performance was found to negatively correlate with the number of output category nodes. (3) Varying the initial setting of the vigilance parameter does not have a significant effect on system performance. (4) Forcing the system to maintain its vigilance at a lower value improves performance in some test runs. (5) Deleting excessively restrictive output category nodes improves system performance in some test runs.

These findings indicate that SFAM has severe problems with how it learns new categories. During learning, SFAM allows its vigilance to ramp up to values so high that it overfits the data and/or forms spurious categories. This indicates that, as currently defined, the SFAM learning algorithm does not seem useful for document grouping tasks. Modifications are suggested to address overfitting problems by (1) controlling the increase in vigilance so that it does not become excessive; (2) monitoring the categories formed to assure that they are meaningful; and (3) using a category-specific as opposed to a global value for vigilance.

School code: 0076.

There are no comments on this title.

to post a comment.

 

116臺北市木柵路一段17巷1號 (02)22368225 轉 82252 

Powered by Koha