By analyzing the multiple pattern matching algorithm based on tree structure, a multiple pattern matching algorithm based on sequential binary tree is proposed in this paper. It is proved by experiment that the algorithm has three features: its constructing process is quick. Its cost of memory is small. At the same time, its searching process is as quickly as the traditional algorithm. The algorithm proposed in this paper is suit for the application whose pattern set is changing dynamically, that is to say, it is suit for the application whose automata must be constructed dynamically. So, the algorithm has a good application prospect.
Web pages contain more abundant contents than pure text ,such as hyperlinks,html tags and metadata et al.So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practical algorithm for extracting subject concepts from web page without thesaurus was proposed, when incorporated these category-subject concepts into knowledge base, Web pages was classified by hybrid algorithm, with experiment corpus extracting from Xinhua net. Experimental result shows that the categorization performance is improved using Web page feature.