Introduction

Sinon is a Java tool that extracts textual information from Web sites. In other words, it is a tool that can be used to scrape any kind of text (HTML included) available in the Internet or in a filesystem. An XML configuration written by the user file informs Sinon what steps it must execute in order to reach the desired information. Page downloads and cookie management are transparent to the user. The extracted information and extraction status information are made available through the Sinon API.

The Sinon API does not provide any data manipulation or storage services. It just extracts data and provides notification of data extraction events to listener classes specified by the XML configuration file.

News

Sinon news and development status can be found in the News section. The Sinon news RSS feed is available here: rss.png.

eXTReMe Tracker