What is jsoup?

Jsoup is a powerful Java library designed to simplify the handling of HTML and XML in practical applications. It features an intuitive API that allows users to fetch URLs, parse content, extract relevant data, and manipulate it using methods from the DOM API, CSS selectors, and XPath queries. By conforming to the WHATWG HTML5 standard, jsoup guarantees that the HTML it processes is converted into a DOM structure akin to that utilized by contemporary web browsers. The library facilitates the scraping and parsing of HTML from various origins, including URLs, files, or strings, enabling users to find and extract information through DOM traversal or CSS selectors. Additionally, it allows for the modification of HTML elements, attributes, and text, as well as the sanitization of user-generated content to protect against XSS vulnerabilities while ensuring the output is clean HTML. Jsoup excels at managing the wide range of HTML formats found online, from well-structured and compliant to messy and non-standard tag-soup, producing a coherent parse tree in the process. For example, a user can easily fetch the Wikipedia homepage, convert it into a DOM structure, and curate the headlines from the "In the news" section into a neatly organized list of elements for subsequent use. This versatility renders jsoup an essential resource for developers aiming to interact with web content in an efficient and effective manner, making it a go-to choice for numerous web scraping tasks.

Integrations

Offers API?:
Yes, jsoup provides an API

Screenshots and Video

jsoup Screenshot 1

Company Facts

Company Name:
jsoup
Company Website:
jsoup.org

Product Details

Deployment
Windows
Mac
Linux
Training Options
Documentation Hub
Support
Web-Based Support

Product Details

Target Company Sizes
Individual
1-10
11-50
51-200
201-500
501-1000
1001-5000
5001-10000
10001+
Target Organization Types
Mid Size Business
Small Business
Enterprise
Freelance
Nonprofit
Government
Startup
Supported Languages
English

jsoup Categories and Features

Web Design Software

Autocompletion
Collaborative Editing
Content Management
Drag & Drop
Element Libraries
Programming Language Support
Syntax Highlighting
Templates