interfaces

Module with soupsavvy interfaces used across the package.

  • Executable - Interface for operations that can be executed on single argument.

  • Comparable - Interface for objects that can be compared for equality.

  • TagSearcher - Interface for objects that can search within IElement.

  • IElement - Interface for any tree structure compatible with soupsavvy.

  • SelectionApi - Interface for selection of elements based on specific selector.

  • IBrowser - Interface for browser implementations compatible with soupsavvy.

class Executable[source]

Bases: ABC

Interface for operations that can be executed on single argument. Derived classes must implement the execute method.

abstract execute(arg: Any) Any[source]

Executes the operation on the given argument.

class Comparable[source]

Bases: ABC

Interface for objects that can be compared for equality. Derived classes must implement the __eq__ method.

class JSONSerializable[source]

Bases: ABC

Abstract class for objects that can be serialized to JSON. They must have json method that returns any JSON serializable object.

abstract json() Any[source]

Serializes the object to a JSON-compatible format.

class TagSearcher[source]

Bases: ABC

Interface for objects that can search within IElement. Derived classes must implement the find and find_all methods, that process IElement object and return results.

abstract find(tag: IElement, strict: bool = False, recursive: bool = True) Any[source]

Processes IElement object and returns result.

Parameters

tagIElement

Any IElement object to process.

strictbool, optional

If True, enforces results to be found in the element, by default False.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

Returns

Any

Processed result from the element.

abstract find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[Any][source]

Processes IElement object and returns list of results.

Parameters

tagIElement

Any IElement object to process.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of results to return in a list. By default None, everything is returned.

Returns

list[Any]

A list of results from processed element.

class TagSearcherMeta(name, bases, namespace, /, **kwargs)[source]

Bases: ABCMeta

Defines the same interface as TagSearcher, but for metaclass level, used by BaseModel mostly for typing purposes. Subclasses should implement the find and find_all on metaclass level or as classmethods, but it’s not enforced by abstractmethod here and is taken care of by BaseModel class.

find(tag: IElement, strict: bool = False, recursive: bool = True) Any[source]

Processes IElement object and returns result.

Parameters

tagIElement

Any IElement object to process.

strictbool, optional

If True, enforces results to be found in the element, by default False.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

Returns

Any

Processed result from the element.

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[Any][source]

Processes IElement object and returns list of results.

Parameters

tagIElement

Any IElement object to process.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of results to return in a list. By default None, everything is returned.

Returns

list[Any]

A list of results from processed element.

class IElement(node: N, *args, **kwargs)[source]

Bases: ABC, Generic[N]

Interface representing a general HTML node within a tree structure. IElement defines methods for common DOM operations, such as searching for elements, retrieving attributes, and navigating between nodes.

This interface enables consistent access to various HTML-parsing libraries or custom tree structures. Any implementation should wrap a node-like structure and allow soupsavvy components to operate on it seamlessly.

Current Implementations: - SoupElement: Wraps a BeautifulSoup node. - LXMLElement: Wraps an lxml node. - SeleniumElement: Wraps a Selenium WebElement. - PlaywrightElement: Wraps a Playwright ElementHandle.

__init__(node: N, *args, **kwargs) None[source]

Initializes the implementation with the given node.

Parameters

nodeAny

Node to wrap for specific implementation.

*args: Any

Additional positional arguments to pass to the constructor.

**kwargs: Any

Additional keyword arguments to pass to the constructor.

classmethod from_node(node: N) Self[source]

Creates a new instance of the implementation from a node.

Parameters

nodeAny

Node to wrap for specific implementation.

Returns

Self

New instance of the implementation with the given node.

abstract find_all(name: str | None = None, attrs: dict[str, str | Pattern[str]] | None = None, recursive: bool = True, limit: int | None = None) list[Self][source]

Finds all elements that match specified element name and attributes.

Parameters

namestr, optional

Name of the element to search for. If None, matches all elements.

attrsdict[str, str | Pattern[str]], optional

Dictionary of attributes to match. Supports exact matches and regex patterns.

recursivebool, optional

If True, searches recursively through all descendants. If False, searches only direct children.

limitint, optional

Maximum number of elements to return. If None, returns all matching elements.

Returns

list[Self]

List of elements that match the criteria, in depth-first order.

property node: N

Returns the underlying node wrapped by the instance.

get() N[source]

Returns the node wrapped by the instance.

abstract find_subsequent_siblings(limit: int | None = None) list[Self][source]

Finds siblings that follow this node in the document structure.

Parameters

limitint, optional

Maximum number of sibling nodes to return. If None, returns all siblings.

Returns

list[Self]

List of subsequent sibling elements, in document order.

abstract find_ancestors(limit: int | None = None) list[Self][source]

Finds all ancestor nodes up to the root of the document.

Parameters

limitint, optional

Maximum number of ancestors to return, starting from the closest ancestor. If None, returns all ancestors.

Returns

list[Self]

List of ancestor nodes, from nearest to root.

abstract property children: Iterable[Self]

Returns an iterable of the direct child elements of this node.

Notes

Only tag elements are included; text and comment nodes are excluded.

Returns

Iterable[Self]

Iterable of direct child nodes, in document order.

abstract property descendants: Iterable[Self]

Returns an iterable of all descendant nodes of this node.

Notes

Only tag elements are included; text and comment nodes are excluded. Nodes are returned in depth-first order.

Returns

Iterable[Self]

Iterable of all descendant nodes.

abstract property parent: Self | None

Returns the immediate parent node of this element, if it exists.

Returns

Optional[Self]

The parent element, or None if this is the root node.

abstract get_attribute(name: str) str | None[source]

Retrieves the value of a specified attribute for this node.

Parameters

namestr

Name of the attribute.

Returns

Optional[str]

The attribute value as a string, or None if the attribute does not exist.

Notes

For dynamic attributes (e.g., in browser contexts), the returned value reflects the current state of the element.

abstract property name: str

Returns the tag name of this element.

abstract property text: str

Gets the combined text content of this element.

Notes

Concatenates all text nodes within this element. The format may vary slightly across implementations depending on handling of whitespace or nested elements.

Returns

str

Text content of this element, or an empty string if none is found.

css(selector: Any) SelectionApi[source]

Returns a SelectionApi for CSS-based selection.

Parameters

selectorAny

The CSS selector to apply.

Returns

SelectionApi

Initialized SelectionApi instance for CSS selection.

xpath(selector) SelectionApi[source]

Returns a SelectionApi for XPath-based selection.

Parameters

selectorAny

The XPath selector to apply.

Returns

SelectionApi

Initialized SelectionApi instance for XPath selection.

class SelectionApi(selector: Any)[source]

Bases: ABC

Interface for selecting elements based on a specific selector.

SelectionApi is designed to handle complex CSS or XPath selections, simplifying element matching across various document structures. Implementing classes should define select for element matching.

__init__(selector: Any) None[source]

Initializes SelectionApi with given selector.

Parameters

selectorAny

The selector used for locating elements.

abstract select(element: IElement) list[IElement][source]

Selects elements within a given node that match the selector.

Parameters

elementIElement

The element to search within.

Returns

list[IElement]

A list of elements matching the selector within the provided element.

class IBrowser(browser: B, *args, **kwargs)[source]

Bases: ABC, Generic[B, E]

Interface representing a general browser for web navigation and interaction. IBrowser defines methods for common browser operations.

This interface enables consistent access to various browser implementations like selenium or playwright for automation of web interactions.

Any implementation should wrap a browser object and allow soupsavvy components to operate on it seamlessly.

Current Implementations: - SeleniumBrowser: Wraps a Selenium WebDriver instance. - PlaywrightBrowser: Wraps a Playwright Page instance.

__init__(browser: B, *args, **kwargs) None[source]

Initializes the implementation with the given browser instance.

Parameters

browserAny

Browser instance to wrap for specific implementation.

*args: Any

Additional positional arguments to pass to the constructor.

**kwargs: Any

Additional keyword arguments to pass to the constructor.

property browser: B

Returns the underlying browser wrapped by the instance.

get() B[source]

Returns the browser wrapped by the instance.

abstract navigate(url: str) None[source]

Navigates the browser to the specified URL.

Parameters

urlstr

The URL to navigate to.

abstract click(element: E) None[source]

Performs a click action on the specified element.

Parameters

elementIElement

The target element of implementation compatible with browser that will be clicked.

abstract send_keys(element: E, value: str, clear: bool = True) None[source]

Sends keystrokes to the specified element.

Parameters

elementIElement

The target element of implementation compatible with browser to interact with.

valuestr

The value to insert into the element.

clearbool, optional

If True, clears existing content before sending keys. Defaults to True.

abstract get_document() E[source]

Returns the html document of the current page as an IElement.

Returns

IElement

The html document of the current page, soupsavvy implementation compatible with the browser.

Raises

TagNotFoundException

If the <html> element is not found on the page.

abstract close() None[source]

Closes the browser and releases resources.

abstract get_current_url() str[source]

Returns the current URL of the browser.