interfaces
Module with soupsavvy interfaces used across the package.
Executable - Interface for operations that can be executed on single argument.
Comparable - Interface for objects that can be compared for equality.
TagSearcher - Interface for objects that can search within IElement.
IElement - Interface for any tree structure compatible with soupsavvy.
SelectionApi - Interface for selection of elements based on specific selector.
IBrowser - Interface for browser implementations compatible with soupsavvy.
- class Executable[source]
Bases:
ABCInterface for operations that can be executed on single argument. Derived classes must implement the execute method.
- class Comparable[source]
Bases:
ABCInterface for objects that can be compared for equality. Derived classes must implement the __eq__ method.
- class JSONSerializable[source]
Bases:
ABCAbstract class for objects that can be serialized to JSON. They must have json method that returns any JSON serializable object.
- class TagSearcher[source]
Bases:
ABCInterface for objects that can search within IElement. Derived classes must implement the find and find_all methods, that process IElement object and return results.
- abstract find(tag: IElement, strict: bool = False, recursive: bool = True) Any[source]
Processes IElement object and returns result.
Parameters
- tagIElement
Any IElement object to process.
- strictbool, optional
If True, enforces results to be found in the element, by default False.
- recursivebool, optional
Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.
Returns
- Any
Processed result from the element.
- abstract find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[Any][source]
Processes IElement object and returns list of results.
Parameters
- tagIElement
Any IElement object to process.
- recursivebool, optional
Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.
- limitint, optional
Specifies maximum number of results to return in a list. By default None, everything is returned.
Returns
- list[Any]
A list of results from processed element.
- class TagSearcherMeta(name, bases, namespace, /, **kwargs)[source]
Bases:
ABCMetaDefines the same interface as TagSearcher, but for metaclass level, used by BaseModel mostly for typing purposes. Subclasses should implement the find and find_all on metaclass level or as classmethods, but it’s not enforced by abstractmethod here and is taken care of by BaseModel class.
- find(tag: IElement, strict: bool = False, recursive: bool = True) Any[source]
Processes IElement object and returns result.
Parameters
- tagIElement
Any IElement object to process.
- strictbool, optional
If True, enforces results to be found in the element, by default False.
- recursivebool, optional
Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.
Returns
- Any
Processed result from the element.
- find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[Any][source]
Processes IElement object and returns list of results.
Parameters
- tagIElement
Any IElement object to process.
- recursivebool, optional
Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.
- limitint, optional
Specifies maximum number of results to return in a list. By default None, everything is returned.
Returns
- list[Any]
A list of results from processed element.
- class IElement(node: N, *args, **kwargs)[source]
Bases:
ABC,Generic[N]Interface representing a general HTML node within a tree structure. IElement defines methods for common DOM operations, such as searching for elements, retrieving attributes, and navigating between nodes.
This interface enables consistent access to various HTML-parsing libraries or custom tree structures. Any implementation should wrap a node-like structure and allow soupsavvy components to operate on it seamlessly.
Current Implementations: - SoupElement: Wraps a BeautifulSoup node. - LXMLElement: Wraps an lxml node. - SeleniumElement: Wraps a Selenium WebElement. - PlaywrightElement: Wraps a Playwright ElementHandle.
- __init__(node: N, *args, **kwargs) None[source]
Initializes the implementation with the given node.
Parameters
- classmethod from_node(node: N) Self[source]
Creates a new instance of the implementation from a node.
Parameters
- nodeAny
Node to wrap for specific implementation.
Returns
- Self
New instance of the implementation with the given node.
- abstract find_all(name: str | None = None, attrs: dict[str, str | Pattern[str]] | None = None, recursive: bool = True, limit: int | None = None) list[Self][source]
Finds all elements that match specified element name and attributes.
Parameters
- namestr, optional
Name of the element to search for. If None, matches all elements.
- attrsdict[str, str | Pattern[str]], optional
Dictionary of attributes to match. Supports exact matches and regex patterns.
- recursivebool, optional
If True, searches recursively through all descendants. If False, searches only direct children.
- limitint, optional
Maximum number of elements to return. If None, returns all matching elements.
Returns
- list[Self]
List of elements that match the criteria, in depth-first order.
- property node: N
Returns the underlying node wrapped by the instance.
- abstract find_subsequent_siblings(limit: int | None = None) list[Self][source]
Finds siblings that follow this node in the document structure.
Parameters
- limitint, optional
Maximum number of sibling nodes to return. If None, returns all siblings.
Returns
- list[Self]
List of subsequent sibling elements, in document order.
- abstract find_ancestors(limit: int | None = None) list[Self][source]
Finds all ancestor nodes up to the root of the document.
Parameters
- limitint, optional
Maximum number of ancestors to return, starting from the closest ancestor. If None, returns all ancestors.
Returns
- list[Self]
List of ancestor nodes, from nearest to root.
- abstract property children: Iterable[Self]
Returns an iterable of the direct child elements of this node.
Notes
Only tag elements are included; text and comment nodes are excluded.
Returns
- Iterable[Self]
Iterable of direct child nodes, in document order.
- abstract property descendants: Iterable[Self]
Returns an iterable of all descendant nodes of this node.
Notes
Only tag elements are included; text and comment nodes are excluded. Nodes are returned in depth-first order.
Returns
- Iterable[Self]
Iterable of all descendant nodes.
- abstract property parent: Self | None
Returns the immediate parent node of this element, if it exists.
Returns
- Optional[Self]
The parent element, or None if this is the root node.
- abstract get_attribute(name: str) str | None[source]
Retrieves the value of a specified attribute for this node.
Parameters
- namestr
Name of the attribute.
Returns
- Optional[str]
The attribute value as a string, or None if the attribute does not exist.
Notes
For dynamic attributes (e.g., in browser contexts), the returned value reflects the current state of the element.
- abstract property name: str
Returns the tag name of this element.
- abstract property text: str
Gets the combined text content of this element.
Notes
Concatenates all text nodes within this element. The format may vary slightly across implementations depending on handling of whitespace or nested elements.
Returns
- str
Text content of this element, or an empty string if none is found.
- css(selector: Any) SelectionApi[source]
Returns a SelectionApi for CSS-based selection.
Parameters
- selectorAny
The CSS selector to apply.
Returns
- SelectionApi
Initialized SelectionApi instance for CSS selection.
- xpath(selector) SelectionApi[source]
Returns a SelectionApi for XPath-based selection.
Parameters
- selectorAny
The XPath selector to apply.
Returns
- SelectionApi
Initialized SelectionApi instance for XPath selection.
- class SelectionApi(selector: Any)[source]
Bases:
ABCInterface for selecting elements based on a specific selector.
SelectionApi is designed to handle complex CSS or XPath selections, simplifying element matching across various document structures. Implementing classes should define select for element matching.
- class IBrowser(browser: B, *args, **kwargs)[source]
Bases:
ABC,Generic[B,E]Interface representing a general browser for web navigation and interaction. IBrowser defines methods for common browser operations.
This interface enables consistent access to various browser implementations like selenium or playwright for automation of web interactions.
Any implementation should wrap a browser object and allow soupsavvy components to operate on it seamlessly.
Current Implementations: - SeleniumBrowser: Wraps a Selenium WebDriver instance. - PlaywrightBrowser: Wraps a Playwright Page instance.
- __init__(browser: B, *args, **kwargs) None[source]
Initializes the implementation with the given browser instance.
Parameters
- property browser: B
Returns the underlying browser wrapped by the instance.
Navigates the browser to the specified URL.
Parameters
- urlstr
The URL to navigate to.
- abstract click(element: E) None[source]
Performs a click action on the specified element.
Parameters
- elementIElement
The target element of implementation compatible with browser that will be clicked.
- abstract send_keys(element: E, value: str, clear: bool = True) None[source]
Sends keystrokes to the specified element.
Parameters
- elementIElement
The target element of implementation compatible with browser to interact with.
- valuestr
The value to insert into the element.
- clearbool, optional
If True, clears existing content before sending keys. Defaults to True.