base

Module for base classes used across the package. Introduces another layer of abstraction for operations and selectors.

check_selector(x: Any, message: str | None = None) SoupSelector[source]

Checks if provided object is a valid soupsavvy selector. Checks for instance of SoupSelector and raises an exception if not. Returns provided object if fulfills the condition for convenience.

Parameters

xAny

Any object to be validated as correct selector.

messagestr, optional

Custom message to be displayed in case of raising an exception. By default None, which results in default message.

Raises

NotSoupSelectorException

If provided object is not an instance of SoupSelector.

check_operation(x: Any, message: str | None = None) BaseOperation[source]

Checks if provided object is a valid soupsavvy operation. Checks for instance of BaseOperation and raises an exception if not. Returns provided object if fulfills the condition for convenience.

Parameters

xAny

Any object to be validated as correct operation.

messagestr, optional

Custom message to be displayed in case of raising an exception. By default None, which results in default message.

Raises

NotOperationException

If provided object is not an instance of BaseOperation.

check_tag_searcher(x: Any, message: str | None = None) TagSearcher[source]

Checks if provided object is a valid soupsavvy TagSearcher. Checks for instance of TagSearcher or other compatible type like Model class. Returns provided object if fulfills the condition for convenience.

Parameters

xAny

Any object to be validated as correct TagSearcher.

messagestr, optional

Custom message to be displayed in case of raising an exception. By default None, which results in default message.

Raises

NotTagSearcherException

If provided object is not an instance of TagSearcher or any other compatible type.

class SoupSelector[source]

Bases: TagSearcher, Comparable

Base class for all soupsavvy selectors, that define declarative search procedure of searching for matching nodes in the html element.

Selectors can be combined with other selectors to create search procedures. They can be chained with operations to extract and transform the data.

Methods

  • find

    Finds first element matching selector in provided element. If no element is found, returns None by default, or raises an exception if strict mode is enabled. Additionally recursive parameter can be set to search only direct children.

  • find_all

    Finds all elements matching selector in provided element and returns them in a list. Additionally limit and recursive parameters can be set.

Notes

  • Specific selector inheriting from this class, need to implement:
    • find_all method that returns a list of matching elements.

    • __eq__ method to compare two selectors for equality.

  • Optionally find method can be implemented to return first matching element,

but, by default, it uses find_all under the hood.

find(tag: IElement, strict: Literal[False] = False, recursive: bool = True) IElement | None[source]
find(tag: IElement, strict: Literal[True] = False, recursive: bool = True) IElement
find(tag: IElement, strict: bool = False, recursive: bool = True) IElement | None

Finds the first matching element in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

strictbool, optional

If True, raises an exception if element was not found in markup, if False and element was not found, returns None. Value of this parameter does not affect behavior if element was successfully found. By default False.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

Returns

IElement | None

First IElement` object matching selector or None if none matching.

Raises

TagNotFoundException

If strict parameter is set to True and none matching element was found.

abstract find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

class SelectableCSS[source]

Bases: ABC

Interface for selectors, that can clearly and unambiguously defined css selector, used to search for elements, that matches the same elements as find methods.

Notes

To implement SelectableCSS interface, child class must implement: - ‘selector’ property, which return a string representing css selector.

abstract property css: str

Returns string representing element css selector.

property selector: str

Returns string representing element css selector.

class CompositeSoupSelector(selectors: Iterable[SoupSelector])[source]

Bases: SoupSelector

Interface for selectors consisting of multiple selectors.

Notes

To implement CompositeSoupSelector interface, child class must call its init method with provided selectors to set up the object.

Attributes

selectorslist[SoupSelector]

List of SoupSelector objects used for searching elements.

COMMUTATIVE = True
__init__(selectors: Iterable[SoupSelector]) None[source]

Initializes composite selector object with provided selectors. Checks if all selectors are instances of SoupSelector.

Parameters

selectors: Iterable[SoupSelector]

Selectors used to search for elements.

Raises

NotSoupSelectorException

If any of provided parameters is not an instance of SoupSelector.

property selectors: list[SoupSelector]

Returns a list of selectors that composite selector consists of.

Returns

list[SoupSelector]

List of SoupSelector objects used for searching elements.

class BaseOperation[source]

Bases: Executable, Comparable

Base class for all soupsavvy operations. Operations are used to process the selection results from the soup, extract and transform the data.

Operations can be chained together using the pipe operator ‘|’.

Example

>>> from soupsavvy.operations import Operation
... operation = Operation(str.lower) | Operation(str.strip)
... operation.execute("  TEXT  ")
'text'

Operations can be combined with selectors to extract and transform target information.

Example

>>> from soupsavvy import TypeSelector
... from soupsavvy.operations import Operation, Text
... selector = TypeSelector("div") | Text() | Operation(int)
... selector.find(soup)
42

BaseOperation inherits from Comparable interface, __eq__ method needs to be implemented in derived classes.

execute(arg: Any) Any[source]

Execute the operation on the given argument and return the result.

Parameters

argAny

Argument to be processed by the operation.

Returns

Any

Result of the operation.

Raises

BreakOperationException

If operation execution should be interrupted and propagated to caller.

FailedOperationExecution

If operation execution fails for any other reason.

class OperationSearcherMixin[source]

Bases: BaseOperation, TagSearcher

Mixin of BaseOperation and TagSearcher interfaces. Allows operations to be used as field searchers in model to perform operation directly on scope element.

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[Any][source]

Processes provided element and returns the result in a list.

Parameters

tagIElement

Any IElement object to process.

recursivebool, optional

Ignored, for consistency with interface.

limitint, optional

Ignored, for consistency with interface.

Returns

list[Any]

Result of applied operation on element in a list.

find(tag: IElement, strict: bool = False, recursive: bool = True) Any[source]

Processes provided element and returns the result.

Parameters

tagIElement

Any IElement object to process.

strictbool, optional

Ignored, for consistency with interface.

limitint, optional

Ignored, for consistency with interface.

Returns

Any

Result of applied operation on element.

class BrowserOperation[source]

Bases: BaseOperation

Base class for operations that act on a IBrowser interface.

Browser operations are designed to perform actions with objects implementing the IBrowser interface. It validates that input argument to execute method is of this type. If operation returns value, it is passed through, otherwise the original IBrowser instance is returned.

As standard operations, browser operations can be chained together using the pipe operator ‘|’.

Operations can be combined with selectors to extract and transform target information. Chaining other types of operations might result in errors.

Each derived operation class needs to implement __eq__ method.

execute(arg: IBrowser) Any[source]

Execute the operation on the given argument and return the result.

Parameters

argAny

Argument to be processed by the operation.

Returns

Any

Result of the operation.

Raises

BreakOperationException

If operation execution should be interrupted and propagated to caller.

FailedOperationExecution

If operation execution fails for any other reason.

class ElementAction[source]

Bases: Comparable

Abstract base class for actions that use browser to interact with an element.

Actions perform operation on element in dynamic context, it requires both browser context (IBrowser) and the target element active in browser context (IElement).

ElementActions are not typical operations, so they cannot be chained with other soupsavvy operations. They are intended to be used within ApplyTo operation.

Example

>>> from soupsavvy.browser.operations import ApplyTo, Click
... from soupsavvy import TypeSelector
... from soupsavvy.implementation.selenium import SeleniumBrowser
... from selenium import webdriver
...
... browser = SeleniumBrowser(webdriver.Chrome())
... action = Click()
... selector = TypeSelector('button')
... operation = ApplyTo(selector, action)
... operation.execute(browser)

ElementAction inherits from Comparable interface, __eq__ method needs to be implemented in derived classes.

execute(browser: IBrowser, element: IElement) None[source]

Execute the action on provided element within the browser context.

Parameters

browserIBrowser

The browser instance within which the action is performed.

elementIElement

The element on which the action is performed using browser context.