operations

Submodules

Content

Package with operations used to post-process the results of selections.

Classes

  • Operation - User defined operation with any function.

  • Text - Operation to extract text from the element.

  • Href - Operation to extract href attribute from the element.

  • Parent - Operation to extract parent element of the element.

  • Break - Operation to break the pipeline execution.

  • IfElse - Operation to control flow in the pipeline.

  • Continue - Operation to skip the operation and continue with the next one.

  • SkipNone - Operation to skip operation if input is None.

  • Suppress - Operation to suppress exceptions and return None instead.

class Operation(func: Callable, *args, **kwargs)[source]

Bases: OperationSearcherMixin

Custom operation that wraps any function to be used with other soupsavvy components.

Example

>>> from soupsavvy.operations import Operation
... operation = Operation(str.lower)
... operation.execute("TEXT")
"text"

Operation is operation-searcher mixin, which means it can be used to find information in IElement directly with find methods. This way, it can be used as field in model or execute method can be replaced with find method, which would produce the same result.

__init__(func: Callable, *args, **kwargs) None[source]

Initializes Operation with provided function and optional arguments.

Parameters

funcCallable

Any callable object that can be called with one positional argument.

*argsAny

Additional positional arguments passed to the operation function.

**kwargsAny

Additional keyword arguments passed to the operation function.

class Text[source]

Bases: OperationSearcherMixin

Operation to extract text from IElement. Wrapper of most common operation used in web scraping.

Example

>>> from soupsavvy.operations import Text
... operation = Text()
... operation.execute(tag)
"Extracted text from the tag"

Implements TagSearcher interface for convenience. It has find methods that can be used to extract text from provided element.

Example

>>> from soupsavvy.operations import Text
... operation = Text()
... operation.find(tag)
"Text"

Notes

Results of this operation may vary between implementations of IElement, as each of them extracts text differently.

class Href[source]

Bases: OperationSearcherMixin

Operation to extract href attribute from IElement. Wrapper of one of the common operation used in web scraping. If href attribute is not present, returns None.

Example

>>> from soupsavvy.operations import Href
... operation = Href()
... operation.execute(tag)
"www.example.com"

Implements TagSearcher interface for convenience. It has find methods that can be used to extract href from provided element.

Example

>>> from soupsavvy.operations import Href
... operation = Href()
... operation.find(tag)
"www.example.com"
class Parent[source]

Bases: BaseOperation, SoupSelector

Operation to extract parent of IElement.

Example

>>> from soupsavvy.operations import Parent
... operation = Parent()
... operation.execute(tag)
"<div>...</div>"

Implements SoupSelector interface for convenience and can be used to extract parent of a provided tag without any conditions.

Example

>>> from soupsavvy.operations import Parent
... operation = Parent()
... operation.find(tag)
"<div>--tag--</div>"

Parent has BaseOperation higher in MRO than SoupSelector, so, using pipe operator | on Parent object will result in OperationPipeline instance.

Example

>>> from soupsavvy.operations import Parent
... operation = Parent() | Parent()
... operation.execute(tag)
"<div><div>--tag--</div></div>"

Notes

If element does not have parent, returns None.

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

class Break[source]

Bases: OperationSearcherMixin

Operation to break the pipeline execution and return the current result. Can be used in selection/operation pipelines with IfElse operation to conditionally stop the execution.

Example

>>> from soupsavvy.operations import Break, IfElse, Operation
... operation = IfElse(
...     lambda x: x == 0,
...     Break(),
...     Operation(lambda x: 100 / x),
... ) | Operation(lambda x: x + 1)
... operation.execute(0)
0
... operation.execute(100)
2

If Break operation is executed, the pipeline will stop and return the result, so next operation is not executed.

class IfElse(condition: Callable[[Any], bool], if_: BaseOperation, else_: BaseOperation)[source]

Bases: OperationSearcherMixin

Operation to control flow in the pipeline. Allows to execute different operations based on the condition.

Example

>>> from soupsavvy.operations import IfElse, Operation
... operation = IfElse(
...     lambda x: x == 0,
...     Operation(lambda x: None),
...     Operation(lambda x: x / 100),
... )
... operation.execute(0)
None
... operation.execute(100)
1

Implements TagSearcher interface for convenience. It can conditionally apply operations to the element and can be used as model field.

Example

>>> from soupsavvy.operations import IfElse, Operation, Text
... operation = IfElse(
...     lambda x: x.get("id") == "user",
...     Text(),
...     Href(),
... )
... operation.find(user_element)
username
... operation.find(other_element)
www.example.com
__init__(condition: Callable[[Any], bool], if_: BaseOperation, else_: BaseOperation) None[source]

Initializes `IfEls`e operation with condition and two operations.

Parameters

conditionCallable[[Any], bool]

Condition to check if the operation should be executed. If callable returns True, if_ operation is executed, otherwise else_.

if_BaseOperation

Operation to execute if condition is fulfilled.

else_BaseOperation

Operation to execute if condition is not fulfilled.

class Continue[source]

Bases: OperationSearcherMixin

Operation to skip the current operation ad move to the next one. Can be used in selection/operation pipelines with IfElse operation to conditionally skip the operation.

Example

>>> from soupsavvy.operations import Continue, IfElse, Operation
... operation = IfElse(
...     lambda x: x == 0,
...     Continue(),
...     Operation(lambda x: x / 100),
... ) | Operation(lambda x: x - 1)
... operation.execute(0)
-1
... operation.execute(100)
0

If Continue operation is executed, operation is skipped and the next one is executed.

class SkipNone(operation: BaseOperation)[source]

Bases: OperationWrapper

A wrapper that skips the operation if the input is None. Used to prevent exceptions where it’s safe and expected to skip the operation.

Example

>>> from soupsavvy.operations import Text
... from soupsavvy.models import SkipNone
... operation = SkipNone(Text())
... operation.execute(None)
None

When element was not found, which can be expected, skips operation and returns None.

class Suppress(operation: ~soupsavvy.base.BaseOperation, category: ~typing.Type[Exception] | tuple[~typing.Type[Exception], ...] = <class 'Exception'>)[source]

Bases: OperationWrapper

A wrapper that executes the operation and suppresses exceptions raised, returning None instead. Used to catch exceptions where it’s expected this might happen.

Example

>>> from soupsavvy.operations import Operation
... from soupsavvy.models import Suppress
... operation = Suppress(Operation(int))
... operation.execute("")
None

This can be used with Default operation to provide a default value when the operation fails.

Example

>>> from soupsavvy.operations import Operation
... from soupsavvy.models import Suppress
... operation = Default(Suppress(Operation(int)), 0)
... operation.execute("")
None

Operations in example can be used to try to convert string to integer from text of element, that can potentially be empty. In such case, if it’s not required, default can be set to None or known value.

Suppress also accepts category of exceptions to suppress, by default it suppresses all exceptions that inherit from Exception.

Example

>>> from soupsavvy.operations import Operation
... from soupsavvy.models import Suppress
... operation = Suppress(Operation(int), category=ValueError)
... operation.execute("not an integer")
None

Category can be a tuple of exceptions as well, issubclass is used to check if the cause of exception is subclass of provided category.

Example

>>> from soupsavvy.operations import Operation
... from soupsavvy.models import Suppress
... operation = Suppress(Operation(int), category=(AttributeError, ValueError))
... operation.execute("not an integer")
FailedOperationExecution
__init__(operation: ~soupsavvy.base.BaseOperation, category: ~typing.Type[Exception] | tuple[~typing.Type[Exception], ...] = <class 'Exception'>) None[source]

Initialize Suppress operation instance.

Parameters

operationBaseOperation

The operation to be wrapped.

categoryType[Exception] | tuple[Type[Exception], …], optional

The exception type(s) to suppress. By default, suppresses all exceptions that inherit from Exception.

Raises

NotOperationException

If provided object is not an instance of BaseOperation.