operations
Submodules
Content
Package with operations used to post-process the results of selections.
Classes
Operation - User defined operation with any function.
Text - Operation to extract text from the element.
Href - Operation to extract href attribute from the element.
Parent - Operation to extract parent element of the element.
Break - Operation to break the pipeline execution.
IfElse - Operation to control flow in the pipeline.
Continue - Operation to skip the operation and continue with the next one.
SkipNone - Operation to skip operation if input is None.
Suppress - Operation to suppress exceptions and return None instead.
- class Operation(func: Callable, *args, **kwargs)[source]
Bases:
OperationSearcherMixinCustom operation that wraps any function to be used with other soupsavvy components.
Example
>>> from soupsavvy.operations import Operation ... operation = Operation(str.lower) ... operation.execute("TEXT") "text"
Operation is operation-searcher mixin, which means it can be used to find information in IElement directly with find methods. This way, it can be used as field in model or execute method can be replaced with find method, which would produce the same result.
- class Text[source]
Bases:
OperationSearcherMixinOperation to extract text from IElement. Wrapper of most common operation used in web scraping.
Example
>>> from soupsavvy.operations import Text ... operation = Text() ... operation.execute(tag) "Extracted text from the tag"
Implements TagSearcher interface for convenience. It has find methods that can be used to extract text from provided element.
Example
>>> from soupsavvy.operations import Text ... operation = Text() ... operation.find(tag) "Text"
Notes
Results of this operation may vary between implementations of IElement, as each of them extracts text differently.
- class Href[source]
Bases:
OperationSearcherMixinOperation to extract href attribute from IElement. Wrapper of one of the common operation used in web scraping. If href attribute is not present, returns None.
Example
>>> from soupsavvy.operations import Href ... operation = Href() ... operation.execute(tag) "www.example.com"
Implements TagSearcher interface for convenience. It has find methods that can be used to extract href from provided element.
Example
>>> from soupsavvy.operations import Href ... operation = Href() ... operation.find(tag) "www.example.com"
- class Parent[source]
Bases:
BaseOperation,SoupSelectorOperation to extract parent of IElement.
Example
>>> from soupsavvy.operations import Parent ... operation = Parent() ... operation.execute(tag) "<div>...</div>"
Implements SoupSelector interface for convenience and can be used to extract parent of a provided tag without any conditions.
Example
>>> from soupsavvy.operations import Parent ... operation = Parent() ... operation.find(tag) "<div>--tag--</div>"
Parent has BaseOperation higher in MRO than SoupSelector, so, using pipe operator | on Parent object will result in OperationPipeline instance.
Example
>>> from soupsavvy.operations import Parent ... operation = Parent() | Parent() ... operation.execute(tag) "<div><div>--tag--</div></div>"
Notes
If element does not have parent, returns None.
- find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]
Finds all elements matching selector in provided IElement.
Parameters
- tagIElement
Any IElement object to search within.
- recursivebool, optional
Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.
- limitint, optional
Specifies maximum number of elements to return. By default None, all found elements are returned.
Returns
- list[IElement]
List of IElement objects matching selector. If none found, the list is empty.
- class Break[source]
Bases:
OperationSearcherMixinOperation to break the pipeline execution and return the current result. Can be used in selection/operation pipelines with IfElse operation to conditionally stop the execution.
Example
>>> from soupsavvy.operations import Break, IfElse, Operation ... operation = IfElse( ... lambda x: x == 0, ... Break(), ... Operation(lambda x: 100 / x), ... ) | Operation(lambda x: x + 1) ... operation.execute(0) 0 ... operation.execute(100) 2
If Break operation is executed, the pipeline will stop and return the result, so next operation is not executed.
- class IfElse(condition: Callable[[Any], bool], if_: BaseOperation, else_: BaseOperation)[source]
Bases:
OperationSearcherMixinOperation to control flow in the pipeline. Allows to execute different operations based on the condition.
Example
>>> from soupsavvy.operations import IfElse, Operation ... operation = IfElse( ... lambda x: x == 0, ... Operation(lambda x: None), ... Operation(lambda x: x / 100), ... ) ... operation.execute(0) None ... operation.execute(100) 1
Implements TagSearcher interface for convenience. It can conditionally apply operations to the element and can be used as model field.
Example
>>> from soupsavvy.operations import IfElse, Operation, Text ... operation = IfElse( ... lambda x: x.get("id") == "user", ... Text(), ... Href(), ... ) ... operation.find(user_element) username ... operation.find(other_element) www.example.com
- __init__(condition: Callable[[Any], bool], if_: BaseOperation, else_: BaseOperation) None[source]
Initializes `IfEls`e operation with condition and two operations.
Parameters
- class Continue[source]
Bases:
OperationSearcherMixinOperation to skip the current operation ad move to the next one. Can be used in selection/operation pipelines with IfElse operation to conditionally skip the operation.
Example
>>> from soupsavvy.operations import Continue, IfElse, Operation ... operation = IfElse( ... lambda x: x == 0, ... Continue(), ... Operation(lambda x: x / 100), ... ) | Operation(lambda x: x - 1) ... operation.execute(0) -1 ... operation.execute(100) 0
If Continue operation is executed, operation is skipped and the next one is executed.
- class SkipNone(operation: BaseOperation)[source]
Bases:
OperationWrapperA wrapper that skips the operation if the input is None. Used to prevent exceptions where it’s safe and expected to skip the operation.
Example
>>> from soupsavvy.operations import Text ... from soupsavvy.models import SkipNone ... operation = SkipNone(Text()) ... operation.execute(None) None
When element was not found, which can be expected, skips operation and returns None.
- class Suppress(operation: ~soupsavvy.base.BaseOperation, category: ~typing.Type[Exception] | tuple[~typing.Type[Exception], ...] = <class 'Exception'>)[source]
Bases:
OperationWrapperA wrapper that executes the operation and suppresses exceptions raised, returning None instead. Used to catch exceptions where it’s expected this might happen.
Example
>>> from soupsavvy.operations import Operation ... from soupsavvy.models import Suppress ... operation = Suppress(Operation(int)) ... operation.execute("") None
This can be used with Default operation to provide a default value when the operation fails.
Example
>>> from soupsavvy.operations import Operation ... from soupsavvy.models import Suppress ... operation = Default(Suppress(Operation(int)), 0) ... operation.execute("") None
Operations in example can be used to try to convert string to integer from text of element, that can potentially be empty. In such case, if it’s not required, default can be set to None or known value.
Suppress also accepts category of exceptions to suppress, by default it suppresses all exceptions that inherit from Exception.
Example
>>> from soupsavvy.operations import Operation ... from soupsavvy.models import Suppress ... operation = Suppress(Operation(int), category=ValueError) ... operation.execute("not an integer") None
Category can be a tuple of exceptions as well, issubclass is used to check if the cause of exception is subclass of provided category.
Example
>>> from soupsavvy.operations import Operation ... from soupsavvy.models import Suppress ... operation = Suppress(Operation(int), category=(AttributeError, ValueError)) ... operation.execute("not an integer") FailedOperationExecution
- __init__(operation: ~soupsavvy.base.BaseOperation, category: ~typing.Type[Exception] | tuple[~typing.Type[Exception], ...] = <class 'Exception'>) None[source]
Initialize Suppress operation instance.
Parameters
- operationBaseOperation
The operation to be wrapped.
- categoryType[Exception] | tuple[Type[Exception], …], optional
The exception type(s) to suppress. By default, suppresses all exceptions that inherit from Exception.
Raises
- NotOperationException
If provided object is not an instance of BaseOperation.