operations.element

Module for operations specific to IElement interface.

Classes

  • Text - Extracts text from element - most common operation.

  • Href - Extracts href attribute from element.

  • Parent - Extracts parent of element.

These components are design to be used for processing IElement and extracting desired information. They can be used in combination with selectors.

class Text[source]

Bases: OperationSearcherMixin

Operation to extract text from IElement. Wrapper of most common operation used in web scraping.

Example

>>> from soupsavvy.operations import Text
... operation = Text()
... operation.execute(tag)
"Extracted text from the tag"

Implements TagSearcher interface for convenience. It has find methods that can be used to extract text from provided element.

Example

>>> from soupsavvy.operations import Text
... operation = Text()
... operation.find(tag)
"Text"

Notes

Results of this operation may vary between implementations of IElement, as each of them extracts text differently.

class Href[source]

Bases: OperationSearcherMixin

Operation to extract href attribute from IElement. Wrapper of one of the common operation used in web scraping. If href attribute is not present, returns None.

Example

>>> from soupsavvy.operations import Href
... operation = Href()
... operation.execute(tag)
"www.example.com"

Implements TagSearcher interface for convenience. It has find methods that can be used to extract href from provided element.

Example

>>> from soupsavvy.operations import Href
... operation = Href()
... operation.find(tag)
"www.example.com"
class Parent[source]

Bases: BaseOperation, SoupSelector

Operation to extract parent of IElement.

Example

>>> from soupsavvy.operations import Parent
... operation = Parent()
... operation.execute(tag)
"<div>...</div>"

Implements SoupSelector interface for convenience and can be used to extract parent of a provided tag without any conditions.

Example

>>> from soupsavvy.operations import Parent
... operation = Parent()
... operation.find(tag)
"<div>--tag--</div>"

Parent has BaseOperation higher in MRO than SoupSelector, so, using pipe operator | on Parent object will result in OperationPipeline instance.

Example

>>> from soupsavvy.operations import Parent
... operation = Parent() | Parent()
... operation.execute(tag)
"<div><div>--tag--</div></div>"

Notes

If element does not have parent, returns None.

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.