operations.element
Module for operations specific to IElement interface.
Classes
Text - Extracts text from element - most common operation.
Href - Extracts href attribute from element.
Parent - Extracts parent of element.
These components are design to be used for processing IElement and extracting desired information. They can be used in combination with selectors.
- class Text[source]
Bases:
OperationSearcherMixinOperation to extract text from IElement. Wrapper of most common operation used in web scraping.
Example
>>> from soupsavvy.operations import Text ... operation = Text() ... operation.execute(tag) "Extracted text from the tag"
Implements TagSearcher interface for convenience. It has find methods that can be used to extract text from provided element.
Example
>>> from soupsavvy.operations import Text ... operation = Text() ... operation.find(tag) "Text"
Notes
Results of this operation may vary between implementations of IElement, as each of them extracts text differently.
- class Href[source]
Bases:
OperationSearcherMixinOperation to extract href attribute from IElement. Wrapper of one of the common operation used in web scraping. If href attribute is not present, returns None.
Example
>>> from soupsavvy.operations import Href ... operation = Href() ... operation.execute(tag) "www.example.com"
Implements TagSearcher interface for convenience. It has find methods that can be used to extract href from provided element.
Example
>>> from soupsavvy.operations import Href ... operation = Href() ... operation.find(tag) "www.example.com"
- class Parent[source]
Bases:
BaseOperation,SoupSelectorOperation to extract parent of IElement.
Example
>>> from soupsavvy.operations import Parent ... operation = Parent() ... operation.execute(tag) "<div>...</div>"
Implements SoupSelector interface for convenience and can be used to extract parent of a provided tag without any conditions.
Example
>>> from soupsavvy.operations import Parent ... operation = Parent() ... operation.find(tag) "<div>--tag--</div>"
Parent has BaseOperation higher in MRO than SoupSelector, so, using pipe operator | on Parent object will result in OperationPipeline instance.
Example
>>> from soupsavvy.operations import Parent ... operation = Parent() | Parent() ... operation.execute(tag) "<div><div>--tag--</div></div>"
Notes
If element does not have parent, returns None.
- find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]
Finds all elements matching selector in provided IElement.
Parameters
- tagIElement
Any IElement object to search within.
- recursivebool, optional
Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.
- limitint, optional
Specifies maximum number of elements to return. By default None, all found elements are returned.
Returns
- list[IElement]
List of IElement objects matching selector. If none found, the list is empty.