implementation.lxml
Module with lxml implementation of IElement. LXMLElement class is an adapter making lxml tree, compatible with IElement interface and usable across the library.
- class LXMLElement(node: N, *args, **kwargs)[source]
Bases:
IElement[_Element]Implementation of IElement for lxml tree. Adapter for lxml objects, that makes them usable across the library.
Example
>>> from soupsavvy.implementation.lxml import LXMLElement ... from lxml.etree import fromstring ... node = fromstring("<html><body><div>example</div></body></html>") ... element = LXMLElement(node)
- find_all(name: str | None = None, attrs: dict[str, str | Pattern[str]] | None = None, recursive: bool = True, limit: int | None = None) list[Self][source]
Finds all elements that match specified element name and attributes.
Parameters
- namestr, optional
Name of the element to search for. If None, matches all elements.
- attrsdict[str, str | Pattern[str]], optional
Dictionary of attributes to match. Supports exact matches and regex patterns.
- recursivebool, optional
If True, searches recursively through all descendants. If False, searches only direct children.
- limitint, optional
Maximum number of elements to return. If None, returns all matching elements.
Returns
- list[Self]
List of elements that match the criteria, in depth-first order.
- find_subsequent_siblings(limit: int | None = None) list[Self][source]
Finds siblings that follow this node in the document structure.
Parameters
- limitint, optional
Maximum number of sibling nodes to return. If None, returns all siblings.
Returns
- list[Self]
List of subsequent sibling elements, in document order.
- find_ancestors(limit: int | None = None) list[Self][source]
Finds all ancestor nodes up to the root of the document.
Parameters
- limitint, optional
Maximum number of ancestors to return, starting from the closest ancestor. If None, returns all ancestors.
Returns
- list[Self]
List of ancestor nodes, from nearest to root.
- get_attribute(name: str) str | None[source]
Retrieves the value of a specified attribute for this node.
Parameters
- namestr
Name of the attribute.
Returns
- Optional[str]
The attribute value as a string, or None if the attribute does not exist.
Notes
For dynamic attributes (e.g., in browser contexts), the returned value reflects the current state of the element.
- css(selector: str) CSSSelectApi[source]
Returns a SelectionApi for CSS-based selection.
Parameters
- selectorAny
The CSS selector to apply.
Returns
- SelectionApi
Initialized SelectionApi instance for CSS selection.
- xpath(selector) LXMLXpathApi[source]
Returns a SelectionApi for XPath-based selection.
Parameters
- selectorAny
The XPath selector to apply.
Returns
- SelectionApi
Initialized SelectionApi instance for XPath selection.
- property children: Iterable[Self]
Returns an iterable of the direct child elements of this node.
Notes
Only tag elements are included; text and comment nodes are excluded.
Returns
- Iterable[Self]
Iterable of direct child nodes, in document order.
- property descendants: Iterable[Self]
Returns an iterable of all descendant nodes of this node.
Notes
Only tag elements are included; text and comment nodes are excluded. Nodes are returned in depth-first order.
Returns
- Iterable[Self]
Iterable of all descendant nodes.
- property parent: Self | None
Returns the immediate parent node of this element, if it exists.
Returns
- Optional[Self]
The parent element, or None if this is the root node.
- property name: str
Returns the tag name of this element.
- property text: str
Gets the combined text content of this element.
Notes
Concatenates all text nodes within this element. The format may vary slightly across implementations depending on handling of whitespace or nested elements.
Returns
- str
Text content of this element, or an empty string if none is found.