implementation.lxml

Module with lxml implementation of IElement. LXMLElement class is an adapter making lxml tree, compatible with IElement interface and usable across the library.

class LXMLElement(node: N, *args, **kwargs)[source]

Bases: IElement[_Element]

Implementation of IElement for lxml tree. Adapter for lxml objects, that makes them usable across the library.

Example

>>> from soupsavvy.implementation.lxml import LXMLElement
... from lxml.etree import fromstring
... node = fromstring("<html><body><div>example</div></body></html>")
... element = LXMLElement(node)
find_all(name: str | None = None, attrs: dict[str, str | Pattern[str]] | None = None, recursive: bool = True, limit: int | None = None) list[Self][source]

Finds all elements that match specified element name and attributes.

Parameters

namestr, optional

Name of the element to search for. If None, matches all elements.

attrsdict[str, str | Pattern[str]], optional

Dictionary of attributes to match. Supports exact matches and regex patterns.

recursivebool, optional

If True, searches recursively through all descendants. If False, searches only direct children.

limitint, optional

Maximum number of elements to return. If None, returns all matching elements.

Returns

list[Self]

List of elements that match the criteria, in depth-first order.

find_subsequent_siblings(limit: int | None = None) list[Self][source]

Finds siblings that follow this node in the document structure.

Parameters

limitint, optional

Maximum number of sibling nodes to return. If None, returns all siblings.

Returns

list[Self]

List of subsequent sibling elements, in document order.

find_ancestors(limit: int | None = None) list[Self][source]

Finds all ancestor nodes up to the root of the document.

Parameters

limitint, optional

Maximum number of ancestors to return, starting from the closest ancestor. If None, returns all ancestors.

Returns

list[Self]

List of ancestor nodes, from nearest to root.

get_attribute(name: str) str | None[source]

Retrieves the value of a specified attribute for this node.

Parameters

namestr

Name of the attribute.

Returns

Optional[str]

The attribute value as a string, or None if the attribute does not exist.

Notes

For dynamic attributes (e.g., in browser contexts), the returned value reflects the current state of the element.

css(selector: str) CSSSelectApi[source]

Returns a SelectionApi for CSS-based selection.

Parameters

selectorAny

The CSS selector to apply.

Returns

SelectionApi

Initialized SelectionApi instance for CSS selection.

xpath(selector) LXMLXpathApi[source]

Returns a SelectionApi for XPath-based selection.

Parameters

selectorAny

The XPath selector to apply.

Returns

SelectionApi

Initialized SelectionApi instance for XPath selection.

property children: Iterable[Self]

Returns an iterable of the direct child elements of this node.

Notes

Only tag elements are included; text and comment nodes are excluded.

Returns

Iterable[Self]

Iterable of direct child nodes, in document order.

property descendants: Iterable[Self]

Returns an iterable of all descendant nodes of this node.

Notes

Only tag elements are included; text and comment nodes are excluded. Nodes are returned in depth-first order.

Returns

Iterable[Self]

Iterable of all descendant nodes.

property parent: Self | None

Returns the immediate parent node of this element, if it exists.

Returns

Optional[Self]

The parent element, or None if this is the root node.

property name: str

Returns the tag name of this element.

property text: str

Gets the combined text content of this element.

Notes

Concatenates all text nodes within this element. The format may vary slightly across implementations depending on handling of whitespace or nested elements.

Returns

str

Text content of this element, or an empty string if none is found.