soupsavvy package

Subpackages

Submodules

Module contents

soupsavvy is flexible search engine for BeautifulSoup, designed to provide more powerful capabilities, making more complex searches simple and web scraping tasks more efficient and manageable.

soupsavvy introduces the concept of a Selector, a declarative search procedure designed with simple and readable syntax. It encapsulates search logic, making it reusable across different scenarios.

The package offers various types of selectors that can be easily combined to perform more complex searches.

class UniversalSelector[source]

Bases: SoupSelector, SelectableCSS

Selector representing a wildcard pattern, that matches all elements in the html page.

Example

>>> UniversalSelector()

CSS counterpart can be represented as:

Example

>>> *

And can be retrieved with css property.

Example

>>> UniversalSelector().css
"*"

Notes

For more information on universal selector, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/Universal_selectors

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

property css: str

Returns wildcard css selector matching all elements in the markup.

__init__() None
class AttributeSelector(name: str, value: Pattern[str] | str | None = None)[source]

Bases: SoupSelector

Selector for searching element based on its attribute value. Counterpart of css attribute selectors, that extends its capability with regex pattern matching.

Example

>>> AttributeSelector(name="role", value="widget")

matches all elements that have ‘role’ attribute with value “widget”.

Example

>>> <div role="widget">Hello World</div> ✔️
>>> <div class="menu">Hello World</div> ❌
>>> <div role="menu">Hello World</div> ❌

CSS counterpart can be represented as:

Example

>>> [role="widget"]

In case of using regex pattern, re.search is used to match the attribute value.

Example

>>> AttributeSelector(name="href", value=re.compile(r"wikipedia"))

Parameters

namestr

HTML element attribute name ex. “class”, “href”

valuestr | Pattern, optional

Value of the attribute to match. By default None, if not provided, default pattern matching any sequence of characters is used.

Notes

For more information about attribute selectors, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors

name: str
value: Pattern[str] | str | None = None
find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

__init__(name: str, value: Pattern[str] | str | None = None) None
class TypeSelector(name: str)[source]

Bases: SoupSelector, SelectableCSS

Selector for finding elements based on tag name (type). Counterpart of css type selectors.

Example

>>> TypeSelector("div")

matches all elements that have “div” tag name.

Example

>>> <div class="widget">Hello World</div> ✔️
>>> <a href="/shop">Hello World</a> ❌

CSS counterpart can be represented as:

Example

>>> div

And can be retrieved with css property.

Example

>>> TypeSelector("div").css
"div"

Parameters

namestr

Tag name of the element ex. “a”, “div”.

Notes

For more information about type selectors, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/Type_selectors

name: str
find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

property css: str

Returns string representing element css selector.

__init__(name: str) None
class ExpressionSelector(f: Callable[[IElement], bool])[source]

Bases: SoupSelector

Selector that matches elements based on a user-defined function (predicate), that is used as filter for element object.

Applies predicate to each element and returns those that satisfy the condition.

Parameters

fCallable[[IElement], bool]

A user-defined function (predicate) that determines whether the element should be selected.

Examples

>>> selector = ExpressionSelector(lambda x: x.name not in {"a", "div"})
... selector.find(soup)

To perform operations on underlying node, use IElement.get() method or IElement.node attribute.

Example

>>> selector = ExpressionSelector(lambda x: 'widget' in x.node['class'])

For SoupElement object, that wraps bs4.Tag.

Notes

Any exceptions should be handled inside provided function. If raised, it will be propagated to the caller.

f: Callable[[IElement], bool]
find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

__init__(f: Callable[[IElement], bool]) None
class PatternSelector(pattern: Pattern[str] | str)[source]

Bases: SoupSelector

Selector for finding elements based on text content pattern.

Example

>>> PatternSelector("Hello World")

matches all element with exact text content “Hello World”.

Example

>>> <div>Hello World</div> ✔️
>>> <div>Hello Python</div> ❌
>>> <div>Hello World 3</div> ❌

In case of using regex pattern, re.search is used to match the attribute value.

Example

>>> PatternSelector(re.compile(r"[0-9]+"))

matches all elements with text content containing at least one digit.

Example

>>> <div>Hello World 123</div> ✔️
>>> <div>Hello World</div> ❌

Parameters

pattern: str | Pattern

Pattern to match text of the element. Can be a string for exact match or Pattern for any more complex regular expressions.

Notes

Element does not match the pattern if it has any children. Only leaf nodes can be returned by PatternSelector find methods.

pattern: Pattern[str] | str
find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

__init__(pattern: Pattern[str] | str) None
class XPathSelector(xpath: Any)[source]

Bases: SoupSelector

Selector for finding elements based on XPath expressions.

Examples

>>> selector = XPathSelector("//p[@class='menu']")
... selector.find(soup)

Examples

>>> from lxml.etree import XPath
... selector = XPathSelector(XPath("//p[@class='menu']", smart_strings=False))
... selector.find(soup)

Expressions must target elements, not attributes or text content.

Examples

>>> selector = XPathSelector("//div//@href")
... selector.find(soup)
None

Notes

Equality check includes only xpath expression, as lxml XPath object does not implement more specific __eq__ method.

__init__(xpath: Any) None[source]

Initializes XPathSelector with a given XPath expression.

Parameters

xpathstr | lxml.etree.XPath

String representing of xpath expression or compiled XPath object. It needs to target elements, not attributes or text content.

Raises

InvalidXPathSelector

If the provided XPath string cannot be compiled into XPath object.

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

class SelectorList(selector1: SoupSelector, selector2: SoupSelector, /, *selectors: SoupSelector)[source]

Bases: CompositeSoupSelector

Counterpart of CSS selector list. At least one selector from list must match the element to be included.

Example

>>> SelectorList(TypeSelector("a"), AttributeSelector(name="class", value="widget"))

matches all elements that have “a” tag name OR ‘class’ attribute “widget”.

Example

>>> <a>Hello World</a> ✔️
>>> <div class="widget">Hello World</div> ✔️
>>> <div>Hello Python</div> ❌

Object can be created as well by using pipe operator | on SoupSelector objects.

Example

>>> TypeSelector("a") | ClassSelector("widget")

CSS counterpart can be represented as:

Example

>>> a, .widget
>>> :is(a, .widget)

Notes

For more information on selector list, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/Selector_list

__init__(selector1: SoupSelector, selector2: SoupSelector, /, *selectors: SoupSelector) None[source]

Initializes SelectorList object with provided positional arguments.

Parameters

selectors: SoupSelector

At least two SoupSelector objects to match accepted as positional arguments.

Raises

NotSoupSelectorException

If any of provided parameters is not an instance of SoupSelector.

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

class DescendantCombinator(selector1: SoupSelector, selector2: SoupSelector, /, *selectors: SoupSelector)[source]

Bases: BaseCombinator

Counterpart of CSS descendant combinator. Represents the relationship between selectors, where every next matching element is a descendant of the previous one.

Example

>>> DescentCombinator(TypeSelector("div"), ClassSelector("widget"))

matches all descendants of ‘div’ element with ‘widget’ class.

Example

>>> <div><a class="widget"></a></div> ✔️
>>> <div><div><a class="widget"></a></div></div> ✔️
>>> <div><a id="widget"></a></div> ❌
>>> <span><a class="widget"></a></span> ❌
>>> <a class="widget"></a> ❌

Object can be created as well by using right shift operator >> on SoupSelector objects.

Example

>>> TypeSelector("div") >> ClassSelector("widget")

CSS counterpart can be represented as:

Example

>>> div .widget

Notes

For more information on subsequent sibling combinator, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/Descendant_combinator

class AndSelector(selector1: SoupSelector, selector2: SoupSelector, /, *selectors: SoupSelector)[source]

Bases: CompositeSoupSelector

Selector representing an intersection of multiple selectors, where element must be matched by all provided selectors. Counterpart of CSS compound selector.

Example

>>> AndSelector(TypeSelector("div"), ClassSelector("widget"))

matches all elements that have “div” tag name AND ‘class’ attribute “widget”.

Example

>>> <div class="widget">Hello World</div> ✔️
>>> <span class="widget">Hello World</span> ❌
>>> <div class="menu">Hello World</div> ❌

Object can be created as well by using bitwise AND operator & on SoupSelector objects.

Example

>>> TypeSelector("div") & ClassSelector("widget")

CSS counterpart can be represented as:

Example

>>> div.widget

Notes

For more information on compound selectors ,see:

https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_selectors/Selector_structure#compound_selector

__init__(selector1: SoupSelector, selector2: SoupSelector, /, *selectors: SoupSelector) None[source]

Initializes AndSelector object with provided positional arguments as selectors.

Parameters

selectors: SoupSelector

At least two SoupSelector objects to match accepted as positional arguments.

Raises

NotSoupSelectorException

If any of provided parameters is not an instance of SoupSelector.

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

class NotSelector(selector: SoupSelector, /, *selectors: SoupSelector)[source]

Bases: CompositeSoupSelector

Selector for finding elements that do not match provided selector(s). Counterpart of CSS :not() pseudo-class.

Example

>>> NotSelector(TypeSelector("div"))

matches all elements that do not have “div” tag name.

Example

>>> <span> class="widget">Hello World</span> ✔️
>>> <div class="menu">Hello World</div> ❌

Object can be created as well by using bitwise NOT operator ~ on SoupSelector object.

Example

>>> ~TypeSelector("div")

Which is equivalent to the first example.

This is CSS counterpart of :not() pseudo-class.

Example

>>> :not(div)

Notes

For more information on :not() pseudo-class, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/:not

__init__(selector: SoupSelector, /, *selectors: SoupSelector) None[source]

Initializes NotSelectors object with provided positional arguments as selectors.

Parameters

selectors: SoupSelector

At least one SoupSelector objects to negate match accepted as positional arguments.

Raises

NotSoupSelectorException

If any of provided parameters is not an instance of SoupSelector.

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

class SelfSelector[source]

Bases: SoupSelector

Selector matching only the element itself. Convenience component that can be used for compatibility.

Example

>>> SelfSelector()

always matches the tag that is passed to the find methods.

Notes

Can be used in user-defined model for scope if element itself is the scope.

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

class ChildCombinator(selector1: SoupSelector, selector2: SoupSelector, /, *selectors: SoupSelector)[source]

Bases: BaseCombinator

Counterpart of CSS child combinator. Represents the relationship between selectors, where every next matching element is a direct child of the previous one.

Example

>>> ChildCombinator(TypeSelector("div"), TypeSelector("a"))

matches all ‘a’ elements that are direct children of ‘div’ elements.

Example

>>> <div class="widget"><a>Hello World</a></div> ✔️
>>> <div class="widget"><span></span><a>Hello World</a></div> ✔️
>>> <span class="widget"><a>Hello World</a></span> ❌
>>> <div class="menu"><span>Hello World</span></div> ❌

Object can be created as well by using greater than operator > on SoupSelector objects.

Example

>>> TypeSelector("div") > TypeSelector("a")

Which is equivalent to the first example.

CSS counterpart can be represented as:

Example

>>> div > a { color: red; }

Notes

For more information on child combinator, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/Child_combinator

class NextSiblingCombinator(selector1: SoupSelector, selector2: SoupSelector, /, *selectors: SoupSelector)[source]

Bases: BaseCombinator

Counterpart of CSS next sibling combinator. Represents the relationship between selectors, where every next matching element is a sibling immediately following the previous one.

Example

>>> NextSiblingCombinator(TypeSelector("div"), TypeSelector("a"))

matches all ‘a’ elements that immediately follow ‘div’ elements, it means that both elements are children of the same parent element.

Example

>>> <div class="widget"></div><a>Hello World</a> ✔️
>>> <div class="widget"><a>Hello World</a></div> ❌
>>> <div class="widget"></div><span></span><a>Hello World</a> ❌

Object can be created as well by using plus operator + on SoupSelector objects.

Example

>>> TypeSelector("div") + TypeSelector("a")

Which is equivalent to the first example.

CSS counterpart can be represented as:

Example

>>> div + a

Notes

This is also known as the adjacent sibling combinator in CSS. For more information on next sibling combinator, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/Next-sibling_combinator

class SubsequentSiblingCombinator(selector1: SoupSelector, selector2: SoupSelector, /, *selectors: SoupSelector)[source]

Bases: BaseCombinator

Counterpart of CSS subsequent sibling combinator. Represents the relationship between selectors, where every next matching element is a sibling following the previous one, but not necessarily immediately.

Example

>>> SubsequentSiblingCombinator(TypeSelector("div"), TypeSelector("a"))

matches all ‘a’ elements that follow ‘div’ elements.

Example

>>> <div class="widget"></div><a>Hello World</a> ✔️
>>> <div class="widget"><span></span><a>Hello World</a></div> ✔️
>>> <span class="widget"><a>Hello World</a></span> ❌
>>> <a>Hello World</a><div class="menu"></div> ❌

Object can be created as well by using multiplication operator * on SoupSelector objects.

Example

>>> TypeSelector("div") * TypeSelector("a")

CSS counterpart can be represented as:

Example

>>> div ~ a

Notes

This combinator is also known as general sibling combinator in CSS. For more information on subsequent sibling combinator, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/Subsequent-sibling_combinator

class AncestorCombinator(selector1: SoupSelector, selector2: SoupSelector, /, *selectors: SoupSelector)[source]

Bases: BaseAncestorCombinator

Defines a relationship between selectors, where every next matching element is an ancestor of the previous one.

Example

>>> AncestorCombinator(TypeSelector("a"), TypeSelector("div"))

The given selector matches all ‘div’ elements that are ancestors of ‘a’ elements.

Example

>>> <div><span><a href="/shop"></a></span></div> ✔️
>>> <div><a href="/shop"></a></div> ✔️
>>> <div><span class="menu"></span>/div> ❌
>>> <span><a class="menu"></span>/div> ❌

Object can be created as well by using left shift operator << on SoupSelector objects.

Example

>>> TypeSelector("a") << TypeSelector("div")

Although this combinator does not have its counterpart in CSS, it can be represented as:

Example

>>> div:has(a)
class ParentCombinator(selector1: SoupSelector, selector2: SoupSelector, /, *selectors: SoupSelector)[source]

Bases: BaseAncestorCombinator

Defines a relationship between selectors, where every next matching element is a parent of the previous one.

Example

>>> ParentCombinator(TypeSelector("a"), TypeSelector("div"))

The given selector matches all ‘div’ elements that are parents of ‘a’ elements.

Example

>>> <div><a href="/shop"></a></div> ✔️
>>> <div><span><div><a href="/shop"></a></span></div> ❌
>>> <span><a href="/shop"></a></span> ❌

Object can be created as well by using lt operator < on SoupSelector objects.

Example

>>> TypeSelector("a") < TypeSelector("div")

Although this combinator does not have its counterpart in CSS, it can be represented as:

Example

>>> div:has(> a)
class HasSelector(selector: SoupSelector, /, *selectors: SoupSelector)[source]

Bases: CompositeSoupSelector

Selector for finding elements based on matching reference elements.

Example

>>> HasSelector(TypeSelector("div"))

matches all elements that have any descendant with “div” tag name. It uses default combinator of relative selector, which is descendant combinator.

Example

>>> <span><div>Hello World</div></span> ✔️
>>> <span><a>Hello World</a></span> ❌

Other relative selectors can be used with Anchor element.

Example

… HasSelector(Anchor > TypeSelector(“div”)) … HasSelector(Anchor + TypeSelector(“div”))

or by using RelativeSelector components directly:

Example

… HasSelector(RelativeChild(TypeSelector(“div”))) … HasSelector(RelativeNextSibling(TypeSelector(“div”))

Example

>>> <span><div>Hello World</div></span> ✔️
>>> <span><a><div>Hello World</div></a></span> ❌

In this case, HasSelector is anchored against any element, and matches only elements that have “div” tag name as a child.

This is an equivalent of CSS :has() pseudo-class, that matches element if any of the relative selectors that are passed as an argument match element when anchored against it.

Example

>>> :has(div, a)
>>> :has(+ div, > a)

These examples translated to soupsavvy would be:

Example

… HasSelector(TypeSelector(“div”), TypeSelector(“a”)) … HasSelector(Anchor + TypeSelector(“div”), Anchor > TypeSelector(“a”))

Notes

Passing RelativeDescendant selector into HasSelector is equivalent to using its selector directly, as descendant combinator is a default option.

Example

>>> HasSelector(RelativeDescendant(TypeSelector("div")))
... HasSelector(Anchor > TypeSelector("div"))
... HasSelector(TypeSelector("div"))

Three of the above examples are equivalent.

For more information on :has() pseudo-class, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/:has

__init__(selector: SoupSelector, /, *selectors: SoupSelector) None[source]

Initializes HasSelector object with provided positional arguments as selectors.

Parameters

selectors: SoupSelector

SoupSelector objects to match accepted as positional arguments. At least one selector is required to create HasSelector.

Raises

NotSoupSelectorException

If any of provided parameters is not an instance of SoupSelector.

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

OrSelector

alias of SelectorList

class ClassSelector(value: Pattern[str] | str | None = None)[source]

Bases: SpecificAttributeSelector

Specific AttributeSelector for matching elements based on ‘class’ attribute value.

Example

>>> ClassSelector("widget")

matches all elements that have ‘class’ attribute with value “widget”.

Example

>>> <div class="widget">Hello World</div> ✔️
>>> <div class="content">Hello World</div> ❌

ClassSelector is a convenience wrapper for AttributeSelector, thus example above is equivalent to using:

>>> AttributeSelector(name="class", value="widget")

CSS counterpart can be represented as:

Example

>>> .widget

In case of using regex pattern, re.search is used to match the attribute value.

Example

>>> ClassSelector(re.compile(r"nav"))

Notes

For more information about class attribute, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/Class_selectors

class IdSelector(value: Pattern[str] | str | None = None)[source]

Bases: SpecificAttributeSelector

Specific AttributeSelector for matching elements based on ‘id’ attribute value.

Example

>>> IdSelector("main")

matches all elements that have ‘id’ attribute with value “main”.

Example

>>> <div id="main">Hello World</div> ✔️
>>> <div id="content">Hello World</div> ❌

IdSelector is a convenience wrapper for AttributeSelector, thus example above is equivalent to using:

>>> AttributeSelector(name="id", value="main")

CSS counterpart can be represented as:

Example

>>> #main

In case of using regex pattern, re.search is used to match the attribute value.

Example

>>> IdSelector(re.compile(r"content[0-9]+"))

Notes

For more information about id attribute, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/ID_selectors

class XORSelector(selector1: SoupSelector, selector2: SoupSelector, /, *selectors: SoupSelector)[source]

Bases: CompositeSoupSelector

Selector representing an exclusive OR of multiple selectors, where element must be matched by exactly one of them.

Example

>>> XORSelector(TypeSelector("div"), ClassSelector("widget"))

Matches all elements that have either “div” tag name or ‘class’ attribute “widget”. Elements with both “div” tag name and ‘class’ attribute “widget” do not match selector.

Object can be created as well by using xor operator ^ (caret) on SoupSelector objects.

Example

>>> TypeSelector("div") ^ ClassSelector("widget")

This is a shortcut for defining XOR operation between two selectors like this:

Example

>>> selector1 = TypeSelector("div")
... selector2 = ClassSelector("widget")
... xor = (selector1 & (~selector2)) | ((~selector1) & selector2)
__init__(selector1: SoupSelector, selector2: SoupSelector, /, *selectors: SoupSelector) None[source]

Initializes XORSelector object with provided positional arguments as selectors.

Parameters

selectors: SoupSelector

At least two SoupSelector objects to match accepted as positional arguments.

Raises

NotSoupSelectorException

If any of provided parameters is not an instance of SoupSelector.

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

class AnyTagSelector(**kwargs)[source]

Bases: UniversalSelector

Alias for UniversalSelector class. Deprecated component.

class NthOfSelector(selector: SoupSelector, nth: str)[source]

Bases: BaseNthOfSelector

Selector for finding nth-of elements in the soup among elements that match provided SoupSelector instance.

Example

>>> selector = NthOfSelector(ClassSelector("item"), "2n+1")

matches all odd elements with class “item”.

Example

>>> <div class="item">1</div> ✔️
... <div id="item"></div> ❌
... <div class="item">2</div> ❌
... <div class="item">3</div> ✔️
... <div class="widget"></div> ❌
... <div class="item">4</div> ❌

Notes

For more information about standard :nth-of-type pseudo-class, visit:

https://developer.mozilla.org/en-US/docs/Web/CSS/:nth-of-type

class NthLastOfSelector(selector: SoupSelector, nth: str)[source]

Bases: BaseNthOfSelector

Selector for finding nth-last-of elements in the soup among elements that match provided SoupSelector instance.

Example

>>> selector = NthLastOfSelector(ClassSelector("item"), "2n+1")

matches all odd elements with class “item” starting from the last element.

Example

>>> <div class="item">1</div> ❌
... <div id="item"></div> ❌
... <div class="item">2</div> ✔️
... <div class="item">3</div> ❌
... <div class="widget"></div> ❌
... <div class="item">4</div> ✔️

Notes

For more information about standard :nth-of-type pseudo-class, visit:

https://developer.mozilla.org/en-US/docs/Web/CSS/:nth-last-of-type

class OnlyOfSelector(selector: SoupSelector)[source]

Bases: SoupSelector

Selector for finding the only element, that matches provided SoupSelector instance among its siblings.

Example

>>> selector = OnlyOfSelector(ClassSelector("item"))

matches all elements with class “item” that are the only child of their parent that matches the selector.

Example

>>> <div><div class="item"></div><a class="item"></a></div> ❌
>>> <div><div class="item"></div><a class="widget"></a></div> ✔️
>>> <div><div class="item"></div></div> ✔️
>>> <div><div class="widget"></div></div> ❌

Notes

For more information about standard :only-of-type pseudo-class, visit:

https://developer.mozilla.org/en-US/docs/Web/CSS/:only-of-type

__init__(selector: SoupSelector) None[source]

Initializes OnlyOfSelector instance.

Parameters

selectorSoupSelector

Any SoupSelector instance used to match elements.

Raises

NotSoupSelectorException

If selector is not an instance of SoupSelector.

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

to_soupsavvy(node: Any) IElement[source]

Converts node of supported type into an appropriate IElement instance making it usable across soupsavvy with all its features.

Parameters

nodeAny

A node object of supported type, currently supported implementations are: “beautifulsoup4”, “lxml”, “selenium” and “playwright”.

Returns

IElement

An instance of IElement, wrapping the node object.

Examples

>>> from bs4 import BeautifulSoup
... from soupsavvy import to_soupsavvy
... soup = BeautifulSoup("<p>Hello, World!</p>", "html.parser")
... element = to_soupsavvy(soup)

Raises

TypeError

If the node object is of an unsupported type.

Notes

If IElement is passed as an argument, it will be returned back.

class CSS(css: str)[source]

Bases: CSSSoupSelector

Selector for finding elements based on any provided CSS selector. soupsieve adapter, that allows any supported css selector to be used with other soupsavvy components.

Example

>>> CSS("div.menu")

Would match:

Example

>>> <div class="widget"> ❌
...    <div class="menu">Hello World</div> ✔️
... </div>
... <div class="menu_main"> ❌
...    <a class="menu">Hello World</a> ❌
... </div>
... <div class="menu"></div> ✔️

Notes

Implemented selectors may vary between implementations, as each of them uses specific compatible libraries for css selection.

SELECTOR: str = '{}'
__init__(css: str) None[source]

Initializes the selector with the provided css selector.

Parameters

cssstr

CSS selector to be used for selecting elements.