selectors.general

Module with miscellaneous selectors.

Classes

  • TypeSelector - combines type and attribute selectors

  • PatternSelector - matches elements based on text content and selector

  • UniversalSelector - universal selector (*)

  • SelfSelector - matches the element itself

  • ExpressionSelector - matches elements based on user-defined function

class TypeSelector(name: str)[source]

Bases: SoupSelector, SelectableCSS

Selector for finding elements based on tag name (type). Counterpart of css type selectors.

Example

>>> TypeSelector("div")

matches all elements that have “div” tag name.

Example

>>> <div class="widget">Hello World</div> ✔️
>>> <a href="/shop">Hello World</a> ❌

CSS counterpart can be represented as:

Example

>>> div

And can be retrieved with css property.

Example

>>> TypeSelector("div").css
"div"

Parameters

namestr

Tag name of the element ex. “a”, “div”.

Notes

For more information about type selectors, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/Type_selectors

name: str
find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

property css: str

Returns string representing element css selector.

__init__(name: str) None
class PatternSelector(pattern: Pattern[str] | str)[source]

Bases: SoupSelector

Selector for finding elements based on text content pattern.

Example

>>> PatternSelector("Hello World")

matches all element with exact text content “Hello World”.

Example

>>> <div>Hello World</div> ✔️
>>> <div>Hello Python</div> ❌
>>> <div>Hello World 3</div> ❌

In case of using regex pattern, re.search is used to match the attribute value.

Example

>>> PatternSelector(re.compile(r"[0-9]+"))

matches all elements with text content containing at least one digit.

Example

>>> <div>Hello World 123</div> ✔️
>>> <div>Hello World</div> ❌

Parameters

pattern: str | Pattern

Pattern to match text of the element. Can be a string for exact match or Pattern for any more complex regular expressions.

Notes

Element does not match the pattern if it has any children. Only leaf nodes can be returned by PatternSelector find methods.

pattern: Pattern[str] | str
find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

__init__(pattern: Pattern[str] | str) None
class UniversalSelector[source]

Bases: SoupSelector, SelectableCSS

Selector representing a wildcard pattern, that matches all elements in the html page.

Example

>>> UniversalSelector()

CSS counterpart can be represented as:

Example

>>> *

And can be retrieved with css property.

Example

>>> UniversalSelector().css
"*"

Notes

For more information on universal selector, see:

https://developer.mozilla.org/en-US/docs/Web/CSS/Universal_selectors

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

property css: str

Returns wildcard css selector matching all elements in the markup.

__init__() None
class AnyTagSelector(**kwargs)[source]

Bases: UniversalSelector

Alias for UniversalSelector class. Deprecated component.

class SelfSelector[source]

Bases: SoupSelector

Selector matching only the element itself. Convenience component that can be used for compatibility.

Example

>>> SelfSelector()

always matches the tag that is passed to the find methods.

Notes

Can be used in user-defined model for scope if element itself is the scope.

find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

class ExpressionSelector(f: Callable[[IElement], bool])[source]

Bases: SoupSelector

Selector that matches elements based on a user-defined function (predicate), that is used as filter for element object.

Applies predicate to each element and returns those that satisfy the condition.

Parameters

fCallable[[IElement], bool]

A user-defined function (predicate) that determines whether the element should be selected.

Examples

>>> selector = ExpressionSelector(lambda x: x.name not in {"a", "div"})
... selector.find(soup)

To perform operations on underlying node, use IElement.get() method or IElement.node attribute.

Example

>>> selector = ExpressionSelector(lambda x: 'widget' in x.node['class'])

For SoupElement object, that wraps bs4.Tag.

Notes

Any exceptions should be handled inside provided function. If raised, it will be propagated to the caller.

f: Callable[[IElement], bool]
find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]

Finds all elements matching selector in provided IElement.

Parameters

tagIElement

Any IElement object to search within.

recursivebool, optional

Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.

limitint, optional

Specifies maximum number of elements to return. By default None, all found elements are returned.

Returns

list[IElement]

List of IElement objects matching selector. If none found, the list is empty.

__init__(f: Callable[[IElement], bool]) None