Testing

soupsavvy includes utilities for testing selectors via the soupsavvy.testing subpackage, enabling you to validate selectors and ensure they handle various edge cases effectively.

Generators

This subpackage offers HTML code generators for testing, allowing you to create controlled HTML structures to simulate different scenarios and test the accuracy of your selectors.

Attribute Generator

The Attribute Generator creates string representations of HTML attributes. While limited on its own, it becomes useful when combined with the TagGenerator for more customizable HTML generation.

Empty Attribute

If only the first parameter (the attribute name) is passed to AttributeGenerator, it generates an attribute with an empty value.

from soupsavvy.testing import AttributeGenerator

generator = AttributeGenerator("class")
generator.generate()
'class=""'

Constant value

By passing the value parameter, you can set a specific value for the attribute.

from soupsavvy.testing import AttributeGenerator

generator = AttributeGenerator("class", value="book")
generator.generate()
'class="book"'

Templates

Templates add another layer of customization by generating strings based on predefined logic, useful for creating dynamic and varied content in your test HTML.

ChoiceTemplate

The ChoiceTemplate allows you to generate a string by randomly selecting from a provided list of strings. For reproducibility, the seed parameter can be set to ensure the same output is generated across multiple runs.

from soupsavvy.testing import AttributeGenerator, ChoiceTemplate

template = ChoiceTemplate(["book", "article", "blog"], seed=42)
generator = AttributeGenerator("class", value=template)
generator.generate()
'class="blog"'

RandomTemplate

The RandomTemplate generates a string from randomly selected ASCII characters. The length parameter defines the string length (default is 4). Like ChoiceTemplate, the seed parameter ensures consistent output if needed.

from soupsavvy.testing import AttributeGenerator, RandomTemplate

template = RandomTemplate(length=5, seed=42)
generator = AttributeGenerator("class", value=template)
generator.generate()
'class="NbrnT"'

User-defined Templates

For advanced customization, you can create your own templates by subclassing soupsavvy.testing.BaseTemplate and implementing the generate method to return a string based on your specific logic.

Here’s how you can define a custom template:

from soupsavvy.testing import BaseTemplate, TagGenerator


class CustomTemplate(BaseTemplate):
    def __init__(self, connection): ...

    def generate(self):
        # connects to external service
        result = "Hello from somewhere!"
        return result


template = CustomTemplate(connection=None)
generator = TagGenerator("span", text=template)
generator.generate()
'<span>Hello from somewhere!</span>'

TagGenerator

TagGenerator is the primary tool for generating HTML tags with customizable attributes, text, and child elements.

Name

The name parameter is required and specifies the tag name, such as div, span, or p.

from soupsavvy.testing import TagGenerator

generator = TagGenerator("div")
generator.generate()
'<div></div>'

Attributes

The attrs parameter allows you to define the attributes of the tag. It accepts an iterable containing:

  • str: Just the attribute name, resulting in an empty value.

  • tuple: A pair where the first element is the attribute name and the second is the value.

  • AttributeGenerator: An object that dynamically generates attribute values.

Attributes of TagGenerator must be unique, trying to define it with duplicate attributes will raise an error.

from soupsavvy.testing import AttributeGenerator, RandomTemplate, TagGenerator

attrs = (
    "href",
    ("class", "link"),
    ("data-id", RandomTemplate(seed=42)),
    AttributeGenerator("title", value="buy"),
)
generator = TagGenerator("a", attrs=attrs)
generator.generate()
'<a href="" class="link" data-id="Nbrn" title="buy"></a>'

Children

The children parameter lets you specify the tag’s children, which must be TagGenerator objects. If no children are specified, the tag is created without any.

from soupsavvy.testing import TagGenerator

child_generator = TagGenerator("span")
generator = TagGenerator(
    "div",
    attrs=["class"],
    children=[child_generator],
)
generator.generate()
'<div class=""><span></span></div>'

Self-closing Tags

Self-closing tags like br are automatically handled. Defining a self-closing tag with children will raise an error.

from soupsavvy.testing import TagGenerator

generator = TagGenerator("br")
generator.generate()
'<br/>'

Text

The text parameter allows you to add text content to the tag. This can be a static string or dynamically generated using templates.

from soupsavvy.testing import TagGenerator

generator = TagGenerator("span", text="Hello, World!")
generator.generate()
'<span>Hello, World!</span>'
from soupsavvy.testing import ChoiceTemplate, TagGenerator

template = ChoiceTemplate(["Hello, World!", "Hello, blog!"], seed=42)
generator = TagGenerator("span", text=template)
generator.generate()
'<span>Hello, World!</span>'

Usage

Let’s explore how to use these generators in practice. In this example, we’ll test a selector targeting span elements with text starting with “Hello” nested inside div elements that have both class="book" and role="section" attributes.

We dynamically generate the HTML content using TagGenerator and verify if the selector accurately identifies the intended elements.

import re

from bs4 import BeautifulSoup

from soupsavvy import (
    AttributeSelector,
    ClassSelector,
    PatternSelector,
    TypeSelector,
    to_soupsavvy,
)
from soupsavvy.testing import AttributeGenerator, ChoiceTemplate, TagGenerator

# 1: define the generator
template = ChoiceTemplate(["Hello, World!", "Hello, blog!"], seed=42)
child_generator = TagGenerator("span", text=template)
generator = TagGenerator(
    "div",
    attrs=[
        AttributeGenerator("class", value="book"),
        AttributeGenerator("role", value="section"),
    ],
    children=[child_generator],
)

# 2: define the selector
selector = (
    TypeSelector("div")
    & ClassSelector("book")
    & AttributeSelector("role", value="section")
) > (TypeSelector("span") & PatternSelector(re.compile(r"^Hello")))

# 3: generate the soup
text = generator.generate()
soup = BeautifulSoup(text, features="lxml")
element = to_soupsavvy(soup)

# 4: test selector on generated soup
selector.find(element)
SoupElement(<span>Hello, World!</span>)

Conclusion

By leveraging these generators, you can easily create dynamic HTML structures to validate your soupsavvy selectors. This allows you to test complex selectors in a controlled environment, ensuring they behave as expected.

Enjoy soupsavvy and leave us feedback!
Happy scraping!