{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Testing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`soupsavvy` includes utilities for testing selectors via the `soupsavvy.testing` subpackage, enabling you to validate selectors and ensure they handle various edge cases effectively." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generators\n", "\n", "This subpackage offers HTML code generators for testing, allowing you to create controlled HTML structures to simulate different scenarios and test the accuracy of your selectors." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Attribute Generator\n", "\n", "The Attribute Generator creates string representations of HTML attributes. While limited on its own, it becomes useful when combined with the `TagGenerator` for more customizable HTML generation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Empty Attribute\n", "\n", "If only the first parameter (the attribute name) is passed to `AttributeGenerator`, it generates an attribute with an empty value." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from soupsavvy.testing import AttributeGenerator\n", "\n", "generator = AttributeGenerator(\"class\")\n", "generator.generate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Constant value\n", "\n", "By passing the `value` parameter, you can set a specific value for the attribute." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from soupsavvy.testing import AttributeGenerator\n", "\n", "generator = AttributeGenerator(\"class\", value=\"book\")\n", "generator.generate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Templates\n", "\n", "Templates add another layer of customization by generating strings based on predefined logic, useful for creating dynamic and varied content in your test HTML." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### ChoiceTemplate\n", "\n", "The `ChoiceTemplate` allows you to generate a string by randomly selecting from a provided list of strings. For reproducibility, the `seed` parameter can be set to ensure the same output is generated across multiple runs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from soupsavvy.testing import AttributeGenerator, ChoiceTemplate\n", "\n", "template = ChoiceTemplate([\"book\", \"article\", \"blog\"], seed=42)\n", "generator = AttributeGenerator(\"class\", value=template)\n", "generator.generate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### RandomTemplate\n", "\n", "The `RandomTemplate` generates a string from randomly selected ASCII characters. The `length` parameter defines the string length (default is 4). Like `ChoiceTemplate`, the `seed` parameter ensures consistent output if needed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from soupsavvy.testing import AttributeGenerator, RandomTemplate\n", "\n", "template = RandomTemplate(length=5, seed=42)\n", "generator = AttributeGenerator(\"class\", value=template)\n", "generator.generate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### User-defined Templates\n", "\n", "For advanced customization, you can create your own templates by subclassing `soupsavvy.testing.BaseTemplate` and implementing the `generate` method to return a string based on your specific logic.\n", "\n", "Here’s how you can define a custom template:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from soupsavvy.testing import BaseTemplate, TagGenerator\n", "\n", "\n", "class CustomTemplate(BaseTemplate):\n", " def __init__(self, connection): ...\n", "\n", " def generate(self):\n", " # connects to external service\n", " result = \"Hello from somewhere!\"\n", " return result\n", "\n", "\n", "template = CustomTemplate(connection=None)\n", "generator = TagGenerator(\"span\", text=template)\n", "generator.generate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TagGenerator\n", "\n", "`TagGenerator` is the primary tool for generating HTML tags with customizable attributes, text, and child elements." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Name\n", "\n", "The `name` parameter is required and specifies the tag name, such as `div`, `span`, or `p`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from soupsavvy.testing import TagGenerator\n", "\n", "generator = TagGenerator(\"div\")\n", "generator.generate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Attributes\n", "\n", "The `attrs` parameter allows you to define the attributes of the tag. It accepts an iterable containing:\n", "\n", "- `str`: Just the attribute name, resulting in an empty value.\n", "- `tuple`: A pair where the first element is the attribute name and the second is the value.\n", "- `AttributeGenerator`: An object that dynamically generates attribute values.\n", "\n", "Attributes of `TagGenerator` must be unique, trying to define it with duplicate attributes will raise an error." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from soupsavvy.testing import AttributeGenerator, RandomTemplate, TagGenerator\n", "\n", "attrs = (\n", " \"href\",\n", " (\"class\", \"link\"),\n", " (\"data-id\", RandomTemplate(seed=42)),\n", " AttributeGenerator(\"title\", value=\"buy\"),\n", ")\n", "generator = TagGenerator(\"a\", attrs=attrs)\n", "generator.generate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Children\n", "\n", "The `children` parameter lets you specify the tag's children, which must be `TagGenerator` objects. If no children are specified, the tag is created without any." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from soupsavvy.testing import TagGenerator\n", "\n", "child_generator = TagGenerator(\"span\")\n", "generator = TagGenerator(\n", " \"div\",\n", " attrs=[\"class\"],\n", " children=[child_generator],\n", ")\n", "generator.generate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Self-closing Tags\n", "\n", "Self-closing tags like `br` are automatically handled. Defining a self-closing tag with children will raise an error." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from soupsavvy.testing import TagGenerator\n", "\n", "generator = TagGenerator(\"br\")\n", "generator.generate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Text\n", "\n", "The `text` parameter allows you to add text content to the tag. This can be a static string or dynamically generated using templates." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from soupsavvy.testing import TagGenerator\n", "\n", "generator = TagGenerator(\"span\", text=\"Hello, World!\")\n", "generator.generate()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from soupsavvy.testing import ChoiceTemplate, TagGenerator\n", "\n", "template = ChoiceTemplate([\"Hello, World!\", \"Hello, blog!\"], seed=42)\n", "generator = TagGenerator(\"span\", text=template)\n", "generator.generate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Usage\n", "\n", "Let’s explore how to use these generators in practice. In this example, we’ll test a selector targeting `span` elements with text starting with \"Hello\" nested inside `div` elements that have both `class=\"book\"` and `role=\"section\"` attributes.\n", "\n", "We dynamically generate the HTML content using `TagGenerator` and verify if the selector accurately identifies the intended elements." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import re\n", "\n", "from bs4 import BeautifulSoup\n", "\n", "from soupsavvy import (\n", " AttributeSelector,\n", " ClassSelector,\n", " PatternSelector,\n", " TypeSelector,\n", " to_soupsavvy,\n", ")\n", "from soupsavvy.testing import AttributeGenerator, ChoiceTemplate, TagGenerator\n", "\n", "# 1: define the generator\n", "template = ChoiceTemplate([\"Hello, World!\", \"Hello, blog!\"], seed=42)\n", "child_generator = TagGenerator(\"span\", text=template)\n", "generator = TagGenerator(\n", " \"div\",\n", " attrs=[\n", " AttributeGenerator(\"class\", value=\"book\"),\n", " AttributeGenerator(\"role\", value=\"section\"),\n", " ],\n", " children=[child_generator],\n", ")\n", "\n", "# 2: define the selector\n", "selector = (\n", " TypeSelector(\"div\")\n", " & ClassSelector(\"book\")\n", " & AttributeSelector(\"role\", value=\"section\")\n", ") > (TypeSelector(\"span\") & PatternSelector(re.compile(r\"^Hello\")))\n", "\n", "# 3: generate the soup\n", "text = generator.generate()\n", "soup = BeautifulSoup(text, features=\"lxml\")\n", "element = to_soupsavvy(soup)\n", "\n", "# 4: test selector on generated soup\n", "selector.find(element)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "By leveraging these generators, you can easily create dynamic HTML structures to validate your `soupsavvy` selectors. This allows you to test complex selectors in a controlled environment, ensuring they behave as expected.\n", "\n", "**Enjoy `soupsavvy` and leave us feedback!** \n", "**Happy scraping!**" ] } ], "metadata": { "kernelspec": { "display_name": "soupsavvy", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.8" } }, "nbformat": 4, "nbformat_minor": 2 }