# Quick Start Guide

Get started with web scraping in minutes!

## 1. Installation

```bash
# Create virtual environment
python -m venv venv

# Activate virtual environment
# Windows:
venv\Scripts\activate
# Unix/MacOS:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
copy .env.example .env  # Windows
# or
cp .env.example .env    # Unix/MacOS
```

## 2. Basic Usage

### Command Line Interface

Scrape any website using the CLI:

```bash
# Basic scraping
python main.py https://example.com

# Use Selenium for JavaScript sites
python main.py https://example.com -m selenium

# Use Jina AI for text extraction
python main.py https://example.com -m jina -o output.txt

# Enable verbose logging
python main.py https://example.com -v
```

### Python Scripts

#### Simple Static Page Scraping

```python
from scrapers.basic_scraper import BasicScraper

# Scrape a static website
with BasicScraper() as scraper:
    result = scraper.scrape("https://quotes.toscrape.com/")
    
    if result["success"]:
        soup = result["soup"]
        
        # Extract quotes
        for quote in soup.select(".quote"):
            text = quote.select_one(".text").get_text()
            author = quote.select_one(".author").get_text()
            print(f"{text} - {author}")
```

#### JavaScript-Heavy Websites

```python
from scrapers.selenium_scraper import SeleniumScraper

# Scrape dynamic content
with SeleniumScraper(headless=True) as scraper:
    result = scraper.scrape(
        "https://quotes.toscrape.com/js/",
        wait_for=".quote"  # Wait for this element to load
    )
    
    if result["success"]:
        print(f"Page title: {result['title']}")
        # Process the data...
```

#### AI-Powered Text Extraction

```python
from scrapers.jina_scraper import JinaScraper

# Extract text intelligently with AI
with JinaScraper() as scraper:
    result = scraper.scrape(
        "https://news.ycombinator.com/",
        return_format="markdown"
    )
    
    if result["success"]:
        print(result["content"])
```

## 3. Save Your Data

```python
from data_processors.storage import DataStorage

storage = DataStorage()

# Save as JSON
data = {"title": "Example", "content": "Hello World"}
storage.save_json(data, "output.json")

# Save as CSV
data_list = [
    {"name": "John", "age": 30},
    {"name": "Jane", "age": 25}
]
storage.save_csv(data_list, "people.csv")

# Save as text
storage.save_text("Some text content", "output.txt")
```

## 4. Run Examples

Try the included examples:

```bash
# Basic scraping example
python examples/basic_example.py

# Selenium example
python examples/selenium_example.py

# Advanced tools example (requires API keys)
python examples/advanced_example.py
```

## 5. Common Patterns

### Extract Links from a Page

```python
from scrapers.basic_scraper import BasicScraper

with BasicScraper() as scraper:
    result = scraper.scrape("https://example.com")
    
    if result["success"]:
        links = scraper.extract_links(
            result["soup"],
            base_url="https://example.com"
        )
        
        for link in links:
            print(link)
```

### Click Buttons and Fill Forms

```python
from scrapers.selenium_scraper import SeleniumScraper

with SeleniumScraper(headless=False) as scraper:
    scraper.scrape("https://example.com/login")
    
    # Fill form fields
    scraper.fill_form("#username", "myuser")
    scraper.fill_form("#password", "mypass")
    
    # Click submit button
    scraper.click_element("#submit-btn")
    
    # Take screenshot
    scraper.take_screenshot("logged_in.png")
```

### Validate and Clean Data

```python
from data_processors.validator import DataValidator

# Validate email
is_valid = DataValidator.validate_email("test@example.com")

# Clean text
cleaned = DataValidator.clean_text("  Multiple   spaces  ")

# Validate required fields
data = {"name": "John", "email": "john@example.com"}
validation = DataValidator.validate_required_fields(
    data, 
    required_fields=["name", "email", "phone"]
)

if not validation["valid"]:
    print(f"Missing: {validation['missing_fields']}")
```

## 6. Testing

Run the test suite:

```bash
# Run all tests
pytest tests/ -v

# Run specific test
pytest tests/test_basic_scraper.py -v

# Run with coverage
pytest tests/ --cov=scrapers --cov=utils --cov=data_processors
```

## 7. Advanced Features

### Deep Crawling with Firecrawl

```python
from scrapers.firecrawl_scraper import FirecrawlScraper

with FirecrawlScraper() as scraper:
    result = scraper.crawl(
        "https://example.com",
        max_depth=3,
        max_pages=50,
        include_patterns=["*/blog/*"],
        exclude_patterns=["*/admin/*"]
    )
    
    if result["success"]:
        print(f"Crawled {result['total_pages']} pages")
        for page in result["pages"]:
            print(f"- {page['url']}")
```

### Complex Workflows with AgentQL

```python
from scrapers.agentql_scraper import AgentQLScraper

with AgentQLScraper() as scraper:
    # Automated login
    result = scraper.login_workflow(
        url="https://example.com/login",
        username="user@example.com",
        password="password123",
        username_field="input[name='email']",
        password_field="input[name='password']",
        submit_button="button[type='submit']"
    )
```

### Exploratory Tasks with Multion

```python
from scrapers.multion_scraper import MultionScraper

with MultionScraper() as scraper:
    # Find best deal automatically
    result = scraper.find_best_deal(
        search_query="noise cancelling headphones",
        filters={
            "max_price": 200,
            "rating": "4.5+",
            "brand": "Sony"
        }
    )
    
    if result["success"]:
        print(result["final_result"])
```

## 8. Tips & Best Practices

1. **Always use context managers** (`with` statement) to ensure proper cleanup
2. **Respect rate limits** - the default is 2 seconds between requests
3. **Check robots.txt** before scraping a website
4. **Use appropriate User-Agent** headers
5. **Handle errors gracefully** - the scrapers include built-in retry logic
6. **Validate and clean data** before storing it
7. **Log everything** for debugging purposes

## 9. Troubleshooting

### Issue: Selenium driver not found

```bash
# The project uses webdriver-manager to auto-download drivers
# If you have issues, manually install ChromeDriver:
# 1. Download from https://chromedriver.chromium.org/
# 2. Add to your system PATH
```

### Issue: Import errors

```bash
# Make sure you've activated the virtual environment
# and installed all dependencies
pip install -r requirements.txt
```

### Issue: API keys not working

```bash
# Make sure you've copied .env.example to .env
# and added your actual API keys
cp .env.example .env
# Edit .env with your keys
```

## 10. Next Steps

- Explore the `examples/` directory for more use cases
- Read the full `README.md` for detailed documentation
- Check out the `tests/` directory to see testing patterns
- Customize `config.py` for your specific needs
- Build your own scrapers extending `BaseScraper`

Happy Scraping! 🚀