pstreams-providers/.docs/content/3.in-depth/4.sources-and-embeds.md

# Sources vs Embeds

Understanding the difference between sources and embeds is crucial for building scrapers effectively. They work together in a two-stage pipeline to extract playable video streams.

## The Two-Stage Pipeline

```
User Request → Source Scraper → What did source find?
                                      ↓
                               ┌─────────────┐
                               ↓             ↓
                        Direct Stream    Embed URLs
                               ↓             ↓
                          Play Video    Embed Scraper
                                            ↓
                                     Extract Stream
                                            ↓
                                      Play Video
```

**Flow Breakdown:**
1. **User requests** content (movie/TV show)
2. **Source scraper** searches the target website
3. **Source returns** either:
   - **Direct streams** → Ready to play immediately
   - **Embed URLs** → Need further processing
4. **Embed scraper** (if needed) extracts streams from player URLs
5. **Final result** → Playable video stream

## Sources: The Content Finders

**Sources** are the first stage - they find content on websites and return either:
1. **Direct video streams** (ready to play)
2. **Embed URLs** that need further processing

### Example: Autoembed Source

```typescript
// From src/providers/sources/autoembed.ts
async function comboScraper(ctx: ShowScrapeContext | MovieScrapeContext): Promise<SourcererOutput> {
  // 1. Call an API to find video sources
  const data = await ctx.proxiedFetcher(`/api/getVideoSource`, {
    baseUrl: 'https://tom.autoembed.cc',
    query: { type: mediaType, id }
  });

  // 2. Return embed URLs for further processing
  return {
    embeds: [{
      embedId: 'autoembed-english',  // Points to an embed scraper
      url: data.videoSource          // URL that embed will process
    }]
  };
}
```

**What this source does:**
- Queries an API with TMDB ID
- Gets back a video source URL
- Returns it as an embed for the `autoembed-english` embed scraper to handle

### Example: Catflix Source

```typescript
// From src/providers/sources/catflix.ts
async function comboScraper(ctx: ShowScrapeContext | MovieScrapeContext): Promise<SourcererOutput> {
  // 1. Build URL to the movie/show page
  const watchPageUrl = `${baseUrl}/movie/${mediaTitle}-${movieId}`;

  // 2. Scrape the page for embedded player URLs
  const watchPage = await ctx.proxiedFetcher(watchPageUrl);
  const $ = load(watchPage);

  // 3. Extract and decode the embed URL
  const mainOriginMatch = scriptData.data.match(/main_origin = "(.*?)";/);
  const decodedUrl = atob(mainOriginMatch[1]);

  // 4. Return embed URL for turbovid embed to process
  return {
    embeds: [{
      embedId: 'turbovid',          // Points to turbovid embed scraper
      url: decodedUrl               // Turbovid player URL
    }]
  };
}
```

**What this source does:**
- Scrapes a streaming website
- Finds encoded embed player URLs in the page source
- Decodes the URL and returns it for the `turbovid` embed scraper

## Embeds: The Stream Extractors

**Embeds** are the second stage - they take URLs from sources and extract the actual playable video streams. Each embed type knows how to handle a specific player or service.

### Example: Autoembed Embed (Simple)

```typescript
// From src/providers/embeds/autoembed.ts
async scrape(ctx) {
  // The URL from the source is already a direct HLS playlist
  return {
    stream: [{
      id: 'primary',
      type: 'hls',
      playlist: ctx.url,  // Use the URL directly as HLS playlist
      flags: [flags.CORS_ALLOWED],
      captions: []
    }]
  };
}
```

**What this embed does:**
- Takes the URL from autoembed source
- Treats it as a direct HLS playlist (no further processing needed)
- Returns it as a playable stream

### Example: Turbovid Embed (Complex)

```typescript
// From src/providers/embeds/turbovid.ts
async scrape(ctx) {
  // 1. Fetch the turbovid player page
  const embedPage = await ctx.proxiedFetcher(ctx.url);

  // 2. Extract encryption keys from the page
  const apkey = embedPage.match(/const\s+apkey\s*=\s*"(.*?)";/)?.[1];
  const xxid = embedPage.match(/const\s+xxid\s*=\s*"(.*?)";/)?.[1];

  // 3. Get decryption key from API
  const encodedJuiceKey = JSON.parse(
    await ctx.proxiedFetcher('/api/cucked/juice_key', { baseUrl })
  ).juice;

  // 4. Get encrypted playlist data
  const data = JSON.parse(
    await ctx.proxiedFetcher('/api/cucked/the_juice_v2/', {
      baseUrl, query: { [apkey]: xxid }
    })
  ).data;

  // 5. Decrypt the playlist URL
  const playlist = decrypt(data, atob(encodedJuiceKey));

  // 6. Return proxied stream (handles CORS/headers)
  return {
    stream: [{
      type: 'hls',
      id: 'primary',
      playlist: createM3U8ProxyUrl(playlist, ctx.features, streamHeaders),
      headers: streamHeaders,
      flags: [], captions: []
    }]
  };
}
```

**What this embed does:**
- Takes turbovid player URL from catflix source
- Performs complex extraction: fetches page → gets keys → decrypts data
- Returns the final HLS playlist with proper proxy handling

## Key Differences

| Sources | Embeds |
|---------|--------|
| **Find content** on websites | **Extract streams** from players |
| Return embed URLs OR direct streams | Always return direct streams |
| Handle website navigation/search | Handle player-specific extraction |
| Can return multiple server options | Process one specific player type |
| Example: "Find Avengers on Catflix" | Example: "Extract stream from Turbovid player" |

## Why This Separation?

### 1. **Reusability**
Multiple sources can use the same embed:
```typescript
// Both catflix and other sources can return turbovid embeds
{ embedId: 'turbovid', url: 'https://turbovid.com/player123' }
```

### 2. **Multiple Server Options**
Sources can provide backup servers:
```typescript
return {
  embeds: [
    { embedId: 'turbovid', url: 'https://turbovid.com/player123' },
    { embedId: 'vidcloud', url: 'https://vidcloud.co/embed456' },
    { embedId: 'dood', url: 'https://dood.watch/789' }
  ]
};
```

### 3. **Language/Quality Variants**
Sources can offer different options:
```typescript
return {
  embeds: [
    { embedId: 'autoembed-english', url: streamUrl },
    { embedId: 'autoembed-spanish', url: streamUrlEs },
    { embedId: 'autoembed-hindi', url: streamUrlHi }
  ]
};
```

### 4. **Specialization**
- **Sources** specialize in website structures and search
- **Embeds** specialize in player technologies and decryption

## How They Work Together

### Flow Example: Finding "Spirited Away"

1. **Source (catflix)**:
   - Searches catflix.su for "Spirited Away"
   - Finds movie page with embedded player
   - Extracts turbovid URL: `https://turbovid.com/embed/abc123`
   - Returns: `{ embedId: 'turbovid', url: 'https://turbovid.com/embed/abc123' }`

2. **Embed (turbovid)**:
   - Receives the turbovid URL
   - Scrapes the player page for encryption keys
   - Decrypts the actual HLS playlist URL
   - Returns: `{ stream: [{ playlist: 'https://cdn.example.com/movie.m3u8' }] }`

3. **Result**: User can now play the video stream

### Error Handling Chain

If the embed fails to extract a stream:
```typescript
// Source provides multiple backup options
return {
  embeds: [
    { embedId: 'turbovid', url: url1 },     // Try first
    { embedId: 'mixdrop', url: url2 },      // Fallback 1
    { embedId: 'dood', url: url3 }          // Fallback 2
  ]
};
```

The system tries each embed in rank order until one succeeds.

## Best Practices

### For Sources:
- Provide multiple embed options when possible
- Use descriptive embed IDs that match existing embeds
- Handle both movies and TV shows (combo scraper pattern)
- Return direct streams when embed processing isn't needed

### For Embeds:
- Focus on one player type per embed
- Handle errors gracefully with clear error messages
- Use proxy functions for protected streams
- Include proper headers and flags

### Registration:
```typescript
// In src/providers/all.ts
export function gatherAllSources(): Array<Sourcerer> {
  return [catflixScraper, autoembedScraper, /* ... */];
}

export function gatherAllEmbeds(): Array<Embed> {
  return [turbovidScraper, autoembedEnglishScraper, /* ... */];
}
```

Both sources and embeds must be registered in `all.ts` to be available for use.