pstreams-providers/.docs/content/3.in-depth/4.sources-and-embeds.md

9.3 KiB

Sources vs Embeds

Understanding the difference between sources and embeds is crucial for building scrapers effectively. They work together in a two-stage pipeline to extract playable video streams.

The Two-Stage Pipeline

User Request → Source Scraper → What did source find?
                                      ↓
                               ┌─────────────┐
                               ↓             ↓
                        Direct Stream    Embed URLs
                               ↓             ↓
                          Play Video    Embed Scraper
                                            ↓
                                     Extract Stream
                                            ↓
                                      Play Video

Flow Breakdown:

  1. User requests content (movie/TV show)
  2. Source scraper searches the target website
  3. Source returns either:
    • Direct streams → Ready to play immediately
    • Embed URLs → Need further processing
  4. Embed scraper (if needed) extracts streams from player URLs
  5. Final result → Playable video stream

Sources: The Content Finders

Sources are the first stage - they find content on websites and return either:

  1. Direct video streams (ready to play)
  2. Embed URLs that need further processing

Example: Autoembed Source

// From src/providers/sources/autoembed.ts
async function comboScraper(ctx: ShowScrapeContext | MovieScrapeContext): Promise<SourcererOutput> {
  // 1. Call an API to find video sources
  const data = await ctx.proxiedFetcher(`/api/getVideoSource`, {
    baseUrl: 'https://tom.autoembed.cc',
    query: { type: mediaType, id },
  });

  // 2. Return embed URLs for further processing
  return {
    embeds: [
      {
        embedId: 'autoembed-english', // Points to an embed scraper
        url: data.videoSource, // URL that embed will process
      },
    ],
  };
}

What this source does:

  • Queries an API with TMDB ID
  • Gets back a video source URL
  • Returns it as an embed for the autoembed-english embed scraper to handle

Example: Catflix Source

// From src/providers/sources/catflix.ts
async function comboScraper(ctx: ShowScrapeContext | MovieScrapeContext): Promise<SourcererOutput> {
  // 1. Build URL to the movie/show page
  const watchPageUrl = `${baseUrl}/movie/${mediaTitle}-${movieId}`;

  // 2. Scrape the page for embedded player URLs
  const watchPage = await ctx.proxiedFetcher(watchPageUrl);
  const $ = load(watchPage);

  // 3. Extract and decode the embed URL
  const mainOriginMatch = scriptData.data.match(/main_origin = "(.*?)";/);
  const decodedUrl = atob(mainOriginMatch[1]);

  // 4. Return embed URL for turbovid embed to process
  return {
    embeds: [
      {
        embedId: 'turbovid', // Points to turbovid embed scraper
        url: decodedUrl, // Turbovid player URL
      },
    ],
  };
}

What this source does:

  • Scrapes a streaming website
  • Finds encoded embed player URLs in the page source
  • Decodes the URL and returns it for the turbovid embed scraper

Embeds: The Stream Extractors

Embeds are the second stage - they take URLs from sources and extract the actual playable video streams. Each embed type knows how to handle a specific player or service.

Example: Autoembed Embed (Simple)

// From src/providers/embeds/autoembed.ts
function embed(provider: { id: string; rank: number }) {
  return makeEmbed({
    id: provider.id,
    name: provider.id.split('-').map(word => word[0].toUpperCase() + word.slice(1)).join(' '),
    rank: provider.rank,
    flags: [flags.CORS_ALLOWED], // Embed flags match stream flags
    async scrape(ctx) {
      // The URL from the source is already a direct HLS playlist
      return {
        stream: [{
          id: 'primary',
          type: 'hls',
          playlist: ctx.url,  // Use the URL directly as HLS playlist
          flags: [flags.CORS_ALLOWED], // Stream flags
          captions: []
        }]
      };
    },
  });
}

What this embed does:

  • Takes the URL from autoembed source
  • Treats it as a direct HLS playlist (no further processing needed)
  • Returns it as a playable stream
  • Note: Embed flags now match stream flags for consistent filtering

Example: Turbovid Embed (Complex)

// From src/providers/embeds/turbovid.ts
export const turbovidScraper = makeEmbed({
  id: 'turbovid',
  name: 'Turbovid',
  rank: 180,
  flags: [flags.CORS_ALLOWED], // Embed flags match stream flags
  async scrape(ctx) {
    // 1. Fetch the turbovid player page
    const embedPage = await ctx.proxiedFetcher(ctx.url);

    // 2. Extract encryption keys from the page
    const apkey = embedPage.match(/const\s+apkey\s*=\s*"(.*?)";/)?.[1];
    const xxid = embedPage.match(/const\s+xxid\s*=\s*"(.*?)";/)?.[1];

    // 3. Get decryption key from API
    const encodedJuiceKey = JSON.parse(
      await ctx.proxiedFetcher('/api/cucked/juice_key', { baseUrl })
    ).juice;

    // 4. Get encrypted playlist data
    const data = JSON.parse(
      await ctx.proxiedFetcher('/api/cucked/the_juice_v2/', {
        baseUrl, query: { [apkey]: xxid }
      })
    ).data;

    // 5. Decrypt the playlist URL
    const playlist = decrypt(data, atob(encodedJuiceKey));

    // 6. Return proxied stream (handles CORS/headers)
    return {
      stream: [{
        type: 'hls',
        id: 'primary',
        playlist: createM3U8ProxyUrl(playlist, ctx.features, streamHeaders),
        headers: streamHeaders,
        flags: [flags.CORS_ALLOWED], // Stream flags match embed flags
        captions: []
      }]
    };
  },
});

What this embed does:

  • Takes turbovid player URL from catflix source
  • Performs complex extraction: fetches page → gets keys → decrypts data
  • Returns the final HLS playlist with proper proxy handling
  • Note: Embed flags now match stream flags for consistent filtering

Key Differences

Sources Embeds
Find content on websites Extract streams from players
Return embed URLs OR direct streams Always return direct streams
Handle website navigation/search Handle player-specific extraction
Can return multiple server options Process one specific player type
Example: "Find Avengers on Catflix" Example: "Extract stream from Turbovid player"

Why This Separation?

1. Reusability

Multiple sources can use the same embed:

// Both catflix and other sources can return turbovid embeds
{ embedId: 'turbovid', url: 'https://turbovid.com/player123' }

2. Multiple Server Options

Sources can provide backup servers:

return {
  embeds: [
    { embedId: 'turbovid', url: 'https://turbovid.com/player123' },
    { embedId: 'vidcloud', url: 'https://vidcloud.co/embed456' },
    { embedId: 'dood', url: 'https://dood.watch/789' },
  ],
};

3. Language/Quality Variants

Sources can offer different options:

return {
  embeds: [
    { embedId: 'autoembed-english', url: streamUrl },
    { embedId: 'autoembed-spanish', url: streamUrlEs },
    { embedId: 'autoembed-hindi', url: streamUrlHi },
  ],
};

4. Specialization

  • Sources specialize in website structures and search
  • Embeds specialize in player technologies and decryption

How They Work Together

Flow Example: Finding "Spirited Away"

  1. Source (catflix):

    • Searches catflix.su for "Spirited Away"
    • Finds movie page with embedded player
    • Extracts turbovid URL: https://turbovid.com/embed/abc123
    • Returns: { embedId: 'turbovid', url: 'https://turbovid.com/embed/abc123' }
  2. Embed (turbovid):

    • Receives the turbovid URL
    • Scrapes the player page for encryption keys
    • Decrypts the actual HLS playlist URL
    • Returns: { stream: [{ playlist: 'https://cdn.example.com/movie.m3u8' }] }
  3. Result: User can now play the video stream

Error Handling Chain

If the embed fails to extract a stream:

// Source provides multiple backup options
return {
  embeds: [
    { embedId: 'turbovid', url: url1 }, // Try first
    { embedId: 'mixdrop', url: url2 }, // Fallback 1
    { embedId: 'dood', url: url3 }, // Fallback 2
  ],
};

The system tries each embed in rank order until one succeeds.

Best Practices

For Sources:

  • Provide multiple embed options when possible
  • Use descriptive embed IDs that match existing embeds
  • Handle both movies and TV shows (combo scraper pattern)
  • Return direct streams when embed processing isn't needed

For Embeds:

  • Focus on one player type per embed
  • Handle errors gracefully with clear error messages
  • Use proxy functions for protected streams
  • Include proper headers and flags at both embed and stream levels
  • Embed flags should match stream flags for consistent filtering behavior

Registration:

// In src/providers/all.ts
export function gatherAllSources(): Array<Sourcerer> {
  return [catflixScraper, autoembedScraper /* ... */];
}

export function gatherAllEmbeds(): Array<Embed> {
  return [turbovidScraper, autoembedEnglishScraper /* ... */];
}

Both sources and embeds must be registered in all.ts to be available for use.