mirror of
https://github.com/p-stream/providers.git
synced 2026-05-19 00:22:03 +00:00
272 lines
8.3 KiB
Markdown
272 lines
8.3 KiB
Markdown
# Sources vs Embeds
|
|
|
|
Understanding the difference between sources and embeds is crucial for building scrapers effectively. They work together in a two-stage pipeline to extract playable video streams.
|
|
|
|
## The Two-Stage Pipeline
|
|
|
|
```
|
|
User Request → Source Scraper → What did source find?
|
|
↓
|
|
┌─────────────┐
|
|
↓ ↓
|
|
Direct Stream Embed URLs
|
|
↓ ↓
|
|
Play Video Embed Scraper
|
|
↓
|
|
Extract Stream
|
|
↓
|
|
Play Video
|
|
```
|
|
|
|
**Flow Breakdown:**
|
|
1. **User requests** content (movie/TV show)
|
|
2. **Source scraper** searches the target website
|
|
3. **Source returns** either:
|
|
- **Direct streams** → Ready to play immediately
|
|
- **Embed URLs** → Need further processing
|
|
4. **Embed scraper** (if needed) extracts streams from player URLs
|
|
5. **Final result** → Playable video stream
|
|
|
|
## Sources: The Content Finders
|
|
|
|
**Sources** are the first stage - they find content on websites and return either:
|
|
1. **Direct video streams** (ready to play)
|
|
2. **Embed URLs** that need further processing
|
|
|
|
### Example: Autoembed Source
|
|
|
|
```typescript
|
|
// From src/providers/sources/autoembed.ts
|
|
async function comboScraper(ctx: ShowScrapeContext | MovieScrapeContext): Promise<SourcererOutput> {
|
|
// 1. Call an API to find video sources
|
|
const data = await ctx.proxiedFetcher(`/api/getVideoSource`, {
|
|
baseUrl: 'https://tom.autoembed.cc',
|
|
query: { type: mediaType, id }
|
|
});
|
|
|
|
// 2. Return embed URLs for further processing
|
|
return {
|
|
embeds: [{
|
|
embedId: 'autoembed-english', // Points to an embed scraper
|
|
url: data.videoSource // URL that embed will process
|
|
}]
|
|
};
|
|
}
|
|
```
|
|
|
|
**What this source does:**
|
|
- Queries an API with TMDB ID
|
|
- Gets back a video source URL
|
|
- Returns it as an embed for the `autoembed-english` embed scraper to handle
|
|
|
|
### Example: Catflix Source
|
|
|
|
```typescript
|
|
// From src/providers/sources/catflix.ts
|
|
async function comboScraper(ctx: ShowScrapeContext | MovieScrapeContext): Promise<SourcererOutput> {
|
|
// 1. Build URL to the movie/show page
|
|
const watchPageUrl = `${baseUrl}/movie/${mediaTitle}-${movieId}`;
|
|
|
|
// 2. Scrape the page for embedded player URLs
|
|
const watchPage = await ctx.proxiedFetcher(watchPageUrl);
|
|
const $ = load(watchPage);
|
|
|
|
// 3. Extract and decode the embed URL
|
|
const mainOriginMatch = scriptData.data.match(/main_origin = "(.*?)";/);
|
|
const decodedUrl = atob(mainOriginMatch[1]);
|
|
|
|
// 4. Return embed URL for turbovid embed to process
|
|
return {
|
|
embeds: [{
|
|
embedId: 'turbovid', // Points to turbovid embed scraper
|
|
url: decodedUrl // Turbovid player URL
|
|
}]
|
|
};
|
|
}
|
|
```
|
|
|
|
**What this source does:**
|
|
- Scrapes a streaming website
|
|
- Finds encoded embed player URLs in the page source
|
|
- Decodes the URL and returns it for the `turbovid` embed scraper
|
|
|
|
## Embeds: The Stream Extractors
|
|
|
|
**Embeds** are the second stage - they take URLs from sources and extract the actual playable video streams. Each embed type knows how to handle a specific player or service.
|
|
|
|
### Example: Autoembed Embed (Simple)
|
|
|
|
```typescript
|
|
// From src/providers/embeds/autoembed.ts
|
|
async scrape(ctx) {
|
|
// The URL from the source is already a direct HLS playlist
|
|
return {
|
|
stream: [{
|
|
id: 'primary',
|
|
type: 'hls',
|
|
playlist: ctx.url, // Use the URL directly as HLS playlist
|
|
flags: [flags.CORS_ALLOWED],
|
|
captions: []
|
|
}]
|
|
};
|
|
}
|
|
```
|
|
|
|
**What this embed does:**
|
|
- Takes the URL from autoembed source
|
|
- Treats it as a direct HLS playlist (no further processing needed)
|
|
- Returns it as a playable stream
|
|
|
|
### Example: Turbovid Embed (Complex)
|
|
|
|
```typescript
|
|
// From src/providers/embeds/turbovid.ts
|
|
async scrape(ctx) {
|
|
// 1. Fetch the turbovid player page
|
|
const embedPage = await ctx.proxiedFetcher(ctx.url);
|
|
|
|
// 2. Extract encryption keys from the page
|
|
const apkey = embedPage.match(/const\s+apkey\s*=\s*"(.*?)";/)?.[1];
|
|
const xxid = embedPage.match(/const\s+xxid\s*=\s*"(.*?)";/)?.[1];
|
|
|
|
// 3. Get decryption key from API
|
|
const encodedJuiceKey = JSON.parse(
|
|
await ctx.proxiedFetcher('/api/cucked/juice_key', { baseUrl })
|
|
).juice;
|
|
|
|
// 4. Get encrypted playlist data
|
|
const data = JSON.parse(
|
|
await ctx.proxiedFetcher('/api/cucked/the_juice_v2/', {
|
|
baseUrl, query: { [apkey]: xxid }
|
|
})
|
|
).data;
|
|
|
|
// 5. Decrypt the playlist URL
|
|
const playlist = decrypt(data, atob(encodedJuiceKey));
|
|
|
|
// 6. Return proxied stream (handles CORS/headers)
|
|
return {
|
|
stream: [{
|
|
type: 'hls',
|
|
id: 'primary',
|
|
playlist: createM3U8ProxyUrl(playlist, ctx.features, streamHeaders),
|
|
headers: streamHeaders,
|
|
flags: [], captions: []
|
|
}]
|
|
};
|
|
}
|
|
```
|
|
|
|
**What this embed does:**
|
|
- Takes turbovid player URL from catflix source
|
|
- Performs complex extraction: fetches page → gets keys → decrypts data
|
|
- Returns the final HLS playlist with proper proxy handling
|
|
|
|
## Key Differences
|
|
|
|
| Sources | Embeds |
|
|
|---------|--------|
|
|
| **Find content** on websites | **Extract streams** from players |
|
|
| Return embed URLs OR direct streams | Always return direct streams |
|
|
| Handle website navigation/search | Handle player-specific extraction |
|
|
| Can return multiple server options | Process one specific player type |
|
|
| Example: "Find Avengers on Catflix" | Example: "Extract stream from Turbovid player" |
|
|
|
|
## Why This Separation?
|
|
|
|
### 1. **Reusability**
|
|
Multiple sources can use the same embed:
|
|
```typescript
|
|
// Both catflix and other sources can return turbovid embeds
|
|
{ embedId: 'turbovid', url: 'https://turbovid.com/player123' }
|
|
```
|
|
|
|
### 2. **Multiple Server Options**
|
|
Sources can provide backup servers:
|
|
```typescript
|
|
return {
|
|
embeds: [
|
|
{ embedId: 'turbovid', url: 'https://turbovid.com/player123' },
|
|
{ embedId: 'vidcloud', url: 'https://vidcloud.co/embed456' },
|
|
{ embedId: 'dood', url: 'https://dood.watch/789' }
|
|
]
|
|
};
|
|
```
|
|
|
|
### 3. **Language/Quality Variants**
|
|
Sources can offer different options:
|
|
```typescript
|
|
return {
|
|
embeds: [
|
|
{ embedId: 'autoembed-english', url: streamUrl },
|
|
{ embedId: 'autoembed-spanish', url: streamUrlEs },
|
|
{ embedId: 'autoembed-hindi', url: streamUrlHi }
|
|
]
|
|
};
|
|
```
|
|
|
|
### 4. **Specialization**
|
|
- **Sources** specialize in website structures and search
|
|
- **Embeds** specialize in player technologies and decryption
|
|
|
|
## How They Work Together
|
|
|
|
### Flow Example: Finding "Spirited Away"
|
|
|
|
1. **Source (catflix)**:
|
|
- Searches catflix.su for "Spirited Away"
|
|
- Finds movie page with embedded player
|
|
- Extracts turbovid URL: `https://turbovid.com/embed/abc123`
|
|
- Returns: `{ embedId: 'turbovid', url: 'https://turbovid.com/embed/abc123' }`
|
|
|
|
2. **Embed (turbovid)**:
|
|
- Receives the turbovid URL
|
|
- Scrapes the player page for encryption keys
|
|
- Decrypts the actual HLS playlist URL
|
|
- Returns: `{ stream: [{ playlist: 'https://cdn.example.com/movie.m3u8' }] }`
|
|
|
|
3. **Result**: User can now play the video stream
|
|
|
|
### Error Handling Chain
|
|
|
|
If the embed fails to extract a stream:
|
|
```typescript
|
|
// Source provides multiple backup options
|
|
return {
|
|
embeds: [
|
|
{ embedId: 'turbovid', url: url1 }, // Try first
|
|
{ embedId: 'mixdrop', url: url2 }, // Fallback 1
|
|
{ embedId: 'dood', url: url3 } // Fallback 2
|
|
]
|
|
};
|
|
```
|
|
|
|
The system tries each embed in rank order until one succeeds.
|
|
|
|
## Best Practices
|
|
|
|
### For Sources:
|
|
- Provide multiple embed options when possible
|
|
- Use descriptive embed IDs that match existing embeds
|
|
- Handle both movies and TV shows (combo scraper pattern)
|
|
- Return direct streams when embed processing isn't needed
|
|
|
|
### For Embeds:
|
|
- Focus on one player type per embed
|
|
- Handle errors gracefully with clear error messages
|
|
- Use proxy functions for protected streams
|
|
- Include proper headers and flags
|
|
|
|
### Registration:
|
|
```typescript
|
|
// In src/providers/all.ts
|
|
export function gatherAllSources(): Array<Sourcerer> {
|
|
return [catflixScraper, autoembedScraper, /* ... */];
|
|
}
|
|
|
|
export function gatherAllEmbeds(): Array<Embed> {
|
|
return [turbovidScraper, autoembedEnglishScraper, /* ... */];
|
|
}
|
|
```
|
|
|
|
Both sources and embeds must be registered in `all.ts` to be available for use.
|