pstreams-providers/.docs/content/3.in-depth/1.setup-and-prerequisites.md
2025-07-11 14:08:43 -06:00

179 lines
4.9 KiB
Markdown

# Setup and Prerequisites
Before you start building scrapers, you need to set up your development environment and understand the testing workflow.
## Environment Setup
### 1. Create Environment File
Create a `.env` file in the root of the repository with the following variables:
```env
MOVIE_WEB_TMDB_API_KEY = "your_tmdb_api_key_here"
MOVIE_WEB_PROXY_URL = "https://your-proxy-url.com" # Optional
```
**Getting a TMDB API Key:**
1. Create an account at [TheMovieDB](https://www.themoviedb.org/)
2. Go to Settings > API
3. Request an API key (choose "Developer" for free usage)
4. Use the provided key in your `.env` file
**Proxy URL (Optional):**
- Useful for testing scrapers that require proxy access
- Can help bypass geographical restrictions during development
- If not provided, the library will use default proxy services
### 2. Install Dependencies
Install all required dependencies:
```sh
pnpm install
```
## Familiarize Yourself with the CLI
The library provides a CLI tool that's essential for testing scrapers during development. Unit tests can't be made for scrapers due to their unreliable nature, so the CLI is your primary testing tool.
### Interactive Mode
The easiest way to test is using interactive mode:
```sh
pnpm cli
```
This will prompt you for:
- **Fetcher mode** (native, node-fetch, browser)
- **Scraper ID** (source or embed)
- **TMDB ID** for the content (for sources)
- **Embed URL** (for testing embeds directly)
- **Season/episode numbers** (for TV shows)
### Command Line Mode
For repeatability and automation, you can specify arguments directly:
```sh
# Get help with all available options
pnpm cli --help
# Test a movie scraper
pnpm cli --source-id catflix --tmdb-id 11527
# Test a TV show scraper (Arcane S1E1)
pnpm cli --source-id zoechip --tmdb-id 94605 --season 1 --episode 1
# Test an embed scraper directly with a URL
pnpm cli --source-id turbovid --url "https://turbovid.eu/embed/DjncbDBEmbLW"
```
### Common CLI Examples
```sh
# Popular test cases
pnpm cli --source-id catflix --tmdb-id 11527 # The Shining
pnpm cli --source-id embedsu --tmdb-id 129 # Spirited Away
pnpm cli --source-id vidsrc --tmdb-id 94605 --season 1 --episode 1 # Arcane S1E1
# Testing different fetcher modes
pnpm cli --fetcher native --source-id catflix --tmdb-id 11527
pnpm cli --fetcher browser --source-id catflix --tmdb-id 11527
```
### Fetcher Options
The CLI supports different fetcher modes:
- **`native`**: Uses Node.js built-in fetch (undici) - fastest
- **`node-fetch`**: Uses the node-fetch library
- **`browser`**: Starts headless Chrome for browser-like environment
::alert{type="warning"}
The browser fetcher requires running `pnpm build` first, otherwise you'll get outdated results.
::
### Understanding CLI Output
#### Source Scraper Output (Returns Embeds)
```sh
pnpm cli --source-id catflix --tmdb-id 11527
```
Example output:
```json
{
embeds: [
{
embedId: 'turbovid',
url: 'https://turbovid.eu/embed/DjncbDBEmbLW'
}
]
}
```
#### Embed Scraper Output (Returns Streams)
```sh
pnpm cli --source-id turbovid --url "https://turbovid.eu/embed/DjncbDBEmbLW"
```
Example output:
```json
{
stream: [
{
type: 'hls',
id: 'primary',
playlist: 'https://proxy.fifthwit.net/m3u8-proxy?url=https%3A%2F%2Fqueenselti.pro%2Fwrofm%2Fuwu.m3u8&headers=%7B%22referer%22%3A%22https%3A%2F%2Fturbovid.eu%2F%22%2C%22origin%22%3A%22https%3A%2F%2Fturbovid.eu%22%7D',
flags: [],
captions: []
}
]
}
```
**Notice the proxied URL**: The `createM3U8ProxyUrl()` function creates URLs like `https://proxy.fifthwit.net/m3u8-proxy?url=...&headers=...` to handle protected streams. Read more about this in [Advanced Concepts](/in-depth/advanced-concepts).
#### Interactive Mode Flow
```sh
pnpm cli
```
```
✔ Select a fetcher mode · native
✔ Select a source · catflix
✔ TMDB ID · 11527
✔ Media type · movie
✓ Done!
{
embeds: [
{
embedId: 'turbovid',
url: 'https://turbovid.eu/embed/DjncbDBEmbLW'
}
]
}
```
## Development Workflow
1. **Setup**: Create `.env` file and install dependencies
2. **Research**: Study the target website's structure and player technology
3. **Code**: Build your scraper following the established patterns
4. **Register**: Add to `all.ts` with unique rank
5. **Test**: Use CLI to test with multiple different movies and TV shows
6. **Iterate**: Fix issues and improve reliability
7. **Submit**: Create pull request with thorough testing documentation
## Next Steps
Once your environment is set up:
1. Read [Provider System Overview](/in-depth/provider-system) to understand how scrapers work
2. Learn [Building Scrapers](/in-depth/building-scrapers) for detailed implementation guide
3. Check [Advanced Concepts](/in-depth/advanced-concepts) for error handling and best practices
::alert{type="info"}
Always test your scrapers with multiple different movies and TV shows to ensure reliability across different content types.
::