append ffmpeg dist

This commit is contained in:
2026-02-13 23:49:43 -07:00
parent 66882233b0
commit 57f95fc74d
2 changed files with 357 additions and 0 deletions
@@ -2632,3 +2632,97 @@ So “input” and “output” are effectively whatever layout the scheduler us
---
_**User**_
cloned Rouji's into C:\Users\Nathan\Downloads\ffmpeg_distributed.
Can we use a batch to run a job? Should the clone be stored in R:\Videos\ffmpeg_distributed, or should we just append the script into this repo? Will it ssh into each machine, or do I need to run it on each? Where are the chunks stored before they are muxed into one?
---
_**Agent (model default, mode Agent)**_
<think><details><summary>Thought Process</summary>
The user has cloned Rouji's ffmpeg_distributed to C:\Users\Nathan\Downloads\ffmpeg_distributed and has several questions:
1. Can we use a batch to run a job?
2. Should the clone be stored in R:\Videos\ffmpeg_distributed, or append the script into the mux repo?
3. Will it SSH into each machine, or do they need to run it on each?
4. Where are the chunks stored before they are muxed into one?
I'm in Ask mode so I can only read and answer, not edit. Let me look at the Rouji script again to answer accurately - I already have the source from the previous fetch. Let me summarize:
From the ffmpeg_distributed.py I saw:
- It's invoked as: input_file, output_file, remote_args, concat_args, with -H host(s), -t tmp_dir, etc.
- It runs on ONE machine (the one you run the script from). It splits the input locally, then uses TaskThread which SSHs to each host and runs ffmpeg there (or runs locally if host is 'localhost'). So you run the script once, on one machine; it SSHs to the other hosts.
- tmp_dir: default is `'ffmpeg_segments_'+md5(input_file.encode()).hexdigest()` - so a directory named like ffmpeg_segments_<hash>, and it creates tmp_in = tmp_dir/in and tmp_out = tmp_dir/out. So chunks are in tmp_dir/in (input segments) and tmp_dir/out (encoded segments). The script runs from one machine, so tmp_dir is relative to where you run it (or you pass -t). The concat step reads from tmp_out and writes to output_file. So chunks are stored in the tmp directory on the machine where you run the script - by default in the current working directory (ffmpeg_segments_<hash>/in and .../out).
Let me re-read to be precise about SSH: "if self._host != 'localhost': ffmpeg_cmd = ['ssh', self._host, join(ffmpeg_cmd)]" - so the script runs on one box, and for each host it either runs ffmpeg locally or via ssh. So you run the script on ONE machine; that machine does the split, sends segments to workers via SSH (pipes through ssh), and receives encoded segments back, then does the concat. So the chunks (tmp_dir) are on the machine where you run the script.
I'll answer all four questions clearly.
</details></think>
---
Here are direct answers based on Roujis `ffmpeg_distributed.py`:
---
## 1. Can you use a batch to run a job?
**Yes.** The script takes one input file and one output file per run. To process many files you run it in a loop. On Windows you can do that from a batch file, e.g.:
```batch
@echo off
set HOSTS=-H host1 -H host2 -H host3
set REMOTE_ARGS=-c:v libx264 -crf 23 -preset fast
for %%f in (R:\Videos\mux\input\*.mp4) do (
python C:\Users\Nathan\Downloads\ffmpeg_distributed\ffmpeg_distributed.py %HOSTS% -s 60 "%%f" "R:\Videos\mux\output\%%~nxf" "%REMOTE_ARGS%" ""
)
```
So: one “job” = one file; batching = multiple such runs (batch file, PowerShell, or Python wrapper).
---
## 2. Clone in `R:\Videos\ffmpeg_distributed` or merge into the mux repo?
**Either is fine; its about how you want to organize.**
- **Separate clone (e.g. `R:\Videos\ffmpeg_distributed`):** Keeps Roujis project untouched, easy to `git pull` upstream. Your batch/wrapper scripts (in the mux repo or elsewhere) just call `python R:\Videos\ffmpeg_distributed\ffmpeg_distributed.py ...`.
- **Copy/merge into the mux repo:** One place for “all my encoding stuff,” but youd maintain the script yourself and wont get upstream updates cleanly.
Recommendation: keep the clone where you like (e.g. `R:\Videos\ffmpeg_distributed`) and call it from your existing repo (e.g. a `run_distributed_encode.bat` in `R:\Videos\mux` that uses that path). No need to merge the script into the mux repo unless you plan to fork and heavily customize it.
---
## 3. Does it SSH into each machine, or do you run it on each?
**You run the script on one machine only.** That machine:
- Splits the input into segments locally.
- For each segment, sends work to a host via **SSH** (or runs locally if the host is `localhost`). So it **does** SSH into each host you list; you dont run the script on each worker.
- Collects encoded segments back and runs the concat on the same machine.
So: one runner box, SSH to the others. You dont start the script on each worker.
---
## 4. Where are the chunks stored before theyre muxed?
**On the same machine where you run the script**, in a temporary directory.
From the code:
- Default temp dir: `ffmpeg_segments_<md5 of input path>` (so a name like `ffmpeg_segments_a1b2c3d4...`).
- It creates two subdirs:
- `tmp_dir/in` — raw segments from the split.
- `tmp_dir/out` — encoded segments from the workers.
- That temp dir is created in the **current working directory** when you start the script, unless you pass `-t TMP_DIR` to override.
So chunks (both input segments and encoded segments) live in that temp dir on the runner machine until the script concatenates them into the final `output_file` and then deletes the temp dir (unless you use `--keep-tmp`). If you run from `R:\Videos\mux`, the default chunk location is something like `R:\Videos\mux\ffmpeg_segments_<hash>\in` and `...\out`.
---