2024-04-13 16:32:56 +10:00
|
|
|
# katana
|
|
|
|
|
|
|
|
> A fast crawler focused on execution in automation pipelines offering both headless and non-headless crawling.
|
|
|
|
> See also: `gau`, `scrapy`, `waymore`.
|
|
|
|
> More information: <https://github.com/projectdiscovery/katana>.
|
|
|
|
|
|
|
|
- Crawl a list of URLs:
|
|
|
|
|
|
|
|
`katana -list {{https://example.com,https://google.com,...}}`
|
|
|
|
|
|
|
|
- Crawl a [u]RL using headless mode using Chromium:
|
|
|
|
|
2025-03-28 05:26:23 +02:00
|
|
|
`katana -u {{https://example.com}} {{[-hl|-headless]}}`
|
2024-04-13 16:32:56 +10:00
|
|
|
|
2024-04-15 04:07:34 +10:00
|
|
|
- Use `subfinder` to find subdomains, and then use [p]a[s]sive sources (Wayback Machine, Common Crawl, and AlienVault) for URL discovery:
|
2024-04-13 16:32:56 +10:00
|
|
|
|
2025-03-28 05:26:23 +02:00
|
|
|
`subfinder {{[-dL|-list]}} {{path/to/domains.txt}} | katana -passive`
|
2024-04-13 16:32:56 +10:00
|
|
|
|
2025-03-28 05:26:23 +02:00
|
|
|
- Pass requests through a proxy (http/socks5) and use custom headers from a file:
|
2024-04-13 16:32:56 +10:00
|
|
|
|
2025-03-28 05:26:23 +02:00
|
|
|
`katana -proxy {{http://127.0.0.1:8080}} {{[-H|-headers]}} {{path/to/headers.txt}} -u {{https://example.com}}`
|
2024-04-13 16:32:56 +10:00
|
|
|
|
2025-03-28 05:26:23 +02:00
|
|
|
- Specify the crawling strategy, depth of subdirectories to crawl, and rate limiting (requests per second):
|
2024-04-13 16:32:56 +10:00
|
|
|
|
2025-03-28 05:26:23 +02:00
|
|
|
`katana {{[-s|-strategy]}} {{depth-first|breadth-first}} {{[-d|-depth]}} {{value}} {{[-rl|-rate-limit]}} {{value}} -u {{https://example.com}}`
|
2024-04-13 16:32:56 +10:00
|
|
|
|
2025-03-28 05:26:23 +02:00
|
|
|
- Find subdomains using `subfinder`, crawl each for a maximum number of seconds, and write results to an output file:
|
2024-04-13 16:32:56 +10:00
|
|
|
|
2025-03-28 05:26:23 +02:00
|
|
|
`subfinder {{[-dL|-list]}} {{path/to/domains.txt}} | katana {{[-ct|-crawl-duration]}} {{value}} {{[-o|-output]}} {{path/to/output.txt}}`
|