> For the complete documentation index, see [llms.txt](https://notes.m4lwhere.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://notes.m4lwhere.org/offensive/recon/osint.md).

# OSINT

## OSINT Framework

Gives a ton of excellent resources on gathering intel.

{% embed url="<https://osintframework.com/>" %}

## Shodan & Censys

Websites which are constantly scanning the Internet for available devices, performs banner grabbing and publicly publishes its findings. Great to see what attackers on the Internet will see for an IP you own.

{% embed url="<https://shodan.io>" %}

{% embed url="<https://censys.io>" %}

## Tesseract

This is an OCR package which is a CLI tool that understands 100+ languages. Very useful to gather quick text from images!

```bash
tesseract important.png stdout | egrep -v '^$'    # Search important.png for text, push to stdout, then remove blank lines
tesseract important.png stdout -psm 11 -l eng     # Set the PSM (Page Segmentation Mode) to 11, find as much text as possible in no particular order
for i in *.jpg; do tesseract $i stdout -psm 11 -l eng >> words.txt; done    # Dirty bash loop to gather text from all jpgs in dir
egrep -v '^$'    # Remove blank lines
fmt -1           # One word per line
strings -n4      # Require min 4 ASCII printable characters
egrep -i [a-z]   # Require at least one alphanumeric character
sort -u          # Unique entries only
```

## Google Dorks

GHDB contains a ton of premade dorks to find info.

{% embed url="<https://www.exploit-db.com/google-hacking-database>" %}

| Dork                                                        | Purpose                                                             |
| ----------------------------------------------------------- | ------------------------------------------------------------------- |
| `whales -bitcoin`                                           | searches for whales without any mention of bitcoin                  |
| `site:m4lwhere.org`                                         | filters to only the site m4lwhere.org                               |
| `cache:`                                                    | search the Google cache only                                        |
| <p><code>ext:pdf</code></p><p><code>filetype:pdf</code></p> | filters to the extension and filetype only                          |
| `intitle:"Index of "`                                       | Searches for any page that has "Index of " in the name              |
| `inurl:"*.cgi"`                                             | Searches for any page that ends in a ".cgi"                         |
| `site:m4lwhere.org intitle:"Index of" "last modified"`      | Searches for a directory listing of a page on the site m4lwhere.org |

## Certificate Transparency

CAs are required to publish all certificates issued to a public database. This can be useful to find servers that are internal to a LAN or are not Internet accessible.&#x20;

{% embed url="<https://ui.ctsearch.entrust.com/ui/ctsearchui>" %}

{% embed url="<https://transparencyreport.google.com/https/certificates?hl=en>" %}

## Credential Leaks

* <https://breachdirectory.org/>
* [https://leak-lookup.com/](https://leak-lookup.com/docs/search)
* <https://monitor.firefox.com/>

## Passive DNS

Occasionally there will be old or forgotten IPs for a site listed in passive DNS listings.

## Data Aggregators

[Hunter.io](http://hunter.io), compiles lists of org metadata, useful to identify email addressing schemes.

[haveibeenpwned.com](http://haveibeenpwned.com), lists of pwned email accounts.

[dehashed.com](http://dehashed.com), public data dumps available, requries paid access.

[scylla.sh](http://scylla.sh), indexed data dumps, free, currently down.

Public data dump forums

Torrents


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://notes.m4lwhere.org/offensive/recon/osint.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
