# OSINT

## OSINT Framework

Gives a ton of excellent resources on gathering intel.

{% embed url="<https://osintframework.com/>" %}

## Shodan & Censys

Websites which are constantly scanning the Internet for available devices, performs banner grabbing and publicly publishes its findings. Great to see what attackers on the Internet will see for an IP you own.

{% embed url="<https://shodan.io>" %}

{% embed url="<https://censys.io>" %}

## Tesseract

This is an OCR package which is a CLI tool that understands 100+ languages. Very useful to gather quick text from images!

```bash
tesseract important.png stdout | egrep -v '^$'    # Search important.png for text, push to stdout, then remove blank lines
tesseract important.png stdout -psm 11 -l eng     # Set the PSM (Page Segmentation Mode) to 11, find as much text as possible in no particular order
for i in *.jpg; do tesseract $i stdout -psm 11 -l eng >> words.txt; done    # Dirty bash loop to gather text from all jpgs in dir
egrep -v '^$'    # Remove blank lines
fmt -1           # One word per line
strings -n4      # Require min 4 ASCII printable characters
egrep -i [a-z]   # Require at least one alphanumeric character
sort -u          # Unique entries only
```

## Google Dorks

GHDB contains a ton of premade dorks to find info.

{% embed url="<https://www.exploit-db.com/google-hacking-database>" %}

| Dork                                                        | Purpose                                                             |
| ----------------------------------------------------------- | ------------------------------------------------------------------- |
| `whales -bitcoin`                                           | searches for whales without any mention of bitcoin                  |
| `site:m4lwhere.org`                                         | filters to only the site m4lwhere.org                               |
| `cache:`                                                    | search the Google cache only                                        |
| <p><code>ext:pdf</code></p><p><code>filetype:pdf</code></p> | filters to the extension and filetype only                          |
| `intitle:"Index of "`                                       | Searches for any page that has "Index of " in the name              |
| `inurl:"*.cgi"`                                             | Searches for any page that ends in a ".cgi"                         |
| `site:m4lwhere.org intitle:"Index of" "last modified"`      | Searches for a directory listing of a page on the site m4lwhere.org |

## Certificate Transparency

CAs are required to publish all certificates issued to a public database. This can be useful to find servers that are internal to a LAN or are not Internet accessible.&#x20;

{% embed url="<https://ui.ctsearch.entrust.com/ui/ctsearchui>" %}

{% embed url="<https://transparencyreport.google.com/https/certificates?hl=en>" %}

## Credential Leaks

* <https://breachdirectory.org/>
* [https://leak-lookup.com/](https://leak-lookup.com/docs/search)
* <https://monitor.firefox.com/>

## Passive DNS

Occasionally there will be old or forgotten IPs for a site listed in passive DNS listings.

## Data Aggregators

[Hunter.io](http://hunter.io), compiles lists of org metadata, useful to identify email addressing schemes.

[haveibeenpwned.com](http://haveibeenpwned.com), lists of pwned email accounts.

[dehashed.com](http://dehashed.com), public data dumps available, requries paid access.

[scylla.sh](http://scylla.sh), indexed data dumps, free, currently down.

Public data dump forums

Torrents
