Documentation

Indexing and crawling

Ahmia is mainly divided into three parts. Codes for these parts are open source and available on Github. You can click on respective links to access them.

  • Index: It refers to the data that Ahmia has collected. We use Elascticsearch to maintain this data.
  • Crawler: It is the part that crawls onions on the Tor network and feed it to the index. Scrapy is one of the crawlers that we use.
  • Site: It is the backbone of Ahmia that includes the design of the website and makes the search engine work.

Ahmia collects a list of known .onion sites, information about these sites and saves it to its database. Collected data is filtered to remove child abuse content (Refer: Blacklist ).

Once data is filtered, it is available for search at Ahmia.

Robots.txt

We are honouring robots.txt files. This means that you can prevent our crawler from indexing your page by using robots.txt rules. This way your page is not shown in the search results. On the other hand, your page still shows in the list of all known hidden services that are online.

Check robotstxt.org for details to create robots.txt.

  •  Globaleaks Platform
  •  Tor2Web Platform
  •  HERMES Center for Transparency and Digital Rights
  •  Tor Project
A man-in-the-middle fake clone detected!
Warning!
Right onion address starts with juhanurmihxlp77 and ends with 4csyd.onion.
Find real address from ahmia.fi