Home | README | Download | Links | Feedback | Original Frameset


INTRODUCTION

Nowadays many networks run with Windows clients. The primary mechanism
for public or restricted file sharing in such an environment is to use the
built-in sharing mechanism of Windows which makes it possible to share
any directory over a network. This mechanism can also be used on most UNIX
system to communicate with Windows clients, thanks to the Samba project.
For more on Samba check out [0].

With Windows it is possible to search for a file on a given server. This is a
feasible solution for networks with a primary file server which contains all
files the user needs, which is usually the case in company networks. In other
environments, for example campus networks, it is usually not known on which
host the file is located. Another scenario might be a network where the
primary fileserver is mirrored by a few backup servers. Should the primary
server fail it might take a while to find out on which backup server a
particular file is located.

This is where FemFind comes into play. FemFind consists of several perl scripts.
At certain intervals all shares are crawled and the filenames are stored
in a database. FTP servers can also be crawled. At any time the user can
search for a file either via a WWW interface or with a Windows client.

FemFind is currently being used within FeM-Net [1], the students' campus LAN at
Technical University Ilmenau, where it has been running for over a year.


INTERNALS

The crawler (crawler.pl) is invoked at certain times each day via crontab
entries. There are two distinct modes of operation for the crawler:
'complete crawl' and 'incremental crawl'. The differences between these modes
will become clear later on. crawler.pl expects a command line argument which
tells it in which mode to run (-c, --complete or -i, --incremental).
The perl script relies on Samba to handle the SMB communciation. It calls
smbclient and nmblookup.

First, the crawler contacts the masterbrowser and retrieves a list of the active
hosts in all workgroups. If the crawler is running in 'complete crawl' mode
it contacts a host if it is listed in the database without a PreferedTime
set or if it is not listed in the database yet, but currently online (as
determined by the masterbrowser's list). These hosts are crawled, and their
share structure is stored or updated in the database. A complete crawl should
be done daily.

When the script gets called in 'incremental crawl' mode, the hosts to be crawled
are determined as follows: Based on the current time a time frame is calculated.
For each host that is already stored in the database, the PreferedTime field
is checked. If it falls in the time frame the host will be crawled. In addition
to this, all hosts that are currently online but not yet included in the
database will also be crawled. Incremental crawls should be run a few times each
day. On small networks you can crawl each hour. Network traffic will still be
low, as only new hosts will be contacted in this case.

If a host is not reachable, a flag is set in the database. The crawler checks
this flag  and tries to reach the host the next time. Once a certain limit
is reached the host will be deleted from the database. This limit can be
defined sperately for SMB and FTP in femfind.conf. The expire flag is cleared
after each successful crawl.


INSTALLATION

FemFind has only been tested on Linux. I know of no reasons why it shouldn't
work on other UNIXes. Feedback is appreciated if you test it somewhere else!

What you need to have installed:
Samba [0]
MySQL [2]

Optional:
An httpd if you want to use the web interface
Some Windows boxes if you want to run the Windows frontend :)

Because some Perl modules have to be installed you should run the install
script as root.


A) Installing The Crawler - The Easy Way

As of v0.71 FemFind comes with a shell script for easy installing. Although
it has not been tested on many platforms yet I recommend that you use this
script. Please report any problems you encounter to me.

These Perl modules are required by FemFind:
Msql-Mysql-modules [3] (which requires DBI [4] and Data-ShowTable [5])
libnet (for Net::FTP) [6]
Time::HiRes [7]

You can let the install script try to install these modules for you, or install
them before you run the script.

  ATTENTION: When installing libnet, Perl's CPAN installer sometimes tries to
             install Perl 5.6. If this happens, just download and install the
             libnet module [6] manually and then re-run the install script.
			 
- Depack FemFind-0.72.tar.gz
    gzip -cd FemFind-0.72.tar.gz | tar xf -
    cd FemFind-0.72
	
- Edit femfind.conf and set all the variables that are in the first section.
  Please note that femfind.conf will be placed in /etc later, so if you want to
  change your configuration after you have run install.sh be sure to alter
  that file.

- Now run install.sh. This script will...
    o try to determine your Perl path, and modify the first line of each .pl
      file accordingly
    o install the required Perl modules via CPAN on request
    o install the FemFind modules
    o copy femfind.conf to /etc
    o run the database setup

- Database setup: This Perl script sets up MySQL for FemFind. It generates
  two users ('search' and db_crawler_login as specified in femfind.conf)
  and the database db_name (you know where to change this ;)

First crawl and crontab setup

- Recommended: Don't do the following as root. Choose a user for FemFind
  (maybe create a new one) and edit this users crontab. You don't need to
  run FemFind as root, and unless you don't care about security there's no
  good reason to do so.
  (You will have to chown the crawler.pl to your new user)

- Now you should test if you have everything set up properly by running
  'crawler.pl --complete'. You might want to time execution for the next step,
  the crontab setup. If there is a problem at this point, you should check the
  logfile first. If there is nothing helpful in there, try setting the
  debuglevel to 3 and re-run the crawler.

- Edit your crontab: Depending on how large your network is you have to define
  how often and at what time to invoke the crawler. Here's an example:
  
        0 13 * * * /home/femfind/crawler.pl --complete
        0 7,10,13,16,19 * * * /home/femfind/crawler.pl --incremental
  
  This executes the script in complete crawl mode once at 13:00, runs the
  incremental crawls 5 times and gives each at least 3 hours to complete.
  
  Please notice that the complete crawl will take significantly longer than the
  incremental crawls.
  
  If you want to optimize search results run 'crawler.pl --complete' at a time
  when most hosts are online OR try to cover the whole day with your scans
  (works good on small networks).
  
  If you want to minimize interferences with your network/servers and most of
  your servers run 24/7 anyway you might want to do the complete crawl at night
  and spread a few incremental crawls over the day.
  
  The crawler detects if another instance is still running and terminates, thus
  avoiding an inconsistent database. 

Crawler working? Now install the webinterface (Section C) or the Winclients (D).

  
B) Installing The Crawler - The Old (Hard?) Way

These Perl modules are required by FemFind:
Msql-Mysql-modules [3] (which requires DBI [4] and Data-ShowTable [5])
libnet (for Net::FTP) [6]
Time::HiRes [7]

Make sure you have these installed.

- Create a directory and put crawler.pl and femfind.conf in it.
  Make sure the rights are correctly set so that the cron demon can invoke
  the crawler.pl script and femfind.conf can be read by everybody.

  Edit the first line of crawler.pl if you have Perl installed somewhere 
  else. (Find it out with 'which perl')

- Edit modules/ConfigReader/ConfigReader.pm, line 29 and insert
  the absolute location of your femfind.conf (default: /etc/femfind.conf)
  Now run ./makemod from the modules subdirectory. This will build and
  install the two modules.
  
- Edit femfind.conf and set all the variables that are in the first section.

- MySQL setup: You have to setup a database and two users (or one user if you
  want to use the root account). You can use the mysql_setpermission script
  that ships with MySQL for this.

  Setup the database (the name must correspond to db_name in femfind.conf)
  and the first user with option 2. This user needs full access rights
  (select/update/create). Ofcourse you should password protect that account.
  You have to insert the password in the crawler.pl script (line 14).
  Local access is sufficient if the crawler runs on the same machine as the
  database.
  
  The second user name has to be 'search', with no password and only 'select'
  rights (option 5). The account has to be accessible from the host where you
  will run your httpd with the search scripts (usually localhost). If you want
  to use the Windows client, you have to permit logins from all hosts ('%').
  Note that '%' does not include localhost, you have to enter both.
  
- Run 'crawler.pl --tables'. This will create the table structure in the
  database.

Now continue with 'First crawl and crontab setup' from section A) above.


C) Webinterface

- Copy the whole cgi-bin/femfind _directory_ from the distro to your cgi-bin.
  Adjust the Perl path in the .pl scripts if you didn't use install.sh.
  
- Copy the htdocs/femfind directory to your htdocs.
  
  Point your browser at
  http://your.webserver/cgi-bin/femfind/frontpage.pl
  http://your.webserver/femfind/index.html
  
  You can choose either one as the FemFind start page.

- Optional: If you want to have a german language webinterface overwrite
  the files in your cgi-bin/femfind/ and htdocs/femfind/ directories with the
  files from german/. Currently no other languages are available.
  Internationalization will be possible in a future version.


D) Windows Client:

- You need at least two servers that are running most of the time.
  Create a file 'setup' with four lines (do this with DOS/Windows to get
  CRLF linebreaks):
  
  DNS name of the MySQL server
  login for the server (default: search)
  password for the account (default: none)
  name of the database (default: femfind)
  
  Example (setup):
  ---cut here
  mysqlbox.codefactory.de
  search
  
  femfind
  ---cut here

- Put 'setup' on your servers.

- Create a file 'femfind.cfg' containing all the locations of your setup files,
  like so:

  Example (femfind.cfg):
  ---cut here
  \\fileserver\femfind\setup
  \\backup\femfind\setup
  ---cut here

  You have to distribute this file along with the FemFind Windows client.
  
  The obvious advantage of this setup is that whenever you decide to move your
  database to some other server you only have to change the 'setup' file on a
  few computers instead of every user having to change the setup.
  
  
Hints:
- If some special non-us characters (umlauts etc.) do not appear correctly,
  check your Linux setup. The LC_* variables must be setup according to the
  language. You can find out current settings with 'locale', 'locale -a'
  lists all availables locales. Pick the one for your country, set
  LC_ALL to this value and export it. Now run the crawler again and
  see if it makes a difference.
  
  Example:
  LC_ALL="de_DE";export LC_ALL;crawler.pl --complete


FINETUNING

There are a few things you can configure. Run 'crawler.pl --modify'.

Options:
1 - Change PreferedTime for SMB Host
2 - Change PreferedTime for FTP Host
3 - Exclude SMB host from scanning

The PreferedTime tells the crawler at which time the Host should be crawled.
You can exclude hosts from crawling with option 3.


SECURITY

FemFind is still in beta stage. It has not really been designed with security
in mind. Some things have been implemented (no symlink following etc.), but
might not work yet! The author is not responsible for any damages the
program might cause.

Please read the next section for further info.


KNOWN BUGS

- Linux shares that contain symlinks to the . directory or similiar
  constructions will lead to a neverending crawl (Example: SuSE distro). This is
  no FemFind bug. The problem is that Samba does not differentiate between
  directories and symlinks. Temporary solution: Exclude hosts with such links
  from scanning by running 'crawler.pl --modify'
  (see: Finetuning)
  UPDATE: You can disable symlink following on a per-share basis with SAMBA.
          (In SWAT, choose Advanced View)
          This should fix the problem.

- There's one known reproducable incident where Samba failed to crawl a large
  share correctly. This seems to be a bug in the smbclient.

- crawler.pl cleanup does not check if the temp file really exists when it tries
  to unlink. You can ignore the resulting error message.

- there's no possibility to remove entries from femfind.mod from the
  'crawler.pl --modify' menu, you can only edit the textfile

- some special characters will not appear correctly


VERSION HISTORY

New versions will be released on Freshmeat [8] and my website [9].

FemFind-0.74.tar.gz

- fixed a bug in the ftp crawler, now files with spaces aren't ignored
  (thanx Dennis)

FemFind-0.73.tar.gz

- new options: disable_ftp and disable_smb
- install.sh now explicitly installs the DBI module from CPAN
- improvements, small fixes in Helper.pm and ConfigReader.pm
- lowered FTP connect timeout and better error handling

FemFind-0.72.tar.gz

- new option: search hidden default shares (NT) (yes/no, default is no)
              see femfind.conf  
- empty hosts will be removed from the database
- improved logging (timestamp, commandline)

FemFind-0.71.tar.gz

- FTP host list is now sorted
- restructured modules (now in 'FemFind::' hierarchy), converted them to real
  modules (Makefile.PL etc.)
- new module 'Helper' added, which contains some common subs
- femfind.conf moved to /etc
- changed some femfind.conf options, please DO NOT use your old file!
- added install.sh script that does most of the work
- added database setup script (makedb.pl)
- 'crawler' mysql password no longer stored in femfind.conf
- configurable user/password for connecting to shares (femfind.conf)
  (you can now setup your intranet with security=domain and generate an
   account for femfind that can read access all hosts)
- auto detection of MySQL stats (no more editing offsets in frontpage.pl)
- FTP servers can be added with WINS name, DNS name or IP
- get_ip resolves WINS, DNS, IP correctly (not just using nmblookup which caused
  problems with some WINS hosts)
- more robust: bizarre workgroup-, host- and sharenames containing
  "$", "'",  "`", "|" or spaces (!) should work fine (but never say never :)

FemFind-0.70.tar.gz

- new command line syntax, please adjust your crontabs and minds
- brand new HTML pages, thanx to Fire
- new options: show directories only, user defined hits p/page in advanced mode
- display of a redirection page if MySQL server is currently down
  (new parameter in femfind.conf - backup_url)
- two bugs fixed in search3.pl

FemFind-0.68.tar.gz

- initial public release


FemFind-winclient-0.65.zip

- initial public release


WHAT NEXT?

There are quite a few plans on how FemFind could evolve.

- redesign of the crawler for more flexibility in host inclusion/exclusion
- support for robots.txt
- rewrite of the Windows client
- support for internationalization in both the webinterface and windows client
- storage abstraction layer: store your data in other databases (or even
  in your reiserfs)
- a modular redesign and rewrite which fully utilizes Perl's OOP facilities
  (sorry, I don't plan to go C++ or Java, I'm just too productive in Perl *g*)
- replacing smbclient calls with Alain Barbet's SmbClientParser module
  as soon as libsmb hits the streets

If there's something you think is missing in FemFind don't hesitate telling me.


LICENSE & COPYRIGHT

HTML Design Copyright (C) 2000 Ralf Prescher
All the rest Copyright (C) 1999, 2000 Martin Richtarsky

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.


FEEDBACK

If you use FemFind, please let me know (email below).
Feedback, be it negative or positive, is always appreciated, certainly
it will motivate me to improve FemFind.

Mail me if anything in this README is unclear to you or you think something's
missing.

Email: femfind@codefactory.de


LINKS
  
[0] http://www.samba.org/

[1] http://www.fem.tu-ilmenau.de/

[2] http://www.mysql.org/

[3] http://www.perl.com/CPAN/authors/id/JWIED/Msql-Mysql-modules-1.2214.tar.gz
    Bundle::DBD::mysql
    
[4] http://www.perl.com/CPAN/authors/id/TIMB/DBI-1.14.tar.gz
    Bundle::DBI 

[5] http://www.perl.com/CPAN/authors/id/AKSTE/Data-ShowTable-3.3.tar.gz
	
[6] http://www.perl.com/CPAN/authors/id/GBARR/Bundle-libnet-1.00.tar.gz     
    Bundle::libnet
	
[7] http://www.perl.com/CPAN/authors/id/DEWEG/Time-HiRes-01.20.tar.gz

[8] http://www.freshmeat.net/

[9] http://femfind.codefactory.de/

Thanks to SourceForge for hosting the project page.
SourceForge Logo