Difference between revisions of "Forensics, Undelete, and Data Recovery"

From Noah.org
Jump to navigationJump to search
(48 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
[[Category:Engineering]]
 
[[Category:Engineering]]
There is no undelete in Linux. You're screwed.
 
  
Well... OK, maybe not... If possible, cut power to the machine or drive. If the drive is on a battery-backed cache RAID controller you might also want to remove the drive from the controller and bring up the drive in a different machine. That might be going overboard, but it all depends on how much trouble you want to go through to save your data. You can also try unmounting the drive, but that will ensure the disk is synced which could be even more
+
== undelete a file in Linux ==
  
Unmount the drive. Don't let anything else get written to the drive. It could potentially destroy the deleted files. Files are not actually erased. Their space it marked as unallocated until a new file comes along and overwrites the space.
+
You may have been told, "There is no undelete in Linux -- you're screwed!" You were told wrong.
  
 +
Do not cleanly unmount the drive. You want to cut access to the drive before the OS or RAID controller has a chance to make the latest changes persistent. If you unmounted the drive then that will actually guarantee the caches are synced to disk which could cause even more data loss. If possible, cut power to the machine or drive. Don't let anything else get written to the drive. New data could potentially overwrite the deleted files. If the drive is on a battery-backed cache RAID controller then you also want to unplug the drive from the controller. This might sound like going overboard, but it depends on how desperate you are. The idea is that you want to prevent as many additional changes to the disk. Some of the changes might still only be in the buffers or cache for the drive. The reality is that these actions probably won't make much difference to the recovery process one way or the other, but if the data is important enough to try to recover then you might as well take these actions.
  
== scalpel ==
+
On most filesystem deleted files are not actually erased. The space is marked as available and the file actually remains on the disk until the drive overwrites the space with a new file. This could happen right away or it could take a long time before the file is actually lost.
  
Scalpel is based on an early version of `foremost`. Supposedly foremost is a little better at finding files, but scalpel will handle files larger than 2GB. Couldn't hurt to try both.
+
Most data recovery tools fall into the category of "data carving". These tools try to scan the disk as a stream to identify bit patterns associated with common filetypes such as JPG, Word files, mp3, and video files. These tools sometimes recover corrupt files. The also can rarely recover filenames or other metadata.
  
== foremost ==
+
Some data recover tools make more use of knowledge of the underlying filesystem. For example, there are special tools only for ext2 or NTFS. But a tool that helps recover deleted files from an ext2 filesystem will not work on an ext3 filesystem even through the underlying format is the same.
 +
 
 +
== Install packages on Ubuntu or Debian ==
 +
 
 +
See [[Ubuntu_Fresh_Install#Forensics|Install Forensics]] for a complete list of packages to install.
 +
 
 +
== data extraction ==
 +
 
 +
To recover data you will need to extract it from the source drive to work on it in a separate work drive.
 +
 
 +
=== dd ===
 +
 
 +
The venerable old `[[Dd_-_Destroyer_of_Disks|dd]]` command is the most common way to create a raw dump of a device. The following does a simple dump of the entire drive, sdb. This means all partitions will be copied. The /dev/sdb device should be unmounted. It is possible that this will work while mounted, but you might also end up with a disk image in an inconsistent state.
 +
 
 +
<pre>
 +
dd bs=4096 conv=noerror,sync if=/dev/sdb | gzip -c > drive.img.gz
 +
</pre>
 +
 
 +
=== ddrescue ===
 +
 
 +
`ddrescue` operates similar to `dd` except that it knows how to gracefully handle bad sectors on a drive whereas `dd` will either lock up or exit and refuse to read past a bad part of a drive.
 +
 
 +
There is another tool called `dd_rescue` by Kurt Garloff, which is different and not what you want. Unfortunately the Ubuntu package repositories have confused the issue even more with their package names. The 'ddrescue' package does not install `ddrescue`; it installs `dd_rescue`.
 +
<pre>
 +
# This will install the GNU ddrescue in /sbin/ddrescue
 +
aptitude install gddrescue
 +
# This will install Kurt Garloff's dd_rescue in /bin/dd_rescue
 +
aptitude install ddrescue
 +
</pre>
 +
 
 +
=== dd_rescue ===
 +
 
 +
This describes Kurt Garloff's [http://www.garloff.de/kurt/linux/ddrescue/ dd_rescue]. There is another tool called `ddrescue` from GNU which is different. Unfortunately the Ubuntu package repositories have confused the issue even more with their package names. The 'ddrescue' package does not install `ddrescue`; it installs `dd_rescue`. Although, Kurt Garloff's own URL to his dd_rescue tool is written as 'ddrescue'! What's the matter with these people? Have they all gone insane?
 +
 
 +
<pre>
 +
# This will install the GNU ddrescue in /sbin/ddrescue
 +
aptitude install gddrescue
 +
# This will install Kurt Garloff's dd_rescue in /bin/dd_rescue
 +
aptitude install ddrescue
 +
</pre>
 +
 
 +
=== dls in Sleuthkit ===
 +
 
 +
This toolkit has a variety of tools for extracting raw data from a drive. The most valuable one to me has been `dls` which is similar to `dd` except that it will extract raw data only from the unused parts of a drive where deleted files will be found. This saves time in later steps because you don't have to search through the entire drive for deleted files.
 +
 
 +
[http://www.sleuthkit.org/ The Sleuthkit]
 +
 
 +
=== testdisk -- partition recovery ===
 +
 
 +
[http://www.cgsecurity.org/wiki/TestDisk TestDisk] checks and recovers lost partitions from a device or image file.
 +
 
 +
Sadly, PhotoRec uses DOS-style command-line options, which means that the options start with a slash instead of a dash ('''/''' instead of '''-'''), but you usually start '''testdisk''' without ant options.
 +
 
 +
Note that the '/list' option lists only what the drive or image file currently shows for a partition table. This does not do a search. If the drive or image is corrupt in any way then this partition table list might be meaningless. Run '''testdisk''' without any options to start the interactive partition recovery tool that will actually search for partitions.
 +
 
 +
== myrescue ==
 +
 
 +
I have not tested the myrescue utility, but I thought I should mention it since it is mentioned by others on the Internet.
 +
 
 +
== data carving tools ==
 +
 
 +
These tools find data in streams based on patterns. They don't need filenames or valid inodes.
 +
 
 +
=== quickly visually scan thousands of photos ===
 +
 
 +
If you are searching for lost photos then you will often end up with a gigantic pile of images from source you are carving from. You can use '''mplayer''' or '''gstreamer''' to play the images back as a video. This can be useful if you are looking to quickly visually scan the images to find a group fitting some theme you are looking for.
 +
;mplayer:
 +
<pre>
 +
mplayer "mf://*.jpg"
 +
</pre>
 +
;gstreamer:
 +
<pre>
 +
gst-launch multifilesrc location="image%05d.jpg" ! jpegdec max-errors=-1 ! videoscale ! ffmpegcolorspace ! autovideosink
 +
</pre>
 +
 
 +
=== PhotoRec (testdisk) ===
 +
 
 +
'''This is the first tool I go to for recovering photos, videos, and other files.''' It is the easiest and fastest and works with the least amount of effort. It may not always find everything, but it's the best first pass.
 +
 
 +
Despite the name '''PhotoRec''' actually recovers many types of files besides photos. It was originally a special purpose tool for recovering photos deleted from flash memory cards, but it has grown into a general purpose tool that can identify many types of files. If you are looking to recover just JPEG images then '''recoverjpg''' with the option '''-b 1''' finds more files, but '''PhotoRec''' is probably the best tool I've used for recovering any type of file.
 +
 
 +
Install the '''testdisk''' package to get the '''photorec''' utility. TestDisk is a tool to recover or repair filesystems and undelete files.
 +
 
 +
PhotoRec uses DOS-style command-line options, which means that the options start with a slash instead of a dash ('''/''' instead of '''-'''). But you actually don't use many command-line options.  Most of the options are actually set interactively through a text GUI. To start PhotoRec simply point it at a device or a raw disk image (as made with dd or the like).
 +
<pre>
 +
photorec /dev/sdb1
 +
# or
 +
photorec diskimage.img
 +
</pre>
 +
When PhotoRec starts up select '''[File Opt]''' and then select the file types (signatures) that you are looking for. Usually you want to press '''s''' to unselect everything, then go through and select only the files types you want.
 +
 
 +
[http://www.cgsecurity.org/wiki/PhotoRec PhotoRec]
 +
 
 +
=== recoverjpeg ===
 +
 
 +
The '''recoverjpg''' tool specializes in jpeg image files.
 +
 
 +
Most filesystems use 512 byte block sizes, but this is not always true. You may want to use a separate tool to determine the block size of the filesystem you want to recover from. If you don't want to worry about this and you don't mind long search times then use the '''-b 1''' option (sets block size to 1 byte).
 +
 
 +
<pre>
 +
recoverjpeg -b 1 /dev/sdb1
 +
</pre>
 +
[http://www.rfc1149.net/devel/recoverjpeg recoverjpeg]
 +
 
 +
=== scalpel ===
 +
 
 +
Scalpel is based on an early version of `foremost`. Supposedly `foremost` is a little better at finding files, but `scalpel` is faster and will handle files larger than 2GB.
 +
 
 +
[http://www.digitalforensicssolutions.com/Scalpel/ scalpel]
 +
 
 +
=== foremost ===
  
 
[http://foremost.sourceforge.net/ Foremost]
 
[http://foremost.sourceforge.net/ Foremost]
  
== PhotoRec ==
+
=== magicrescue ===
 +
 
 +
[http://www.itu.dk/people/jobr/magicrescue/ Magic Rescue]
 +
Local cached copy:
 +
[[file:magicrescue-1.1.9.tar.gz]]
 +
 
 +
== ext2 file recovery tools ==
 +
 
 +
These tools are becoming less relevant since ext2 is old and not found as often as ext3 and ext4. It is more difficult to undelete files in ext3 and ext4. See the section on [[#data carving tools]] for recovering data for newer these filesystems.
  
This tool is probably the best I've tried for recovering files. It was originally a special purpose tool for recovering photos deleted from memory cards, but it has grown into a general purpose tool.
+
=== e2undel ===
  
[http://www.cgsecurity.org/wiki/PhotoRec PhotoRec]
+
This tool specializes only in ext2 filesystems.
 +
 
 +
[http://e2undel.sourceforge.net/ e2undel]
 +
 
 +
=== recover ===
 +
 
 +
This tool specializes only in ext2 filesystems.
 +
 
 +
[http://recover.sourceforge.net/linux/recover/ recover]
 +
 
 +
== Data file magic bytes ==
 +
 
 +
;file: offset:magic_bytes
 +
-----
 +
;png: 0:89504e47
 +
;jpg: 0:ffd8ffe0
 +
;mpeg: 0:000001b3
 +
;mpeg: 0:000001ba
 +
 
 +
=== see also ===
 +
* Man pages for 'magic(5)' and 'file(1)'.
 +
* Magic file databases on Linux stored in /usr/share/file or /usr/share/misc.
 +
* [http://www.magicdb.org/magic.db magic.db] database from MagicDB.org.
 +
* [http://en.wikipedia.org/wiki/File_(Unix) file] The UNIX `file` command.
 +
 
 +
== Example Recovery ==
  
== Install some tools ==
+
=== Install some tools ===
  
 
Install `foremost` and `dls`
 
Install `foremost` and `dls`
  
 
<pre>
 
<pre>
 +
aptitude install sleuthkit  # This is a collection of forensic analysis tools that includes `dls`.
 
aptitude install foremost  
 
aptitude install foremost  
aptitude install sleuthkit  # This is a collection of forensic analysis tools that includes `dls`.
 
 
</pre>
 
</pre>
  
== dls ==
+
=== dls ===
  
 
You can use `dls` to dump the raw binary data of the free space on a partition. You can pipe that directly into `foremost` which intelligently tries to reconstruct files in raw binary streams.
 
You can use `dls` to dump the raw binary data of the free space on a partition. You can pipe that directly into `foremost` which intelligently tries to reconstruct files in raw binary streams.
Line 37: Line 180:
  
 
<pre>
 
<pre>
dls /dev/sdb1 > /home/user/recovery/rawdata.dd
+
dls /dev/sdb1 > ~/recovery/rawdata.dd
 
</pre>
 
</pre>
  
== foremost ==
+
=== foremost ===
  
 
`foremost` recovers files from a disk image. You use it with `dls` like this:
 
`foremost` recovers files from a disk image. You use it with `dls` like this:
Line 59: Line 202:
 
dls /dev/sdb1 | foremost -tjpg
 
dls /dev/sdb1 | foremost -tjpg
 
</pre>
 
</pre>
 +
 +
== database recovery ==
 +
 +
=== MySQL ===
 +
 +
For dropped MyISAM tables you can try the undelete and data carving tools to find the table files.
 +
 +
InnoDB tables are stored in a single file, so this won't work.
 +
For InnoDB tables you can try using data carving tools on the ibdata file. This file is often found in /var/lib/mysql/ibdata1 and is accompanied by /var/lib/mysql/ib_logfile0 and
 +
/var/lib/mysql/ib_logfile1 which store transaction info. At this level data carving is probably not going to be much more sophisticated than opening ibdata1 in an editor and searching for strings. I actually recovered large portions of a wiki by running ibdata1 through the `strings` command and then sifting through the mess.
 +
 +
You can also try [http://code.google.com/p/innodb-tools/ innodb-tools] if you are willing to put in a lot of work.
 +
 +
== Memory dumping ==
 +
 +
You can cat the device file /dev/mem to get a copy of memory. You should pipe this to `netcat` or `ssh` or something that will copy the data over the network.
 +
 +
The `memdump` command can be used to output a copy of RAM. It skips empty regions, so this can help save space. Output goes to stdout, so you should pipe it into netcat or ssh to copy the data off the machine.
 +
 +
== Other Resources ==
 +
 +
http://nij.ncjrs.gov/publications/Pub_Search.asp?category=99&PSID=31 NIJ ublications related to Digital Forensics
 +
 +
http://nij.ncjrs.gov/publications/Pub_Search.asp?category=99&PSID=55 NIJ Publications Related to Computer Forensic Tool Testing
 +
 +
http://nij.ncjrs.gov/publications/Pub_Search.asp?category=99&PSID=30 NIJ publications on Electronic Crime
 +
 +
http://www.cftt.nist.gov/ Computer Forensic Testing Tools
 +
 +
http://www.nsrl.nist.gov/ National Software Reference Library
 +
 +
[http://www.forensicswiki.org/wiki/Tools:Data_Recovery Forensics Wiki] This is a very handy site and good for finding other tools.
 +
 +
[http://www.forensicswiki.org/index.php?title=Websites Forensics Wiki's Resources] A link to some more links.
 +
 +
[http://www.informationweek.com/news/storage/disaster_recovery/showArticle.jhtml?articleID=208403254&pgno=1 Disaster Recover] This is an InformationWeek article with good info.

Revision as of 10:14, 3 October 2014


undelete a file in Linux

You may have been told, "There is no undelete in Linux -- you're screwed!" You were told wrong.

Do not cleanly unmount the drive. You want to cut access to the drive before the OS or RAID controller has a chance to make the latest changes persistent. If you unmounted the drive then that will actually guarantee the caches are synced to disk which could cause even more data loss. If possible, cut power to the machine or drive. Don't let anything else get written to the drive. New data could potentially overwrite the deleted files. If the drive is on a battery-backed cache RAID controller then you also want to unplug the drive from the controller. This might sound like going overboard, but it depends on how desperate you are. The idea is that you want to prevent as many additional changes to the disk. Some of the changes might still only be in the buffers or cache for the drive. The reality is that these actions probably won't make much difference to the recovery process one way or the other, but if the data is important enough to try to recover then you might as well take these actions.

On most filesystem deleted files are not actually erased. The space is marked as available and the file actually remains on the disk until the drive overwrites the space with a new file. This could happen right away or it could take a long time before the file is actually lost.

Most data recovery tools fall into the category of "data carving". These tools try to scan the disk as a stream to identify bit patterns associated with common filetypes such as JPG, Word files, mp3, and video files. These tools sometimes recover corrupt files. The also can rarely recover filenames or other metadata.

Some data recover tools make more use of knowledge of the underlying filesystem. For example, there are special tools only for ext2 or NTFS. But a tool that helps recover deleted files from an ext2 filesystem will not work on an ext3 filesystem even through the underlying format is the same.

Install packages on Ubuntu or Debian

See Install Forensics for a complete list of packages to install.

data extraction

To recover data you will need to extract it from the source drive to work on it in a separate work drive.

dd

The venerable old `dd` command is the most common way to create a raw dump of a device. The following does a simple dump of the entire drive, sdb. This means all partitions will be copied. The /dev/sdb device should be unmounted. It is possible that this will work while mounted, but you might also end up with a disk image in an inconsistent state.

dd bs=4096 conv=noerror,sync if=/dev/sdb | gzip -c > drive.img.gz

ddrescue

`ddrescue` operates similar to `dd` except that it knows how to gracefully handle bad sectors on a drive whereas `dd` will either lock up or exit and refuse to read past a bad part of a drive.

There is another tool called `dd_rescue` by Kurt Garloff, which is different and not what you want. Unfortunately the Ubuntu package repositories have confused the issue even more with their package names. The 'ddrescue' package does not install `ddrescue`; it installs `dd_rescue`.

# This will install the GNU ddrescue in /sbin/ddrescue
aptitude install gddrescue
# This will install Kurt Garloff's dd_rescue in /bin/dd_rescue
aptitude install ddrescue

dd_rescue

This describes Kurt Garloff's dd_rescue. There is another tool called `ddrescue` from GNU which is different. Unfortunately the Ubuntu package repositories have confused the issue even more with their package names. The 'ddrescue' package does not install `ddrescue`; it installs `dd_rescue`. Although, Kurt Garloff's own URL to his dd_rescue tool is written as 'ddrescue'! What's the matter with these people? Have they all gone insane?

# This will install the GNU ddrescue in /sbin/ddrescue
aptitude install gddrescue
# This will install Kurt Garloff's dd_rescue in /bin/dd_rescue
aptitude install ddrescue

dls in Sleuthkit

This toolkit has a variety of tools for extracting raw data from a drive. The most valuable one to me has been `dls` which is similar to `dd` except that it will extract raw data only from the unused parts of a drive where deleted files will be found. This saves time in later steps because you don't have to search through the entire drive for deleted files.

The Sleuthkit

testdisk -- partition recovery

TestDisk checks and recovers lost partitions from a device or image file.

Sadly, PhotoRec uses DOS-style command-line options, which means that the options start with a slash instead of a dash (/ instead of -), but you usually start testdisk without ant options.

Note that the '/list' option lists only what the drive or image file currently shows for a partition table. This does not do a search. If the drive or image is corrupt in any way then this partition table list might be meaningless. Run testdisk without any options to start the interactive partition recovery tool that will actually search for partitions.

myrescue

I have not tested the myrescue utility, but I thought I should mention it since it is mentioned by others on the Internet.

data carving tools

These tools find data in streams based on patterns. They don't need filenames or valid inodes.

quickly visually scan thousands of photos

If you are searching for lost photos then you will often end up with a gigantic pile of images from source you are carving from. You can use mplayer or gstreamer to play the images back as a video. This can be useful if you are looking to quickly visually scan the images to find a group fitting some theme you are looking for.

mplayer
mplayer "mf://*.jpg"
gstreamer
gst-launch multifilesrc location="image%05d.jpg" ! jpegdec max-errors=-1 ! videoscale ! ffmpegcolorspace ! autovideosink

PhotoRec (testdisk)

This is the first tool I go to for recovering photos, videos, and other files. It is the easiest and fastest and works with the least amount of effort. It may not always find everything, but it's the best first pass.

Despite the name PhotoRec actually recovers many types of files besides photos. It was originally a special purpose tool for recovering photos deleted from flash memory cards, but it has grown into a general purpose tool that can identify many types of files. If you are looking to recover just JPEG images then recoverjpg with the option -b 1 finds more files, but PhotoRec is probably the best tool I've used for recovering any type of file.

Install the testdisk package to get the photorec utility. TestDisk is a tool to recover or repair filesystems and undelete files.

PhotoRec uses DOS-style command-line options, which means that the options start with a slash instead of a dash (/ instead of -). But you actually don't use many command-line options. Most of the options are actually set interactively through a text GUI. To start PhotoRec simply point it at a device or a raw disk image (as made with dd or the like).

photorec /dev/sdb1
# or
photorec diskimage.img

When PhotoRec starts up select [File Opt] and then select the file types (signatures) that you are looking for. Usually you want to press s to unselect everything, then go through and select only the files types you want.

PhotoRec

recoverjpeg

The recoverjpg tool specializes in jpeg image files.

Most filesystems use 512 byte block sizes, but this is not always true. You may want to use a separate tool to determine the block size of the filesystem you want to recover from. If you don't want to worry about this and you don't mind long search times then use the -b 1 option (sets block size to 1 byte).

recoverjpeg -b 1 /dev/sdb1

recoverjpeg

scalpel

Scalpel is based on an early version of `foremost`. Supposedly `foremost` is a little better at finding files, but `scalpel` is faster and will handle files larger than 2GB.

scalpel

foremost

Foremost

magicrescue

Magic Rescue Local cached copy: File:magicrescue-1.1.9.tar.gz

ext2 file recovery tools

These tools are becoming less relevant since ext2 is old and not found as often as ext3 and ext4. It is more difficult to undelete files in ext3 and ext4. See the section on #data carving tools for recovering data for newer these filesystems.

e2undel

This tool specializes only in ext2 filesystems.

e2undel

recover

This tool specializes only in ext2 filesystems.

recover

Data file magic bytes

file
offset:magic_bytes

png
0:89504e47
jpg
0:ffd8ffe0
mpeg
0:000001b3
mpeg
0:000001ba

see also

  • Man pages for 'magic(5)' and 'file(1)'.
  • Magic file databases on Linux stored in /usr/share/file or /usr/share/misc.
  • magic.db database from MagicDB.org.
  • file The UNIX `file` command.

Example Recovery

Install some tools

Install `foremost` and `dls`

aptitude install sleuthkit  # This is a collection of forensic analysis tools that includes `dls`.
aptitude install foremost 

dls

You can use `dls` to dump the raw binary data of the free space on a partition. You can pipe that directly into `foremost` which intelligently tries to reconstruct files in raw binary streams.

`dls` works sort of like `dd` except that it dumps unallocated blocks. You can use `dd` instead of `dls`, but then you would be grabbing all the raw data from a disk including data from files that are not deleted. This example assumes the drive partition you want to recover from is /dev/sdb1.

dls /dev/sdb1 > ~/recovery/rawdata.dd

foremost

`foremost` recovers files from a disk image. You use it with `dls` like this:

dls /dev/sdb1 | foremost 

If you want to reduce the amount of useless files that are recovered you can specify the file type you are looking for. For example, to recover any Microsoft Office documents you might do something like the following.

dls /dev/sdb1 | foremost -tole

To recover JPEG pictures do this:

dls /dev/sdb1 | foremost -tjpg

database recovery

MySQL

For dropped MyISAM tables you can try the undelete and data carving tools to find the table files.

InnoDB tables are stored in a single file, so this won't work. For InnoDB tables you can try using data carving tools on the ibdata file. This file is often found in /var/lib/mysql/ibdata1 and is accompanied by /var/lib/mysql/ib_logfile0 and /var/lib/mysql/ib_logfile1 which store transaction info. At this level data carving is probably not going to be much more sophisticated than opening ibdata1 in an editor and searching for strings. I actually recovered large portions of a wiki by running ibdata1 through the `strings` command and then sifting through the mess.

You can also try innodb-tools if you are willing to put in a lot of work.

Memory dumping

You can cat the device file /dev/mem to get a copy of memory. You should pipe this to `netcat` or `ssh` or something that will copy the data over the network.

The `memdump` command can be used to output a copy of RAM. It skips empty regions, so this can help save space. Output goes to stdout, so you should pipe it into netcat or ssh to copy the data off the machine.

Other Resources

http://nij.ncjrs.gov/publications/Pub_Search.asp?category=99&PSID=31 NIJ ublications related to Digital Forensics

http://nij.ncjrs.gov/publications/Pub_Search.asp?category=99&PSID=55 NIJ Publications Related to Computer Forensic Tool Testing

http://nij.ncjrs.gov/publications/Pub_Search.asp?category=99&PSID=30 NIJ publications on Electronic Crime

http://www.cftt.nist.gov/ Computer Forensic Testing Tools

http://www.nsrl.nist.gov/ National Software Reference Library

Forensics Wiki This is a very handy site and good for finding other tools.

Forensics Wiki's Resources A link to some more links.

Disaster Recover This is an InformationWeek article with good info.