Dave to cover introduction bearing in mind the outline below, up to you how long Dave.
Describe the three format identification tools
Look at how the tools go about their job, how they recognise formats, briefly cover sig files.
Describe the chararcterisation tools
Scenarios to include some of (all if time):
Aim to create a format profile using the three different format identification tools (a tool for each group). I'll cover some command line tips and give out a cheat sheet. For the same group of files find out how many formats and how many files of each format are in the set.
Bonus exercise, try to do the same thing with characterisation data if time. This will demonstrate the increase in both quantity and variety of data, plus the more delicate nature of characterisation.
The messages to take away are that performing characterisation at scale takes time, and will probably have to be repeated, as both format identification and chararcterisation tools improve.
These tools are ideally combined in performant workflows, where components can be upgraded. For large collections these workflows may benefit from parallelisation.
Taming characterisation data is not easy, and if you use a variety of tools you're going to have a lot of it