Ticket #123 (assigned enhancement)
Implement filters
| Reported by: | vreixo | Owned by: | vreixo |
|---|---|---|---|
| Priority: | major | Milestone: | libisofs-0.6.4 |
| Component: | libisofs | Version: | libisofs-0.6.3 |
| Keywords: | Cc: |
Description
The idea is to implement the concept of a Filter, i.e., the possibility "filter" file content before writing them to image. This filtering process can consist of:
- cut-off some parts of the file
- Transform file contents: encoding, compression, encryption...
The idea is that a FilterStream or similar takes care of applying the Filter. The discussion is whether a filter implementation means creating its own Stream (i.e. GzStream, EncryptStream...), or we can just provide a generic FilterStream, that takes a reference to a IsoFilter interface, that is what each filter implements. In this second case, the idea is that the Filter can be shared among several nodes:
IsoFilter *filter = iso_filter_gz_create(...); iso_file_add_filter(file1, filter); iso_file_add_filter(file2, filter); ...
i.e., the filter is a place whether we can store configuration options for the Filter (encryption algorithm, key, ....). In this case, the FilterStream read function should be something like
int filter_stream_read(Stream s, buffer,...)
{
FilterStreamData *data = s->data;
Stream *source = data->source;
Filter *f = data->filter;
source->read(tmpbuffer)
filter->filter(tempbuffer into buffer)
}
However, it seems the filter->filter() function is not trivial to implement, as different filters may need different data chunks.
Another solution is to just implement each filter as an IsoStream implementation. In this case, it is each filter who implements its own stream->read() function. This needs, however, an ugly API, as the user needs to create each "FilterStream" implementation. And, at the end, we need the Filter idea (ie. a shared context) anyway.
i.e, we need something like:
IsoFilter *filter1 = iso_filter_gz_create(...); iso_file_add_filter(file1, filter1); IsoFilter *filter2 = iso_filter_gz_create(...); iso_file_add_filter(file2, filter);
and this if we extend the IsoStream interface to a FilterStream whether we define the original_stream field. Otherwise we need something like:
IsoStream *orig_stream = iso_file_get_stream(file1); IsoFilter *filter1 = iso_filter_gz_create(orig_stream, ...); iso_file_add_filter(file1, filter1);
or maybe directly, of course
iso_file_add_gz_filter(file1, ....);
but we still have the problem of the impossibility to use the shared context.
A final alternative is to define a generic FilterContext:
struct FilterContext {
void *data; //filter specific shared data
IsoStream (*get_filter)(IsoFile*);
}
whether the get_filter is a factory method to create the concrete Filter implementation for each file. The API usage will be, then:
FilterContext *filter = iso_filter_gz_create(...options...);
//the filter->get_filter gets filled with a ptr to a filter-dependent function
iso_file_add_filter(file, filter);
// it calls the filter->get_filter() function to get the IsoStream that is filter dependent.
// the user does not need to know the concrete IsoStream implementation for each filter.
Some considerations:
- Given the filter can change file size, we would need to apply the filter twice: when image structures are computed, are when the file is actually written. With complex filters, this can be a problem. Thus, all filters must have a property "on_the_fly", that decides whether the filter is applying each time the file must be read, or whether it should be applied once and stored in a temporal folder. The user could decide whether to priorize temporal hard disk space or computation time based on that flag. It is legal to ignore that flag (for example, in the cut-off filter it make no sense to store a temporal file).
- I wonder whether we shoudl provide some kind of plugin system with filters. Ideas?
