What design pattern(s) to use to build a extractor for a flat file based on a specification?
I have a flat file where each row has a specification. Say the first row specification level 100 a header row and all 200s are data rows and level 199 is a file summary.
I have created a data table that contains the specifications, and I am able to get the specifications into an object List<RowSpecs> no problem.
I open the file and I read each row and based on the first three characters I pull the specifications for that row. I iterate through the specific specs for that row and assign the data to that field in the appropriate element. Some times there is some conversion of the data, or a lookup that needs to occur on a piece of data, and sometimes there are some special rules like formatting etc.
I have coded it to make it work, but it seems very clunky and riddled with possible errors. Does anyone have a good suggestion for how to apply a design pattern for this type of situation? Multiple patterns?
The other trick is that I use the field name to match the object field name to reflect back and set the field formatting and the field data, so I have to make sure those items match or decorate them somehow or build a mapping of some sort.
Anyway, any ideas would be appreciated! Also, I need to go both ways, in other words, I need to use the rowspecs to build a file or to parse the file.
Tony Blogumas, Jun 16, 2011
I appreciate the responses. Its been a while and I developed my own solution. I created a few tables in a database to store the data for the specifications of the file. This allows me to define any specification including versions of the specifications. The detail rows contain the following fields, Id, RecordTypeId, FieldName, StartPosition, FieldLength, Description, IsFieldPadded, PaddingCharacter, FieldFormattingString, PaddingDirection, HasConstantValue, ConstantValue, DataTypeId.
Using this data I can create a collection of detail specifications that defines a row, and either parse it or build it against that specification.
It gets trickier when you have to know what the previous line was in order to apply the correct specification to the current line.
But in psuedo code her is a basic idea:
Read file into List<string>
-- determine the specifications required to decompose this file
Get Specifications for file
for each string in List
get the specifications for this row -- in my case the first 3 characters of each row are the row type identifier
create Collection to store decomposed objects (ie whatever each row holds the data for)
for each specification in row specifications
create new decomposed object item
Extract the data for specification
(here is where I ran into coding trouble -- I had a huge switch based on RecordType and a method to extract the data based on record types. Very ugly but functional. I also had to add code in this method to check for field types like Time or Date or both also for handling boolean conversion of data and decimal types)
Once you have the data for the field then you could easily set the value of the objects field
In my case if I could not convert some piece of data I would throw an exception and then I could figure out why and handle that, then just rerun the process.
If anyone is interested, feel free to ping me and you can send email to tony at blognstuff dot com
Always interested in someone looking at my solution and showing me a better way!
Tony Blogumas, Mar 15, 2012
Hi for your scenario , I could think of Factory pattern .
All the other stuff could be placed on a service , which has public method (of course you got to add Webmethod attribs) to be triggered by the clients.
The same factory pattern could be extended to provide different file schemas , based on some logic (filename?) in case if you wish to have different set of file formats for different set of files.
Also, you may also have a separate schema for delimited files and refer them too for Header/Trailer.
You may add that to this code as you would prefer.
Please share your thoughts , if this suggestion is ok.
Tarriq Ferrose Khan
Tarriq Ferrose Khan, Aug 15, 2011
I am no expert, but I faced somewhat similar issue when working with excel file.
I used Strategy pattern for extracting information from files.
Secondly, I used Flyweight pattern for capturing data before writing or processing.
I used Collection Objects to capture data (List and Dictionary)
List<object> -> this can be another dictionary for quick access (if required) Dictionary<Id, Object>
Strategy Pattern -> follows principle of Encaplusation - Seperate what changes
Hope this will be helpful.
I appologise for poor english.
Asmprogs Asmprogs, Jul 09, 2011
Tony, could you please post your code for reference..
Venkatesh D, Jul 05, 2011