What design pattern(s) to use to build a extractor for a flat file based on a specification?

 
50   50
Jun 16, 2011
 
I have a flat file where each row has a specification.  Say the first row specification level 100 a header row and all 200s are data rows and level 199 is a file summary.
I have created a data table that contains the specifications, and I am able to get the specifications into an object List<RowSpecs> no problem.

I open the file and I read each row and based on the first three characters I pull the specifications for that row.  I iterate through the specific specs for that row and assign the data to that field in the appropriate element.  Some times there is some conversion of the data, or a lookup that needs to occur on a piece of data, and sometimes there are some special rules like formatting etc. 

I have coded it to make it work, but it seems very clunky and riddled with possible errors.  Does anyone have a good suggestion for how to apply a design pattern for this type of situation?  Multiple patterns?

The other trick is that I use the field name to match the object field name to reflect back and set the field formatting and the field data, so I have to make sure those items match or decorate them somehow or build a mapping of some sort. 

Anyway, any ideas would be appreciated!  Also, I need to go both ways, in other words, I need to use the rowspecs to build a file or to parse the file.

thanks!



60   95.3
Jul 05, 2011
Tony, could you please post your code for reference..

59   94.7
Jul 09, 2011
I am no expert, but I faced somewhat similar issue when working with excel file.

I used Strategy pattern for extracting information from files.
Secondly, I used Flyweight pattern for capturing data before writing or processing.

I used Collection Objects to capture data (List and Dictionary)

List<object> -> this can be another dictionary for quick access (if required) Dictionary<Id, Object>

Object
{
 Dictionary<Property, Value>
}

Strategy Pattern -> follows principle of Encaplusation - Seperate what changes

Hope this will be helpful.

I appologise for poor english.


80   96.3
Aug 15, 2011

Hi for your scenario , I could think of Factory pattern .
File formats could normally vary like Delimited and Fixed Width .

Processing a Delimited File is fairly straight forward and processing Fixed Width needs extra effort.

For Fixed Width I would recommend to have a xml based File Schema, which could be loaded and parsed against the data file.

In case if the File format changes for Field Names, length, lookup formatting etc., all that you need to do is to change the XML and not the code.

This way your app becomes more resilient to change requirements and will add extensibility as well.

Please refer to the code snippet I have put in for your reference.

In Short
1) MyFile is the File object (product) being handled.
2) MyFileProcessor will return the FileProcessor to IFileProcessor (Delimeted or a FixedWidth) based on the File Format you choose (on the UI?)
3) IFileProcessor.ProcessFile() will process and return the data (as a DataTable - assumed for simplicity).

This way you may choose to have the code within the console application to stay on your app side code (ASP.NET or Windows forms etc.,)

All the other stuff could be placed on a service , which has public method (of course you got to add Webmethod attribs) to be triggered by the clients.

Please Note:

The same factory pattern could be extended to provide different file schemas , based on some logic (filename?) in case if you wish to have different set of file formats for different set of files.

Also, you may also have a separate schema for delimited files and refer them too for Header/Trailer.

You may add that to this code as you would prefer.

Please share your thoughts , if this suggestion is ok.

Thanks,

Tarriq Ferrose Khan


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;


namespace ConsoleApplication1
 {
     public enum CustomFileType {FixedWidth,Delimited}

     class Program
     {
         static void Main(string[] args)
         {
             //File object being handled
             MyFile myFile = new MyFile("Test.txt",CustomFileType.FixedWidth);
             
             //File Factory returning the exact File processor based on File Type 
             MyFileProcessorFactory processorFactory = new MyFileProcessorFactory(myFile);
             IFileProcessor iFileProcessor = processorFactory.GetFileProcessor();

             //Process File will call the ProcessFile method of the FileProcessor Product being returned by factory
             
             iFileProcessor.ProcessFile(myFile);
                 //myFile.ProcessedFileData will need to have the processed collection.
             Console.ReadLine();
         }
     }

     public interface IFileProcessor
     {
         MyFile ProcessFile(MyFile myFile);
     }

     public class MyFile
     {
         string _FileName = string.Empty;
         CustomFileType _FileType;
         DataTable _FileData = new DataTable();

         public string FileName
         {
             get {return _FileName; }            
         }
         public CustomFileType FileType
         {
             get { return _FileType; }
         }

         public DataTable ProcessedFileData
         {
             get { return _FileData; }
             set {_FileData=value; }
         }

         public MyFile(string FileName, CustomFileType fileType)
         {
             _FileName = FileName;
             _FileType = fileType;
         }
     }

     public class MyFileProcessorFactory
     {
         public IFileProcessor processor;
         public MyFile myFile;

         public MyFileProcessorFactory(MyFile _myfile)
         {
             myFile = _myfile;
         }

         public IFileProcessor GetFileProcessor()
         {

             switch (myFile.FileType)
             {
                 case CustomFileType.Delimited:
                     processor= new MyDelimitedFileProcessor();
                     break;
                 case CustomFileType.FixedWidth:
                     processor= new MyFixedWidthFileProcessor();
                     break;
             }
             return processor;
         }
     }

     public class MyFixedWidthFileProcessor : IFileProcessor
     {

         public MyFile ProcessFile(MyFile myfile)
         {
             
             if (LoadSchema())
             {
                 if (CheckFileIntegrity())
                 {
                     myfile.ProcessedFileData = CreateDataTable();
                 }
             }
             Console.WriteLine("Fixed");
             return myfile;
         }
         private bool LoadSchema()
         { 
             //Load the XML Schema.
             //if not loaded properly returns false.
             //else
             return true;
         }

         private bool CheckFileIntegrity()
         {
             //Check if the Header and Trailer format on the data file matches with the defined schema.
             //Checks if the actual number of rows on the file matches with Header /trailer count
             //returns false if any of the above it false
             //else
             return true;
         }

         private DataTable CreateDataTable()
         {
             //this will parse through the file against the loaded schema and return a Data Table
             return new DataTable();
         }
     }

     public class MyDelimitedFileProcessor : IFileProcessor
     {
         string StandardDelimiter = "|";
         //You may declare this as a property to have other delimiters - passed in from UI.
         public MyFile ProcessFile(MyFile myfile)
         {
             Console.WriteLine("Delimited");
             return myfile;
         }
         private DataTable CreateDataTable()
         {
             //this will parse through the file for the delimiter and return DataTable
             return new DataTable();
         }


     }



         
     
 }

 
Sample XML Schema - to be validated?

<?xml version="1.0" encoding="utf-8" ?>
<CustomFileSchema>
<HeaderSchema value="100;#########;mm/dd/yyyy"/>
<TrailerSchema value="199;#########;mm/dd/yyyy"/>
<DataSchema Tag="200">
<Field Name="Test1" Order="1" length="10" NeedsLookup="False" FieldFormatting="dd/mm/yyyy"></Field>
<Field Name="Test2" Order="2" length="8" NeedsLookup="true" ></Field>
<Field Name="Test3" Order="3" length="10" NeedsLookup="False"  FieldFormatting="$##,###,###.00"></Field>
</DataSchema>
</CustomFileSchema>

 

50   50
Mar 15, 2012
I appreciate the responses.  Its been a while and I developed my own solution.  I created a few tables in a database to store the data for the specifications of the file.  This allows me to define any specification including versions of the specifications.  The detail rows contain the following fields, Id, RecordTypeId, FieldName, StartPosition, FieldLength, Description, IsFieldPadded, PaddingCharacter, FieldFormattingString, PaddingDirection, HasConstantValue, ConstantValue, DataTypeId.

Using this data I can create a collection of detail specifications that defines a row, and either parse it or build it against that specification.

It gets trickier when you have to know what the previous line was in order to apply the correct specification to the current line.

But in psuedo code her is a basic idea:

Read file into List<string>
-- determine the specifications required to decompose this file
Get Specifications for file
for each string in List
        get the specifications for this row -- in my case the first 3 characters of each row are the row type identifier
        create Collection to store decomposed objects  (ie whatever each row holds the data for)
        for each specification in row specifications
                create new decomposed object item
                Extract the data for specification
                (here is where I ran into coding trouble -- I had a huge switch based on RecordType and a method to extract the data based on record types.  Very ugly but functional.  I also had to add code in this method to check for field types like Time or Date or both also for handling boolean conversion of data and decimal types)
                Once you have the data for the field then you could easily set the value of the objects field

In my case if I could not convert some piece of data I would throw an exception and then I could figure out why and handle that, then just rerun the process.

If anyone is interested, feel free to ping me and you can send email to tony at blognstuff dot com

Always interested in someone looking at my solution and showing me a better way!