Views: 31.9K
Replies: 4
Archived
|
What design pattern(s) to use to build a extractor for a flat file based on a specification?I have a flat file where each row has a specification. Say the first row specification level 100 a header row and all 200s are data rows and level 199 is a file summary.
I have created a data table that contains the specifications, and I am able to get the specifications into an object List<RowSpecs> no problem. I open the file and I read each row and based on the first three characters I pull the specifications for that row. I iterate through the specific specs for that row and assign the data to that field in the appropriate element. Some times there is some conversion of the data, or a lookup that needs to occur on a piece of data, and sometimes there are some special rules like formatting etc. I have coded it to make it work, but it seems very clunky and riddled with possible errors. Does anyone have a good suggestion for how to apply a design pattern for this type of situation? Multiple patterns? The other trick is that I use the field name to match the object field name to reflect back and set the field formatting and the field data, so I have to make sure those items match or decorate them somehow or build a mapping of some sort. Anyway, any ideas would be appreciated! Also, I need to go both ways, in other words, I need to use the rowspecs to build a file or to parse the file. thanks! Tony Blogumas, Jun 16, 2011
|
|
Reply 1I appreciate the responses. Its been a while and I developed my own solution. I created a few tables in a database to store the data for the specifications of the file. This allows me to define any specification including versions of the specifications. The detail rows contain the following fields, Id, RecordTypeId, FieldName, StartPosition, FieldLength, Description, IsFieldPadded, PaddingCharacter, FieldFormattingString, PaddingDirection, HasConstantValue, ConstantValue, DataTypeId.
Using this data I can create a collection of detail specifications that defines a row, and either parse it or build it against that specification. It gets trickier when you have to know what the previous line was in order to apply the correct specification to the current line. But in psuedo code her is a basic idea: Read file into List<string> -- determine the specifications required to decompose this file Get Specifications for file for each string in List get the specifications for this row -- in my case the first 3 characters of each row are the row type identifier create Collection to store decomposed objects (ie whatever each row holds the data for) for each specification in row specifications create new decomposed object item Extract the data for specification (here is where I ran into coding trouble -- I had a huge switch based on RecordType and a method to extract the data based on record types. Very ugly but functional. I also had to add code in this method to check for field types like Time or Date or both also for handling boolean conversion of data and decimal types) Once you have the data for the field then you could easily set the value of the objects field In my case if I could not convert some piece of data I would throw an exception and then I could figure out why and handle that, then just rerun the process. If anyone is interested, feel free to ping me and you can send email to tony at blognstuff dot com Always interested in someone looking at my solution and showing me a better way! Tony Blogumas, Mar 15, 2012
|
|
Reply 2Hi for your scenario , I could think of Factory pattern . All the other stuff could be placed on a service , which has public method (of course you got to add Webmethod attribs) to be triggered by the clients. The same factory pattern could be extended to provide different file schemas , based on some logic (filename?) in case if you wish to have different set of file formats for different set of files. Also, you may also have a separate schema for delimited files and refer them too for Header/Trailer. You may add that to this code as you would prefer. Please share your thoughts , if this suggestion is ok. Thanks, Tarriq Ferrose Khan using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Data; namespace ConsoleApplication1 { public enum CustomFileType {FixedWidth,Delimited} class Program { static void Main(string[] args) { //File object being handled MyFile myFile = new MyFile("Test.txt",CustomFileType.FixedWidth); //File Factory returning the exact File processor based on File Type MyFileProcessorFactory processorFactory = new MyFileProcessorFactory(myFile); IFileProcessor iFileProcessor = processorFactory.GetFileProcessor(); //Process File will call the ProcessFile method of the FileProcessor Product being returned by factory iFileProcessor.ProcessFile(myFile); //myFile.ProcessedFileData will need to have the processed collection. Console.ReadLine(); } } public interface IFileProcessor { MyFile ProcessFile(MyFile myFile); } public class MyFile { string _FileName = string.Empty; CustomFileType _FileType; DataTable _FileData = new DataTable(); public string FileName { get {return _FileName; } } public CustomFileType FileType { get { return _FileType; } } public DataTable ProcessedFileData { get { return _FileData; } set {_FileData=value; } } public MyFile(string FileName, CustomFileType fileType) { _FileName = FileName; _FileType = fileType; } } public class MyFileProcessorFactory { public IFileProcessor processor; public MyFile myFile; public MyFileProcessorFactory(MyFile _myfile) { myFile = _myfile; } public IFileProcessor GetFileProcessor() { switch (myFile.FileType) { case CustomFileType.Delimited: processor= new MyDelimitedFileProcessor(); break; case CustomFileType.FixedWidth: processor= new MyFixedWidthFileProcessor(); break; } return processor; } } public class MyFixedWidthFileProcessor : IFileProcessor { public MyFile ProcessFile(MyFile myfile) { if (LoadSchema()) { if (CheckFileIntegrity()) { myfile.ProcessedFileData = CreateDataTable(); } } Console.WriteLine("Fixed"); return myfile; } private bool LoadSchema() { //Load the XML Schema. //if not loaded properly returns false. //else return true; } private bool CheckFileIntegrity() { //Check if the Header and Trailer format on the data file matches with the defined schema. //Checks if the actual number of rows on the file matches with Header /trailer count //returns false if any of the above it false //else return true; } private DataTable CreateDataTable() { //this will parse through the file against the loaded schema and return a Data Table return new DataTable(); } } public class MyDelimitedFileProcessor : IFileProcessor { string StandardDelimiter = "|"; //You may declare this as a property to have other delimiters - passed in from UI. public MyFile ProcessFile(MyFile myfile) { Console.WriteLine("Delimited"); return myfile; } private DataTable CreateDataTable() { //this will parse through the file for the delimiter and return DataTable return new DataTable(); } } } Sample XML Schema - to be validated? <?xml version="1.0" encoding="utf-8" ?> <CustomFileSchema> <HeaderSchema value="100;#########;mm/dd/yyyy"/> <TrailerSchema value="199;#########;mm/dd/yyyy"/> <DataSchema Tag="200"> <Field Name="Test1" Order="1" length="10" NeedsLookup="False" FieldFormatting="dd/mm/yyyy"></Field> <Field Name="Test2" Order="2" length="8" NeedsLookup="true" ></Field> <Field Name="Test3" Order="3" length="10" NeedsLookup="False" FieldFormatting="$##,###,###.00"></Field> </DataSchema> </CustomFileSchema> Tarriq Ferrose Khan, Aug 15, 2011
|
|
Reply 3I am no expert, but I faced somewhat similar issue when working with excel file.
I used Strategy pattern for extracting information from files. Secondly, I used Flyweight pattern for capturing data before writing or processing. I used Collection Objects to capture data (List and Dictionary) List<object> -> this can be another dictionary for quick access (if required) Dictionary<Id, Object> Object { Dictionary<Property, Value> } Strategy Pattern -> follows principle of Encaplusation - Seperate what changes Hope this will be helpful. I appologise for poor english. Asmprogs Asmprogs, Jul 09, 2011
|
|
Reply 4Tony, could you please post your code for reference..
Venkatesh D, Jul 05, 2011
|