Translate

26.3.17

Read 20,000 lines in 3 seconds || .NET Application

Sometimes last year, I was working on this Desktop application (VB.NET) that had to read a large file and run some complex analysis to give output in various formats. 
Anyway, reading the large file took the better part of 5 weeks and I more or less tried all suggested solutions (from stackoverflow & co and MSDN website itself). Nothing. 

I tried datareaders, datatables, tableadapters, xmlserializer, etc ( the whole microsoft file reading arsenal). I cracked my brain alongside my friend and system analyst Adelore Oreoluwa (he's awesome). They were all either too slow, or ineffective. (Some read part of the file and skipped some, some misplaced part of my input data, some simply hung the entire application and consumed large memory).

I finally fixed the problem by reading all my data directly from Excel 2003 (.xls) format, using a dll file got from some spanish programmer friends.  The code snippets are below (in case you ever run into same problem). It read me over 20,000 rows of data in less than 3 seconds and misplaced none of my input. 

Imports Excel    //Of course you know where to put this

Private Sub CheckExcel()
        Dim result As DataSet = Nothing

        If Path.GetExtension(Me.txtfilename.Text).ToLower().Equals(".xls") Then
            'Call the ReadExcelFile function
            Call ReadExcelFile(result)

            'Check that the excel file matches your input template
            Call CheckTemplate(result)
        Else
        End If
 End Sub

Private Sub ReadExcelFile(ByRef result As DataSet)
        Dim stream As FileStream = File.Open(Me.OpenFileDialog1.FileName, FileMode.Open, FileAccess.Read)
        Dim excelReader As IExcelDataReader

        excelReader = ExcelReaderFactory.CreateBinaryReader(stream)

        result = excelReader.AsDataSet()
        Return
    End Sub


    Private Sub CheckTemplate(ByVal result As DataSet)
        Try
            objTable = result.Tables(0)

          Dim fields(9) As String
            For Each row As DataRow In objTable.Rows
                For j As Int16 = 0 To 8
                    If row(j).ToString <> String.Empty Then
                        fields(j) = row(j).ToString
                    End If
                Next

                Call loadtobase(fields)
            Next
            'MessageBox.Show(objTable.Rows.Count.ToString + " Records Loaded", "Success")
        Catch ex As Exception
            ' MessageBox.Show(ex.ToString)
            Exit Sub
        End Try
    End Sub

||||||--------------------------------------------------------------------------------------|||||||||||||||||||
This code is in VB.Net, I'm sure if C#, F#, or Visual C++ is your thing, you can easily translate to fit your needs. (I can help with the C#, just contact me).

The dll has to be placed in the debug folder, something like: C:\Users\Username\Documents\Visual Studio 2012\Projects\Zeed REgiz\SeedlingRegistry1\SeedlingRegistry\bin\Debug\Excel.dll
(Adjust path to suit your System settings).

The dll file can be found on my github: github.com/C-spydo    . It will work for any .NET language.

No comments:

Post a Comment