demarc
  

Resource Logic Client/Server Development Services


Flat File Bulk Import to SQL Server 2005


  
Context: The IRS publishes a list of non-profits (501c3) which is made up of churches, charities, foundations, and similar organizations.  This list currently contains about 890,000 entries.  The incoming file has a number of quirks.  It is not an MS-DOS-style text file: each record has a ASCII 10 linefeed end of record marker.  There are some unusual values and sequences, one of which can be mistaken for an open tag (' <') in XML, which cannot be written to an XML export.  One of the characters appears to be a translation of a Spanish ñ, which is stored as an ASCII character 26.  Websites on shared host servers (i.e., the typical $9.95 per month website) can import, at best, about 10,000 records through a CSV before timing out.  The point of this demonstration is to efficiently and repeatedly import the full set of records, applying needed corrections, and exporting chunks of XML that can be automatically imported into the host through a C# program and ASP.NET web page.
  
The reason I have developed this alternative path is that SSIS does not work with this file.  The following stored procedure, with it's table creation component commented out, is a means to import a flat file into a memo field (varchar(max)) as a first step:
  
CREATE procedure [dbo].[BulkLoadFromText]

as

set nocount on

--create table EOPubBulkLoad
--(
-- BulkLoadId int IDENTITY(1,1) NOT NULL,
-- IncomingFileName varchar(50) NOT NULL,
-- IncomingContents varchar(max) NULL
--)
insert into EOPubBulkLoad
select 'C:\IRSImport\eopub780310.txt', i.fullstream
from openrowset(bulk 'C:\IRSImport\eopub780310.txt', single_blob) as i(fullstream);
  
  
The first column of the insert contains the name of the file. The second column, shown here as the target of the 'openrowset' operation, is a single text column storing the entire incoming file.
  
Next: Parsing the text file.