This article has now been updated to include a script download with many more features, including support for almost all column types, copying files from subfolders in the source location and retaining the structure in the destination library, an overwrite option, ability to merge subfolders into a single location, and much more. Visit this page for details - http://get-spscripts.com/2010/10/bulk-upload-files-with-metadata-into.html.
A scenario I come across quite regularly is migrating data into SharePoint from a file system and tagging it with column values on its way in. The PowerShell script below provides a framework for doing exactly that. It doesn’t cover all the options that you might need from a fully featured migration platform but is a good starting point from which you can extend further.
The general order of the script is as follows:
- Specify site, document library and local folder path variables (haven’t tested with network paths yet but will update blog when I have).
- Enumerate through all files in the folder (not subfolders at the moment) and upload them into the document library in SharePoint. I have set it to upload the files to the root of the library, but you could change the script to copy into sub-folders or migrate an entire folder structure from the file system to SharePoint.
- Connect to a manifest XML file and configure the column values - The columns themselves must already be present in the document library prior to running the script. The file specifies values for “Single line of text” and “Managed Metadata” (single value) column types – the only two columns types supported in the script at the moment. With Managed Metadata columns, the script will look for the value in the term set first (it will choose the first term it picks up if there is more than one in the term set with the same name) and create it if it isn’t there. An example of a manifest file is shown below:
<?xml version="1.0" encoding="utf-8"?>
<Columns>
<Column name="Technology Area" type="TaxonomyFieldType">
<Values>
<Value>SharePoint</Value>
</Values>
</Column>
<Column name="Subject" type="Text">
<Values>
<Value>Document Management</Value>
</Values>
</Column>
</Columns>
- Check the files into the document library (optional depending on the document library configuration)
- Approve the files (optional depending on the document library configuration)
Here is the script:
#Setup default variables
$metadataManifest = [xml] (Get-Content ("C:\Install\DocManifest.xml"))
$webUrl = "http://portal"
$docLibraryName = "Documents"
$localFolderPath = "c:\Install\Docs"#Get web and document library objects
$web = Get-SPWeb $webUrl
$docLibrary = $web.Lists[$docLibraryName]
#Attach to local folder and enumerate through all files
$files = ([System.IO.DirectoryInfo] (Get-Item $localFolderPath)).GetFiles() | ForEach-Object {#Create file stream object from file
$fileStream = ([System.IO.FileInfo] (Get-Item $_.FullName)).OpenRead()
$contents = new-object byte[] $fileStream.Length
$fileStream.Read($contents, 0, [int]$fileStream.Length);
$fileStream.Close();write-host "Copying" $_.Name "to" $docLibrary.Title "in" $web.Title "..."
#Add file
$folder = $docLibrary.RootFolder
$spFile = $folder.Files.Add($folder.Url + "/" + $_.Name, $contents, $true)
$spItem = $spFile.Item#Walk through manifest XML file and configure column values on the file
$metadataManifest.Columns.Column | ForEach-Object {
#Single line of text column
if ($_.Type -eq "Text")
{
$columnName = $_.Name
write-host "Setting value on column"$columnName
$_.Values.Value | ForEach-Object {
$spItem[$columnName] = $_
$spItem.Update()
write-host "Value set to"$_
}
}
#Single value Managed Metadata column
if ($_.Type -eq "TaxonomyFieldType")
{
$columnName = $_.Name
$taxonomySession = Get-SPTaxonomySession -Site $web.Site
$termStore = $taxonomySession.DefaultSiteCollectionTermStore
$taxonomyField = $docLibrary.Fields[$columnName]
$termSet = $termStore.GetTermSet($taxonomyField.TermSetId)
write-host "Setting value on column"$columnName
$_.Values.Value | ForEach-Object {
$termCollection = $termSet.GetTerms($_, $true)
if ($termCollection.Count -eq 0)
{
$term = $termSet.CreateTerm($_, 1033)
$termStore.CommitAll()
}
else
{
$term = $termCollection[0]
}
$taxonomyField.SetFieldValue($spItem, $term)
write-host "Value set to"$_
}
}
}#Check in file to document library
#MinorCheckIn=0, MajorCheckIn=1, OverwriteCheckIn=2
$spFile.CheckIn("File copied from " + $filePath, 1)
if ($spFile.CheckOutStatus -eq "None") { write-host $spfile.Name"checked in" }#Approve file
$spFile.Approve("File automatically approved after copying from " + $filePath)
if ($spItem["Approval Status"] -eq 0) { write-host $spfile.Name"approved" }
}
#Dispose of Web object
$web.Dispose()
For brevity purposes, I haven’t added any exception handling in the script posted here, so I am assuming that the site, document library, local folder, permissions, columns, etc. have all been created and assigned correctly. PowerShell is fairly good at reporting issues, and if you use something like Windows PowerShell ISE included with Windows Server 2008 R2 and Windows 7, it will report the location of the issue in the script and even supports breakpoints so that you can debug.
Once you have this basic script working, you could introduce a whole host of other features – e.g., create an XML or CSV file plan containing different folder structures on the file system and related document manifests so that you can copy larger numbers of documents up in one hit, configure specific security permissions, attach to different content types, etc. I will be adding extra functionality myself and documenting in future posts on this blog…