Thursday, 22 July 2010

Upload multiple files into a document library using PowerShell and set metadata

This article has now been updated to include a script download with many more features, including support for almost all column types, copying files from subfolders in the source location and retaining the structure in the destination library, an overwrite option, ability to merge subfolders into a single location, and much more. Visit this page for details - http://get-spscripts.com/2010/10/bulk-upload-files-with-metadata-into.html.

A scenario I come across quite regularly is migrating data into SharePoint from a file system and tagging it with column values on its way in. The PowerShell script below provides a framework for doing exactly that. It doesn’t cover all the options that you might need from a fully featured migration platform but is a good starting point from which you can extend further.

The general order of the script is as follows:

  • Specify site, document library and local folder path variables (haven’t tested with network paths yet but will update blog when I have).
  • Enumerate through all files in the folder (not subfolders at the moment) and upload them into the document library in SharePoint. I have set it to upload the files to the root of the library, but you could change the script to copy into sub-folders or migrate an entire folder structure from the file system to SharePoint.
  • Connect to a manifest XML file and configure the column values - The columns themselves must already be present in the document library prior to running the script. The file specifies values for “Single line of text” and “Managed Metadata” (single value) column types – the only two columns types supported in the script at the moment. With Managed Metadata columns, the script will look for the value in the term set first (it will choose the first term it picks up if there is more than one in the term set with the same name) and create it if it isn’t there. An example of a manifest file is shown below:

<?xml version="1.0" encoding="utf-8"?>
<Columns>
  <Column name="Technology Area" type="TaxonomyFieldType">
    <Values>
      <Value>SharePoint</Value>
    </Values>
  </Column>
  <Column name="Subject" type="Text">
    <Values>
      <Value>Document Management</Value>
    </Values>
  </Column>
</Columns>

  • Check the files into the document library (optional depending on the document library configuration)
  • Approve the files (optional depending on the document library configuration)

Here is the script:

#Setup default variables
$metadataManifest = [xml] (Get-Content ("C:\Install\DocManifest.xml"))
$webUrl = "http://portal"
$docLibraryName = "Documents"
$localFolderPath = "c:\Install\Docs"

#Get web and document library objects
$web = Get-SPWeb $webUrl
$docLibrary = $web.Lists[$docLibraryName]
#Attach to local folder and enumerate through all files
$files = ([System.IO.DirectoryInfo] (Get-Item $localFolderPath)).GetFiles() | ForEach-Object {

    #Create file stream object from file
    $fileStream = ([System.IO.FileInfo] (Get-Item $_.FullName)).OpenRead()
    $contents = new-object byte[] $fileStream.Length
    $fileStream.Read($contents, 0, [int]$fileStream.Length);
    $fileStream.Close();

    write-host "Copying" $_.Name "to" $docLibrary.Title "in" $web.Title "..."

    #Add file
    $folder = $docLibrary.RootFolder
    $spFile = $folder.Files.Add($folder.Url + "/" + $_.Name, $contents, $true)
    $spItem = $spFile.Item

    #Walk through manifest XML file and configure column values on the file
    $metadataManifest.Columns.Column | ForEach-Object {
        #Single line of text column
        if ($_.Type -eq "Text")
        {
            $columnName = $_.Name
            write-host "Setting value on column"$columnName
            $_.Values.Value | ForEach-Object {
                $spItem[$columnName] = $_
                $spItem.Update()
                write-host "Value set to"$_
            }
        }
        #Single value Managed Metadata column
        if ($_.Type -eq "TaxonomyFieldType")
        {
            $columnName = $_.Name
            $taxonomySession = Get-SPTaxonomySession -Site $web.Site
            $termStore = $taxonomySession.DefaultSiteCollectionTermStore
            $taxonomyField = $docLibrary.Fields[$columnName]
            $termSet = $termStore.GetTermSet($taxonomyField.TermSetId)
            write-host "Setting value on column"$columnName
            $_.Values.Value | ForEach-Object {
                $termCollection = $termSet.GetTerms($_, $true)
                if ($termCollection.Count -eq 0)
                {
                    $term = $termSet.CreateTerm($_, 1033)
                    $termStore.CommitAll()
                }
                else
                {
                    $term = $termCollection[0]
                }
                $taxonomyField.SetFieldValue($spItem, $term)
                write-host "Value set to"$_
            }
        }
    }

    #Check in file to document library
    #MinorCheckIn=0, MajorCheckIn=1, OverwriteCheckIn=2
    $spFile.CheckIn("File copied from " + $filePath, 1)
    if ($spFile.CheckOutStatus -eq "None") { write-host $spfile.Name"checked in" }

    #Approve file
    $spFile.Approve("File automatically approved after copying from " + $filePath)
    if ($spItem["Approval Status"] -eq 0) { write-host $spfile.Name"approved" }
}
#Dispose of Web object
$web.Dispose()

For brevity purposes, I haven’t added any exception handling in the script posted here, so I am assuming that the site, document library, local folder, permissions, columns, etc. have all been created and assigned correctly. PowerShell is fairly good at reporting issues, and if you use something like Windows PowerShell ISE included with Windows Server 2008 R2 and Windows 7, it will report the location of the issue in the script and even supports breakpoints so that you can debug.

Once you have this basic script working, you could introduce a whole host of other features – e.g., create an XML or CSV file plan containing different folder structures on the file system and related document manifests so that you can copy larger numbers of documents up in one hit, configure specific security permissions, attach to different content types, etc. I will be adding extra functionality myself and documenting in future posts on this blog…

14 comments:

  1. Phil, you may be a saviour! I will test and let you know how I get on. I guess you've written this because there's a big gap in SP2010 functionality, with respect to assigning meta-data to multiple files (either during upload or afterwards)?
    Thanks!
    Doug

    ReplyDelete
  2. I quite like being referred to as a "saviour" :-) The script only supports single line of text and single value managed metadata columns at the moment, but I plan to increase support for other column types over the next few weeks. Hope it works out for you and please do let me know how you get on...

    ReplyDelete
  3. Hi Phil,

    Thanks for the great article, where do I add file name to upload documents to library? Any hwlp would be much appreciated.

    Many Thanks
    Pramod

    ReplyDelete
  4. Hi Pramod - you just specify a local folder name for the $localFolderPath variable and copy your files into that folder. Individual file names are not required.

    Hope that helps,
    Phil

    ReplyDelete
  5. Thanks Phil, I will try

    Thanks
    Pramod

    ReplyDelete
  6. Hi Phil,

    Can this be used on MOSS?

    Thanks

    ReplyDelete
  7. Not as is - but it wouldn't take too much adjustment to get it working. Just replace the Get-SPWeb command with a 2007 equivalent and get rid of the Managed Metadata references as the column type doesn't exist. Publishing a 2007 version on Codeplex is on my things to do list...It's just time...

    ReplyDelete
  8. Hi Phil,

    Can you run your script from an asp.net page if so how. I have tried with no luck.

    ReplyDelete
  9. Not that I am aware of - I would suggest converting the script to C# or VB and developing an ASP.NET page from there

    ReplyDelete
  10. Hi Phil, Great Script!

    Does it only work with the default manage MetaData Service? I created a new MetaData service but I cant get the Script to add MetaData from that service. If I try to add from the default it works. Have also tried to deleted the default Service from the Association list, but then I don teven get the blue 'Setting Value on Column....'

    Another thing is when I import a dokument with Swedish special character I get error.

    ReplyDelete
  11. This may well save me over 2 days of manual effort, it works great in development, but on the server it says the $web object is null ($web = Get-SPSite $SiteURL

    Any ideas what would cause that? The variable outbuts 'home' on dev, but just ends up going to the catch statement on QA.

    ReplyDelete
  12. Kamis - could be anything really - the obvious ones being permissions. Try running the PowerShell console by right-clicking on the shortcut and selecting "Run As Administrator".

    Also, make sure $web = Get-SPWeb $SiteURL and not Get-SPSite that you specified in your message above.

    ReplyDelete
  13. Thanks Phil, it was the load plug-ins option that worked in the end.

    Regards

    ReplyDelete
  14. To use this with MOSS 2007;

    Comment out #$web = Get-SPWeb $SiteUrl

    Add:

    [void][System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SharePoint")
    $Site = New-Object Microsoft.Sharepoint.SPSite($siteurl)
    $Web = $Site.OpenWeb()

    It works as expected now with Sharepoint 2007

    ReplyDelete