how to import and export files in Rapidminer

Sample Image

Introduction

RapidMiner is one of the leading data mining software suites. With more than 400 data mining modules or operators, it is one of the most comprehensive and most flexible data mining tools available. With over 10,000 downloads from SourceForge.net each month and more than 300,000 downloads in total, it is also one of the most widespread-used data mining tools. According to polls by the popular data mining web portal KDnuggets.com among several hundred data mining experts, RapidMiner was the most widely used open source data mining tool and among the top three data mining tools overall in 2007 and 2008.

RapidMiner supports all steps of the data mining process from data loading, pre-processing, visualization, interactive data mining process design and inspection, automated modeling, automated parameter and process optimization, automated feature construction and feature selection, evaluation, and deployment. RapidMiner can be used as stand-alone program on the desktop with its graphical user interface (GUI), on a server via its command line version, or as data mining engine for your own products and Java library for developers.

Background

For my recent BI project I need a tool to transform a SQL Server 2008 database to PostgreSQL database with all data. After working with several tools I chose RapidMiner for the task and it shows its full color for this purpose.

ETL

ETL stands for Extract, Transform, Load. For example, you receive files or other data from vendors or other third parties which you need to manipulate in some way and then insert into your own database.

What you Need

Microsoft® SQL Server® 2008 Express

 

SQL Server 2008 Express is a free edition of SQL Server that is an ideal data platform for learning and building desktop and small server applications, and for redistribution by ISVs. You can download from http://www.microsoft.com/en-us/download/details.aspx?id=1695[^]

I assume you have installed Microsoft® SQL Server® 2008 Express. For further information you can visithttp://msdn.microsoft.com/en-us/library/dd981045(v=sql.100).aspx.

AdventureWorks database

You can download from http://msftdbprodsamples.codeplex.com/downloads/get/478218[^] and attach:

Sample Image

RapidMiner

You can download from http://sourceforge.net/projects/rapidminer/files/1.%20RapidMiner/5.2/rapidminer-5.2.008×32-install.exe/download[^].

I assume you have installed RapidMiner. For further information you can visit http://rapid-i.com/wiki/index.php?title=RapidMiner_Installation_Guide.

ETL Process

Step 1:

Create a new process from File->New.

Step 2:

Create a new process from File->New.

Step 3:

Click Operators->Import->Data->Read Database.

Sample Image

Step 4:

Now we have to create a SQL Server connection for reading.

Sample Image

After creating the connection, assign the connection to read Database link.

Sample Image

Step 5:

Click Build Query and assign your Table.

Sample Image

Step 6:

Click Operators->Export->Data->Write CSV. Select Write CSV icon and assign value.

CSV file: Location of the file.

Column separator: Column separator character.

Connect out from “Read Database” to inp from “Write CSV”.

Sample Image

Behind the scenes

 Collapse | Copy Code
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="370" width="614">
      <operator activated="true" class="read_database" compatibility="5.2.008" 
              expanded="true" height="60" name="Read Database" width="90" x="179" y="75">
        <parameter key="connection" value="MSSQLServer-AdventureWorks2008"/>
        <parameter key="query" value="SELECT *
FROM "HumanResources"."Employee""/>
        <enumeration key="parameters"/>
      </operator>
      <operator activated="true" class="write_csv" compatibility="5.2.008" 
             expanded="true" height="76" name="Write CSV" width="90" x="380" y="75">
        <parameter key="csv_file" value="C:\Users\Masud\Desktop\exportcsv.csv"/>
        <parameter key="column_separator" value=","/>
      </operator>
      <connect from_op="Read Database" from_port="output" to_op="Write CSV" to_port="input"/>
      <connect from_op="Write CSV" from_port="through" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Step 7:

Now press Run or F11 you will see that all data will be exported as a CSV file in your destination. And output will be like:

Sample Image

Conclusion

RapidMiner Community Edition is an excellent tool for ETL.

Using Business Intelligence Systems

Dashboard

In an article in the Graziadio Business Review, Using Dashboard Based Business Intelligence Systems, it discusses challenges for current business intelligence systems (BIS), benefits of using BIS, and strategies for implementing future BIS. As we have become a more global world, BIS has been applied across many industries . A dashboard is used to make real time management decisions. The dashboard could be used to analyze a building or city project through different data sources to identify issues that are otherwise difficult to sift through in the sea of data provided.

Tools for organizing & analyzing urban data

•Traditional databases
•Data Warehouses
•Columnar databases for reporting
•Business Intelligence Systems
•Dashboards
•Data Mining
Data Warehouse
Columnar Database
Business Intelligence Systems
Data Mining
Using data in Excel for BIS:
I created this Excel Spreadsheet from xml data exported from the Chicago Data Portal about city owned land inventory. This is a simplified example of what one can do with the data to discover relationships between data sources. I used a pivot table to calculate the number of properties within an area that had a particular zoning class. When filtering to business, one can recognize that there are a high amount of properties in Englewood zoned business. Why is this the case? These are the types of questions one can ask by layering different data sources and become better informed of what the issues within these complex data sources mean. Although I only analyzed the zoning classifications, one data dimension or measurement does not tell the whole story, its only when you compare different dimensions and classifications within the various hierarchies that you can really judge what the information means. For example, comparing property tax funds received by the government in a certain area to the population of that area, this would result in a per capita tax contribution to the fund.
Business Intelligence App (dashboards)

Creating a Better Urban Quality of Life through Integrated Urban Design

INTRODUCTION:
Advances in information technology are bringing people together. People are interacting on a daily basis through their computers, phones, tablets via social media sites, blogs, e-mail etc. The information people are gaining from the internet and other sources are allowing them to become knowledgeable in areas they do not have expertise in and they have a better understanding of what’s going on. This access of information is empowering their credibility because they have the knowledge provided by information technology. Because of this people want to be more involved in the decision making of things that affect them. The advances in information, communication and overall people knowledge have made an impact in the delivery method of construction projects. The antiquated methods and processes of project delivery are no longer acceptable. Consumers are demanding integrated project delivery through the use of computer aided software. This has made the AEC industry the pioneer or the lab rat for integrated project delivery which can be applied to any type of project, including urban design.

BIM request page

John Tolva, CTO of Chicago talks about “Code for America”
http://snd.sc/1c9HQAZ

Lack of experience is presenting problems related to effective means and methods of integrating people to create a better end product. Integrating people is the challenge, but integration provides a combined intelligence that only advances a project.

“Two heads are better than one” -John Heywood

In the studio I want to apply integrated project delivery principles to urban design, using advances in information systems to create a better urban city quality of life.

DATA SOURCES:

The more data that is available the more tools decision makers will have. The best way to record data is through electronic means. Therefore the more people use technology the more data we will have.

URBAN CITY QUALITY OF LIFE

Data Warehouse_

Pro Forma