#@title LICENSE
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Data Exploration & Model Training with Vertex AI Extensions Code Interpreter¶
Authors | Christos Aniftos |
Michael W. Sherman | |
Reviewer | Meltem Subasioglu |
Last updated | 2024 04 09: Initial release |
2024 04 04: Complete draft |
Overview¶
This notebook shows how to use the Vertex AI Extensions Google-provided Code Interpreter Extension to do standard data science tasks like analyzing a dataset and training an ML model. As a data scientist, Code Interpreter can save you time getting up and running with a new dataset.
In this notebook you will use Code Interpreter to:
- Explore data
- Clean data
- Visualise data
- Train a linear regression model
- Generate predictions using that model
- Evaluate the predictions against the ground truth
If you're already familiar with Google Cloud and the Vertex AI Extensions Code Interpreter Extension, you can skip reading between here and the "Create the Data" section, but make sure to run the code cells.
Vertex AI Extensions¶
Vertex AI Extensions is a platform for creating and managing extensions that connect large language models to external systems via APIs. These external systems can provide LLMs with real-time data and perform data processing actions on their behalf. You can use pre-built or third-party extensions in Vertex AI Extensions.
Vertex AI Extensions Code Interpreter Extension¶
The Code Interpreter extension provides access to a Python interpreter with a sandboxed, secure execution environment that can be used with any model in the Vertex AI Model Garden. This extension can generate and execute code in response to a user query or workflow. It allows the user or LLM agent to perform various tasks such as data analysis and visualization on new or existing data files.
You can use the Code Interpreter extension to:
- Generate and execute code.
- Perform a wide variety of mathematical calculations.
- Sort, filter, select the top results, and otherwise analyze data (including data acquired from other tools and APIs).
- Create visualizations, plot charts, draw graphs, shapes, print results, etc.
Using this Notebook¶
Colab is recommended for running this notebook, but it can run in any iPython environment where you can connect to Google Cloud, install pip packages, etc.
If you're running outside of Colab, depending on your environment you may need to install pip packages (at the very least pandas
and tabulate
) that are included in the Colab environment by default but are not part of the Python Standard Library--try pipping the library name of any imports that fail. You'll also notice some comments in code cells that look like "@something"; these have special rendering in colab, but you aren't missing out on any content or important functionality.
This tutorial uses the following Google Cloud services and resources:
- Vertex AI Extensions
This notebook has been tested in the following environment:
- Python version = 3.10.12
- google-cloud-aiplatform version = 1.47.0
Useful Tips¶
- This notebook uses Generative AI cababilities. Re-running a cell that uses Generative AI capabilities may produce similar but not identical results.
- Because of #1, it is possible that an output from Code Interpreter producess errors. If that happens re-run the cell that produced the coding error. The different generated code will likely be bug free. The
run_code_interpreter
method below helps automate this, but you still may need to rerun cells that generate working code that doesn't perfectly follow the instructions in the prompt. - The use of Extensions and other Generative AI capabilities is subject to service quotas. Running the notebook using "Run All" may exceed your queries per minute (QPM) limitations. Run the notebook manually and if you get a quota error pause for up to 1 minute before retrying that cell. Code Interpreter defaults to Gemini on the backend and is subject to the Gemini quotas, view your Gemini quotas here.
- The Code Interpreter Extension is stateless and therefore every request to Code Interpreter does not have knowledge of previous operations nor files injested or produced in previous steps. Therefore, with any request to Code Interpreter you need to submit all files and instructions for that request to complete successfully.
- When doing data science tasks with Code Interpreter, often the pandas library will be used, and common ways of using pandas generate a lot of warnings. Related to number 2 above, you'll want to make sure you don't necessarily automatically rerun code that generates warnings. One way to handle this is to instruct Code Interpreter to use the Python
warnings
library to supress warnings. Step 2 below has an example of this.
Getting Started¶
The following steps are necessary to run this notebook, no matter what notebook environment you're using.
If you're entirely new to Google Cloud, get started here.
Google Cloud Project Setup¶
- Select or create a Google Cloud project. When you first create an account, you get a $300 free credit towards your compute/storage costs.
- Make sure that billing is enabled for your project.
- Enable the Vertex AI API.
Google Cloud Permissions¶
Make sure you have been granted the following roles for the GCP project you'll access from this notebook:
Install the Google Cloud Vertex AI Python SDK¶
Install the Google Cloud Vertex AI Python SDK, and if you already have the Google Cloud Vertex AI Python SDK installed, upgrade to the latest version.
!pip install google-cloud-aiplatform --upgrade
# Note -- this may not work in some non-Colab environments. If you get errors
# when running 'import vertexai' below, you'll need to find another way to
# install the latest google-cloud-aiplatform package into your notebook kernel.
# In some kernel setups running "%pip install google-cloud-aiplatform --upgrade"
# in a code cell works if "!pip install ...." doesn't.
Restart runtime¶
You may need to restart your notebook runtime to use the Vertex AI SDK. You can do this by running the cell below, which restarts the current kernel.
You may see the restart reported as a crash, but it is working as-intended -- you are merely restarting the runtime.
The restart might take a minute or longer. After its restarted, continue to the next step.
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)
{'status': 'ok', 'restart': True}
If you're using Colab, as long the notebook runtime isn't deleted (even if it restarts) you don't need to re-run the previous cell.
If you're running this notebook in your own environment you shouldn't need to run the above pip cell again unless you delete your IPython kernel.
Authenticate¶
If you're using Colab, run the code in the next cell. Follow the popups and authenticate with an account that has access to your Google Cloud project.
If you're running this notebook somewhere besides Colab, make sure your environment has the right Google Cloud access. If that's a new concept to you, consider looking into Application Default Credentials for your local environment and initializing the Google Cloud CLI. More authentication options are discussed here.
# Colab authentication.
import sys
if "google.colab" in sys.modules:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')
Authenticated
Initialize the Google Cloud Vertex AI Python SDK¶
Start here if your Notebook kernel restarts (but isn't deleted), though if it's been a few hours you may need to run the Authentication steps above again.
To initialize the SDK, you need to set your Google Cloud project ID and region.
If you don't know your project ID, try the Google Cloud CLI commands gcloud config list
or gcloud projects list
. See the support page Locate the project ID for more information.
Set Your Project ID¶
PROJECT_ID = "YOUR_PROJECT_ID_HERE" # @param {type:"string"}
Set the Region¶
You can also change the REGION
variable used by Vertex AI. Learn more about Vertex AI regions.
REGION = "us-central1" # @param {type: "string"}
Import the Vertex AI Python SDK¶
import vertexai
from vertexai.preview import extensions
vertexai.init(
project=PROJECT_ID,
location=REGION
)
Setup and Test the Code Interpreter Extension¶
Code Interpreter is provided by Google, so you can load it directly.
extension_code_interpreter = extensions.Extension.from_hub("code_interpreter")
extension_code_interpreter
Confirm your Code Interpreter extension is registered:
print("Name:", extension_code_interpreter.gca_resource.name)
print("Display Name:", extension_code_interpreter.gca_resource.display_name)
print("Description:", extension_code_interpreter.gca_resource.description)
Test Code Interpreter¶
To test Code Interpreter, ask it to generate a basic plot from a small dataset.
Note that printing the Code Interpreter response object below is a bit long, due to the base64-encoded image file returned by Code Interpreter--just scroll down a bit.
QUERY = """
Using the data below, construct a bar chart that includes only the height values with different colors for the bars:
tree_heights_prices = {
\"Pine\": {\"height\": 100, \"price\": 100},
\"Oak\": {\"height\": 65, \"price\": 135},
\"Birch\": {\"height\": 45, \"price\": 80},
\"Redwood\": {\"height\": 200, \"price\": 200},
\"Fir\": {\"height\": 180, \"price\": 162},
}
Please include the data in the generated code.
"""
response = extension_code_interpreter.execute(
operation_id = "generate_and_execute",
operation_params = {"query": QUERY},
)
print(response)
Now, dig deeper into the returned response
object. pprint
more clearly shows the generated code:
import pprint
pprint.pprint(response)
You'll notice the response
object has an output_files
object that contains (base64 encoded) files you'll want to extract.
In the next section you'll create some helper functions that make it easier to work with Code Interpreter's response
object.
Code Interpreter Helper Functions¶
These functions are optional when using Code Interpreter but make it easier to inspect Code Interpreter's output, assemble Code Interprer requests, and run generated code.
process_response
¶
process_response
displays the generated code and any output files, shows the output from code execution, surfaces code execution errors, and saves output files.
If the output of process_response
looks strange, try making your noteboook window wider--this will help keep the HTML layout organized.
To use this functionality call process_response(response)
, where response
is the Code Interpreter response
object.
import base64
import json
import pprint
import pandas
import sys
import IPython
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
css_styles = """
<style>
.main_summary {
font-weight: bold;
font-size: 14px; color: #4285F4;
background-color:rgba(221, 221, 221, 0.5); padding:8px;}
.main_summary:hover {background-color: rgba(221, 221, 221, 1);}
details {
background-color:#fff;
border: 1px solid #E8EAED;
padding:0px;
margin-bottom:2px; }
details img {width:50%}
details > div {padding:10px; }
div#left > * > div {
overflow:auto;
max-height:400px; }
div#right > pre {
overflow:auto;
max-height:600px;
background-color: ghostwhite;
padding: 10px; }
details details > div { overflow: scroll; max-height:400px}
details details {
background-color:rgba(246, 231, 217, 0.2);
border: 1px solid #FBBC04;}
details details > summary {
padding: 8px;
background-color:rgba(255, 228, 196, 0.6); }
details details > summary:hover { background-color:rgba(255, 228, 196, 0.9); }
div#left {width: 64%; padding:0 1%; }
div#right {
border-left: 1px solid silver;
width: 30%;
float: right;
padding:0 1%; }
body {color: #000; background-color: white; padding:10px 10px 40px 10px; }
#main { border: 1px solid #FBBC04; padding:10px 0; display: flow-root; }
h3 {color: #000; }
code { font-family: monospace; color: #900; padding: 0 2px; font-size: 105%; }
</style>
"""
# Parser to visualise the content of returned files as HTML.
def parse_files_to_html(outputFiles, save_files_locally = True):
IMAGE_FILE_EXTENSIONS = set(["jpg", "jpeg", "png"])
file_list = []
details_tml = """<details><summary>{name}</summary><div>{html_content}</div></details>"""
if not outputFiles:
return "No Files generated from the code"
# Sort output_files so images are displayed before other files such as JSON.
for output_file in sorted(
outputFiles,
key=lambda x: x["name"].split(".")[-1] not in IMAGE_FILE_EXTENSIONS,
):
file_name = output_file.get("name")
file_contents = base64.b64decode(output_file.get("contents"))
if save_files_locally:
open(file_name,"wb").write(file_contents)
if file_name.split(".")[-1] in IMAGE_FILE_EXTENSIONS:
# Render Image
file_html_content = ('<img src="data:image/png;base64, '
f'{output_file.get("contents")}" />')
elif file_name.endswith(".json"):
# Pretty print JSON
json_pp = pprint.pformat(
json.loads(file_contents.decode()),
compact=False,
width=160)
file_html_content = (f'<span>{json_pp}</span>')
elif file_name.endswith(".csv"):
# CSV
csv_md = pandas.read_csv(
StringIO(file_contents.decode())).to_markdown(index=False)
file_html_content = f'<span>{csv_md}</span>'
elif file_name.endswith(".pkl"):
# PKL
file_html_content = f'<span>Preview N/A</span>'
else:
file_html_content = f"<span>{file_contents.decode()}</span>"
file_list.append({'name': file_name, "html_content": file_html_content})
buffer_html = [ details_tml.format(**_file) for _file in file_list ]
return "".join(buffer_html)
# Processing code interpreter response to html visualization.
def process_response(response: dict, save_files_locally = True) -> None:
result_template = """
<details open>
<summary class='main_summary'>{summary}:</summary>
<div><pre>{content}</pre></div>
</details>
"""
result = ""
code = response.get('generated_code')
if 'execution_result' in response and response['execution_result']!="":
result = result_template.format(
summary="Executed Code Output",
content=response.get('execution_result'))
else:
result = result_template.format(
summary="Executed Code Output",
content="Code does not produce printable output.")
if response.get('execution_error', None):
result += result_template.format(
summary="Generated Code Raised a (Possibly Non-Fatal) Exception",
content=response.get('execution_error', None))
result += result_template.format(
summary="Files Created <u>(Click on filename to view content)</u>",
content=parse_files_to_html(
response.get('output_files', []),
save_files_locally = True))
display(
IPython.display.HTML(
( f"{css_styles}"
f"""
<div id='main'>
<div id="right">
<h3>Generated Code by Code Interpreter</h3>
<pre><code>{code}</code></pre>
</div>
<div id="left">
<h3>Code Execution Results</h3>
{result}
</div>
</div>
"""
)
)
)
{name}
{summary}:
{content}
Generated Code by Code Interpreter
{code}
Code Execution Results
{result}run_code_interpreter
¶
run_code_interpreter
eases calling Code Interpreter by encoding files to base 64 (a Code Interpreter requirement) and submitting the files alongside the instructions. It also automates retries (5 by default) if the generated code doesn't execute or if Code Interpreter fails due to exceeding Gemini (time-based) quotas. Additionally, a global CODE_INTERPRETER_WRITTEN_FILES
variable is populated by run_code_interpreter
to aid with cleaning up files created by Code Interpreter.
To use this functionality call run_code_interpreter(instructions, filenames, retry_num, retry_wait_time)
where instructions
is the prompt for Code Interpreter, filenames
is a list of local files in the working directory to submit to Code Interpreter, optionally retry_num
if you want to change the default number of retries from 5, and optionally retry_wait_time
if you want to change the default 15 second wait between retries.
from time import sleep
global CODE_INTERPRETER_WRITTEN_FILES
CODE_INTERPRETER_WRITTEN_FILES = []
def run_code_interpreter(instructions: str,
filenames: list[dict] = [],
retry_num: int = 5,
retry_wait_time: int = 15) -> dict['str', 'str']:
global CODE_INTERPRETER_WRITTEN_FILES
file_arr = [
{
"name": filename,
"contents": base64.b64encode(open(filename, "rb").read()).decode()
}
for filename in filenames
]
attempts = 0
res = {}
while attempts <= retry_num:
attempts += 1
res = extension_code_interpreter.execute(
operation_id = "generate_and_execute",
operation_params = {
"query": instructions,
"files": file_arr
},
)
CODE_INTERPRETER_WRITTEN_FILES.extend(
[item['name'] for item in res['output_files']])
if not res.get('execution_error', None):
return res
elif attempts <= retry_num:
print(f"The generated code produced an error {res.get('execution_error')}"
f" -Automatic retry attempt # {attempts}/{retry_num}")
Using the Helper Functions¶
To demonstrate the helper functions you will write a CSV of data, send the CSV with a prompt to Code Interpreter, examine the response, and run the code locally.
import csv
tree_heights_prices = {
"Pine": {"height": 100, "price": 100},
"Oak": {"height": 65, "price": 135},
"Birch": {"height": 45, "price": 80},
"Redwood": {"height": 200, "price": 200},
"Fir": {"height": 180, "price": 162},
}
with open('tree_data.csv', 'w', newline='') as csvfile:
fieldnames = ['Tree', 'Height', 'Price']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for tree, data in tree_heights_prices.items():
writer.writerow({'Tree': tree, 'Height': data['height'], 'Price': data['price']})
response = run_code_interpreter("Make a bar chart of the heights of the trees.",
['tree_data.csv'])
process_response(response)
Generated Code by Code Interpreter
```python
import pandas as pd
import matplotlib.pyplot as plt
# Load the data from the CSV file
data = pd.read_csv("tree_data.csv")
# Create a bar chart of the heights of the trees
plt.bar(data["Tree"], data["Height"])
# Set the chart title and labels
plt.title("Heights of Trees")
plt.xlabel("Tree")
plt.ylabel("Height (feet)")
# Show the chart
plt.show()
```
Code Execution Results
Executed Code Output:
Code does not produce printable output.
Files Created (Click on filename to view content):
code_execution_image_1_1CEsZrzCL8-S2ukPo6my0Ak.png
Create the Data¶
The following code writes a local CSV file of synthetic data. This is a simple dataset of students containing attributes about sleeping and eating habits along with academic performance. This dataset is fictional and does not represent reality, it is only used to demontstrate Code Interpreter cabapilities.
%%writefile students.csv
StudentID,Gender,ExtraActivitiesGroup,EatingHabits,SleepingHabits,Reading,Writing,Maths
1,Male,nan,Healthy,Satisfactory,75,80,78
2,Female,Group B,Mixed,Non-Satisfactory,nan,70,67
3,nan,Group A,Unhealthy,Satisfactory,55,60,58
4,Female,Group C,Healthy,Non-Satisfactory,70,75,73
5,Male,Group B,Mixed,Satisfactory,60,65,63
6,Female,Group A,Unhealthy,Non-Satisfactory,50,55,53
7,Male,Group C,Healthy,Satisfactory,80,85,83
8,Female,Group B,Mixed,Non-Satisfactory,65,70,67
9,Male,Group A,Unhealthy,Satisfactory,55,60,58
10,Male,nan,Mixed,Non-Satisfactory,80,78,85
11,Female,Group B,Unhealthy,Satisfactory,65,68,70
12,Female,Group A,Healthy,Non-Satisfactory,52,57,55
13,nan,Group C,Unhealthy,Satisfactory,78,75,79
14,Female,Group B,Mixed,Non-Satisfactory,63,70,65
15,Male,Group A,Healthy,Satisfactory,82,87,80
16,Male,Group C,Unhealthy,Non-Satisfactory,57,60,54
17,Female,Group A,Mixed,Satisfactory,67,65,63
18,Male,Group B,Unhealthy,Non-Satisfactory,55,62,58
19,nan,Group C,Healthy,Satisfactory,88,85,87
20,Female,Group B,Mixed,Non-Satisfactory,67,75,68
21,Male,Group A,Unhealthy,Satisfactory,53,58,55
22,Female,Group C,Healthy,Non-Satisfactory,80,77,82
23,Male,Group A,Mixed,Satisfactory,60,63,60
24,Female,Group B,Unhealthy,Non-Satisfactory,65,62,60
25,Male,Group C,Healthy,Satisfactory,90,92,88
26,Female,Group B,Mixed,Non-Satisfactory,58,65,60
27,Male,Group A,Unhealthy,Satisfactory,67,60,65
28,Male,Group C,Healthy,Non-Satisfactory,72,78,73
29,Female,Group A,Mixed,Satisfactory,55,62,58
30,Male,Group B,Unhealthy,Non-Satisfactory,78,75,72
31,Female,Group C,Healthy,Satisfactory,85,87,83
32,Female,Group A,Mixed,Non-Satisfactory,70,65,67
33,Male,Group B,Unhealthy,Satisfactory,62,67,65
34,Male,Group C,Healthy,Non-Satisfactory,77,83,75
35,nan,Group A,Mixed,Satisfactory,65,63,60
36,Female,Group B,Unhealthy,Non-Satisfactory,72,78,70
37,Male,Group C,Healthy,Satisfactory,80,87,83
38,Female,Group A,Mixed,Non-Satisfactory,75,70,72
39,Male,Group B,Unhealthy,Satisfactory,65,67,60
40,nan,Group C,Healthy,Non-Satisfactory,82,88,80
41,Female,Group A,Mixed,Satisfactory,77,72,70
42,Male,Group B,Unhealthy,Non-Satisfactory,67,62,63
43,Male,Group C,Healthy,Satisfactory,92,90,88
44,Female,Group A,Mixed,Non-Satisfactory,80,75,77
45,nan,Group B,Unhealthy,Satisfactory,72,75,73
46,Female,Group C,Healthy,Non-Satisfactory,83,80,85
47,Male,Group A,Mixed,Satisfactory,75,72,73
48,Male,Group B,Unhealthy,Non-Satisfactory,60,63,58
49,nan,Group C,Healthy,Satisfactory,90,92,88
50,Female,Group A,Mixed,Non-Satisfactory,85,80,82
51,Male,Group B,Unhealthy,Satisfactory,70,67,65
52,Female,Group C,Healthy,Non-Satisfactory,78,83,77
53,Male,Group B,Mixed,Satisfactory,65,63,62
54,Male,Group A,Unhealthy,Non-Satisfactory,52,57,55
55,nan,Group C,Healthy,Satisfactory,75,78,73
56,Female,Group B,Mixed,Non-Satisfactory,70,77,72
57,Male,Group A,Unhealthy,Satisfactory,62,65,63
58,Female,Group C,Healthy,Non-Satisfactory,88,85,83
59,Male,Group B,Mixed,Satisfactory,78,80,77
60,nan,Group A,Unhealthy,Non-Satisfactory,67,60,65
61,Female,Group C,Healthy,Satisfactory,83,80,82
62,Male,Group B,Mixed,Non-Satisfactory,72,68,70
63,Male,Group A,Unhealthy,Satisfactory,62,57,60
64,Female,Group C,Healthy,Non-Satisfactory,90,87,88
65,Male,Group B,Mixed,Satisfactory,85,82,80
66,nan,Group A,Unhealthy,Non-Satisfactory,55,62,58
67,Female,Group C,Healthy,Satisfactory,77,85,80
68,Male,Group B,Mixed,Non-Satisfactory,65,72,67
69,Male,Group A,Unhealthy,Satisfactory,67,60,68
70,Female,Group C,Healthy,Non-Satisfactory,92,90,85
71,Male,Group B,Mixed,Satisfactory,77,85,82
72,nan,Group A,Unhealthy,Non-Satisfactory,62,55,60
73,Female,Group C,Healthy,Satisfactory,83,87,85
74,Male,Group B,Mixed,Non-Satisfactory,68,72,65
75,Male,Group A,Unhealthy,Satisfactory,53,58,55
76,nan,Group C,Healthy,Non-Satisfactory,88,83,87
77,Female,Group B,Mixed,Satisfactory,72,70,73
78,Male,Group A,Unhealthy,Non-Satisfactory,70,65,67
79,Male,Group C,Healthy,Satisfactory,80,85,80
80,Female,Group B,Mixed,Non-Satisfactory,75,72,75
81,nan,Group A,Unhealthy,Satisfactory,55,60,58
82,Female,Group C,Healthy,Non-Satisfactory,80,77,82
83,Male,Group B,Mixed,Satisfactory,68,70,68
84,Male,Group A,Unhealthy,Non-Satisfactory,62,57,63
85,Female,Group C,Healthy,Satisfactory,90,92,88
86,nan,Group B,Mixed,Non-Satisfactory,67,72,67
87,Female,Group A,Unhealthy,Satisfactory,53,60,58
88,Male,Group C,Healthy,Non-Satisfactory,75,78,73
89,Male,Group B,Mixed,Satisfactory,82,80,83
90,nan,Group A,Unhealthy,Non-Satisfactory,65,62,63
91,Female,Group C,Healthy,Satisfactory,80,83,80
92,Male,Group B,Mixed,Non-Satisfactory,85,80,82
93,Male,Group A,Unhealthy,Satisfactory,62,67,65
94,nan,Group C,Healthy,Non-Satisfactory,90,87,92
95,Female,Group B,Mixed,Satisfactory,77,75,78
96,Female,Group A,Unhealthy,Non-Satisfactory,67,60,68
97,nan,Group C,Healthy,Satisfactory,77,83,78
98,Male,Group B,Mixed,Non-Satisfactory,62,68,65
99,Male,Group A,Unhealthy,Satisfactory,52,57,58
100,Female,Group C,Healthy,Non-Satisfactory,72,75,77
101,Male,Group B,Mixed,Satisfactory,70,67,72
102,nan,Group A,Unhealthy,Non-Satisfactory,67,62,65
103,Female,Group C,Healthy,Satisfactory,83,87,85
104,Male,Group B,Mixed,Non-Satisfactory,80,77,82
105,Male,Group A,Unhealthy,Satisfactory,55,62,53
106,Female,Group C,Healthy,Non-Satisfactory,92,90,88
107,nan,Group B,Mixed,Satisfactory,78,83,78
108,Female,Group A,Unhealthy,Non-Satisfactory,72,65,70
109,Male,Group C,Healthy,Satisfactory,83,80,85
110,Female,Group B,Mixed,Non-Satisfactory,68,72,63
111,Male,Group A,Unhealthy,Satisfactory,60,63,63
112,nan,Group C,Healthy,Non-Satisfactory,72,78,73
113,Female,Group B,Mixed,Satisfactory,80,83,83
114,Male,Group A,Unhealthy,Non-Satisfactory,70,65,67
115,Female,Group C,Healthy,Satisfactory,90,87,92
116,Male,Group B,Mixed,Non-Satisfactory,85,82,80
117,Male,Group A,Unhealthy,Satisfactory,52,57,55
118,Female,Group C,Healthy,Non-Satisfactory,77,85,80
119,nan,Group B,Mixed,Satisfactory,68,70,68
120,Female,Group A,Unhealthy,Non-Satisfactory,53,60,58
121,Male,Group C,Healthy,Satisfactory,75,80,77
122,Female,Group B,Mixed,Non-Satisfactory,67,72,67
123,Male,Group B,Unhealthy,Satisfactory,70,67,72
124,Female,Group A,Mixed,Non-Satisfactory,62,57,60
125,nan,Group C,Healthy,Satisfactory,80,83,80
126,Male,Group B,Mixed,Non-Satisfactory,62,68,60
127,Male,Group A,Unhealthy,Satisfactory,55,60,58
128,Female,Group C,Healthy,Non-Satisfactory,92,90,85
129,Male,Group B,Mixed,Satisfactory,85,82,80
130,Female,Group A,Unhealthy,Non-Satisfactory,75,70,72
131,nan,Group C,Healthy,Satisfactory,77,83,78
132,Male,Group B,Mixed,Non-Satisfactory,80,77,82
133,Male,Group A,Unhealthy,Satisfactory,62,67,60
134,Female,Group C,Healthy,Non-Satisfactory,90,87,92
135,Male,Group B,Mixed,Satisfactory,78,83,78
136,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
137,Male,Group C,Healthy,Satisfactory,80,83,80
138,Male,Group B,Mixed,Non-Satisfactory,67,70,63
139,nan,Group A,Unhealthy,Satisfactory,65,62,65
140,Female,Group C,Healthy,Non-Satisfactory,88,83,87
141,Female,Group B,Mixed,Satisfactory,70,77,70
142,Male,Group A,Unhealthy,Non-Satisfactory,52,57,55
143,Male,Group C,Healthy,Satisfactory,85,80,82
144,Male,Group B,Mixed,Non-Satisfactory,82,80,83
145,nan,Group A,Unhealthy,Satisfactory,60,63,63
146,Female,Group C,Healthy,Non-Satisfactory,90,87,92
147,Female,Group B,Mixed,Satisfactory,75,72,77
148,Male,Group A,Unhealthy,Non-Satisfactory,57,60,54
149,nan,Group C,Healthy,Satisfactory,80,85,82
150,Female,Group B,Mixed,Non-Satisfactory,80,75,83
151,Male,Group A,Unhealthy,Satisfactory,78,75,79
152,Male,Group C,Healthy,Non-Satisfactory,92,90,88
153,nan,Group B,Mixed,Satisfactory,65,63,62
154,Female,Group A,Unhealthy,Non-Satisfactory,53,58,55
155,Male,Group C,Healthy,Satisfactory,83,87,82
156,Female,Group B,Mixed,Non-Satisfactory,85,80,83
157,Male,Group A,Unhealthy,Satisfactory,70,67,72
158,Male,Group C,Healthy,Non-Satisfactory,90,87,92
159,Female,Group B,Mixed,Satisfactory,68,70,68
160,Female,Group A,Unhealthy,Non-Satisfactory,67,60,70
161,nan,Group C,Healthy,Satisfactory,90,92,88
162,Male,Group B,Mixed,Non-Satisfactory,85,82,80
163,Male,Group A,Unhealthy,Satisfactory,65,62,65
164,Female,Group C,Healthy,Non-Satisfactory,83,87,85
165,nan,Group B,Mixed,Satisfactory,78,83,78
166,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
167,Male,Group C,Healthy,Satisfactory,80,83,80
168,Female,Group B,Mixed,Non-Satisfactory,67,70,63
169,Male,Group A,Unhealthy,Satisfactory,52,57,55
170,nan,Group C,Healthy,Non-Satisfactory,82,88,80
171,Male,Group B,Mixed,Satisfactory,80,83,83
172,Female,Group A,Unhealthy,Non-Satisfactory,75,70,72
173,Male,Group B,Healthy,Satisfactory,90,87,88
174,Male,Group B,Mixed,Non-Satisfactory,62,68,65
175,nan,Group A,Unhealthy,Satisfactory,62,57,63
176,Female,Group C,Healthy,Non-Satisfactory,77,85,80
177,Male,Group B,Mixed,Satisfactory,68,70,68
178,Male,Group A,Unhealthy,Non-Satisfactory,53,60,58
179,Female,Group C,Healthy,Satisfactory,90,87,92
180,Male,Group B,Mixed,Non-Satisfactory,70,67,75
181,nan,Group A,Unhealthy,Satisfactory,65,62,65
182,Female,Group C,Healthy,Non-Satisfactory,83,87,85
183,nan,Group A,Mixed,Satisfactory,75,78,77
184,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
185,Male,Group C,Healthy,Satisfactory,80,83,80
186,Male,Group A,Mixed,Non-Satisfactory,85,82,80
187,Male,Group A,Unhealthy,Satisfactory,78,75,79
188,nan,Group C,Healthy,Non-Satisfactory,80,85,83
189,Female,Group B,Mixed,Satisfactory,70,77,70
190,Male,Group A,Unhealthy,Non-Satisfactory,57,60,54
191,nan,Group C,Healthy,Satisfactory,92,90,85
192,Female,Group B,Mixed,Non-Satisfactory,80,75,83
193,Male,Group A,Unhealthy,Satisfactory,53,58,55
194,nan,Group C,Healthy,Non-Satisfactory,75,78,77
195,Female,Group B,Mixed,Satisfactory,65,63,62
196,Female,Group A,Unhealthy,Non-Satisfactory,67,60,70
197,Male,Group A,Healthy,Satisfactory,85,80,87
198,Male,Group B,Mixed,Non-Satisfactory,85,82,80
199,Male,Group A,Unhealthy,Satisfactory,72,65,70
200,nan,Group C,Healthy,Non-Satisfactory,90,87,92
201,Female,Group B,Mixed,Satisfactory,68,70,68
202,Female,Group A,Unhealthy,Non-Satisfactory,62,57,63
203,nan,Group A,Healthy,Satisfactory,82,88,80
204,Female,Group B,Mixed,Non-Satisfactory,80,77,82
205,Male,Group A,Unhealthy,Satisfactory,67,60,68
206,Male,Group A,Healthy,Non-Satisfactory,90,87,92
207,Female,Group B,Mixed,Satisfactory,78,83,78
208,Female,Group A,Unhealthy,Non-Satisfactory,72,65,70
209,nan,Group C,Healthy,Satisfactory,77,83,78
210,Male,Group B,Mixed,Non-Satisfactory,62,68,65
211,Male,Group A,Unhealthy,Satisfactory,53,58,55
212,Male,Group A,Healthy,Non-Satisfactory,92,90,85
213,Female,Group B,Mixed,Satisfactory,68,70,68
214,Female,Group A,Unhealthy,Non-Satisfactory,75,70,72
215,nan,Group B,Healthy,Satisfactory,77,83,78
216,Female,Group B,Mixed,Non-Satisfactory,67,70,63
217,Male,Group A,Unhealthy,Satisfactory,52,57,55
218,nan,Group C,Healthy,Non-Satisfactory,90,87,92
219,Female,Group B,Mixed,Satisfactory,85,82,80
220,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
221,Male,Group A,Healthy,Satisfactory,80,83,80
222,Male,Group B,Mixed,Non-Satisfactory,60,63,63
223,Male,Group A,Unhealthy,Satisfactory,78,75,79
224,Female,Group C,Healthy,Non-Satisfactory,75,78,77
225,nan,Group B,Mixed,Satisfactory,70,67,72
226,Male,Group A,Unhealthy,Non-Satisfactory,70,65,67
227,nan,Group C,Healthy,Satisfactory,90,92,88
228,Female,Group B,Mixed,Non-Satisfactory,85,82,80
229,Male,Group A,Unhealthy,Satisfactory,65,62,65
230,Female,Group C,Healthy,Non-Satisfactory,83,87,85
231,nan,Group B,Mixed,Satisfactory,75,78,77
232,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
233,Male,Group C,Healthy,Satisfactory,80,83,80
234,Male,Group B,Mixed,Non-Satisfactory,85,82,80
235,Male,Group A,Unhealthy,Satisfactory,78,75,79
236,Female,Group C,Healthy,Non-Satisfactory,83,87,85
237,nan,Group A,Mixed,Satisfactory,80,83,83
238,Female,Group B,Mixed,Non-Satisfactory,75,70,77
239,Male,Group A,Unhealthy,Non-Satisfactory,62,57,63
240,nan,Group C,Healthy,Non-Satisfactory,82,88,80
241,Female,Group B,Mixed,Satisfactory,80,77,82
242,Male,Group A,Unhealthy,Satisfactory,60,63,63
243,Female,Group C,Healthy,Non-Satisfactory,90,87,92
244,Male,Group B,Mixed,Non-Satisfactory,82,80,83
245,nan,Group C,Healthy,Satisfactory,77,83,78
246,Male,Group B,Mixed,Non-Satisfactory,72,68,70
247,Female,Group A,Unhealthy,Satisfactory,65,62,65
248,Male,Group C,Healthy,Non-Satisfactory,80,85,83
249,Female,Group A,Mixed,Non-Satisfactory,70,65,67
250,nan,Group C,Healthy,Non-Satisfactory,83,80,85
251,Female,Group B,Mixed,Satisfactory,68,70,68
252,Female,Group A,Unhealthy,Non-Satisfactory,62,57,63
253,Male,Group C,Healthy,Satisfactory,92,90,88
254,Female,Group B,Mixed,Non-Satisfactory,80,75,83
255,nan,Group C,Healthy,Satisfactory,90,92,88
256,Female,Group B,Mixed,Satisfactory,70,77,70
257,Male,Group A,Unhealthy,Non-Satisfactory,52,57,55
258,nan,Group C,Healthy,Non-Satisfactory,75,78,77
259,Female,Group B,Mixed,Non-Satisfactory,80,77,82
260,Male,Group A,Unhealthy,Satisfactory,55,62,58
261,nan,Group C,Healthy,Satisfactory,82,88,80
262,Female,Group B,Mixed,Non-Satisfactory,72,65,70
263,Male,Group A,Unhealthy,Non-Satisfactory,65,62,65
264,Female,Group C,Healthy,Non-Satisfactory,90,87,92
265,Male,Group B,Mixed,Satisfactory,77,85,82
266,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
267,nan,Group C,Healthy,Satisfactory,83,80,85
268,Female,Group B,Mixed,Non-Satisfactory,85,82,80
269,Male,Group A,Unhealthy,Satisfactory,62,57,63
270,Female,Group C,Healthy,Non-Satisfactory,77,85,80
271,nan,Group B,Mixed,Satisfactory,70,67,72
272,Male,Group A,Unhealthy,Non-Satisfactory,53,60,58
273,Male,Group C,Healthy,Satisfactory,75,80,77
274,Female,Group B,Mixed,Non-Satisfactory,80,75,83
275,Male,Group A,Unhealthy,Satisfactory,52,57,55
276,nan,Group C,Healthy,Non-Satisfactory,92,90,85
277,Female,Group B,Mixed,Satisfactory,68,72,65
278,Male,Group A,Unhealthy,Non-Satisfactory,70,65,67
279,nan,Group C,Healthy,Satisfactory,80,83,80
280,Female,Group B,Mixed,Non-Satisfactory,75,72,75
281,Male,Group A,Unhealthy,Satisfactory,57,60,54
282,Female,Group C,Healthy,Non-Satisfactory,78,83,77
283,nan,Group B,Mixed,Satisfactory,70,67,72
284,Female,Group A,Unhealthy,Non-Satisfactory,62,57,63
285,Male,Group C,Healthy,Satisfactory,90,87,88
286,Male,Group B,Mixed,Non-Satisfactory,82,80,83
287,nan,Group C,Healthy,Satisfactory,77,83,78
288,Female,Group B,Mixed,Non-Satisfactory,72,70,73
289,Male,Group A,Unhealthy,Satisfactory,65,62,65
290,Female,Group C,Healthy,Non-Satisfactory,90,87,92
291,nan,Group B,Mixed,Satisfactory,70,63,60
292,Female,Group A,Unhealthy,Non-Satisfactory,55,62,58
293,Male,Group C,Healthy,Satisfactory,75,80,77
294,Male,Group B,Mixed,Non-Satisfactory,85,82,80
295,nan,Group A,Mixed,Satisfactory,80,75,77
296,Female,Group C,Healthy,Non-Satisfactory,77,83,78
297,Female,Group B,Mixed,Non-Satisfactory,67,72,67
298,Male,Group A,Unhealthy,Satisfactory,67,60,68
299,Male,Group B,Healthy,Satisfactory,88,85,87
300,Female,Group A,Mixed,Non-Satisfactory,78,75,79
301,Male,Group C,Unhealthy,Satisfactory,75,78,72
302,Female,Group B,Mixed,Non-Satisfactory,72,65,70
303,Male,Group A,Healthy,Non-Satisfactory,85,82,80
304,Female,Group C,Healthy,Non-Satisfactory,77,83,78
305,Male,Group A,Mixed,Non-Satisfactory,72,65,70
306,Female,Group B,Unhealthy,Satisfactory,72,78,70
307,nan,Group A,Healthy,Satisfactory,82,88,80
308,Female,Group C,Mixed,Non-Satisfactory,72,75,77
309,Male,Group B,Mixed,Non-Satisfactory,62,68,65
310,Female,Group A,Unhealthy,Satisfactory,53,60,58
311,nan,Group C,Healthy,Satisfactory,90,92,88
312,Female,Group B,Mixed,Non-Satisfactory,80,77,82
313,Male,Group A,Unhealthy,Non-Satisfactory,67,60,68
314,nan,Group C,Healthy,Satisfactory,77,83,78
315,Female,Group B,Mixed,Satisfactory,75,72,75
316,Male,Group A,Unhealthy,Non-Satisfactory,52,57,55
317,Female,Group C,Healthy,Non-Satisfactory,90,87,92
318,Male,Group B,Mixed,Non-Satisfactory,85,82,80
Writing students.csv
Step 1: Analyze the Dataset¶
Send a prompt with instructions that uses data from the students.csv
file attached to the Code Interpreter call.
Understanding the Dataset Using Plots¶
In this step you are going to use Gemini to generate plot ideas. Provide the first 30 rows of the CSV and prompt Gemini in natural language to propose plots. Then you will use Code Interpreter to execute those plot ideas.
from vertexai.preview.generative_models import (
GenerativeModel,
Part,
HarmCategory,
HarmBlockThreshold )
from pathlib import Path
model = GenerativeModel("gemini-1.0-pro-001")
csv_content = Path("students.csv").read_text().split('\n')
sample = '\n'.join(csv_content[:30])
prompt = f"""
Data sample:
{sample}
You are a data scientist and you are using Code Interpreter to run data
operations and generate plots/charts. Code interpreter generates code from
natural language instructions.
Based on the data, create about 8 prompt instructions in natural language for
Code Interpreter to use to create code that generates plots that help you
understand the data.
Do not use StudentID as it is unique identifier.
There is no time attribute in the dataset so do not suggest plotting something over time.
You can use boxplots, pie charts, scatter charts, and bar charts."""
ideas = model.generate_content(
prompt,
generation_config={
"max_output_tokens": 2048,
"temperature": 0.1,
"top_p": 1
},
safety_settings={
HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
stream=False,
)
print(f"Gemini responded with the following suggestions: \n\n{ideas.text}")
Gemini responded with the following suggestions: 1. Create a boxplot to show the distribution of Reading scores for each Gender. 2. Create a pie chart to show the proportion of students in each ExtraActivitiesGroup. 3. Create a scatter chart to show the relationship between EatingHabits and SleepingHabits. 4. Create a bar chart to show the average Maths score for each ExtraActivitiesGroup. 5. Create a boxplot to show the distribution of Writing scores for each EatingHabits category. 6. Create a scatter chart to show the relationship between Reading and Writing scores. 7. Create a bar chart to show the average Maths score for each SleepingHabits category. 8. Create a pie chart to show the proportion of students with Satisfactory SleepingHabits for each Gender.
Thank you Gemini! Next, ask Code Interpreter to plot these ideas.
Note: Code Interpreter might fail to plot some of the suggestions because they might be poorly defined. In the instructions below you are asking Code Interpreter to interate over those ideas, and if there is a failure to simply continue with the next plot idea and not fail. Basically, you are asking Code Interpreter to plot as many of the ideas as possible.
response = run_code_interpreter(instructions=f"""
Create the following plots.
Make sure each plot is in its own file and do not overlay multiple plots, so for every plot reset the process.
Make sure plots have visible numbers or percentages when applicable and labels.
If any of the following produces an exception make sure you catch it and continue to the next item in the list:
{ideas.text}
""", filenames= ['students.csv'])
process_response(response)
The generated code produced an error pie requires either y column or 'subplots=True' -Automatic retry attempt # 1/5
Generated Code by Code Interpreter
```python
import pandas as pd
import matplotlib.pyplot as plt
# Read the data from the CSV file
data = pd.read_csv("students.csv")
# 1. Boxplot of Reading scores for each Gender
try:
plt.figure()
data.boxplot(column="Reading", by="Gender")
plt.xlabel("Gender")
plt.ylabel("Reading Score")
plt.title("Distribution of Reading Scores by Gender")
plt.savefig("boxplot_reading_gender.png")
except Exception as e:
print(f"Error creating boxplot of Reading scores for each Gender: {e}")
# 2. Pie chart of ExtraActivitiesGroup proportions
try:
plt.figure()
data["ExtraActivitiesGroup"].value_counts().plot(kind="pie", autopct="%1.1f%%")
plt.title("Proportion of Students in Each ExtraActivitiesGroup")
plt.savefig("piechart_extraactivitiesgroup.png")
except Exception as e:
print(f"Error creating pie chart of ExtraActivitiesGroup proportions: {e}")
# 3. Scatter plot of EatingHabits and SleepingHabits
try:
plt.figure()
plt.scatter(data["EatingHabits"], data["SleepingHabits"])
plt.xlabel("Eating Habits")
plt.ylabel("Sleeping Habits")
plt.title("Relationship between Eating Habits and Sleeping Habits")
plt.savefig("scatterplot_eatinghabits_sleepinghabits.png")
except Exception as e:
print(f"Error creating scatter plot of EatingHabits and SleepingHabits: {e}")
# 4. Bar chart of average Maths score for each ExtraActivitiesGroup
try:
plt.figure()
data.groupby("ExtraActivitiesGroup")["Maths"].mean().plot(kind="bar")
plt.xlabel("ExtraActivitiesGroup")
plt.ylabel("Average Maths Score")
plt.title("Average Maths Score for Each ExtraActivitiesGroup")
plt.savefig("barchart_maths_extraactivitiesgroup.png")
except Exception as e:
print(f"Error creating bar chart of average Maths score for each ExtraActivitiesGroup: {e}")
# 5. Boxplot of Writing scores for each EatingHabits category
try:
plt.figure()
data.boxplot(column="Writing", by="EatingHabits")
plt.xlabel("Eating Habits")
plt.ylabel("Writing Score")
plt.title("Distribution of Writing Scores by Eating Habits")
plt.savefig("boxplot_writing_eatinghabits.png")
except Exception as e:
print(f"Error creating boxplot of Writing scores for each EatingHabits category: {e}")
# 6. Scatter plot of Reading and Writing scores
try:
plt.figure()
plt.scatter(data["Reading"], data["Writing"])
plt.xlabel("Reading Score")
plt.ylabel("Writing Score")
plt.title("Relationship between Reading and Writing Scores")
plt.savefig("scatterplot_reading_writing.png")
except Exception as e:
print(f"Error creating scatter plot of Reading and Writing scores: {e}")
# 7. Bar chart of average Maths score for each SleepingHabits category
try:
plt.figure()
data.groupby("SleepingHabits")["Maths"].mean().plot(kind="bar")
plt.xlabel("Sleeping Habits")
plt.ylabel("Average Maths Score")
plt.title("Average Maths Score for Each SleepingHabits Category")
plt.savefig("barchart_maths_sleepinghabits.png")
except Exception as e:
print(f"Error creating bar chart of average Maths score for each SleepingHabits category: {e}")
# 8. Pie chart of proportion of students with Satisfactory SleepingHabits for each Gender
try:
plt.figure()
data.groupby(["Gender", "SleepingHabits"])["SleepingHabits"].count().unstack().loc["Satisfactory"].plot(kind="pie", autopct="%1.1f%%")
plt.title("Proportion of Students with Satisfactory SleepingHabits for Each Gender")
plt.savefig("piechart_satisfactorysleepinghabits_gender.png")
except Exception as e:
print(f"Error creating pie chart of proportion of students with Satisfactory SleepingHabits for each Gender: {e}")
```
Code Execution Results
Executed Code Output:
Error creating pie chart of proportion of students with Satisfactory SleepingHabits for each Gender: 'Satisfactory'
Files Created (Click on filename to view content):
code_execution_image_10_CCIsZva6Hs-S2ukPo6my0Ak.png
code_execution_image_9_CCIsZva6Hs-S2ukPo6my0Ak.png
code_execution_image_8_CCIsZva6Hs-S2ukPo6my0Ak.png
code_execution_image_7_CCIsZva6Hs-S2ukPo6my0Ak.png
code_execution_image_6_CCIsZva6Hs-S2ukPo6my0Ak.png
code_execution_image_5_CCIsZva6Hs-S2ukPo6my0Ak.png
code_execution_image_4_CCIsZva6Hs-S2ukPo6my0Ak.png
code_execution_image_3_CCIsZva6Hs-S2ukPo6my0Ak.png
code_execution_image_2_CCIsZva6Hs-S2ukPo6my0Ak.png
code_execution_image_1_CCIsZva6Hs-S2ukPo6my0Ak.png
barchart_maths_sleepinghabits.png
scatterplot_reading_writing.png
boxplot_writing_eatinghabits.png
barchart_maths_extraactivitiesgroup.png
scatterplot_eatinghabits_sleepinghabits.png
piechart_extraactivitiesgroup.png
boxplot_reading_gender.png
You may notice some generated errors, and/or some plots that look strange or are entirely blank. Maybe there's some issues with the data? Check if you have any missing values in the data.
response = run_code_interpreter(instructions="Are there any missing values in my data? show results in a nice table",
filenames= ['students.csv'])
process_response(response)
Generated Code by Code Interpreter
```python
import pandas as pd
# Load the data from the CSV file
data = pd.read_csv("students.csv")
# Check for missing values
missing_values_count = data.isnull().sum()
# Create a DataFrame to display the results
missing_values_df = pd.DataFrame({"Column": missing_values_count.index, "Missing Values": missing_values_count.values})
# Print the DataFrame
print(missing_values_df.to_string())
```
Code Execution Results
Executed Code Output:
Column Missing Values 0 StudentID 0 1 Gender 62 2 ExtraActivitiesGroup 2 3 EatingHabits 0 4 SleepingHabits 0 5 Reading 1 6 Writing 0 7 Maths 0
Files Created (Click on filename to view content):
No Files generated from the code
You can also use Code Interpreter to generate a statistics report.
response = run_code_interpreter("Generate a detailed statistics report from the data.",
filenames= ['students.csv'])
process_response(response)
The generated code produced an error agg function failed [how->mean,dtype->object] -Automatic retry attempt # 1/5
Generated Code by Code Interpreter
```python
import pandas as pd
# Read the data from the CSV file
data = pd.read_csv("students.csv")
# Generate descriptive statistics for each numerical column
numerical_columns = ["Reading", "Writing", "Maths"]
descriptive_stats = data[numerical_columns].describe()
# Print the descriptive statistics
print("Descriptive Statistics:")
print(descriptive_stats)
# Calculate the percentage of students in each gender category
gender_counts = data["Gender"].value_counts(normalize=True) * 100
# Print the percentage of students in each gender category
print("\nPercentage of Students in Each Gender Category:")
print(gender_counts)
# Calculate the percentage of students in each ExtraActivitiesGroup category
extra_activities_group_counts = data["ExtraActivitiesGroup"].value_counts(normalize=True) * 100
# Print the percentage of students in each ExtraActivitiesGroup category
print("\nPercentage of Students in Each ExtraActivitiesGroup Category:")
print(extra_activities_group_counts)
# Calculate the percentage of students in each EatingHabits category
eating_habits_counts = data["EatingHabits"].value_counts(normalize=True) * 100
# Print the percentage of students in each EatingHabits category
print("\nPercentage of Students in Each EatingHabits Category:")
print(eating_habits_counts)
# Calculate the percentage of students in each SleepingHabits category
sleeping_habits_counts = data["SleepingHabits"].value_counts(normalize=True) * 100
# Print the percentage of students in each SleepingHabits category
print("\nPercentage of Students in Each SleepingHabits Category:")
print(sleeping_habits_counts)
# Calculate the correlation between numerical columns
correlation_matrix = data[numerical_columns].corr()
# Print the correlation matrix
print("\nCorrelation Matrix:")
print(correlation_matrix)
# Calculate the mean of each numerical column grouped by gender
gender_grouped_means = data.groupby("Gender")[numerical_columns].mean()
# Print the mean of each numerical column grouped by gender
print("\nMean of Numerical Columns Grouped by Gender:")
print(gender_grouped_means)
# Calculate the mean of each numerical column grouped by ExtraActivitiesGroup
extra_activities_group_grouped_means = data.groupby("ExtraActivitiesGroup")[numerical_columns].mean()
# Print the mean of each numerical column grouped by ExtraActivitiesGroup
print("\nMean of Numerical Columns Grouped by ExtraActivitiesGroup:")
print(extra_activities_group_grouped_means)
# Calculate the mean of each numerical column grouped by EatingHabits
eating_habits_grouped_means = data.groupby("EatingHabits")[numerical_columns].mean()
# Print the mean of each numerical column grouped by EatingHabits
print("\nMean of Numerical Columns Grouped by EatingHabits:")
print(eating_habits_grouped_means)
# Calculate the mean of each numerical column grouped by SleepingHabits
sleeping_habits_grouped_means = data.groupby("SleepingHabits")[numerical_columns].mean()
# Print the mean of each numerical column grouped by SleepingHabits
print("\nMean of Numerical Columns Grouped by SleepingHabits:")
print(sleeping_habits_grouped_means)
```
Code Execution Results
Executed Code Output:
Descriptive Statistics: Reading Writing Maths count 317.000000 318.000000 318.000000 mean 73.022082 73.698113 73.088050 std 11.154105 10.507474 10.488921 min 50.000000 55.000000 53.000000 25% 65.000000 63.000000 65.000000 50% 75.000000 75.000000 73.000000 75% 80.000000 83.000000 82.000000 max 92.000000 92.000000 92.000000 Percentage of Students in Each Gender Category: Gender Male 52.34375 Female 47.65625 Name: proportion, dtype: float64 Percentage of Students in Each ExtraActivitiesGroup Category: ExtraActivitiesGroup Group A 35.443038 Group B 33.860759 Group C 30.696203 Name: proportion, dtype: float64 Percentage of Students in Each EatingHabits Category: EatingHabits Mixed 34.905660 Healthy 33.333333 Unhealthy 31.761006 Name: proportion, dtype: float64 Percentage of Students in Each SleepingHabits Category: SleepingHabits Non-Satisfactory 51.572327 Satisfactory 48.427673 Name: proportion, dtype: float64 Correlation Matrix: Reading Writing Maths Reading 1.000000 0.912343 0.967204 Writing 0.912343 1.000000 0.914739 Maths 0.967204 0.914739 1.000000 Mean of Numerical Columns Grouped by Gender: Reading Writing Maths Gender Female 73.661157 74.057377 73.950820 Male 70.910448 71.552239 70.902985 Mean of Numerical Columns Grouped by ExtraActivitiesGroup: Reading Writing Maths ExtraActivitiesGroup Group A 64.705357 64.607143 65.107143 Group B 73.103774 73.570093 72.700935 Group C 82.443299 84.226804 82.556701 Mean of Numerical Columns Grouped by EatingHabits: Reading Writing Maths EatingHabits Healthy 82.783019 84.518868 82.792453 Mixed 73.490909 73.387387 73.036036 Unhealthy 62.267327 62.683168 62.960396 Mean of Numerical Columns Grouped by SleepingHabits: Reading Writing Maths SleepingHabits Non-Satisfactory 73.177914 73.170732 73.103659 Satisfactory 72.857143 74.259740 73.071429
Files Created (Click on filename to view content):
No Files generated from the code
Plot a correlation matrix.
response = run_code_interpreter("""
Plot a correlation matrix of the Maths, Reading, and Writing fields.
First set the seaborn font scale to 0.5.
Make width and height to 4 using figsize.
Use Blue base gradient for coloring where dark blue means high correlation.""",
filenames= ['students.csv'])
process_response(response)
Generated Code by Code Interpreter
```python
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Read data from CSV file
data = pd.read_csv("students.csv")
# Select relevant columns
data = data[["Maths", "Reading", "Writing"]]
# Set seaborn font scale
sns.set(font_scale=0.5)
# Create correlation matrix
corr = data.corr()
# Plot correlation matrix with blue base gradient
plt.figure(figsize=(4, 4))
sns.heatmap(corr, annot=True, cmap="Blues")
# Set title and labels
plt.title("Correlation Matrix of Maths, Reading, and Writing")
plt.xlabel("Features")
plt.ylabel("Features")
# Display plot
plt.show()
```
Code Execution Results
Executed Code Output:
Code does not produce printable output.
Files Created (Click on filename to view content):
code_execution_image_1_SCIsZu_INM-S2ukPo6my0Ak.png
Step 2: Clean the Dataset¶
In this step you will fix some issues identified in the analysis above.
Fix Missing Values¶
Fix the missing values issue in the dataset and produce a new file students_clean.csv.
You'll see in the example below that Code Interpreter is instructed to ignore FutureWarnings. This is because Code Interpreter favors pandas for data transformations, and pandas throws many non-fatal warnings. The run_code_interpreter
method will retry code that throws errors, but since the pandas warnings are non-fatal we don't want to retry code that only has warnings in this particular case.
instr = """
Use the warnings library to supress all category=FutureWarning.
Replace Gender missing values with Unknown.
Replace missing ExtraActivitiesGroup values with Group X.
Replace missing Reading, Writing, or Maths values with the mean value of that column.
Write the results in students_clean.csv.
"""
response = run_code_interpreter(instructions=instr, filenames= ['students.csv'])
process_response(response)
Generated Code by Code Interpreter
```python
import pandas as pd
import warnings
# Suppress FutureWarnings
warnings.simplefilter(action="ignore", category=FutureWarning)
# Read data from CSV file
df = pd.read_csv("students.csv")
# Replace missing values in Gender column with "Unknown"
df["Gender"].fillna("Unknown", inplace=True)
# Replace missing values in ExtraActivitiesGroup column with "Group X"
df["ExtraActivitiesGroup"].fillna("Group X", inplace=True)
# Calculate mean values for Reading, Writing, and Maths columns
mean_reading = df["Reading"].mean()
mean_writing = df["Writing"].mean()
mean_maths = df["Maths"].mean()
# Replace missing values in Reading, Writing, and Maths columns with the mean values
df["Reading"].fillna(mean_reading, inplace=True)
df["Writing"].fillna(mean_writing, inplace=True)
df["Maths"].fillna(mean_maths, inplace=True)
# Write the cleaned data to a new CSV file
df.to_csv("students_clean.csv", index=False)
```
Code Execution Results
Executed Code Output:
Code does not produce printable output.
Files Created (Click on filename to view content):
students_clean.csv
| StudentID | Gender | ExtraActivitiesGroup | EatingHabits | SleepingHabits | Reading | Writing | Maths | |------------:|:---------|:-----------------------|:---------------|:-----------------|----------:|----------:|--------:| | 1 | Male | Group X | Healthy | Satisfactory | 75 | 80 | 78 | | 2 | Female | Group B | Mixed | Non-Satisfactory | 73.0221 | 70 | 67 | | 3 | Unknown | Group A | Unhealthy | Satisfactory | 55 | 60 | 58 | | 4 | Female | Group C | Healthy | Non-Satisfactory | 70 | 75 | 73 | | 5 | Male | Group B | Mixed | Satisfactory | 60 | 65 | 63 | | 6 | Female | Group A | Unhealthy | Non-Satisfactory | 50 | 55 | 53 | | 7 | Male | Group C | Healthy | Satisfactory | 80 | 85 | 83 | | 8 | Female | Group B | Mixed | Non-Satisfactory | 65 | 70 | 67 | | 9 | Male | Group A | Unhealthy | Satisfactory | 55 | 60 | 58 | | 10 | Male | Group X | Mixed | Non-Satisfactory | 80 | 78 | 85 | | 11 | Female | Group B | Unhealthy | Satisfactory | 65 | 68 | 70 | | 12 | Female | Group A | Healthy | Non-Satisfactory | 52 | 57 | 55 | | 13 | Unknown | Group C | Unhealthy | Satisfactory | 78 | 75 | 79 | | 14 | Female | Group B | Mixed | Non-Satisfactory | 63 | 70 | 65 | | 15 | Male | Group A | Healthy | Satisfactory | 82 | 87 | 80 | | 16 | Male | Group C | Unhealthy | Non-Satisfactory | 57 | 60 | 54 | | 17 | Female | Group A | Mixed | Satisfactory | 67 | 65 | 63 | | 18 | Male | Group B | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 19 | Unknown | Group C | Healthy | Satisfactory | 88 | 85 | 87 | | 20 | Female | Group B | Mixed | Non-Satisfactory | 67 | 75 | 68 | | 21 | Male | Group A | Unhealthy | Satisfactory | 53 | 58 | 55 | | 22 | Female | Group C | Healthy | Non-Satisfactory | 80 | 77 | 82 | | 23 | Male | Group A | Mixed | Satisfactory | 60 | 63 | 60 | | 24 | Female | Group B | Unhealthy | Non-Satisfactory | 65 | 62 | 60 | | 25 | Male | Group C | Healthy | Satisfactory | 90 | 92 | 88 | | 26 | Female | Group B | Mixed | Non-Satisfactory | 58 | 65 | 60 | | 27 | Male | Group A | Unhealthy | Satisfactory | 67 | 60 | 65 | | 28 | Male | Group C | Healthy | Non-Satisfactory | 72 | 78 | 73 | | 29 | Female | Group A | Mixed | Satisfactory | 55 | 62 | 58 | | 30 | Male | Group B | Unhealthy | Non-Satisfactory | 78 | 75 | 72 | | 31 | Female | Group C | Healthy | Satisfactory | 85 | 87 | 83 | | 32 | Female | Group A | Mixed | Non-Satisfactory | 70 | 65 | 67 | | 33 | Male | Group B | Unhealthy | Satisfactory | 62 | 67 | 65 | | 34 | Male | Group C | Healthy | Non-Satisfactory | 77 | 83 | 75 | | 35 | Unknown | Group A | Mixed | Satisfactory | 65 | 63 | 60 | | 36 | Female | Group B | Unhealthy | Non-Satisfactory | 72 | 78 | 70 | | 37 | Male | Group C | Healthy | Satisfactory | 80 | 87 | 83 | | 38 | Female | Group A | Mixed | Non-Satisfactory | 75 | 70 | 72 | | 39 | Male | Group B | Unhealthy | Satisfactory | 65 | 67 | 60 | | 40 | Unknown | Group C | Healthy | Non-Satisfactory | 82 | 88 | 80 | | 41 | Female | Group A | Mixed | Satisfactory | 77 | 72 | 70 | | 42 | Male | Group B | Unhealthy | Non-Satisfactory | 67 | 62 | 63 | | 43 | Male | Group C | Healthy | Satisfactory | 92 | 90 | 88 | | 44 | Female | Group A | Mixed | Non-Satisfactory | 80 | 75 | 77 | | 45 | Unknown | Group B | Unhealthy | Satisfactory | 72 | 75 | 73 | | 46 | Female | Group C | Healthy | Non-Satisfactory | 83 | 80 | 85 | | 47 | Male | Group A | Mixed | Satisfactory | 75 | 72 | 73 | | 48 | Male | Group B | Unhealthy | Non-Satisfactory | 60 | 63 | 58 | | 49 | Unknown | Group C | Healthy | Satisfactory | 90 | 92 | 88 | | 50 | Female | Group A | Mixed | Non-Satisfactory | 85 | 80 | 82 | | 51 | Male | Group B | Unhealthy | Satisfactory | 70 | 67 | 65 | | 52 | Female | Group C | Healthy | Non-Satisfactory | 78 | 83 | 77 | | 53 | Male | Group B | Mixed | Satisfactory | 65 | 63 | 62 | | 54 | Male | Group A | Unhealthy | Non-Satisfactory | 52 | 57 | 55 | | 55 | Unknown | Group C | Healthy | Satisfactory | 75 | 78 | 73 | | 56 | Female | Group B | Mixed | Non-Satisfactory | 70 | 77 | 72 | | 57 | Male | Group A | Unhealthy | Satisfactory | 62 | 65 | 63 | | 58 | Female | Group C | Healthy | Non-Satisfactory | 88 | 85 | 83 | | 59 | Male | Group B | Mixed | Satisfactory | 78 | 80 | 77 | | 60 | Unknown | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 65 | | 61 | Female | Group C | Healthy | Satisfactory | 83 | 80 | 82 | | 62 | Male | Group B | Mixed | Non-Satisfactory | 72 | 68 | 70 | | 63 | Male | Group A | Unhealthy | Satisfactory | 62 | 57 | 60 | | 64 | Female | Group C | Healthy | Non-Satisfactory | 90 | 87 | 88 | | 65 | Male | Group B | Mixed | Satisfactory | 85 | 82 | 80 | | 66 | Unknown | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 67 | Female | Group C | Healthy | Satisfactory | 77 | 85 | 80 | | 68 | Male | Group B | Mixed | Non-Satisfactory | 65 | 72 | 67 | | 69 | Male | Group A | Unhealthy | Satisfactory | 67 | 60 | 68 | | 70 | Female | Group C | Healthy | Non-Satisfactory | 92 | 90 | 85 | | 71 | Male | Group B | Mixed | Satisfactory | 77 | 85 | 82 | | 72 | Unknown | Group A | Unhealthy | Non-Satisfactory | 62 | 55 | 60 | | 73 | Female | Group C | Healthy | Satisfactory | 83 | 87 | 85 | | 74 | Male | Group B | Mixed | Non-Satisfactory | 68 | 72 | 65 | | 75 | Male | Group A | Unhealthy | Satisfactory | 53 | 58 | 55 | | 76 | Unknown | Group C | Healthy | Non-Satisfactory | 88 | 83 | 87 | | 77 | Female | Group B | Mixed | Satisfactory | 72 | 70 | 73 | | 78 | Male | Group A | Unhealthy | Non-Satisfactory | 70 | 65 | 67 | | 79 | Male | Group C | Healthy | Satisfactory | 80 | 85 | 80 | | 80 | Female | Group B | Mixed | Non-Satisfactory | 75 | 72 | 75 | | 81 | Unknown | Group A | Unhealthy | Satisfactory | 55 | 60 | 58 | | 82 | Female | Group C | Healthy | Non-Satisfactory | 80 | 77 | 82 | | 83 | Male | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 84 | Male | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 85 | Female | Group C | Healthy | Satisfactory | 90 | 92 | 88 | | 86 | Unknown | Group B | Mixed | Non-Satisfactory | 67 | 72 | 67 | | 87 | Female | Group A | Unhealthy | Satisfactory | 53 | 60 | 58 | | 88 | Male | Group C | Healthy | Non-Satisfactory | 75 | 78 | 73 | | 89 | Male | Group B | Mixed | Satisfactory | 82 | 80 | 83 | | 90 | Unknown | Group A | Unhealthy | Non-Satisfactory | 65 | 62 | 63 | | 91 | Female | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 92 | Male | Group B | Mixed | Non-Satisfactory | 85 | 80 | 82 | | 93 | Male | Group A | Unhealthy | Satisfactory | 62 | 67 | 65 | | 94 | Unknown | Group C | Healthy | Non-Satisfactory | 90 | 87 | 92 | | 95 | Female | Group B | Mixed | Satisfactory | 77 | 75 | 78 | | 96 | Female | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 68 | | 97 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 98 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 65 | | 99 | Male | Group A | Unhealthy | Satisfactory | 52 | 57 | 58 | | 100 | Female | Group C | Healthy | Non-Satisfactory | 72 | 75 | 77 | | 101 | Male | Group B | Mixed | Satisfactory | 70 | 67 | 72 | | 102 | Unknown | Group A | Unhealthy | Non-Satisfactory | 67 | 62 | 65 | | 103 | Female | Group C | Healthy | Satisfactory | 83 | 87 | 85 | | 104 | Male | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 105 | Male | Group A | Unhealthy | Satisfactory | 55 | 62 | 53 | | 106 | Female | Group C | Healthy | Non-Satisfactory | 92 | 90 | 88 | | 107 | Unknown | Group B | Mixed | Satisfactory | 78 | 83 | 78 | | 108 | Female | Group A | Unhealthy | Non-Satisfactory | 72 | 65 | 70 | | 109 | Male | Group C | Healthy | Satisfactory | 83 | 80 | 85 | | 110 | Female | Group B | Mixed | Non-Satisfactory | 68 | 72 | 63 | | 111 | Male | Group A | Unhealthy | Satisfactory | 60 | 63 | 63 | | 112 | Unknown | Group C | Healthy | Non-Satisfactory | 72 | 78 | 73 | | 113 | Female | Group B | Mixed | Satisfactory | 80 | 83 | 83 | | 114 | Male | Group A | Unhealthy | Non-Satisfactory | 70 | 65 | 67 | | 115 | Female | Group C | Healthy | Satisfactory | 90 | 87 | 92 | | 116 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 117 | Male | Group A | Unhealthy | Satisfactory | 52 | 57 | 55 | | 118 | Female | Group C | Healthy | Non-Satisfactory | 77 | 85 | 80 | | 119 | Unknown | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 120 | Female | Group A | Unhealthy | Non-Satisfactory | 53 | 60 | 58 | | 121 | Male | Group C | Healthy | Satisfactory | 75 | 80 | 77 | | 122 | Female | Group B | Mixed | Non-Satisfactory | 67 | 72 | 67 | | 123 | Male | Group B | Unhealthy | Satisfactory | 70 | 67 | 72 | | 124 | Female | Group A | Mixed | Non-Satisfactory | 62 | 57 | 60 | | 125 | Unknown | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 126 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 60 | | 127 | Male | Group A | Unhealthy | Satisfactory | 55 | 60 | 58 | | 128 | Female | Group C | Healthy | Non-Satisfactory | 92 | 90 | 85 | | 129 | Male | Group B | Mixed | Satisfactory | 85 | 82 | 80 | | 130 | Female | Group A | Unhealthy | Non-Satisfactory | 75 | 70 | 72 | | 131 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 132 | Male | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 133 | Male | Group A | Unhealthy | Satisfactory | 62 | 67 | 60 | | 134 | Female | Group C | Healthy | Non-Satisfactory | 90 | 87 | 92 | | 135 | Male | Group B | Mixed | Satisfactory | 78 | 83 | 78 | | 136 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 137 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 138 | Male | Group B | Mixed | Non-Satisfactory | 67 | 70 | 63 | | 139 | Unknown | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 140 | Female | Group C | Healthy | Non-Satisfactory | 88 | 83 | 87 | | 141 | Female | Group B | Mixed | Satisfactory | 70 | 77 | 70 | | 142 | Male | Group A | Unhealthy | Non-Satisfactory | 52 | 57 | 55 | | 143 | Male | Group C | Healthy | Satisfactory | 85 | 80 | 82 | | 144 | Male | Group B | Mixed | Non-Satisfactory | 82 | 80 | 83 | | 145 | Unknown | Group A | Unhealthy | Satisfactory | 60 | 63 | 63 | | 146 | Female | Group C | Healthy | Non-Satisfactory | 90 | 87 | 92 | | 147 | Female | Group B | Mixed | Satisfactory | 75 | 72 | 77 | | 148 | Male | Group A | Unhealthy | Non-Satisfactory | 57 | 60 | 54 | | 149 | Unknown | Group C | Healthy | Satisfactory | 80 | 85 | 82 | | 150 | Female | Group B | Mixed | Non-Satisfactory | 80 | 75 | 83 | | 151 | Male | Group A | Unhealthy | Satisfactory | 78 | 75 | 79 | | 152 | Male | Group C | Healthy | Non-Satisfactory | 92 | 90 | 88 | | 153 | Unknown | Group B | Mixed | Satisfactory | 65 | 63 | 62 | | 154 | Female | Group A | Unhealthy | Non-Satisfactory | 53 | 58 | 55 | | 155 | Male | Group C | Healthy | Satisfactory | 83 | 87 | 82 | | 156 | Female | Group B | Mixed | Non-Satisfactory | 85 | 80 | 83 | | 157 | Male | Group A | Unhealthy | Satisfactory | 70 | 67 | 72 | | 158 | Male | Group C | Healthy | Non-Satisfactory | 90 | 87 | 92 | | 159 | Female | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 160 | Female | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 70 | | 161 | Unknown | Group C | Healthy | Satisfactory | 90 | 92 | 88 | | 162 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 163 | Male | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 164 | Female | Group C | Healthy | Non-Satisfactory | 83 | 87 | 85 | | 165 | Unknown | Group B | Mixed | Satisfactory | 78 | 83 | 78 | | 166 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 167 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 168 | Female | Group B | Mixed | Non-Satisfactory | 67 | 70 | 63 | | 169 | Male | Group A | Unhealthy | Satisfactory | 52 | 57 | 55 | | 170 | Unknown | Group C | Healthy | Non-Satisfactory | 82 | 88 | 80 | | 171 | Male | Group B | Mixed | Satisfactory | 80 | 83 | 83 | | 172 | Female | Group A | Unhealthy | Non-Satisfactory | 75 | 70 | 72 | | 173 | Male | Group B | Healthy | Satisfactory | 90 | 87 | 88 | | 174 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 65 | | 175 | Unknown | Group A | Unhealthy | Satisfactory | 62 | 57 | 63 | | 176 | Female | Group C | Healthy | Non-Satisfactory | 77 | 85 | 80 | | 177 | Male | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 178 | Male | Group A | Unhealthy | Non-Satisfactory | 53 | 60 | 58 | | 179 | Female | Group C | Healthy | Satisfactory | 90 | 87 | 92 | | 180 | Male | Group B | Mixed | Non-Satisfactory | 70 | 67 | 75 | | 181 | Unknown | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 182 | Female | Group C | Healthy | Non-Satisfactory | 83 | 87 | 85 | | 183 | Unknown | Group A | Mixed | Satisfactory | 75 | 78 | 77 | | 184 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 185 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 186 | Male | Group A | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 187 | Male | Group A | Unhealthy | Satisfactory | 78 | 75 | 79 | | 188 | Unknown | Group C | Healthy | Non-Satisfactory | 80 | 85 | 83 | | 189 | Female | Group B | Mixed | Satisfactory | 70 | 77 | 70 | | 190 | Male | Group A | Unhealthy | Non-Satisfactory | 57 | 60 | 54 | | 191 | Unknown | Group C | Healthy | Satisfactory | 92 | 90 | 85 | | 192 | Female | Group B | Mixed | Non-Satisfactory | 80 | 75 | 83 | | 193 | Male | Group A | Unhealthy | Satisfactory | 53 | 58 | 55 | | 194 | Unknown | Group C | Healthy | Non-Satisfactory | 75 | 78 | 77 | | 195 | Female | Group B | Mixed | Satisfactory | 65 | 63 | 62 | | 196 | Female | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 70 | | 197 | Male | Group A | Healthy | Satisfactory | 85 | 80 | 87 | | 198 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 199 | Male | Group A | Unhealthy | Satisfactory | 72 | 65 | 70 | | 200 | Unknown | Group C | Healthy | Non-Satisfactory | 90 | 87 | 92 | | 201 | Female | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 202 | Female | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 203 | Unknown | Group A | Healthy | Satisfactory | 82 | 88 | 80 | | 204 | Female | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 205 | Male | Group A | Unhealthy | Satisfactory | 67 | 60 | 68 | | 206 | Male | Group A | Healthy | Non-Satisfactory | 90 | 87 | 92 | | 207 | Female | Group B | Mixed | Satisfactory | 78 | 83 | 78 | | 208 | Female | Group A | Unhealthy | Non-Satisfactory | 72 | 65 | 70 | | 209 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 210 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 65 | | 211 | Male | Group A | Unhealthy | Satisfactory | 53 | 58 | 55 | | 212 | Male | Group A | Healthy | Non-Satisfactory | 92 | 90 | 85 | | 213 | Female | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 214 | Female | Group A | Unhealthy | Non-Satisfactory | 75 | 70 | 72 | | 215 | Unknown | Group B | Healthy | Satisfactory | 77 | 83 | 78 | | 216 | Female | Group B | Mixed | Non-Satisfactory | 67 | 70 | 63 | | 217 | Male | Group A | Unhealthy | Satisfactory | 52 | 57 | 55 | | 218 | Unknown | Group C | Healthy | Non-Satisfactory | 90 | 87 | 92 | | 219 | Female | Group B | Mixed | Satisfactory | 85 | 82 | 80 | | 220 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 221 | Male | Group A | Healthy | Satisfactory | 80 | 83 | 80 | | 222 | Male | Group B | Mixed | Non-Satisfactory | 60 | 63 | 63 | | 223 | Male | Group A | Unhealthy | Satisfactory | 78 | 75 | 79 | | 224 | Female | Group C | Healthy | Non-Satisfactory | 75 | 78 | 77 | | 225 | Unknown | Group B | Mixed | Satisfactory | 70 | 67 | 72 | | 226 | Male | Group A | Unhealthy | Non-Satisfactory | 70 | 65 | 67 | | 227 | Unknown | Group C | Healthy | Satisfactory | 90 | 92 | 88 | | 228 | Female | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 229 | Male | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 230 | Female | Group C | Healthy | Non-Satisfactory | 83 | 87 | 85 | | 231 | Unknown | Group B | Mixed | Satisfactory | 75 | 78 | 77 | | 232 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 233 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 234 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 235 | Male | Group A | Unhealthy | Satisfactory | 78 | 75 | 79 | | 236 | Female | Group C | Healthy | Non-Satisfactory | 83 | 87 | 85 | | 237 | Unknown | Group A | Mixed | Satisfactory | 80 | 83 | 83 | | 238 | Female | Group B | Mixed | Non-Satisfactory | 75 | 70 | 77 | | 239 | Male | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 240 | Unknown | Group C | Healthy | Non-Satisfactory | 82 | 88 | 80 | | 241 | Female | Group B | Mixed | Satisfactory | 80 | 77 | 82 | | 242 | Male | Group A | Unhealthy | Satisfactory | 60 | 63 | 63 | | 243 | Female | Group C | Healthy | Non-Satisfactory | 90 | 87 | 92 | | 244 | Male | Group B | Mixed | Non-Satisfactory | 82 | 80 | 83 | | 245 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 246 | Male | Group B | Mixed | Non-Satisfactory | 72 | 68 | 70 | | 247 | Female | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 248 | Male | Group C | Healthy | Non-Satisfactory | 80 | 85 | 83 | | 249 | Female | Group A | Mixed | Non-Satisfactory | 70 | 65 | 67 | | 250 | Unknown | Group C | Healthy | Non-Satisfactory | 83 | 80 | 85 | | 251 | Female | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 252 | Female | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 253 | Male | Group C | Healthy | Satisfactory | 92 | 90 | 88 | | 254 | Female | Group B | Mixed | Non-Satisfactory | 80 | 75 | 83 | | 255 | Unknown | Group C | Healthy | Satisfactory | 90 | 92 | 88 | | 256 | Female | Group B | Mixed | Satisfactory | 70 | 77 | 70 | | 257 | Male | Group A | Unhealthy | Non-Satisfactory | 52 | 57 | 55 | | 258 | Unknown | Group C | Healthy | Non-Satisfactory | 75 | 78 | 77 | | 259 | Female | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 260 | Male | Group A | Unhealthy | Satisfactory | 55 | 62 | 58 | | 261 | Unknown | Group C | Healthy | Satisfactory | 82 | 88 | 80 | | 262 | Female | Group B | Mixed | Non-Satisfactory | 72 | 65 | 70 | | 263 | Male | Group A | Unhealthy | Non-Satisfactory | 65 | 62 | 65 | | 264 | Female | Group C | Healthy | Non-Satisfactory | 90 | 87 | 92 | | 265 | Male | Group B | Mixed | Satisfactory | 77 | 85 | 82 | | 266 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 267 | Unknown | Group C | Healthy | Satisfactory | 83 | 80 | 85 | | 268 | Female | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 269 | Male | Group A | Unhealthy | Satisfactory | 62 | 57 | 63 | | 270 | Female | Group C | Healthy | Non-Satisfactory | 77 | 85 | 80 | | 271 | Unknown | Group B | Mixed | Satisfactory | 70 | 67 | 72 | | 272 | Male | Group A | Unhealthy | Non-Satisfactory | 53 | 60 | 58 | | 273 | Male | Group C | Healthy | Satisfactory | 75 | 80 | 77 | | 274 | Female | Group B | Mixed | Non-Satisfactory | 80 | 75 | 83 | | 275 | Male | Group A | Unhealthy | Satisfactory | 52 | 57 | 55 | | 276 | Unknown | Group C | Healthy | Non-Satisfactory | 92 | 90 | 85 | | 277 | Female | Group B | Mixed | Satisfactory | 68 | 72 | 65 | | 278 | Male | Group A | Unhealthy | Non-Satisfactory | 70 | 65 | 67 | | 279 | Unknown | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 280 | Female | Group B | Mixed | Non-Satisfactory | 75 | 72 | 75 | | 281 | Male | Group A | Unhealthy | Satisfactory | 57 | 60 | 54 | | 282 | Female | Group C | Healthy | Non-Satisfactory | 78 | 83 | 77 | | 283 | Unknown | Group B | Mixed | Satisfactory | 70 | 67 | 72 | | 284 | Female | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 285 | Male | Group C | Healthy | Satisfactory | 90 | 87 | 88 | | 286 | Male | Group B | Mixed | Non-Satisfactory | 82 | 80 | 83 | | 287 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 288 | Female | Group B | Mixed | Non-Satisfactory | 72 | 70 | 73 | | 289 | Male | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 290 | Female | Group C | Healthy | Non-Satisfactory | 90 | 87 | 92 | | 291 | Unknown | Group B | Mixed | Satisfactory | 70 | 63 | 60 | | 292 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 293 | Male | Group C | Healthy | Satisfactory | 75 | 80 | 77 | | 294 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 295 | Unknown | Group A | Mixed | Satisfactory | 80 | 75 | 77 | | 296 | Female | Group C | Healthy | Non-Satisfactory | 77 | 83 | 78 | | 297 | Female | Group B | Mixed | Non-Satisfactory | 67 | 72 | 67 | | 298 | Male | Group A | Unhealthy | Satisfactory | 67 | 60 | 68 | | 299 | Male | Group B | Healthy | Satisfactory | 88 | 85 | 87 | | 300 | Female | Group A | Mixed | Non-Satisfactory | 78 | 75 | 79 | | 301 | Male | Group C | Unhealthy | Satisfactory | 75 | 78 | 72 | | 302 | Female | Group B | Mixed | Non-Satisfactory | 72 | 65 | 70 | | 303 | Male | Group A | Healthy | Non-Satisfactory | 85 | 82 | 80 | | 304 | Female | Group C | Healthy | Non-Satisfactory | 77 | 83 | 78 | | 305 | Male | Group A | Mixed | Non-Satisfactory | 72 | 65 | 70 | | 306 | Female | Group B | Unhealthy | Satisfactory | 72 | 78 | 70 | | 307 | Unknown | Group A | Healthy | Satisfactory | 82 | 88 | 80 | | 308 | Female | Group C | Mixed | Non-Satisfactory | 72 | 75 | 77 | | 309 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 65 | | 310 | Female | Group A | Unhealthy | Satisfactory | 53 | 60 | 58 | | 311 | Unknown | Group C | Healthy | Satisfactory | 90 | 92 | 88 | | 312 | Female | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 313 | Male | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 68 | | 314 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 315 | Female | Group B | Mixed | Satisfactory | 75 | 72 | 75 | | 316 | Male | Group A | Unhealthy | Non-Satisfactory | 52 | 57 | 55 | | 317 | Female | Group C | Healthy | Non-Satisfactory | 90 | 87 | 92 | | 318 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 |
Remove Outliers¶
Remove outliers using quantiles between 0.05 and 0.95.
instr = """
Print the initial number of rows.
Remove any outliers in the 'Reading', 'Writing', and 'Maths' columns based on quantiles between 0.05 and 0.95.
Write the new dataset in students_clean_v2.csv.
Print the total number of rows after removing outliers.
"""
response = run_code_interpreter(instructions=instr, filenames= ['students_clean.csv'])
process_response(response)
Generated Code by Code Interpreter
```python
import pandas as pd
# Read the data from the CSV file
df = pd.read_csv("students_clean.csv")
# Print the initial number of rows
print("Initial number of rows:", len(df))
# Remove outliers in the 'Reading', 'Writing', and 'Maths' columns
q_low = 0.05
q_high = 0.95
df = df[
(df["Reading"] >= df["Reading"].quantile(q_low))
& (df["Reading"] <= df["Reading"].quantile(q_high))
& (df["Writing"] >= df["Writing"].quantile(q_low))
& (df["Writing"] <= df["Writing"].quantile(q_high))
& (df["Maths"] >= df["Maths"].quantile(q_low))
& (df["Maths"] <= df["Maths"].quantile(q_high))
]
# Write the new dataset to a CSV file
df.to_csv("students_clean_v2.csv", index=False)
# Print the total number of rows after removing outliers
print("Number of rows after removing outliers:", len(df))
```
Code Execution Results
Executed Code Output:
Initial number of rows: 318 Number of rows after removing outliers: 272
Files Created (Click on filename to view content):
students_clean_v2.csv
| StudentID | Gender | ExtraActivitiesGroup | EatingHabits | SleepingHabits | Reading | Writing | Maths | |------------:|:---------|:-----------------------|:---------------|:-----------------|----------:|----------:|--------:| | 1 | Male | Group X | Healthy | Satisfactory | 75 | 80 | 78 | | 2 | Female | Group B | Mixed | Non-Satisfactory | 73.0221 | 70 | 67 | | 3 | Unknown | Group A | Unhealthy | Satisfactory | 55 | 60 | 58 | | 4 | Female | Group C | Healthy | Non-Satisfactory | 70 | 75 | 73 | | 5 | Male | Group B | Mixed | Satisfactory | 60 | 65 | 63 | | 7 | Male | Group C | Healthy | Satisfactory | 80 | 85 | 83 | | 8 | Female | Group B | Mixed | Non-Satisfactory | 65 | 70 | 67 | | 9 | Male | Group A | Unhealthy | Satisfactory | 55 | 60 | 58 | | 10 | Male | Group X | Mixed | Non-Satisfactory | 80 | 78 | 85 | | 11 | Female | Group B | Unhealthy | Satisfactory | 65 | 68 | 70 | | 13 | Unknown | Group C | Unhealthy | Satisfactory | 78 | 75 | 79 | | 14 | Female | Group B | Mixed | Non-Satisfactory | 63 | 70 | 65 | | 15 | Male | Group A | Healthy | Satisfactory | 82 | 87 | 80 | | 17 | Female | Group A | Mixed | Satisfactory | 67 | 65 | 63 | | 18 | Male | Group B | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 19 | Unknown | Group C | Healthy | Satisfactory | 88 | 85 | 87 | | 20 | Female | Group B | Mixed | Non-Satisfactory | 67 | 75 | 68 | | 21 | Male | Group A | Unhealthy | Satisfactory | 53 | 58 | 55 | | 22 | Female | Group C | Healthy | Non-Satisfactory | 80 | 77 | 82 | | 23 | Male | Group A | Mixed | Satisfactory | 60 | 63 | 60 | | 24 | Female | Group B | Unhealthy | Non-Satisfactory | 65 | 62 | 60 | | 26 | Female | Group B | Mixed | Non-Satisfactory | 58 | 65 | 60 | | 27 | Male | Group A | Unhealthy | Satisfactory | 67 | 60 | 65 | | 28 | Male | Group C | Healthy | Non-Satisfactory | 72 | 78 | 73 | | 29 | Female | Group A | Mixed | Satisfactory | 55 | 62 | 58 | | 30 | Male | Group B | Unhealthy | Non-Satisfactory | 78 | 75 | 72 | | 31 | Female | Group C | Healthy | Satisfactory | 85 | 87 | 83 | | 32 | Female | Group A | Mixed | Non-Satisfactory | 70 | 65 | 67 | | 33 | Male | Group B | Unhealthy | Satisfactory | 62 | 67 | 65 | | 34 | Male | Group C | Healthy | Non-Satisfactory | 77 | 83 | 75 | | 35 | Unknown | Group A | Mixed | Satisfactory | 65 | 63 | 60 | | 36 | Female | Group B | Unhealthy | Non-Satisfactory | 72 | 78 | 70 | | 37 | Male | Group C | Healthy | Satisfactory | 80 | 87 | 83 | | 38 | Female | Group A | Mixed | Non-Satisfactory | 75 | 70 | 72 | | 39 | Male | Group B | Unhealthy | Satisfactory | 65 | 67 | 60 | | 40 | Unknown | Group C | Healthy | Non-Satisfactory | 82 | 88 | 80 | | 41 | Female | Group A | Mixed | Satisfactory | 77 | 72 | 70 | | 42 | Male | Group B | Unhealthy | Non-Satisfactory | 67 | 62 | 63 | | 44 | Female | Group A | Mixed | Non-Satisfactory | 80 | 75 | 77 | | 45 | Unknown | Group B | Unhealthy | Satisfactory | 72 | 75 | 73 | | 46 | Female | Group C | Healthy | Non-Satisfactory | 83 | 80 | 85 | | 47 | Male | Group A | Mixed | Satisfactory | 75 | 72 | 73 | | 48 | Male | Group B | Unhealthy | Non-Satisfactory | 60 | 63 | 58 | | 50 | Female | Group A | Mixed | Non-Satisfactory | 85 | 80 | 82 | | 51 | Male | Group B | Unhealthy | Satisfactory | 70 | 67 | 65 | | 52 | Female | Group C | Healthy | Non-Satisfactory | 78 | 83 | 77 | | 53 | Male | Group B | Mixed | Satisfactory | 65 | 63 | 62 | | 55 | Unknown | Group C | Healthy | Satisfactory | 75 | 78 | 73 | | 56 | Female | Group B | Mixed | Non-Satisfactory | 70 | 77 | 72 | | 57 | Male | Group A | Unhealthy | Satisfactory | 62 | 65 | 63 | | 58 | Female | Group C | Healthy | Non-Satisfactory | 88 | 85 | 83 | | 59 | Male | Group B | Mixed | Satisfactory | 78 | 80 | 77 | | 60 | Unknown | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 65 | | 61 | Female | Group C | Healthy | Satisfactory | 83 | 80 | 82 | | 62 | Male | Group B | Mixed | Non-Satisfactory | 72 | 68 | 70 | | 63 | Male | Group A | Unhealthy | Satisfactory | 62 | 57 | 60 | | 64 | Female | Group C | Healthy | Non-Satisfactory | 90 | 87 | 88 | | 65 | Male | Group B | Mixed | Satisfactory | 85 | 82 | 80 | | 66 | Unknown | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 67 | Female | Group C | Healthy | Satisfactory | 77 | 85 | 80 | | 68 | Male | Group B | Mixed | Non-Satisfactory | 65 | 72 | 67 | | 69 | Male | Group A | Unhealthy | Satisfactory | 67 | 60 | 68 | | 71 | Male | Group B | Mixed | Satisfactory | 77 | 85 | 82 | | 73 | Female | Group C | Healthy | Satisfactory | 83 | 87 | 85 | | 74 | Male | Group B | Mixed | Non-Satisfactory | 68 | 72 | 65 | | 75 | Male | Group A | Unhealthy | Satisfactory | 53 | 58 | 55 | | 76 | Unknown | Group C | Healthy | Non-Satisfactory | 88 | 83 | 87 | | 77 | Female | Group B | Mixed | Satisfactory | 72 | 70 | 73 | | 78 | Male | Group A | Unhealthy | Non-Satisfactory | 70 | 65 | 67 | | 79 | Male | Group C | Healthy | Satisfactory | 80 | 85 | 80 | | 80 | Female | Group B | Mixed | Non-Satisfactory | 75 | 72 | 75 | | 81 | Unknown | Group A | Unhealthy | Satisfactory | 55 | 60 | 58 | | 82 | Female | Group C | Healthy | Non-Satisfactory | 80 | 77 | 82 | | 83 | Male | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 84 | Male | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 86 | Unknown | Group B | Mixed | Non-Satisfactory | 67 | 72 | 67 | | 87 | Female | Group A | Unhealthy | Satisfactory | 53 | 60 | 58 | | 88 | Male | Group C | Healthy | Non-Satisfactory | 75 | 78 | 73 | | 89 | Male | Group B | Mixed | Satisfactory | 82 | 80 | 83 | | 90 | Unknown | Group A | Unhealthy | Non-Satisfactory | 65 | 62 | 63 | | 91 | Female | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 92 | Male | Group B | Mixed | Non-Satisfactory | 85 | 80 | 82 | | 93 | Male | Group A | Unhealthy | Satisfactory | 62 | 67 | 65 | | 95 | Female | Group B | Mixed | Satisfactory | 77 | 75 | 78 | | 96 | Female | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 68 | | 97 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 98 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 65 | | 100 | Female | Group C | Healthy | Non-Satisfactory | 72 | 75 | 77 | | 101 | Male | Group B | Mixed | Satisfactory | 70 | 67 | 72 | | 102 | Unknown | Group A | Unhealthy | Non-Satisfactory | 67 | 62 | 65 | | 103 | Female | Group C | Healthy | Satisfactory | 83 | 87 | 85 | | 104 | Male | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 107 | Unknown | Group B | Mixed | Satisfactory | 78 | 83 | 78 | | 108 | Female | Group A | Unhealthy | Non-Satisfactory | 72 | 65 | 70 | | 109 | Male | Group C | Healthy | Satisfactory | 83 | 80 | 85 | | 110 | Female | Group B | Mixed | Non-Satisfactory | 68 | 72 | 63 | | 111 | Male | Group A | Unhealthy | Satisfactory | 60 | 63 | 63 | | 112 | Unknown | Group C | Healthy | Non-Satisfactory | 72 | 78 | 73 | | 113 | Female | Group B | Mixed | Satisfactory | 80 | 83 | 83 | | 114 | Male | Group A | Unhealthy | Non-Satisfactory | 70 | 65 | 67 | | 116 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 118 | Female | Group C | Healthy | Non-Satisfactory | 77 | 85 | 80 | | 119 | Unknown | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 120 | Female | Group A | Unhealthy | Non-Satisfactory | 53 | 60 | 58 | | 121 | Male | Group C | Healthy | Satisfactory | 75 | 80 | 77 | | 122 | Female | Group B | Mixed | Non-Satisfactory | 67 | 72 | 67 | | 123 | Male | Group B | Unhealthy | Satisfactory | 70 | 67 | 72 | | 124 | Female | Group A | Mixed | Non-Satisfactory | 62 | 57 | 60 | | 125 | Unknown | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 126 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 60 | | 127 | Male | Group A | Unhealthy | Satisfactory | 55 | 60 | 58 | | 129 | Male | Group B | Mixed | Satisfactory | 85 | 82 | 80 | | 130 | Female | Group A | Unhealthy | Non-Satisfactory | 75 | 70 | 72 | | 131 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 132 | Male | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 133 | Male | Group A | Unhealthy | Satisfactory | 62 | 67 | 60 | | 135 | Male | Group B | Mixed | Satisfactory | 78 | 83 | 78 | | 136 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 137 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 138 | Male | Group B | Mixed | Non-Satisfactory | 67 | 70 | 63 | | 139 | Unknown | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 140 | Female | Group C | Healthy | Non-Satisfactory | 88 | 83 | 87 | | 141 | Female | Group B | Mixed | Satisfactory | 70 | 77 | 70 | | 143 | Male | Group C | Healthy | Satisfactory | 85 | 80 | 82 | | 144 | Male | Group B | Mixed | Non-Satisfactory | 82 | 80 | 83 | | 145 | Unknown | Group A | Unhealthy | Satisfactory | 60 | 63 | 63 | | 147 | Female | Group B | Mixed | Satisfactory | 75 | 72 | 77 | | 149 | Unknown | Group C | Healthy | Satisfactory | 80 | 85 | 82 | | 150 | Female | Group B | Mixed | Non-Satisfactory | 80 | 75 | 83 | | 151 | Male | Group A | Unhealthy | Satisfactory | 78 | 75 | 79 | | 153 | Unknown | Group B | Mixed | Satisfactory | 65 | 63 | 62 | | 154 | Female | Group A | Unhealthy | Non-Satisfactory | 53 | 58 | 55 | | 155 | Male | Group C | Healthy | Satisfactory | 83 | 87 | 82 | | 156 | Female | Group B | Mixed | Non-Satisfactory | 85 | 80 | 83 | | 157 | Male | Group A | Unhealthy | Satisfactory | 70 | 67 | 72 | | 159 | Female | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 160 | Female | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 70 | | 162 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 163 | Male | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 164 | Female | Group C | Healthy | Non-Satisfactory | 83 | 87 | 85 | | 165 | Unknown | Group B | Mixed | Satisfactory | 78 | 83 | 78 | | 166 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 167 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 168 | Female | Group B | Mixed | Non-Satisfactory | 67 | 70 | 63 | | 170 | Unknown | Group C | Healthy | Non-Satisfactory | 82 | 88 | 80 | | 171 | Male | Group B | Mixed | Satisfactory | 80 | 83 | 83 | | 172 | Female | Group A | Unhealthy | Non-Satisfactory | 75 | 70 | 72 | | 173 | Male | Group B | Healthy | Satisfactory | 90 | 87 | 88 | | 174 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 65 | | 175 | Unknown | Group A | Unhealthy | Satisfactory | 62 | 57 | 63 | | 176 | Female | Group C | Healthy | Non-Satisfactory | 77 | 85 | 80 | | 177 | Male | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 178 | Male | Group A | Unhealthy | Non-Satisfactory | 53 | 60 | 58 | | 180 | Male | Group B | Mixed | Non-Satisfactory | 70 | 67 | 75 | | 181 | Unknown | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 182 | Female | Group C | Healthy | Non-Satisfactory | 83 | 87 | 85 | | 183 | Unknown | Group A | Mixed | Satisfactory | 75 | 78 | 77 | | 184 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 185 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 186 | Male | Group A | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 187 | Male | Group A | Unhealthy | Satisfactory | 78 | 75 | 79 | | 188 | Unknown | Group C | Healthy | Non-Satisfactory | 80 | 85 | 83 | | 189 | Female | Group B | Mixed | Satisfactory | 70 | 77 | 70 | | 192 | Female | Group B | Mixed | Non-Satisfactory | 80 | 75 | 83 | | 193 | Male | Group A | Unhealthy | Satisfactory | 53 | 58 | 55 | | 194 | Unknown | Group C | Healthy | Non-Satisfactory | 75 | 78 | 77 | | 195 | Female | Group B | Mixed | Satisfactory | 65 | 63 | 62 | | 196 | Female | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 70 | | 197 | Male | Group A | Healthy | Satisfactory | 85 | 80 | 87 | | 198 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 199 | Male | Group A | Unhealthy | Satisfactory | 72 | 65 | 70 | | 201 | Female | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 202 | Female | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 203 | Unknown | Group A | Healthy | Satisfactory | 82 | 88 | 80 | | 204 | Female | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 205 | Male | Group A | Unhealthy | Satisfactory | 67 | 60 | 68 | | 207 | Female | Group B | Mixed | Satisfactory | 78 | 83 | 78 | | 208 | Female | Group A | Unhealthy | Non-Satisfactory | 72 | 65 | 70 | | 209 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 210 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 65 | | 211 | Male | Group A | Unhealthy | Satisfactory | 53 | 58 | 55 | | 213 | Female | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 214 | Female | Group A | Unhealthy | Non-Satisfactory | 75 | 70 | 72 | | 215 | Unknown | Group B | Healthy | Satisfactory | 77 | 83 | 78 | | 216 | Female | Group B | Mixed | Non-Satisfactory | 67 | 70 | 63 | | 219 | Female | Group B | Mixed | Satisfactory | 85 | 82 | 80 | | 220 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 221 | Male | Group A | Healthy | Satisfactory | 80 | 83 | 80 | | 222 | Male | Group B | Mixed | Non-Satisfactory | 60 | 63 | 63 | | 223 | Male | Group A | Unhealthy | Satisfactory | 78 | 75 | 79 | | 224 | Female | Group C | Healthy | Non-Satisfactory | 75 | 78 | 77 | | 225 | Unknown | Group B | Mixed | Satisfactory | 70 | 67 | 72 | | 226 | Male | Group A | Unhealthy | Non-Satisfactory | 70 | 65 | 67 | | 228 | Female | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 229 | Male | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 230 | Female | Group C | Healthy | Non-Satisfactory | 83 | 87 | 85 | | 231 | Unknown | Group B | Mixed | Satisfactory | 75 | 78 | 77 | | 232 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 233 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 234 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 235 | Male | Group A | Unhealthy | Satisfactory | 78 | 75 | 79 | | 236 | Female | Group C | Healthy | Non-Satisfactory | 83 | 87 | 85 | | 237 | Unknown | Group A | Mixed | Satisfactory | 80 | 83 | 83 | | 238 | Female | Group B | Mixed | Non-Satisfactory | 75 | 70 | 77 | | 239 | Male | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 240 | Unknown | Group C | Healthy | Non-Satisfactory | 82 | 88 | 80 | | 241 | Female | Group B | Mixed | Satisfactory | 80 | 77 | 82 | | 242 | Male | Group A | Unhealthy | Satisfactory | 60 | 63 | 63 | | 244 | Male | Group B | Mixed | Non-Satisfactory | 82 | 80 | 83 | | 245 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 246 | Male | Group B | Mixed | Non-Satisfactory | 72 | 68 | 70 | | 247 | Female | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 248 | Male | Group C | Healthy | Non-Satisfactory | 80 | 85 | 83 | | 249 | Female | Group A | Mixed | Non-Satisfactory | 70 | 65 | 67 | | 250 | Unknown | Group C | Healthy | Non-Satisfactory | 83 | 80 | 85 | | 251 | Female | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 252 | Female | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 254 | Female | Group B | Mixed | Non-Satisfactory | 80 | 75 | 83 | | 256 | Female | Group B | Mixed | Satisfactory | 70 | 77 | 70 | | 258 | Unknown | Group C | Healthy | Non-Satisfactory | 75 | 78 | 77 | | 259 | Female | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 260 | Male | Group A | Unhealthy | Satisfactory | 55 | 62 | 58 | | 261 | Unknown | Group C | Healthy | Satisfactory | 82 | 88 | 80 | | 262 | Female | Group B | Mixed | Non-Satisfactory | 72 | 65 | 70 | | 263 | Male | Group A | Unhealthy | Non-Satisfactory | 65 | 62 | 65 | | 265 | Male | Group B | Mixed | Satisfactory | 77 | 85 | 82 | | 266 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 267 | Unknown | Group C | Healthy | Satisfactory | 83 | 80 | 85 | | 268 | Female | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 269 | Male | Group A | Unhealthy | Satisfactory | 62 | 57 | 63 | | 270 | Female | Group C | Healthy | Non-Satisfactory | 77 | 85 | 80 | | 271 | Unknown | Group B | Mixed | Satisfactory | 70 | 67 | 72 | | 272 | Male | Group A | Unhealthy | Non-Satisfactory | 53 | 60 | 58 | | 273 | Male | Group C | Healthy | Satisfactory | 75 | 80 | 77 | | 274 | Female | Group B | Mixed | Non-Satisfactory | 80 | 75 | 83 | | 277 | Female | Group B | Mixed | Satisfactory | 68 | 72 | 65 | | 278 | Male | Group A | Unhealthy | Non-Satisfactory | 70 | 65 | 67 | | 279 | Unknown | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 280 | Female | Group B | Mixed | Non-Satisfactory | 75 | 72 | 75 | | 282 | Female | Group C | Healthy | Non-Satisfactory | 78 | 83 | 77 | | 283 | Unknown | Group B | Mixed | Satisfactory | 70 | 67 | 72 | | 284 | Female | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 285 | Male | Group C | Healthy | Satisfactory | 90 | 87 | 88 | | 286 | Male | Group B | Mixed | Non-Satisfactory | 82 | 80 | 83 | | 287 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 288 | Female | Group B | Mixed | Non-Satisfactory | 72 | 70 | 73 | | 289 | Male | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 291 | Unknown | Group B | Mixed | Satisfactory | 70 | 63 | 60 | | 292 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 293 | Male | Group C | Healthy | Satisfactory | 75 | 80 | 77 | | 294 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 295 | Unknown | Group A | Mixed | Satisfactory | 80 | 75 | 77 | | 296 | Female | Group C | Healthy | Non-Satisfactory | 77 | 83 | 78 | | 297 | Female | Group B | Mixed | Non-Satisfactory | 67 | 72 | 67 | | 298 | Male | Group A | Unhealthy | Satisfactory | 67 | 60 | 68 | | 299 | Male | Group B | Healthy | Satisfactory | 88 | 85 | 87 | | 300 | Female | Group A | Mixed | Non-Satisfactory | 78 | 75 | 79 | | 301 | Male | Group C | Unhealthy | Satisfactory | 75 | 78 | 72 | | 302 | Female | Group B | Mixed | Non-Satisfactory | 72 | 65 | 70 | | 303 | Male | Group A | Healthy | Non-Satisfactory | 85 | 82 | 80 | | 304 | Female | Group C | Healthy | Non-Satisfactory | 77 | 83 | 78 | | 305 | Male | Group A | Mixed | Non-Satisfactory | 72 | 65 | 70 | | 306 | Female | Group B | Unhealthy | Satisfactory | 72 | 78 | 70 | | 307 | Unknown | Group A | Healthy | Satisfactory | 82 | 88 | 80 | | 308 | Female | Group C | Mixed | Non-Satisfactory | 72 | 75 | 77 | | 309 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 65 | | 310 | Female | Group A | Unhealthy | Satisfactory | 53 | 60 | 58 | | 312 | Female | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 313 | Male | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 68 | | 314 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 315 | Female | Group B | Mixed | Satisfactory | 75 | 72 | 75 | | 318 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 |
Step 3: Training a Model¶
Now that you have cleaned the dataset, in this step you will train a regression model to predict the Maths score based on student attributes.
Split the Data¶
Create a training set and an evaluation set with an 80%/20% split.
instr = """
Split the data into 2 files. 80% in train.csv and 20% in evaluate.csv.
Print the number of rows in each file excluding the header."""
response = run_code_interpreter(instructions=instr, filenames= ['students_clean_v2.csv'])
process_response(response)
Generated Code by Code Interpreter
```python
import pandas as pd
from sklearn.model_selection import train_test_split
# Read the data from the uploaded file
data = pd.read_csv("students_clean_v2.csv")
# Split the data into train and test sets
train_data, eval_data = train_test_split(data, test_size=0.2, random_state=42)
# Save the train data to a CSV file
train_data.to_csv("train.csv", index=False)
# Save the evaluation data to a CSV file
eval_data.to_csv("evaluate.csv", index=False)
# Print the number of rows in each file
print("Number of rows in train.csv:", len(train_data))
print("Number of rows in evaluate.csv:", len(eval_data))
```
Code Execution Results
Executed Code Output:
Number of rows in train.csv: 217 Number of rows in evaluate.csv: 55
Files Created (Click on filename to view content):
evaluate.csv
| StudentID | Gender | ExtraActivitiesGroup | EatingHabits | SleepingHabits | Reading | Writing | Maths | |------------:|:---------|:-----------------------|:---------------|:-----------------|----------:|----------:|--------:| | 35 | Unknown | Group A | Mixed | Satisfactory | 65 | 63 | 60 | | 135 | Male | Group B | Mixed | Satisfactory | 78 | 83 | 78 | | 90 | Unknown | Group A | Unhealthy | Non-Satisfactory | 65 | 62 | 63 | | 149 | Unknown | Group C | Healthy | Satisfactory | 80 | 85 | 82 | | 231 | Unknown | Group B | Mixed | Satisfactory | 75 | 78 | 77 | | 162 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 245 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 52 | Female | Group C | Healthy | Non-Satisfactory | 78 | 83 | 77 | | 185 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 291 | Unknown | Group B | Mixed | Satisfactory | 70 | 63 | 60 | | 215 | Unknown | Group B | Healthy | Satisfactory | 77 | 83 | 78 | | 314 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 267 | Unknown | Group C | Healthy | Satisfactory | 83 | 80 | 85 | | 93 | Male | Group A | Unhealthy | Satisfactory | 62 | 67 | 65 | | 194 | Unknown | Group C | Healthy | Non-Satisfactory | 75 | 78 | 77 | | 229 | Male | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 266 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 172 | Female | Group A | Unhealthy | Non-Satisfactory | 75 | 70 | 72 | | 121 | Male | Group C | Healthy | Satisfactory | 75 | 80 | 77 | | 68 | Male | Group B | Mixed | Non-Satisfactory | 65 | 72 | 67 | | 260 | Male | Group A | Unhealthy | Satisfactory | 55 | 62 | 58 | | 312 | Female | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 53 | Male | Group B | Mixed | Satisfactory | 65 | 63 | 62 | | 48 | Male | Group B | Unhealthy | Non-Satisfactory | 60 | 63 | 58 | | 219 | Female | Group B | Mixed | Satisfactory | 85 | 82 | 80 | | 11 | Female | Group B | Unhealthy | Satisfactory | 65 | 68 | 70 | | 27 | Male | Group A | Unhealthy | Satisfactory | 67 | 60 | 65 | | 234 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 126 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 60 | | 29 | Female | Group A | Mixed | Satisfactory | 55 | 62 | 58 | | 131 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 78 | Male | Group A | Unhealthy | Non-Satisfactory | 70 | 65 | 67 | | 170 | Unknown | Group C | Healthy | Non-Satisfactory | 82 | 88 | 80 | | 263 | Male | Group A | Unhealthy | Non-Satisfactory | 65 | 62 | 65 | | 296 | Female | Group C | Healthy | Non-Satisfactory | 77 | 83 | 78 | | 8 | Female | Group B | Mixed | Non-Satisfactory | 65 | 70 | 67 | | 139 | Unknown | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 77 | Female | Group B | Mixed | Satisfactory | 72 | 70 | 73 | | 138 | Male | Group B | Mixed | Non-Satisfactory | 67 | 70 | 63 | | 137 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 30 | Male | Group B | Unhealthy | Non-Satisfactory | 78 | 75 | 72 | | 145 | Unknown | Group A | Unhealthy | Satisfactory | 60 | 63 | 63 | | 287 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 23 | Male | Group A | Mixed | Satisfactory | 60 | 63 | 60 | | 88 | Male | Group C | Healthy | Non-Satisfactory | 75 | 78 | 73 | | 252 | Female | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 103 | Female | Group C | Healthy | Satisfactory | 83 | 87 | 85 | | 244 | Male | Group B | Mixed | Non-Satisfactory | 82 | 80 | 83 | | 108 | Female | Group A | Unhealthy | Non-Satisfactory | 72 | 65 | 70 | | 211 | Male | Group A | Unhealthy | Satisfactory | 53 | 58 | 55 | | 19 | Unknown | Group C | Healthy | Satisfactory | 88 | 85 | 87 | | 178 | Male | Group A | Unhealthy | Non-Satisfactory | 53 | 60 | 58 | | 272 | Male | Group A | Unhealthy | Non-Satisfactory | 53 | 60 | 58 | | 294 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 133 | Male | Group A | Unhealthy | Satisfactory | 62 | 67 | 60 |train.csv
| StudentID | Gender | ExtraActivitiesGroup | EatingHabits | SleepingHabits | Reading | Writing | Maths | |------------:|:---------|:-----------------------|:---------------|:-----------------|----------:|----------:|--------:| | 38 | Female | Group A | Mixed | Non-Satisfactory | 75 | 70 | 72 | | 216 | Female | Group B | Mixed | Non-Satisfactory | 67 | 70 | 63 | | 167 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 232 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 42 | Male | Group B | Unhealthy | Non-Satisfactory | 67 | 62 | 63 | | 20 | Female | Group B | Mixed | Non-Satisfactory | 67 | 75 | 68 | | 86 | Unknown | Group B | Mixed | Non-Satisfactory | 67 | 72 | 67 | | 174 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 65 | | 13 | Unknown | Group C | Unhealthy | Satisfactory | 78 | 75 | 79 | | 273 | Male | Group C | Healthy | Satisfactory | 75 | 80 | 77 | | 76 | Unknown | Group C | Healthy | Non-Satisfactory | 88 | 83 | 87 | | 297 | Female | Group B | Mixed | Non-Satisfactory | 67 | 72 | 67 | | 265 | Male | Group B | Mixed | Satisfactory | 77 | 85 | 82 | | 214 | Female | Group A | Unhealthy | Non-Satisfactory | 75 | 70 | 72 | | 83 | Male | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 22 | Female | Group C | Healthy | Non-Satisfactory | 80 | 77 | 82 | | 118 | Female | Group C | Healthy | Non-Satisfactory | 77 | 85 | 80 | | 230 | Female | Group C | Healthy | Non-Satisfactory | 83 | 87 | 85 | | 130 | Female | Group A | Unhealthy | Non-Satisfactory | 75 | 70 | 72 | | 199 | Male | Group A | Unhealthy | Satisfactory | 72 | 65 | 70 | | 98 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 65 | | 63 | Male | Group A | Unhealthy | Satisfactory | 62 | 57 | 60 | | 112 | Unknown | Group C | Healthy | Non-Satisfactory | 72 | 78 | 73 | | 235 | Male | Group A | Unhealthy | Satisfactory | 78 | 75 | 79 | | 44 | Female | Group A | Mixed | Non-Satisfactory | 80 | 75 | 77 | | 181 | Unknown | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 96 | Female | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 68 | | 295 | Unknown | Group A | Mixed | Satisfactory | 80 | 75 | 77 | | 107 | Unknown | Group B | Mixed | Satisfactory | 78 | 83 | 78 | | 236 | Female | Group C | Healthy | Non-Satisfactory | 83 | 87 | 85 | | 147 | Female | Group B | Mixed | Satisfactory | 75 | 72 | 77 | | 144 | Male | Group B | Mixed | Non-Satisfactory | 82 | 80 | 83 | | 89 | Male | Group B | Mixed | Satisfactory | 82 | 80 | 83 | | 213 | Female | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 129 | Male | Group B | Mixed | Satisfactory | 85 | 82 | 80 | | 269 | Male | Group A | Unhealthy | Satisfactory | 62 | 57 | 63 | | 302 | Female | Group B | Mixed | Non-Satisfactory | 72 | 65 | 70 | | 305 | Male | Group A | Mixed | Non-Satisfactory | 72 | 65 | 70 | | 246 | Male | Group B | Mixed | Non-Satisfactory | 72 | 68 | 70 | | 228 | Female | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 182 | Female | Group C | Healthy | Non-Satisfactory | 83 | 87 | 85 | | 79 | Male | Group C | Healthy | Satisfactory | 80 | 85 | 80 | | 3 | Unknown | Group A | Unhealthy | Satisfactory | 55 | 60 | 58 | | 87 | Female | Group A | Unhealthy | Satisfactory | 53 | 60 | 58 | | 173 | Male | Group B | Healthy | Satisfactory | 90 | 87 | 88 | | 164 | Female | Group C | Healthy | Non-Satisfactory | 83 | 87 | 85 | | 168 | Female | Group B | Mixed | Non-Satisfactory | 67 | 70 | 63 | | 111 | Male | Group A | Unhealthy | Satisfactory | 60 | 63 | 63 | | 125 | Unknown | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 165 | Unknown | Group B | Mixed | Satisfactory | 78 | 83 | 78 | | 203 | Unknown | Group A | Healthy | Satisfactory | 82 | 88 | 80 | | 307 | Unknown | Group A | Healthy | Satisfactory | 82 | 88 | 80 | | 84 | Male | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 136 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 34 | Male | Group C | Healthy | Non-Satisfactory | 77 | 83 | 75 | | 251 | Female | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 226 | Male | Group A | Unhealthy | Non-Satisfactory | 70 | 65 | 67 | | 274 | Female | Group B | Mixed | Non-Satisfactory | 80 | 75 | 83 | | 208 | Female | Group A | Unhealthy | Non-Satisfactory | 72 | 65 | 70 | | 308 | Female | Group C | Mixed | Non-Satisfactory | 72 | 75 | 77 | | 7 | Male | Group C | Healthy | Satisfactory | 80 | 85 | 83 | | 64 | Female | Group C | Healthy | Non-Satisfactory | 90 | 87 | 88 | | 304 | Female | Group C | Healthy | Non-Satisfactory | 77 | 83 | 78 | | 205 | Male | Group A | Unhealthy | Satisfactory | 67 | 60 | 68 | | 193 | Male | Group A | Unhealthy | Satisfactory | 53 | 58 | 55 | | 75 | Male | Group A | Unhealthy | Satisfactory | 53 | 58 | 55 | | 184 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 97 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 132 | Male | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 288 | Female | Group B | Mixed | Non-Satisfactory | 72 | 70 | 73 | | 202 | Female | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 36 | Female | Group B | Unhealthy | Non-Satisfactory | 72 | 78 | 70 | | 15 | Male | Group A | Healthy | Satisfactory | 82 | 87 | 80 | | 40 | Unknown | Group C | Healthy | Non-Satisfactory | 82 | 88 | 80 | | 33 | Male | Group B | Unhealthy | Satisfactory | 62 | 67 | 65 | | 155 | Male | Group C | Healthy | Satisfactory | 83 | 87 | 82 | | 59 | Male | Group B | Mixed | Satisfactory | 78 | 80 | 77 | | 110 | Female | Group B | Mixed | Non-Satisfactory | 68 | 72 | 63 | | 196 | Female | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 70 | | 210 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 65 | | 47 | Male | Group A | Mixed | Satisfactory | 75 | 72 | 73 | | 289 | Male | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 207 | Female | Group B | Mixed | Satisfactory | 78 | 83 | 78 | | 160 | Female | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 70 | | 31 | Female | Group C | Healthy | Satisfactory | 85 | 87 | 83 | | 309 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 65 | | 166 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 186 | Male | Group A | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 1 | Male | Group X | Healthy | Satisfactory | 75 | 80 | 78 | | 271 | Unknown | Group B | Mixed | Satisfactory | 70 | 67 | 72 | | 116 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 284 | Female | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 237 | Unknown | Group A | Mixed | Satisfactory | 80 | 83 | 83 | | 113 | Female | Group B | Mixed | Satisfactory | 80 | 83 | 83 | | 41 | Female | Group A | Mixed | Satisfactory | 77 | 72 | 70 | | 69 | Male | Group A | Unhealthy | Satisfactory | 67 | 60 | 68 | | 176 | Female | Group C | Healthy | Non-Satisfactory | 77 | 85 | 80 | | 248 | Male | Group C | Healthy | Non-Satisfactory | 80 | 85 | 83 | | 300 | Female | Group A | Mixed | Non-Satisfactory | 78 | 75 | 79 | | 14 | Female | Group B | Mixed | Non-Satisfactory | 63 | 70 | 65 | | 313 | Male | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 68 | | 270 | Female | Group C | Healthy | Non-Satisfactory | 77 | 85 | 80 | | 32 | Female | Group A | Mixed | Non-Satisfactory | 70 | 65 | 67 | | 209 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | | 5 | Male | Group B | Mixed | Satisfactory | 60 | 65 | 63 | | 141 | Female | Group B | Mixed | Satisfactory | 70 | 77 | 70 | | 37 | Male | Group C | Healthy | Satisfactory | 80 | 87 | 83 | | 239 | Male | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | | 189 | Female | Group B | Mixed | Satisfactory | 70 | 77 | 70 | | 197 | Male | Group A | Healthy | Satisfactory | 85 | 80 | 87 | | 241 | Female | Group B | Mixed | Satisfactory | 80 | 77 | 82 | | 163 | Male | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 71 | Male | Group B | Mixed | Satisfactory | 77 | 85 | 82 | | 159 | Female | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 150 | Female | Group B | Mixed | Non-Satisfactory | 80 | 75 | 83 | | 306 | Female | Group B | Unhealthy | Satisfactory | 72 | 78 | 70 | | 286 | Male | Group B | Mixed | Non-Satisfactory | 82 | 80 | 83 | | 80 | Female | Group B | Mixed | Non-Satisfactory | 75 | 72 | 75 | | 261 | Unknown | Group C | Healthy | Satisfactory | 82 | 88 | 80 | | 74 | Male | Group B | Mixed | Non-Satisfactory | 68 | 72 | 65 | | 51 | Male | Group B | Unhealthy | Satisfactory | 70 | 67 | 65 | | 220 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 183 | Unknown | Group A | Mixed | Satisfactory | 75 | 78 | 77 | | 46 | Female | Group C | Healthy | Non-Satisfactory | 83 | 80 | 85 | | 143 | Male | Group C | Healthy | Satisfactory | 85 | 80 | 82 | | 180 | Male | Group B | Mixed | Non-Satisfactory | 70 | 67 | 75 | | 28 | Male | Group C | Healthy | Non-Satisfactory | 72 | 78 | 73 | | 254 | Female | Group B | Mixed | Non-Satisfactory | 80 | 75 | 83 | | 249 | Female | Group A | Mixed | Non-Satisfactory | 70 | 65 | 67 | | 92 | Male | Group B | Mixed | Non-Satisfactory | 85 | 80 | 82 | | 45 | Unknown | Group B | Unhealthy | Satisfactory | 72 | 75 | 73 | | 283 | Unknown | Group B | Mixed | Satisfactory | 70 | 67 | 72 | | 55 | Unknown | Group C | Healthy | Satisfactory | 75 | 78 | 73 | | 109 | Male | Group C | Healthy | Satisfactory | 83 | 80 | 85 | | 259 | Female | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 278 | Male | Group A | Unhealthy | Non-Satisfactory | 70 | 65 | 67 | | 188 | Unknown | Group C | Healthy | Non-Satisfactory | 80 | 85 | 83 | | 50 | Female | Group A | Mixed | Non-Satisfactory | 85 | 80 | 82 | | 171 | Male | Group B | Mixed | Satisfactory | 80 | 83 | 83 | | 224 | Female | Group C | Healthy | Non-Satisfactory | 75 | 78 | 77 | | 247 | Female | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | | 4 | Female | Group C | Healthy | Non-Satisfactory | 70 | 75 | 73 | | 122 | Female | Group B | Mixed | Non-Satisfactory | 67 | 72 | 67 | | 61 | Female | Group C | Healthy | Satisfactory | 83 | 80 | 82 | | 156 | Female | Group B | Mixed | Non-Satisfactory | 85 | 80 | 83 | | 2 | Female | Group B | Mixed | Non-Satisfactory | 73.0221 | 70 | 67 | | 262 | Female | Group B | Mixed | Non-Satisfactory | 72 | 65 | 70 | | 120 | Female | Group A | Unhealthy | Non-Satisfactory | 53 | 60 | 58 | | 57 | Male | Group A | Unhealthy | Satisfactory | 62 | 65 | 63 | | 192 | Female | Group B | Mixed | Non-Satisfactory | 80 | 75 | 83 | | 91 | Female | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 240 | Unknown | Group C | Healthy | Non-Satisfactory | 82 | 88 | 80 | | 39 | Male | Group B | Unhealthy | Satisfactory | 65 | 67 | 60 | | 279 | Unknown | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 9 | Male | Group A | Unhealthy | Satisfactory | 55 | 60 | 58 | | 201 | Female | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 233 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | | 285 | Male | Group C | Healthy | Satisfactory | 90 | 87 | 88 | | 127 | Male | Group A | Unhealthy | Satisfactory | 55 | 60 | 58 | | 104 | Male | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 95 | Female | Group B | Mixed | Satisfactory | 77 | 75 | 78 | | 151 | Male | Group A | Unhealthy | Satisfactory | 78 | 75 | 79 | | 60 | Unknown | Group A | Unhealthy | Non-Satisfactory | 67 | 60 | 65 | | 102 | Unknown | Group A | Unhealthy | Non-Satisfactory | 67 | 62 | 65 | | 10 | Male | Group X | Mixed | Non-Satisfactory | 80 | 78 | 85 | | 17 | Female | Group A | Mixed | Satisfactory | 67 | 65 | 63 | | 67 | Female | Group C | Healthy | Satisfactory | 77 | 85 | 80 | | 292 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 154 | Female | Group A | Unhealthy | Non-Satisfactory | 53 | 58 | 55 | | 21 | Male | Group A | Unhealthy | Satisfactory | 53 | 58 | 55 | | 195 | Female | Group B | Mixed | Satisfactory | 65 | 63 | 62 | | 82 | Female | Group C | Healthy | Non-Satisfactory | 80 | 77 | 82 | | 298 | Male | Group A | Unhealthy | Satisfactory | 67 | 60 | 68 | | 157 | Male | Group A | Unhealthy | Satisfactory | 70 | 67 | 72 | | 293 | Male | Group C | Healthy | Satisfactory | 75 | 80 | 77 | | 268 | Female | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 303 | Male | Group A | Healthy | Non-Satisfactory | 85 | 82 | 80 | | 73 | Female | Group C | Healthy | Satisfactory | 83 | 87 | 85 | | 62 | Male | Group B | Mixed | Non-Satisfactory | 72 | 68 | 70 | | 124 | Female | Group A | Mixed | Non-Satisfactory | 62 | 57 | 60 | | 58 | Female | Group C | Healthy | Non-Satisfactory | 88 | 85 | 83 | | 280 | Female | Group B | Mixed | Non-Satisfactory | 75 | 72 | 75 | | 204 | Female | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | | 282 | Female | Group C | Healthy | Non-Satisfactory | 78 | 83 | 77 | | 223 | Male | Group A | Unhealthy | Satisfactory | 78 | 75 | 79 | | 18 | Male | Group B | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 242 | Male | Group A | Unhealthy | Satisfactory | 60 | 63 | 63 | | 299 | Male | Group B | Healthy | Satisfactory | 88 | 85 | 87 | | 258 | Unknown | Group C | Healthy | Non-Satisfactory | 75 | 78 | 77 | | 198 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 66 | Unknown | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | | 256 | Female | Group B | Mixed | Satisfactory | 70 | 77 | 70 | | 56 | Female | Group B | Mixed | Non-Satisfactory | 70 | 77 | 72 | | 101 | Male | Group B | Mixed | Satisfactory | 70 | 67 | 72 | | 277 | Female | Group B | Mixed | Satisfactory | 68 | 72 | 65 | | 26 | Female | Group B | Mixed | Non-Satisfactory | 58 | 65 | 60 | | 65 | Male | Group B | Mixed | Satisfactory | 85 | 82 | 80 | | 238 | Female | Group B | Mixed | Non-Satisfactory | 75 | 70 | 77 | | 187 | Male | Group A | Unhealthy | Satisfactory | 78 | 75 | 79 | | 310 | Female | Group A | Unhealthy | Satisfactory | 53 | 60 | 58 | | 221 | Male | Group A | Healthy | Satisfactory | 80 | 83 | 80 | | 225 | Unknown | Group B | Mixed | Satisfactory | 70 | 67 | 72 | | 301 | Male | Group C | Unhealthy | Satisfactory | 75 | 78 | 72 | | 175 | Unknown | Group A | Unhealthy | Satisfactory | 62 | 57 | 63 | | 153 | Unknown | Group B | Mixed | Satisfactory | 65 | 63 | 62 | | 177 | Male | Group B | Mixed | Satisfactory | 68 | 70 | 68 | | 114 | Male | Group A | Unhealthy | Non-Satisfactory | 70 | 65 | 67 | | 100 | Female | Group C | Healthy | Non-Satisfactory | 72 | 75 | 77 | | 250 | Unknown | Group C | Healthy | Non-Satisfactory | 83 | 80 | 85 | | 140 | Female | Group C | Healthy | Non-Satisfactory | 88 | 83 | 87 | | 318 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | | 24 | Female | Group B | Unhealthy | Non-Satisfactory | 65 | 62 | 60 | | 222 | Male | Group B | Mixed | Non-Satisfactory | 60 | 63 | 63 | | 81 | Unknown | Group A | Unhealthy | Satisfactory | 55 | 60 | 58 | | 123 | Male | Group B | Unhealthy | Satisfactory | 70 | 67 | 72 | | 315 | Female | Group B | Mixed | Satisfactory | 75 | 72 | 75 | | 119 | Unknown | Group B | Mixed | Satisfactory | 68 | 70 | 68 |
Train the Model¶
Now train a model to predict the Maths score based on other attributes, excluding Reading and Writing.
model_training_instruction = """
Train a regression model to predict Maths score based on other fields.
Exclude Reading and Writing and StudentID columns, and separate the Maths column as a label.
Use the rest of the columns to train a model to predict the Maths score.
All the columns apart from the label are categorical so treat them as such.
Use a sklearn pipeline to do data transformations and modeling together.
At the end export the pipeline as pipeline.pkl.
Do not split the data, the file is only the training data.
Report back MAE and R2 using the training data.
Do not use sklearn.externals.
"""
response = run_code_interpreter(model_training_instruction, ['train.csv'])
process_response(response)
The generated code produced an error OneHotEncoder.__init__() got an unexpected keyword argument 'sparse_out' -Automatic retry attempt # 1/5 The generated code produced an error OneHotEncoder.__init__() got an unexpected keyword argument 'sparse' -Automatic retry attempt # 2/5
Generated Code by Code Interpreter
```python
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, r2_score
# Load the data
data = pd.read_csv("train.csv")
# Drop unnecessary columns
data = data.drop(["StudentID", "Reading", "Writing"], axis=1)
# Separate the label
y = data["Maths"]
X = data.drop("Maths", axis=1)
# Create a pipeline for data transformation and modeling
categorical_transformer = OneHotEncoder(handle_unknown="ignore")
preprocessor = ColumnTransformer(
transformers=[
("cat", categorical_transformer, X.select_dtypes("object").columns)
]
)
model = LinearRegression()
pipeline = Pipeline(steps=[("preprocessor", preprocessor), ("model", model)])
# Fit the pipeline on the training data
pipeline.fit(X, y)
# Export the pipeline
import pickle
with open("pipeline.pkl", "wb") as f:
pickle.dump(pipeline, f)
# Evaluate the model
y_pred = pipeline.predict(X)
mae = mean_absolute_error(y, y_pred)
r2 = r2_score(y, y_pred)
print(f"MAE: {mae}")
print(f"R2: {r2}")
```
Code Execution Results
Executed Code Output:
MAE: 5.117511520737327 R2: 0.4954749879656516
Files Created (Click on filename to view content):
pipeline.pkl
Preview N/A
Step 5: Using the Model to Predict¶
In this step you will use the pipeline.pkl
to run predicitons on the test split.
model_predict_instruction = """
Load the .pkl file and run predictions on evaluate.csv.
Export predictions in a new predictions.csv.
The prediction should be in new column called 'pred'.
Calculate and print MAE and R2 using columns Maths and pred.
Do not use sklearn.externals.
"""
response = run_code_interpreter(model_predict_instruction, ['pipeline.pkl','evaluate.csv'])
process_response(response)
Generated Code by Code Interpreter
```python
import pandas as pd
from sklearn.metrics import mean_absolute_error, r2_score
# Load the pipeline and data
pipeline = pd.read_pickle("pipeline.pkl")
data = pd.read_csv("evaluate.csv")
# Make predictions
data["pred"] = pipeline.predict(data)
# Calculate and print MAE and R2
mae = mean_absolute_error(data["Maths"], data["pred"])
r2 = r2_score(data["Maths"], data["pred"])
print(f"MAE: {mae}")
print(f"R2: {r2}")
# Export predictions
data.to_csv("predictions.csv", index=False)
```
Code Execution Results
Executed Code Output:
MAE: 5.4 R2: 0.4059213479543414
Files Created (Click on filename to view content):
predictions.csv
| StudentID | Gender | ExtraActivitiesGroup | EatingHabits | SleepingHabits | Reading | Writing | Maths | pred | |------------:|:---------|:-----------------------|:---------------|:-----------------|----------:|----------:|--------:|-------:| | 35 | Unknown | Group A | Mixed | Satisfactory | 65 | 63 | 60 | 74 | | 135 | Male | Group B | Mixed | Satisfactory | 78 | 83 | 78 | 76 | | 90 | Unknown | Group A | Unhealthy | Non-Satisfactory | 65 | 62 | 63 | 62.5 | | 149 | Unknown | Group C | Healthy | Satisfactory | 80 | 85 | 82 | 80 | | 231 | Unknown | Group B | Mixed | Satisfactory | 75 | 78 | 77 | 74 | | 162 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | 76 | | 245 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | 80 | | 52 | Female | Group C | Healthy | Non-Satisfactory | 78 | 83 | 77 | 80 | | 185 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | 82 | | 291 | Unknown | Group B | Mixed | Satisfactory | 70 | 63 | 60 | 74 | | 215 | Unknown | Group B | Healthy | Satisfactory | 77 | 83 | 78 | 79 | | 314 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | 80 | | 267 | Unknown | Group C | Healthy | Satisfactory | 83 | 80 | 85 | 80 | | 93 | Male | Group A | Unhealthy | Satisfactory | 62 | 67 | 65 | 64.5 | | 194 | Unknown | Group C | Healthy | Non-Satisfactory | 75 | 78 | 77 | 80 | | 229 | Male | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | 64.5 | | 266 | Female | Group A | Unhealthy | Non-Satisfactory | 55 | 62 | 58 | 62.5 | | 172 | Female | Group A | Unhealthy | Non-Satisfactory | 75 | 70 | 72 | 62.5 | | 121 | Male | Group C | Healthy | Satisfactory | 75 | 80 | 77 | 82 | | 68 | Male | Group B | Mixed | Non-Satisfactory | 65 | 72 | 67 | 76 | | 260 | Male | Group A | Unhealthy | Satisfactory | 55 | 62 | 58 | 64.5 | | 312 | Female | Group B | Mixed | Non-Satisfactory | 80 | 77 | 82 | 74 | | 53 | Male | Group B | Mixed | Satisfactory | 65 | 63 | 62 | 76 | | 48 | Male | Group B | Unhealthy | Non-Satisfactory | 60 | 63 | 58 | 64.5 | | 219 | Female | Group B | Mixed | Satisfactory | 85 | 82 | 80 | 74 | | 11 | Female | Group B | Unhealthy | Satisfactory | 65 | 68 | 70 | 62.5 | | 27 | Male | Group A | Unhealthy | Satisfactory | 67 | 60 | 65 | 64.5 | | 234 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | 76 | | 126 | Male | Group B | Mixed | Non-Satisfactory | 62 | 68 | 60 | 76 | | 29 | Female | Group A | Mixed | Satisfactory | 55 | 62 | 58 | 74 | | 131 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | 80 | | 78 | Male | Group A | Unhealthy | Non-Satisfactory | 70 | 65 | 67 | 64.5 | | 170 | Unknown | Group C | Healthy | Non-Satisfactory | 82 | 88 | 80 | 80 | | 263 | Male | Group A | Unhealthy | Non-Satisfactory | 65 | 62 | 65 | 64.5 | | 296 | Female | Group C | Healthy | Non-Satisfactory | 77 | 83 | 78 | 80 | | 8 | Female | Group B | Mixed | Non-Satisfactory | 65 | 70 | 67 | 74 | | 139 | Unknown | Group A | Unhealthy | Satisfactory | 65 | 62 | 65 | 62.5 | | 77 | Female | Group B | Mixed | Satisfactory | 72 | 70 | 73 | 74 | | 138 | Male | Group B | Mixed | Non-Satisfactory | 67 | 70 | 63 | 76 | | 137 | Male | Group C | Healthy | Satisfactory | 80 | 83 | 80 | 82 | | 30 | Male | Group B | Unhealthy | Non-Satisfactory | 78 | 75 | 72 | 64.5 | | 145 | Unknown | Group A | Unhealthy | Satisfactory | 60 | 63 | 63 | 62.5 | | 287 | Unknown | Group C | Healthy | Satisfactory | 77 | 83 | 78 | 80 | | 23 | Male | Group A | Mixed | Satisfactory | 60 | 63 | 60 | 76 | | 88 | Male | Group C | Healthy | Non-Satisfactory | 75 | 78 | 73 | 82 | | 252 | Female | Group A | Unhealthy | Non-Satisfactory | 62 | 57 | 63 | 62.5 | | 103 | Female | Group C | Healthy | Satisfactory | 83 | 87 | 85 | 80 | | 244 | Male | Group B | Mixed | Non-Satisfactory | 82 | 80 | 83 | 76 | | 108 | Female | Group A | Unhealthy | Non-Satisfactory | 72 | 65 | 70 | 62.5 | | 211 | Male | Group A | Unhealthy | Satisfactory | 53 | 58 | 55 | 64.5 | | 19 | Unknown | Group C | Healthy | Satisfactory | 88 | 85 | 87 | 80 | | 178 | Male | Group A | Unhealthy | Non-Satisfactory | 53 | 60 | 58 | 64.5 | | 272 | Male | Group A | Unhealthy | Non-Satisfactory | 53 | 60 | 58 | 64.5 | | 294 | Male | Group B | Mixed | Non-Satisfactory | 85 | 82 | 80 | 76 | | 133 | Male | Group A | Unhealthy | Satisfactory | 62 | 67 | 60 | 64.5 |
Cleanup¶
In this tutorial you used Code Interpreter from Vertex AI Extensions to process data, train a linear regression model, and run predictions.
Cleaning Up Extensions¶
Run the next code block to remove the extension you registered in this notebook.
extension_code_interpreter.delete()
If you restarted the notebook runtime, you may have some stray registered Extensions. This next line of code shows you all the Extensions registered in your project:
extensions.Extension.list()
You can use the Google Cloud Console to view and delete any stray registered Extensions.
If you want to delete all the extensions in your project, uncomment and run this code block. WARNING: This cannot be undone!
"""
clean_ids = []
for element in extensions.Extension.list():
clean_ids.append(str(element).split("extensions/")[1])
for id in clean_ids:
extension = extensions.Extension(id)
extension.delete()
"""
Cleaning Up Local Files¶
If you used the run_code_interpreter
helper function, you can quickly cleanup the files created by Code Interpreter. First, take a look at the file names created:
print(set(CODE_INTERPRETER_WRITTEN_FILES))
If you don't want to keep any of these files, uncomment and run the next code block. WARNING: These files will all be deleted, and this cannot be undone.
# import os
# _ = [os.remove(filename) for filename in set(CODE_INTERPRETER_WRITTEN_FILES)
# if os.path.isfile(filename)]
Uncomment to remove two more files created by this notebook:
# os.remove('students.csv')
# os.remove('tree_data.csv')